Post-Translational Modification (PTM) of the protein is a crucial event that involved in almost every biological process. Tremendous amount of PTM information are buried in the biomedical literature database, NCBI PubMed. The amounts of published paper are growing continuously. To utilize this information, we have developed a literature mining system which extracts PTM information from the biomedical literature. Our system can extract 5 PTM (acetylation, hydroxylation, methylation, glycosylation, and phosphorylation) information from abstracts. This literature mining system exploits pattern template based method as well as extra clausal methods. Using this system, we have constructed a prototype of PTM database which contains PTM information.
The good application of this system is in the mass spectrometry field. Validating high throughput mass spectrometry experiments is difficult. Especially, detecting PTM from mass spectra data is tough work. Many researchers refer PTM database such as dbPTM and UniProt for validation. However, those databases can’t cover all the recent information because it is manually updated by experts. PTM text mining system can solve this problem by automatically extracting PTM information from the published paper.