PLPD: reliable protein localization prediction from imbalanced and overlapped datasets

Cited 42 time in webofscience Cited 0 time in scopus
  • Hit : 545
  • Download : 0
Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003).
Publisher
OXFORD UNIV PRESS
Issue Date
2006-10
Language
English
Article Type
Article
Keywords

AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINES; SUBCELLULAR LOCATION PREDICTION; FUNCTIONAL DOMAIN COMPOSITION; BUDDING YEAST; SACCHAROMYCES-CEREVISIAE; STRUCTURAL CLASSES; SEQUENCE; DATABASE; GENOME

Citation

NUCLEIC ACIDS RESEARCH, v.34, no.17, pp.4655 - 4666

ISSN
0305-1048
DOI
10.1093/nar/gkl638
URI
http://hdl.handle.net/10203/91787
Appears in Collection
BiS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 42 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0