Identification of protein homology using domain architecture comparison도메인 아키텍쳐 비교를 이용한 단백질 상동성 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 621
  • Download : 0
Homology identification is the first step for many genomic studies and can be divided into sequence-based and domain-based methods. The sequence-based homology searches can yield false mis-assignments, especially when comparing multidomain proteins. Here we propose a new method to detect homologs through explicit comparison of domain architecture. A major obstacle in comparison of domain architecture, the organization of protein domains, is the existence of ‘promiscuous’ domains, which carry out auxiliary functions and appear in many unrelated proteins. To distinguish these promiscuous domains from protein domains, we assigned a weight score to each domain extracted from RefSeq proteins, based on its abundance and versatility. To measure domain abundance and versatility, we defined the Inverse Versatility (IV) and the Inverse Abundance Frequency (IAF), derived from Inverse Document Frequency (IDF), a statistic commonly used in information retrieval. A domain’s score represents its importance in the “protein world” and is used in the comparison of domain architectures. We examined the weight scores of 215 known promiscuous domains to identify relationship between the obtained weight scores and domain promiscuity. We found that all of the known promiscuous domains have very low weight scores, most of which are below 10. In scoring domains, domain combinations were also considered as well as single domains. To measure domain architecture similarity, we compared domain content, domain order, and domain duplication between two domain architectures. To do this, we used several methods that are based on algorithms used in information retrieval (the cosine similarity, the Goodman-Kruskal gamma function, and domain duplication index). A final similarity score between two domain architectures were obtained by combining these three indices into a similarity score. To determine the best combination of three indices, we used the COG database. Our method was compared with ot...
Lee, Kwang H.researcher이광형researcherLee, Do-Heonresearcher이도헌researcher
한국과학기술원 : 바이오및뇌공학과,
Issue Date
309047/325007  / 020025854

학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2009.2, [ x, 103 p. ]


homology; protein; domain; architecture; bioinformatics; 상동성; 단백질; 도메인; 아키텍쳐; 생물정보; homology; protein; domain; architecture; bioinformatics; 상동성; 단백질; 도메인; 아키텍쳐; 생물정보

Appears in Collection
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0