DSpace at KOASAS: Identification of protein homology using domain architecture comparison

DSpace at KOASAS

College of Engineering(공과대학)Dept. of Bio and Brain Engineering(바이오및뇌공학과)BiS-Theses_Ph.D.(박사논문)

Identification of protein homology using domain architecture comparison도메인 아키텍쳐 비교를 이용한 단백질 상동성 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 725
Download : 0

Export

Lee, Byung-Wook / 이병욱

Homology identification is the first step for many genomic studies and can be divided into sequence-based and domain-based methods. The sequence-based homology searches can yield false mis-assignments, especially when comparing multidomain proteins. Here we propose a new method to detect homologs through explicit comparison of domain architecture. A major obstacle in comparison of domain architecture, the organization of protein domains, is the existence of ‘promiscuous’ domains, which carry out auxiliary functions and appear in many unrelated proteins. To distinguish these promiscuous domains from protein domains, we assigned a weight score to each domain extracted from RefSeq proteins, based on its abundance and versatility. To measure domain abundance and versatility, we defined the Inverse Versatility (IV) and the Inverse Abundance Frequency (IAF), derived from Inverse Document Frequency (IDF), a statistic commonly used in information retrieval. A domain’s score represents its importance in the “protein world” and is used in the comparison of domain architectures. We examined the weight scores of 215 known promiscuous domains to identify relationship between the obtained weight scores and domain promiscuity. We found that all of the known promiscuous domains have very low weight scores, most of which are below 10. In scoring domains, domain combinations were also considered as well as single domains. To measure domain architecture similarity, we compared domain content, domain order, and domain duplication between two domain architectures. To do this, we used several methods that are based on algorithms used in information retrieval (the cosine similarity, the Goodman-Kruskal gamma function, and domain duplication index). A final similarity score between two domain architectures were obtained by combining these three indices into a similarity score. To determine the best combination of three indices, we used the COG database. Our method was compared with ot...

Advisors: Lee, Kwang H.researcher; 이광형 researcher; Lee, Do-Heon researcher; 이도헌 researcher

Description: 한국과학기술원 : 바이오및뇌공학과,

Publisher: 한국과학기술원

Issue Date: 2009

Identifier: 309047/325007 / 020025854

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2009.2, [ x, 103 p. ]

Keywords: homology; protein; domain; architecture; bioinformatics; 상동성; 단백질; 도메인; 아키텍쳐; 생물정보; homology; protein; domain; architecture; bioinformatics; 상동성; 단백질; 도메인; 아키텍쳐; 생물정보

URI: http://hdl.handle.net/10203/27066

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=309047&flag=dissertation

Appears in Collection: BiS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Identification of protein homology using domain architecture comparison도메인 아키텍쳐 비교를 이용한 단백질 상동성 연구

KOASAS

Communities & Collections