#### Data mining for characterizing protein-protein interaction interfaces = 데이터 마이닝 기법을 적용한 단백질 분자 사이의 상호작용 특성에 대한 연구

Cited 0 time in Cited 0 time in
• Hit : 343
The main objective of this thesis is to suggest a new perspective to effectively describe protein-protein interaction interfaces in terms of data mining. Two systematic approaches are applied to answer the question of what features are effective in representing the interaction interfaces. One is proceeded in the aspect of specificity, and the other is performed in the aspect of stability, which are two critical factors determining the binding association. Firstly, specificity of molecular interactions is examined in the context of protein functions. This is the first approach to analyze interaction interfaces at the molecular interaction level in the context of protein functions. We perform systematic analysis at the molecular interaction level using classification and feature subset selection technique prevalent in the field of pattern recognition. To represent the physicochemical properties of protein-protein interfaces, we design 18 molecular interaction types using canonical and non-canonical interactions. Then, we construct input vectors using the frequency of each interaction type in protein-protein interfaces. The 131 interfaces of transient protein-protein heterocomplexes in PDB is analyzed : 33 protease-inhibitors, 52 antibody-antigens, and 46 signaling proteins including 4 cyclin dependent kinase and 26 G-protein. Using kNN classification and feature subset selection techniques, we clearly show that there are specific interaction types according to their functional categories, and such interaction types are conserved through the common binding mechanism, rather than through the sequence or structure conservation. The $C^\alpha-H\cdots O=C$ shows binding specificity for protease-inhibitor complexes, while cation-anion interaction is predominant in signaling complexes. In the case of antibody-antigen complexes, the sign is somewhat ambiguous. From the evolutionary perspective, while protease-inhibitors and signaling proteins have optimized their int...
Lee, Kwang-H.researcher이광형researcherKim, Dong-Supresearcher김동섭researcher
Description
한국과학기술원 : 바이오및뇌공학과,
Publisher
한국과학기술원
Issue Date
2007
Identifier
268685/325007  / 020025881
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2007. 8, [ xviii, 112 p. ]

Keywords

데이터 마이닝; 단백질; 상호작용면; 특이성; 분자수준; 고밀도 구조; 단량체; 핫스팟 아미노산; 통계분석; 특징추출; 특징분류; 주변량 분석; Data mining; protein; interface; specificity; molecular interaction; densely structured organization; unbounded state; hot spot residues; Mann-Whitney; classification; feature selection; SVMs; kNN; SVD; PCA; Decision tree; 데이터 마이닝; 단백질; 상호작용면; 특이성; 분자수준; 고밀도 구조; 단량체; 핫스팟 아미노산; 통계분석; 특징추출; 특징분류; 주변량 분석; Data mining; protein; interface; specificity; molecular interaction; densely structured organization; unbounded state; hot spot residues; Mann-Whitney; classification; feature selection; SVMs; kNN; SVD; PCA; Decision tree

URI
http://hdl.handle.net/10203/27058