DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 김호민 | - |
dc.contributor.advisor | Kim, Homin | - |
dc.contributor.advisor | 차미영 | - |
dc.contributor.author | Jung, Hyunkyu | - |
dc.contributor.author | 정현규 | - |
dc.date.accessioned | 2024-07-30T19:31:43Z | - |
dc.date.available | 2024-07-30T19:31:43Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097251&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/321671 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전산학부, 2024.2,[iv, 29 p. :] | - |
dc.description.abstract | The ability to treat 3D data or point clouds has tremendously impacted various applications. Proteins are functional components in biological processes that comprise amino acid residues linked by peptide bonds. Linear polypeptides fold into a specific 3D structure and form a complex with other proteins or biomolecules for their cellular functions. Predicting whether two proteins interact, also known as protein-protein interactions (PPI), is a fundamental challenge in biomedical fields. Here, we propose PPI-BERT, a pre-trained Transformer to learn PPI using protein sequences and structures repre- sented as heterogenous point clouds. Our model uses a rotation invariant method to obtain a canonical representation of protein structures and segments them into fragments of fixed amino acid lengths while retaining information regarding atom positions and amino acid classes. This “sequence-structure” representation is used to train a tokenizer that learns discrete token IDs to optimize the sequence and structure reconstruction. Masked modeling is used to train the Transformer encoder model on tokenized fragments. Our self-supervised model was trained on protein complex structures (N=85,885) from the Protein Data Bank. Evaluation shows that our model outperforms existing methods in two critical PPI downstream tasks: binding and interface region predictions. These results are an important step toward developing computational models for PPI applications such as drug discovery. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 단백질 구조▼a기하적 심층 학습▼a비지도 학습▼a사전 훈련된 모델▼a마스크 모델 | - |
dc.subject | Protein structure▼aGeometric deep learning▼aUnsupervised learning▼aPre-trained model▼aMasked model | - |
dc.title | PPI-BERT: Pretraining transformers with masked sequence-structure of protein fragments for learning protein-protein interactions | - |
dc.title.alternative | PPI-BERT: 단백질-단백질 상호작용 학습을 위한 마스크된 서열-구조의 단백질 단편 구성의 사전 학습된 트랜스포머 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전산학부, | - |
dc.contributor.alternativeauthor | Cha, Meeyoung | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.