DSpace at KOASAS: PPI-BERT: Pretraining transformers with masked sequence-structure of protein fragments for learning protein-protein interactions

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

PPI-BERT: Pretraining transformers with masked sequence-structure of protein fragments for learning protein-protein interactionsPPI-BERT: 단백질-단백질 상호작용 학습을 위한 마스크된 서열-구조의 단백질 단편 구성의 사전 학습된 트랜스포머

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 7
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	김호민	-
dc.contributor.advisor	Kim, Homin	-
dc.contributor.advisor	차미영	-
dc.contributor.author	Jung, Hyunkyu	-
dc.contributor.author	정현규	-
dc.date.accessioned	2024-07-30T19:31:43Z	-
dc.date.available	2024-07-30T19:31:43Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097251&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321671	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학부, 2024.2,[iv, 29 p. :]	-
dc.description.abstract	The ability to treat 3D data or point clouds has tremendously impacted various applications. Proteins are functional components in biological processes that comprise amino acid residues linked by peptide bonds. Linear polypeptides fold into a specific 3D structure and form a complex with other proteins or biomolecules for their cellular functions. Predicting whether two proteins interact, also known as protein-protein interactions (PPI), is a fundamental challenge in biomedical fields. Here, we propose PPI-BERT, a pre-trained Transformer to learn PPI using protein sequences and structures repre- sented as heterogenous point clouds. Our model uses a rotation invariant method to obtain a canonical representation of protein structures and segments them into fragments of fixed amino acid lengths while retaining information regarding atom positions and amino acid classes. This “sequence-structure” representation is used to train a tokenizer that learns discrete token IDs to optimize the sequence and structure reconstruction. Masked modeling is used to train the Transformer encoder model on tokenized fragments. Our self-supervised model was trained on protein complex structures (N=85,885) from the Protein Data Bank. Evaluation shows that our model outperforms existing methods in two critical PPI downstream tasks: binding and interface region predictions. These results are an important step toward developing computational models for PPI applications such as drug discovery.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	단백질 구조▼a기하적 심층 학습▼a비지도 학습▼a사전 훈련된 모델▼a마스크 모델	-
dc.subject	Protein structure▼aGeometric deep learning▼aUnsupervised learning▼aPre-trained model▼aMasked model	-
dc.title	PPI-BERT: Pretraining transformers with masked sequence-structure of protein fragments for learning protein-protein interactions	-
dc.title.alternative	PPI-BERT: 단백질-단백질 상호작용 학습을 위한 마스크된 서열-구조의 단백질 단편 구성의 사전 학습된 트랜스포머	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	Cha, Meeyoung	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

PPI-BERT: Pretraining transformers with masked sequence-structure of protein fragments for learning protein-protein interactionsPPI-BERT: 단백질-단백질 상호작용 학습을 위한 마스크된 서열-구조의 단백질 단편 구성의 사전 학습된 트랜스포머

KOASAS

Communities & Collections