Prediction of binding property of RNA-binding proteins using multi-kernel multi-modal deep convolutional neural network다중커널 다중정보 기반의 심층 컨볼루션 신경망을 이용한 RNA Binding Protein (RBP) 결합 위치 예측

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 195
  • Download : 0
Background: Due to the development of various Next Generation Sequencing (NGS), such as CLIP-seq and RNAcompete, RNA sequence and structure reveal to be important factor in RNA-protein binding properties. Especially RNA function can be related with its interacting proteins, RNA-protein interaction was interesting topic for decades. RNA-binding proteins (RBPs) are important in gene expression regulations by post-transcriptional control of RNAs and immune system development and its function. Due to the help of sequencing technology, numerous RNA sequences are newly discovered without knowing their binding partner RBPs. Therefore, demands for accurate prediction method for RBP binding sites are increasing. Results: There are many attempts for RBP binding site predictions using various machine-learning techniques combined with various RNA features. In this doctoral thesis, we developed a new deep convolution neural network model trained on CLIP-seq datasets using multi-sized filters and multi-modal features to predict the binding property of RBPs. With this model, we integrated sequence and structure information to extract sequence motifs, structure motifs, and combined motifs at the same time. The RBP binding site prediction on RBP-24 dataset was compared with two multi-modal methods, GraphProt and Deepnet-rbp, using area under curve (AUC) of receiver-operating characteristics (ROC). Our method (average AUC=0.920) outperformed 20 RBPs with GraphProt (average AUC=0.888) and 15 RBP with Deepnet-rbp (average AUC=0.902). The improvement was achieved by using multi-sized convolution filters, where average relative error reduction was 17%. By introducing new RNA structure representation, structure probability matrix, average relative error was reduced by 3% when compared to one-hot encoded secondary structure representation. Interestingly, structure probability matrix was more effective on ALKBH5, where relative error reduction was 30%. Finally, we developed new sequence motif enrichment method, which we stated as filter enrichment method. We successfully enriched sequence motif for 15 RBPs, which had high resemblance with other literature evidences, RBPgroup and CISBP-RNA, and 10RBPs were statistically significant when p-values were calculated. By analyzing sequence and structure motif altogether, we could identify combined motifs for ELAVL1, ALKBH5, IGF2BP123, PUM2, and TDP43. Conclusion: We developed a new deep learning framework, which can integrate two different types of features, sequence and structure. With our deep learning framework and filter enrichment method, we successfully extracted sequence and structure motifs, which were more resembling with literature evidence when compared with other prediction methods and statistically significant. Finally analyzing these results, we found intricate interplay between sequence motif and structure motif, which agreed with other researches.
Advisors
Kim, Dongsupresearcher김동섭researcher
Description
한국과학기술원 :바이오및뇌공학과,
Publisher
한국과학기술원
Issue Date
2019
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2019.8,[xi, 100 p. :]

Keywords

RNA-binding protein▼adeep learning▼aconvolution neural network▼amulti-modal▼amulti-filter▼aCLIP-seq▼aprediction method; RNA 결합 단백질▼a심층 학습▼a콘볼루션 신경망▼a다양한 RNA 정보▼a다중 필터▼aCLIP-seq 데이터▼a예측 기법

URI
http://hdl.handle.net/10203/283194
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=871351&flag=dissertation
Appears in Collection
BiS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0