Machine learning for the identification of noncoding driver mutations in cancer암 세포에서 발생하는 돌연변이의 기능을 확인하기 위한 머신러닝 알고리즘 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 567
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorChoi, Jung Kyoon-
dc.contributor.advisor최정균-
dc.contributor.authorYang, Woojin-
dc.contributor.author양우진-
dc.date.accessioned2018-05-23T19:34:07Z-
dc.date.available2018-05-23T19:34:07Z-
dc.date.issued2017-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=718828&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/241811-
dc.description학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2017.8,[iv, 83 p. :]-
dc.description.abstractOne of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations. In this work, I developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. Recurrent mutations can arise at the same site or affect the same gene from different sites. Here I identified a set of mutations arising from individual samples and altering different cis-regulatory elements that converge on a common gene via chromatin interactions. With breast cancer and lung cancer as a model, I profiled up-to 50 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif were disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for random forest was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of my random forest classifier was evaluated by cross validations. My methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples. The mutations and genes identified in this fashion showed strong relevance to cancer, in contrast to those with site-specific recurrence. My methods were capable of accurately predicting mutations recurring at the target gene level but not those recurring at the same site. In conclusion, I propose a novel approach to discovering potential cancer-driving mutations in noncoding regions.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject머신러닝▼a후성유전체▼a암 체세포 돌연변이▼a크로마틴 원거리 상호작용▼a전사체-
dc.subjectmachine learning▼aepigenome▼acancer somatic mutation▼adistal chromatin interaction▼atranscriptome-
dc.titleMachine learning for the identification of noncoding driver mutations in cancer-
dc.title.alternative암 세포에서 발생하는 돌연변이의 기능을 확인하기 위한 머신러닝 알고리즘 연구-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :바이오및뇌공학과,-
Appears in Collection
BiS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0