Topic models for discovering latent inter-dependency within text data텍스트 데이터의 내재적 상호 의존을 찾아내기 위한 토픽 모델

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 471
  • Download : 0
Topic models refer to a group of bayesian statistical models that discover latent structure of documents inthe form of human-readable topics. They have been widely researched and applied in industry for tasks such as document classification, sentiment analysis, and review prediction. Recently we have witnessed two cases of major explosion in text data as well as a new trend in statistical language models. These changes call for more advanced models. The goal of this thesis is to present novel Bayesian generativemodels that are capable of gaining deeper understanding from the data. First, most of human knowledgeand information have become digitized with efforts such as Wikipedia and Gutenberg project. Oftentopic model is used to extract topics and obtain insights on the corpus, yet it is much harder to learnrelations and structure among topics. I build recursive Chinese Restaurant Process (rCRP), a novelnonparametric generative model that discovers hierarchical structure of topics. Second, traces of ourdaily lives are transferred online with proliferation of mobile devices. This gave rise to coupling of textdata with other types of data such as text-photo (Facebook, Instagram), text-video (Youtube), andtext-review score (Amazon). With its flexibility, researchers have attempted to extend topic model fromits original form of LDA to account for additional data. However, text-click data, equivalent to mostonline user activity, has not been researched. I build Headline Click-based Topic Model (HCTM), a novelgenerative model that learns click-value of words for relevant semantic context. Finally, the developmentof competing models such as word embedding has precipitated the need for modeling specific contextsof individual words. Topic model is good at extracting general topics of an entire document, however itrequires better modeling of local context at individual word locations. Topic model is good at extractinggeneral topics of an entire document, however word embedding excels at extracting local context atspecific position of document. I build Dual Context Topic Model (DCTM), a novel generative modelthat accounts for both document context and local context of individual words.
Advisors
Oh, Hae Yunresearcher오혜연researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2017
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2017.8,[vii, 68 p. :]

Keywords

Topic Modeling▼aText Analysis▼aGenerative Bayesian Model; 토픽 모델▼a텍스트 분석▼a베이지안 모델

URI
http://hdl.handle.net/10203/242094
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=718882&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0