DSpace at KOASAS: Topic models for discovering latent inter-dependency within text data

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Topic models for discovering latent inter-dependency within text data텍스트 데이터의 내재적 상호 의존을 찾아내기 위한 토픽 모델

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 471
Download : 0

Export

Kim, Joon Hee / 김준희

Topic models refer to a group of bayesian statistical models that discover latent structure of documents inthe form of human-readable topics. They have been widely researched and applied in industry for tasks such as document classification, sentiment analysis, and review prediction. Recently we have witnessed two cases of major explosion in text data as well as a new trend in statistical language models. These changes call for more advanced models. The goal of this thesis is to present novel Bayesian generativemodels that are capable of gaining deeper understanding from the data. First, most of human knowledgeand information have become digitized with efforts such as Wikipedia and Gutenberg project. Oftentopic model is used to extract topics and obtain insights on the corpus, yet it is much harder to learnrelations and structure among topics. I build recursive Chinese Restaurant Process (rCRP), a novelnonparametric generative model that discovers hierarchical structure of topics. Second, traces of ourdaily lives are transferred online with proliferation of mobile devices. This gave rise to coupling of textdata with other types of data such as text-photo (Facebook, Instagram), text-video (Youtube), andtext-review score (Amazon). With its flexibility, researchers have attempted to extend topic model fromits original form of LDA to account for additional data. However, text-click data, equivalent to mostonline user activity, has not been researched. I build Headline Click-based Topic Model (HCTM), a novelgenerative model that learns click-value of words for relevant semantic context. Finally, the developmentof competing models such as word embedding has precipitated the need for modeling specific contextsof individual words. Topic model is good at extracting general topics of an entire document, however itrequires better modeling of local context at individual word locations. Topic model is good at extractinggeneral topics of an entire document, however word embedding excels at extracting local context atspecific position of document. I build Dual Context Topic Model (DCTM), a novel generative modelthat accounts for both document context and local context of individual words.

Advisors: Oh, Hae Yun researcher; 오혜연 researcher

Description: 한국과학기술원 :전산학부,

Publisher: 한국과학기술원

Issue Date: 2017

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학부, 2017.8,[vii, 68 p. :]

Keywords: Topic Modeling▼aText Analysis▼aGenerative Bayesian Model; 토픽 모델▼a텍스트 분석▼a베이지안 모델

URI: http://hdl.handle.net/10203/242094

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=718882&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Topic models for discovering latent inter-dependency within text data텍스트 데이터의 내재적 상호 의존을 찾아내기 위한 토픽 모델

KOASAS

Communities & Collections