Distance dependent chinese restaurant franchise거리 의존관계를 이용한 비모수적 베이지안 확률 모형

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 520
  • Download : 0
Topic models provide a simple way to analyze large volumes of unlabeled documents by automatically identifying the latent semantics of the corpus. Such models have been widely applied to text modeling, cognitive science, computational biology, and many others where there are meaningful patterns hidden in the data. Driven by an ever increasing amount of information available and also by efforts of researchers who have built many tools for topic modeling, there is a wide and fast spread of variants and applications of topic models. This thesis proposes a new model, the distance dependent Chinese restaurant franchise (ddCRF), in which the model considers the distance between the latent variables. This thesis starts with the Chinese restaurant process (CRP), which is a non-parametric prior for Bayesian models, extends it to the Chinese restaurant franchise (CRF), which is a hierarchical non-parametric prior for Bayesian topic models, and finally incorporates the distance dependent Chinese restaurant process (ddCRP) into the CRF to build the ddCRF. For posterior inference in ddCRF, which is an important computational issue in probabilistic generative topic models, this thesis proposes Markov chain Monte Carlo (MCMC) algorithms. The resulting model reflects the intuition that topics in nearby documents are more likely to be similar, and when it is applied to a corpus collected over several years in which the documents exhibit the phenomena of emergence and disappearance of topics through time, the ddCRF produces much clearer patterns than previously proposed models for capturing such temporal patterns. The improved performance of the ddCRF in modeling such corpora is shown with four different corpora of conference proceedings, SIGIR, SIGMOD, SIGGRAPH and NIPS. The ddCRF performs better than the CRF and the most widely used topic model, latent Dirichlet allocation (LDA), in terms of held-out likelihoood and complexity. Another advantage of the ddCRF over LDA, dynamic ...
Advisors
Oh, Aliceresearcher오혜연researcher
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2011
Identifier
467960/325007  / 020093057
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2011.2, [ v, 32 p. ]

Keywords

ddCRF; Bayesian; 확률모형; 비모수적

URI
http://hdl.handle.net/10203/35001
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=467960&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0