Distributed online learning for topic models토픽 모델의 분산 온라인 기계 학습 알고리즘

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 638
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorOh, Hae-Yun-
dc.contributor.advisor오혜연-
dc.contributor.authorBak, Jin-Yeong-
dc.contributor.author박진영-
dc.date.accessioned2013-09-12T01:48:56Z-
dc.date.available2013-09-12T01:48:56Z-
dc.date.issued2013-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=515124&flag=dissertation-
dc.identifier.urihttp://hdl.handle.net/10203/180445-
dc.description학위논문(석사) - 한국과학기술원 : 전산학과, 2013.2, [ vi, 46 p. ]-
dc.description.abstractA major obstacle in using a probabilistic topic model, such as Latent Dirichlet Allocation (LDA) or Hierarchical Dirichlet Processes (HDP) is the amount of time it takes for posterior inference, especially for Web data which are huge and continuously expanding. Recent developments in distributed inference algorithms and minibatch-based online learning algorithms have offered partial solutions for this problem. In this paper, I propose a distributed online learning algorithm for LDA and HDP for dealing with both aspects of this problem at once. I apply our learning algorithm to three datasets: a corpus of 973K Twitter conversations and 4.8M Wikipedia articles used for a quantitative evaluation of our algorithm, and a larger corpus of 5.1M Twitter conversations for a case study. I compare our algorithm with the distributed version of variational inference using MapReduce and online learning using stochastic variational inference. I show that our learning algorithm achieves the same model fit and topic quality as the other inference algorithms but within a much shorter learning time. I conduct a case study using our distributed online learning framework to visualize how the topic proportions change over time in a stream of Web documents. Through this case study, I discover interesting temporal dynamics of topics in Twitter conversations.eng
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectHierarchical Dirichlet Processes-
dc.subjectLatent Dirichlet Allocation-
dc.subjectDistributed inference-
dc.subjectOnline Learning-
dc.subjectTopic modeling-
dc.subjectVariational inference-
dc.subject토픽 모델-
dc.subject온라인 학습-
dc.subject분산 추론-
dc.subjectLatent Dirichlet Allocation-
dc.subjectHierarchical Dirichlet Processes-
dc.subjectVariational inference-
dc.subject맵리듀스-
dc.subjectMapReduce-
dc.titleDistributed online learning for topic models-
dc.title.alternative토픽 모델의 분산 온라인 기계 학습 알고리즘-
dc.typeThesis(Master)-
dc.identifier.CNRN515124/325007 -
dc.description.department한국과학기술원 : 전산학과, -
dc.identifier.uid020113246-
dc.contributor.localauthorOh, Hae-Yun-
dc.contributor.localauthor오혜연-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0