Measuring popularity of machine-generated sentence using term occurrence and dependency language model어휘등장빈도와 의존관계언어모델을 이용한 기계생성문장의 대중성 측정

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 656
  • Download : 0
Natural language generation is widely used in variety of Natural Language Processing (NLP) applications. To improve the quality of generated sentences, appropriate evaluation criteria is critical. We investigated the notion of \popularity" for machine-generated sentences as a new criteria for sentence evaluation. We approached popularity of sentence from two perspectives: word and word sequence. We defined a popular sentence as one that contains words that are frequently used, appear in many documents, and contain frequent dependencies. We measured the popularity of sentences based on three components: content morpheme count, document frequency, and dependency relationships. Language resources used for those three components were obtained by analyzing massive on-line document repository. Additionally, we attempted to improve search quality under the intuition: search queries that consist of popular terms retrieve greater number of results and increase the chance that these documents contain the desired results. In order to consider the characteristics of agglutinative language, we used content morpheme frequency instead of term frequency. The key component in our method is that we use the product of content morpheme count and document frequency to measure word popularity, and apply language models based on dependency relationships to consider popularity from the co-occurred words. We verify that our method accurately reects popularity by using Pearson correlations and the inuence of query popularity on search results using the mean reciprocal rank (MRR), precision-at-k (p@k) and individual comparison of search term pairs. Through these experiments, we demonstrate that our method has a high correlation with human judgments and that better search results can be obtained by considering the popularity of the query.
Advisors
Choi, Ho-Jinresearcher최호진researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2016.2 ,[v, 32 p. :]

Keywords

popularity; language model; term frequency; evaluation; machine-generated sentence; 대중성; 언어모델; 어휘빈도; 평가; 기계생성문장

URI
http://hdl.handle.net/10203/221903
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=649674&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0