Neural language models with multi-sense representations for natural language understanding system자연어 이해 시스템을 위한 다의어가 표현된 신경망 언어모델 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 60
  • Download : 0
Language models (LMs) is the most basic technology in the field of natural language processing, and is essential in various fields for natural language understanding or text generation. Existing LMs represent each word with only a single representation, which is unsuitable for processing words with multiple meanings. Past studies related on the multi-sense words were attempted to resolve word ambiguity through a rule-based system. As a large amount of text data became available, learning-based systems have been proposed. Expensive human annotated data and its tagging errors have led to research on unsupervised learning methods without annotated data. In this dissertation, we propose a sense-aware framework that can process multi-sense word information without relying on annotated data. In contrast to the existing multi-sense representation models, which handle information in a restricted context, our framework provides context representations encoded without ignoring word order information or long-term dependency. The proposed framework consists of a context representation stage to encode the variable-size context, a sense-labeling stage that involves unsupervised clustering to infer a probable sense for a word in each context, and a multi-sense LM (MSLM) learning stage to learn the multi-sense representations. Particularly for the evaluation of MSLMs with different vocabulary sizes, we propose a new metric, i.e., unigram-normalized perplexity (PPLu), which is also understood as the negated mutual information between a word and its context information. Additionally, there is a theoretical verification of PPLu on the change of vocabulary size. Furthermore, we adopt a method of estimating the number of senses, which does not require further hyperparameter search for an LM performance. For the LMs in our framework, both unidirectional and bidirectional architectures based on long short-term memory (LSTM) and Transformers are adopted. We conduct comprehensive experiments on three language modeling datasets to perform quantitative and qualitative comparisons of various LMs. Our MSLM outperforms single-sense LMs (SSLMs) with the same network architecture and parameters. It also shows better performance on several downstream natural language processing tasks in the General Language Understanding Evaluation (GLUE) and SuperGLUE benchmarks.
Advisors
Kim, Dae-Shikresearcher김대식researcherLee, Soo-Youngresearcher이수영researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2022.2,[vii, 100 p. :]

URI
http://hdl.handle.net/10203/309067
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=996244&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0