Feature-based weighting method for type classification with deep learning = 딥 러닝 기반 개체 유형 분류를 위한 단어 자질 가중치 적용 방법

Question Answering (QA) is a task aimed at providing correct answers for a natural language question. Especially, answering an open-domain question requires knowledge of a sufficiently wide coverage. The Web is a possible information source with a wide coverage, but it requires natural language processing to understand the meaning of the text. Type classification is to support such a task by assigning a predefined type to an entity mention in the text. There are two kinds of type classification methods. One is a traditional feature-based classifier, which uses various features from mention words and their context. The other is using a recent word embedding model, which shows comparable performance without any explicit features, because it learns deeper semantics of words than traditional features. However, tuning word embeddings on given corpora requires a large amount of data. While feature-based models can obtain a significant amount of information about the context words even with small training corpora, they suffer from shallow semantics. To use the benefits of both of the methods, we propose a word embedding model with a term weighting scheme. We design a scoring perceptron method to scale word embedding by its weight before they are manipulated by the type classification model. This has the effect of term weighting without loss of the meaning, which is contained in the word embedding. We use 11 features to measure weight of a context word, based on traditional feature-based classifiers. Our experiment compares the difference among the features in loose micro R-precision. In order to verify such difference, we classify features into eight groups based on their expected functions. We train these models with five small data sets, whose size vary between 1, 000 and 25,000. We find that the data set with the size less than 10,000 syntactic features are good at a decelerating performance drop caused by small training data. This shows that those features support the system by giving a higher weight to significant words. In addition, we investigate four important points that require attention when we design features for small training examples.
Advisors
Myaeng, Sung-Hyonresearcher맹성현researcher
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 웹사이언스대학원, 2016.2 ,[v, 44 p. :]

Keywords

Neural Network; Type Classification; Term Weighting; Deep Learing; Word Embedding; 인공신경망; 개체 유형 분류; 단어 자질 가중치; 딥 러닝; 단어 표현

URI
http://hdl.handle.net/10203/221650
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=649565&flag=t
Appears in Collection
WST-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.
  • Hit : 211
  • Download : 0
  • Cited 0 times in thomson ci

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0