Temporal information extraction from Korean texts한국어 문서로부터의 시간 정보 추출

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 619
  • Download : 0
Due to the increasing number of unstructured documents available on the Web and from other sources, developing techniques that automatically extract knowledge from the documents has been of paramount importance. Among many aspects of extracting knowledge from documents, the extraction of temporal information is recently drawing much attention, since the documents usually incorporate temporal information that is useful for further applications such as Information Retrieval (IR) and Question Answering (QA) systems. Given a simple question, ``who was the president of the U.S. 8 years ago?'', for example, a QA system may have a difficulty in finding the right answer without the correct temporal information about when the question is posed and what `8 years ago' refers to. To prior to the task of the temporal information extraction, it is required to define a representation scheme or an annotation language of the temporal information. The most popular annotation languages are TimeML and ISO-TimeML. Although they are desinged to represent various types of temporal information, they do not consider language diversity. That is, for language-specific characteristics, there are some languages that can not be properly annotated using the TimeML and ISO-TimeML. Korean language is one of such languages, so Korean TimeML (KTimeML) was proposed in 2009. However, the KTimeML also has some limitations. For example, it does not consider a lunar calendar although the temporal expressions of the lunar calendar appear often in Korean texts. It is also based on a morpheme-level annotation which is not practical to data distribution or data sharing. In this dissertation, a revised version of the KTimeML is proposed, and Korean TimeBank, which is constructed using a part of the new KTimeML, is proposed. With the Korean TimeBank, a system for temporal information extraction, namely ExoTime, is developed. Several Korean-specific challenging issues are discussed, and it will be explained how these issues are addressed by the proposed system. The proposed system makes use of Korean analyzer which gives POS tags, NE tags and results of dependency parsing. As the performance of Korean analyzer is not stable compared to the tools for English language, a new method for generating complementary features is also proposed. The complementary feature generation method is a data-driven model designed to be available to any language, and it generates syntactic and semantic features in an unsupervised way. The proposed system will have a huge impact on industry and various research fields, because the documents usually have the temporal information which must be useful for various applications.
Advisors
Choi, Ho-Jinresearcher최호진researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2016.2 ,[vii, 110 p. :]

Keywords

Temporal Information; Korean Texts; Temporal Information Extraction; Language Independent Features; Topic Modeling; 시간정보; 한국어문서; 시간정보추출; 언어비종속적자질; 토픽모델링

URI
http://hdl.handle.net/10203/222423
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=648285&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0