Korean Treebank Transformation for Parser Training

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 484
  • Download : 28
DC FieldValueLanguage
dc.contributor.authorDongHyun Choi-
dc.contributor.authorJungyeul Park-
dc.contributor.authorChoi, Key-Sun-
dc.date.accessioned2013-03-29T15:22:47Z-
dc.date.available2013-03-29T15:22:47Z-
dc.date.created2012-08-23-
dc.date.issued2012-07-12-
dc.identifier.citationACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012)-
dc.identifier.urihttp://hdl.handle.net/10203/171709-
dc.description.abstractKorean is a morphologically rich language in which grammatical functions are marked by inflections and affixes, and they can indicate grammatical relations such as subject, object, predicate, etc. A Korean sentence could be thought as a sequence of eojeols. An eojeol is a word or its variant word form agglutinated with grammatical affixes, and eojeols are separated by white space as in English written texts. Korean treebanks (Choi et al., 1994; Han et al., 2002; Korean Language Institute, 2012) use eojeol as their fundamental unit of analysis, thus representing an eojeol as a prepreterminal phrase inside the constituent tree. This eojeol-based annotating schema introduces various complexity to train the parser, for example an entity represented by a sequence of nouns will be annotated as two or more different noun phrases, depending on the number of spaces used. In this paper, we propose methods to transform eojeol-based Korean treebanks into entity-based Korean treebanks. The methods are applied to Sejong treebank, which is the largest constituent treebank in Korean, and the transformed treebank is used to train and test various probabilistic CFG parsers. The experimental result shows that the proposed transformation methods reduce ambiguity in the training corpus, increasing the overall F1 score up to about 9 %.-
dc.languageEnglish-
dc.publisherLIMSI-CNRS-
dc.titleKorean Treebank Transformation for Parser Training-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.publicationnameACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012)-
dc.identifier.conferencecountryKO-
dc.identifier.conferencelocation제주도-
dc.contributor.localauthorChoi, Key-Sun-
dc.contributor.nonIdAuthorDongHyun Choi-
dc.contributor.nonIdAuthorJungyeul Park-

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0