DC Field | Value | Language |
---|---|---|
dc.contributor.author | DongHyun Choi | - |
dc.contributor.author | Jungyeul Park | - |
dc.contributor.author | Choi, Key-Sun | - |
dc.date.accessioned | 2013-03-29T15:22:47Z | - |
dc.date.available | 2013-03-29T15:22:47Z | - |
dc.date.created | 2012-08-23 | - |
dc.date.issued | 2012-07-12 | - |
dc.identifier.citation | ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012) | - |
dc.identifier.uri | http://hdl.handle.net/10203/171709 | - |
dc.description.abstract | Korean is a morphologically rich language in which grammatical functions are marked by inflections and affixes, and they can indicate grammatical relations such as subject, object, predicate, etc. A Korean sentence could be thought as a sequence of eojeols. An eojeol is a word or its variant word form agglutinated with grammatical affixes, and eojeols are separated by white space as in English written texts. Korean treebanks (Choi et al., 1994; Han et al., 2002; Korean Language Institute, 2012) use eojeol as their fundamental unit of analysis, thus representing an eojeol as a prepreterminal phrase inside the constituent tree. This eojeol-based annotating schema introduces various complexity to train the parser, for example an entity represented by a sequence of nouns will be annotated as two or more different noun phrases, depending on the number of spaces used. In this paper, we propose methods to transform eojeol-based Korean treebanks into entity-based Korean treebanks. The methods are applied to Sejong treebank, which is the largest constituent treebank in Korean, and the transformed treebank is used to train and test various probabilistic CFG parsers. The experimental result shows that the proposed transformation methods reduce ambiguity in the training corpus, increasing the overall F1 score up to about 9 %. | - |
dc.language | English | - |
dc.publisher | LIMSI-CNRS | - |
dc.title | Korean Treebank Transformation for Parser Training | - |
dc.type | Conference | - |
dc.type.rims | CONF | - |
dc.citation.publicationname | ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012) | - |
dc.identifier.conferencecountry | KO | - |
dc.identifier.conferencelocation | 제주도 | - |
dc.contributor.localauthor | Choi, Key-Sun | - |
dc.contributor.nonIdAuthor | DongHyun Choi | - |
dc.contributor.nonIdAuthor | Jungyeul Park | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.