DSpace at KOASAS: Monolingual Pre-trained Language Models for Tigrinya

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

Monolingual Pre-trained Language Models for Tigrinya

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 234
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	GAIM GEBRE, FITSUM	ko
dc.contributor.author	YANG, WONSUK	ko
dc.contributor.author	Park, Jong-Cheol	ko
dc.date.accessioned	2021-11-24T06:41:58Z	-
dc.date.available	2021-11-24T06:41:58Z	-
dc.date.created	2021-11-23	-
dc.date.created	2021-11-23	-
dc.date.created	2021-11-23	-
dc.date.issued	2021-11-11	-
dc.identifier.citation	The 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021	-
dc.identifier.uri	http://hdl.handle.net/10203/289422	-
dc.description.abstract	Pre-trained language models (PLMs) are driving much of the recent progress in natural language processing. However, due to the resource-intensive nature of the models, underrepresented languages without sizable curated data have not seen significant progress. Multilingual PLMs have been introduced with the potential to generalize across many languages, but their performance trails compared to their monolingual counterparts and depends on the characteristics of the target language. In the case of the Tigrinya language, recent studies report a sub-optimal performance when applying the current multilingual models. This may be due to its orthography and unique linguistic characteristics, especially when compared to the Indo-European and other typologically distant languages that were used to train the models. In this work, we pre-train three monolingual PLMs for Tigrinya on a newly compiled corpus, and we compare the models with their multilingual counterparts on two downstream tasks, part-of-speech tagging and sentiment analysis, achieving significantly better results and establishing the state-of-the-art. We make the data and trained models publicly available.	-
dc.language	English	-
dc.publisher	Association for Computational Linguistics	-
dc.title	Monolingual Pre-trained Language Models for Tigrinya	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.publicationname	The 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021	-
dc.identifier.conferencecountry	DR	-
dc.identifier.conferencelocation	Online & Barcelo Bavaro Convention Centre, Punta Cana	-
dc.contributor.localauthor	Park, Jong-Cheol	-

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Monolingual Pre-trained Language Models for Tigrinya

KOASAS

Communities & Collections