Monolingual Pre-trained Language Models for Tigrinya

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 234
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorGAIM GEBRE, FITSUMko
dc.contributor.authorYANG, WONSUKko
dc.contributor.authorPark, Jong-Cheolko
dc.date.accessioned2021-11-24T06:41:58Z-
dc.date.available2021-11-24T06:41:58Z-
dc.date.created2021-11-23-
dc.date.created2021-11-23-
dc.date.created2021-11-23-
dc.date.issued2021-11-11-
dc.identifier.citationThe 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021-
dc.identifier.urihttp://hdl.handle.net/10203/289422-
dc.description.abstractPre-trained language models (PLMs) are driving much of the recent progress in natural language processing. However, due to the resource-intensive nature of the models, underrepresented languages without sizable curated data have not seen significant progress. Multilingual PLMs have been introduced with the potential to generalize across many languages, but their performance trails compared to their monolingual counterparts and depends on the characteristics of the target language. In the case of the Tigrinya language, recent studies report a sub-optimal performance when applying the current multilingual models. This may be due to its orthography and unique linguistic characteristics, especially when compared to the Indo-European and other typologically distant languages that were used to train the models. In this work, we pre-train three monolingual PLMs for Tigrinya on a newly compiled corpus, and we compare the models with their multilingual counterparts on two downstream tasks, part-of-speech tagging and sentiment analysis, achieving significantly better results and establishing the state-of-the-art. We make the data and trained models publicly available.-
dc.languageEnglish-
dc.publisherAssociation for Computational Linguistics-
dc.titleMonolingual Pre-trained Language Models for Tigrinya-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.publicationnameThe 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021-
dc.identifier.conferencecountryDR-
dc.identifier.conferencelocationOnline & Barcelo Bavaro Convention Centre, Punta Cana-
dc.contributor.localauthorPark, Jong-Cheol-
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0