DSpace at KOASAS: Data augmentation for natural language processing

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Data augmentation for natural language processing자연언어처리를 위한 데이터 증강방법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 183
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Hwang, Sung Ju	-
dc.contributor.advisor	황성주	-
dc.contributor.advisor	Lee, Juho	-
dc.contributor.advisor	이주호	-
dc.contributor.author	Lee, Seanie	-
dc.date.accessioned	2023-06-22T19:31:29Z	-
dc.date.available	2023-06-22T19:31:29Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997682&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/308231	-
dc.description	학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.2,[iv, 47 p. :]	-
dc.description.abstract	Deep neural networks have achieved remarkable performance on various natural language processing tasks --- text classification, machine translataion, and question answering to name a few. Although pretraining a model on large unlabeled corpora and finetuning it on labeled data is sample efficient method, it still requires a large amount of annotate data. Data augmentation is known to be one of the most effective method for tackling few labeled data problem. However, it is challenging to construct a well-defined data augmentation for NLP, which preserves semantic of the original data with diversity. In this thesis, we propose three data augmentation methods for question answering and conditional text generation task. First, we leverage probabilistic generative models regularized with information maximization to sample diverse and consistent question answer pairs. Second, we propose adversarial perturbation to generate negative examples for text generation and train a text generation model to push away negative examples from given source sentences. Last, we propose a stochastic word embedding perturbation to regularize QA model for domain generalization. With stochastic word embedding perturbation, we can transform original question and context without any semantic drift.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.title	Data augmentation for natural language processing	-
dc.title.alternative	자연언어처리를 위한 데이터 증강방법	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :김재철AI대학원,	-
dc.contributor.alternativeauthor	이신의	-

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Data augmentation for natural language processing자연언어처리를 위한 데이터 증강방법

KOASAS

Communities & Collections