DSpace at KOASAS: Data augmentation for natural language processing

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Data augmentation for natural language processing자연언어처리를 위한 데이터 증강방법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 182
Download : 0

Export

Lee, Seanie

Deep neural networks have achieved remarkable performance on various natural language processing tasks --- text classification, machine translataion, and question answering to name a few. Although pretraining a model on large unlabeled corpora and finetuning it on labeled data is sample efficient method, it still requires a large amount of annotate data. Data augmentation is known to be one of the most effective method for tackling few labeled data problem. However, it is challenging to construct a well-defined data augmentation for NLP, which preserves semantic of the original data with diversity. In this thesis, we propose three data augmentation methods for question answering and conditional text generation task. First, we leverage probabilistic generative models regularized with information maximization to sample diverse and consistent question answer pairs. Second, we propose adversarial perturbation to generate negative examples for text generation and train a text generation model to push away negative examples from given source sentences. Last, we propose a stochastic word embedding perturbation to regularize QA model for domain generalization. With stochastic word embedding perturbation, we can transform original question and context without any semantic drift.

Advisors: Hwang, Sung Ju researcher; 황성주 researcher; Lee, Juho researcher; 이주호 researcher

Description: 한국과학기술원 :김재철AI대학원,

Publisher: 한국과학기술원

Issue Date: 2022

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.2,[iv, 47 p. :]

URI: http://hdl.handle.net/10203/308231

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997682&flag=dissertation

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Data augmentation for natural language processing자연언어처리를 위한 데이터 증강방법

KOASAS

Communities & Collections