DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Hwang, Sung Ju | - |
dc.contributor.advisor | 황성주 | - |
dc.contributor.advisor | Lee, Juho | - |
dc.contributor.advisor | 이주호 | - |
dc.contributor.author | Lee, Seanie | - |
dc.date.accessioned | 2023-06-22T19:31:29Z | - |
dc.date.available | 2023-06-22T19:31:29Z | - |
dc.date.issued | 2022 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997682&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/308231 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.2,[iv, 47 p. :] | - |
dc.description.abstract | Deep neural networks have achieved remarkable performance on various natural language processing tasks --- text classification, machine translataion, and question answering to name a few. Although pretraining a model on large unlabeled corpora and finetuning it on labeled data is sample efficient method, it still requires a large amount of annotate data. Data augmentation is known to be one of the most effective method for tackling few labeled data problem. However, it is challenging to construct a well-defined data augmentation for NLP, which preserves semantic of the original data with diversity. In this thesis, we propose three data augmentation methods for question answering and conditional text generation task. First, we leverage probabilistic generative models regularized with information maximization to sample diverse and consistent question answer pairs. Second, we propose adversarial perturbation to generate negative examples for text generation and train a text generation model to push away negative examples from given source sentences. Last, we propose a stochastic word embedding perturbation to regularize QA model for domain generalization. With stochastic word embedding perturbation, we can transform original question and context without any semantic drift. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.title | Data augmentation for natural language processing | - |
dc.title.alternative | 자연언어처리를 위한 데이터 증강방법 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :김재철AI대학원, | - |
dc.contributor.alternativeauthor | 이신의 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.