Information retrieval by augmenting document representation문서 표현 증강을 통한 정보 검색

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 71
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJeong, Soyeong-
dc.date.accessioned2023-06-26T19:31:27Z-
dc.date.available2023-06-26T19:31:27Z-
dc.date.issued2022-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1000338&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/309531-
dc.description학위논문(석사) - 한국과학기술원 : 전산학부, 2022.2,[iv, 35 p. :]-
dc.description.abstractOne of the challenges in information retrieval (IR) is the $vocabulary mismatch$ problem, which refers to the failure of retrieving the query-relevant document when the terms between the query and the document are lexically different but semantically similar. While recent work has tried to tackle the problem by expanding sparse representations with additional relevant terms or by embedding the representations to learnable dense space, both of the expansion and dense models generally require a large volume of labeled query-document pairs to train, whereas it is often challenging to acquire the labeled pairs annotated by humans. The thesis focuses on augmenting the document representations, either on the document text level or on the training dataset level, without requiring additional labeled query-document pairs for both sparse and dense retrieval models. For the sparse retrieval model, we propose Unsupervised Document Expansion with Generation (UDEG), which generates diverse supplementary sentences for the original document without using labels on query-document pairs for training. For generating sentences, we further stochastically perturb their embeddings to generate more diverse sentences for document expansion. We validate our UDEG on two standard IR benchmark datasets. The results show that our UDEG significantly outperforms relevant expansion baselines. For the dense retrieval model, we propose Document Augmentation for dense Retrieval (DAR), which augments the document representations with interpolation and perturbation. We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the seen and unseen documents. We believe that our UDEG and DAR make a good contribution to sparse and dense retrievers by augmenting document representations without annotating additional query-document pairs.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectNatural language understanding▼aInformation retrieval▼aData augmentation▼aDocument expansion▼aInterpolation▼aPerturbation-
dc.subject자연 언어 이해▼a정보 검색▼a데이터 증강▼a문서 확장▼a보간▼a섭동-
dc.titleInformation retrieval by augmenting document representation-
dc.title.alternative문서 표현 증강을 통한 정보 검색-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthor정소영-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0