DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 노용만 | - |
dc.contributor.author | Kim, Yeonju | - |
dc.contributor.author | 김연주 | - |
dc.date.accessioned | 2024-07-30T19:31:23Z | - |
dc.date.available | 2024-07-30T19:31:23Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096788&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/321570 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iii, 22 p. :] | - |
dc.description.abstract | Large Vision-Language Models (LVLMs) have revolutionized the field of computer vision by unifying various computer vision tasks through their ability to comprehend visual information. However, they often suffer from hallucination, generating inconsistent descriptions not aligned with input images. This paper introduces Besra, a Large Vision-Language Model designed to address hallucination by incorporating a self-correction task. Besra leverages its iterative refinement capability to enhance generated sentences' consistency with provided images. The model iteratively refines descriptions by refeeding them alongside corresponding images, facilitating a detailed examination of specific image regions. Besra-Self-Correction-30K, a proposed dataset, trains Besra's self-correction ability by inducing corrections based on predictions from a baseline LVLM. The approach aims to mitigate hallucination, enabling Besra to generate more accurate and contextually relevant descriptions through active image scrutiny. We evaluate Besra on POPE and MME benchmark and prove that a self-correction task is helpful for hallucination mitigation. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 대형 시각 언어 모델▼a환각 현상▼a자체 교정 작업▼a베스▼a베스라-자체교정-데이터셋 | - |
dc.subject | Large vision-language model▼aHallucination▼aSelf-correction▼aBesra▼aBesra-self-correction-30K | - |
dc.title | Besra: Self-correction for hallucination mitigation in large vision-language models | - |
dc.title.alternative | 베스라: 대형 시각 언어 모델의 환각 완화를 위한 자체 교정 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | Ro, Yongman | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.