DSpace at KOASAS: Besra: Self-correction for hallucination mitigation in large vision-language models

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Besra: Self-correction for hallucination mitigation in large vision-language models베스라: 대형 시각 언어 모델의 환각 완화를 위한 자체 교정

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 8
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	노용만	-
dc.contributor.author	Kim, Yeonju	-
dc.contributor.author	김연주	-
dc.date.accessioned	2024-07-30T19:31:23Z	-
dc.date.available	2024-07-30T19:31:23Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096788&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321570	-
dc.description	학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iii, 22 p. :]	-
dc.description.abstract	Large Vision-Language Models (LVLMs) have revolutionized the field of computer vision by unifying various computer vision tasks through their ability to comprehend visual information. However, they often suffer from hallucination, generating inconsistent descriptions not aligned with input images. This paper introduces Besra, a Large Vision-Language Model designed to address hallucination by incorporating a self-correction task. Besra leverages its iterative refinement capability to enhance generated sentences' consistency with provided images. The model iteratively refines descriptions by refeeding them alongside corresponding images, facilitating a detailed examination of specific image regions. Besra-Self-Correction-30K, a proposed dataset, trains Besra's self-correction ability by inducing corrections based on predictions from a baseline LVLM. The approach aims to mitigate hallucination, enabling Besra to generate more accurate and contextually relevant descriptions through active image scrutiny. We evaluate Besra on POPE and MME benchmark and prove that a self-correction task is helpful for hallucination mitigation.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	대형 시각 언어 모델▼a환각 현상▼a자체 교정 작업▼a베스▼a베스라-자체교정-데이터셋	-
dc.subject	Large vision-language model▼aHallucination▼aSelf-correction▼aBesra▼aBesra-self-correction-30K	-
dc.title	Besra: Self-correction for hallucination mitigation in large vision-language models	-
dc.title.alternative	베스라: 대형 시각 언어 모델의 환각 완화를 위한 자체 교정	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	Ro, Yongman	-

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Besra: Self-correction for hallucination mitigation in large vision-language models베스라: 대형 시각 언어 모델의 환각 완화를 위한 자체 교정

KOASAS

Communities & Collections