DSpace at KOASAS: HBoP: Hierarchical bag of phrases

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

HBoP: Hierarchical bag of phrases계층적 구문의 모음

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 4
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	제임스 손	-
dc.contributor.advisor	James, Thorne	-
dc.contributor.advisor	정송	-
dc.contributor.author	Waheed, Sania	-
dc.contributor.author	Sania Waheed	-
dc.date.accessioned	2024-07-30T19:30:39Z	-
dc.date.available	2024-07-30T19:30:39Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096067&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321362	-
dc.description	학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 17 :]	-
dc.description.abstract	Visual-Language Models (VLMs) play a crucial role in connecting the gap that exists between understanding visual and linguistic data collectively. However, the success of current models is hindered by the extensive pre-training and fine-tuning required, often making them difficult to employ for downstream tasks. To address this limitation, large language models were introduced as an alternative to fine-tuning VLMs due to their zero-shot applicability in downstream tasks, but the effective utilization of LLMs for vision-language tasks demands comprehensive textual representations of the visual data in the form of captions. Unfortunately, the textual representations generated by current VLMs are repetitive and do not provide a detailed understanding of the data. To address this gap, we propose a novel framework, Hierarchical Bag of Phrases (HBoP), that effectively connects visual and textual data by generating a comprehensive understanding of all pertinent information in the image. Our proposed framework not only enables the use of LLMs in multi-modal tasks but also helps produce image-patch/text pairs that could be useful for training vision-language models for better image representation. To evaluate its performance, we conduct experiments comparing HBoP results to state-of-the-art VLMs in terms of semantic integrity, image-text retrieval, and the diversity of generated captions. Our results demonstrate a diversity score significantly close to human-generated captions and a substantial increase in performance for text-retrieval tasks, showcasing the effectiveness of the HBoP framework.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	멀티 모달 작업▼a이미지 이해▼a시각적 이해▼a정보 추출▼a이미지-텍스트 변환	-
dc.subject	Multi-modal tasks▼aImage understanding▼aVisual understanding▼aInformation extraction▼aImage-to-text transformation	-
dc.title	HBoP: Hierarchical bag of phrases	-
dc.title.alternative	계층적 구문의 모음	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :김재철AI대학원,	-
dc.contributor.alternativeauthor	Chong, Song	-

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

HBoP: Hierarchical bag of phrases계층적 구문의 모음

KOASAS

Communities & Collections