Toward universal computer vision task solver with single unified model단일 통합 모델을 통해 범용적인 컴퓨터 비전 테스크들을 풀기 위한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 7
  • Download : 0
With the advancements in Large Language Models (LLMs), a variety of Natural Language Processing(NLP) tasks can be effectively addressed using single unified LLM backbones. Notably, Instruction Tuning leverages the emergent abilities of LLMs by handling diverse language tasks through language instructions. However, in the field of computer vision, there is no single unified system capable of solving all types of computer vision tasks due to the inherent diversity of such tasks. In this paper, we propose an approach to address various computer vision tasks by utilizing the capabilities of visual instruction tuning. By unifying the model’s input and output as either text or image, we design a sequence-to- sequence modeling framework for computer vision tasks. In summary, we present a framework designed to solve any type of computer vision task—a universal computer vision task solver
Advisors
황성주researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 27p. :]

Keywords

멀티모달러닝▼a대형언어모델▼a기반모델▼a지시튜닝▼a시각지시튜닝▼a시퀸스-투-시퀸스 모델링▼a컴퓨터 비전 테스크; Multimodal learning▼aLarge language model▼aFoundation model▼aInstruction tuning▼aVisual instruction tuning▼aSequence-to-sequence modeling▼aComputer vision tasks

URI
http://hdl.handle.net/10203/321345
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096050&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0