DSpace at KOASAS: Multi-view full sentence visual question answering with full sentence answer network and question-driven object attention network

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Multi-view full sentence visual question answering with full sentence answer network and question-driven object attention network멀티뷰 완전 문장 시각 질의 응답 문제를 위한 완전 문장 응답 네트워크와 질문 주도 물체 주의 네트워크

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 332
Download : 0

Export

Chung, Guhyun

Visual Question Answering (VQA) is a task which answers a question about a given image. So, a model for VQA needs understanding of images and questions, and reasoning method based on a given image and question. Previous researches on VQA are mainly focused on better reasoning and understanding of images or questions. However, in a real VQA application where a robot or a mobile device interacts with human, the VQA model should handle surrounding environment rather than an image taken at specific time. We name the task as Multi-view VQA (MV-VQA) when the object of the task is to get a word answer, and Multi-view Full Sentence VQA (MV-FSVQA) when the object of the task is to get a full sentence answer. We propose a question-driven object-based attention model for the tasks. Furthermore, we separately train a seq2seq model for FSVQA and MV-FSVQA task to get better full sentence answer unlike the baseline algorithm. We carried out various experiments on VQA, FSVQA, MV-VQA, and MV-FSVQA with MS COCO dataset and customized datasets. We show that our model achieves improvements over the baseline especially in Multi-view scenarios and demonstrate the feasibility of the proposed model for real application.

Advisors: Kim, Jong-Hwan researcher; 김종환 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2019

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2019.2,[iv, 23 p. :]

Keywords: Multi-view full sentence visual question answering▼aobject-based attention model; 멀티뷰 완전 문장 시각 질의 응답▼a물체 기반 주의 모델

URI: http://hdl.handle.net/10203/266742

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=843366&flag=dissertation

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Multi-view full sentence visual question answering with full sentence answer network and question-driven object attention network멀티뷰 완전 문장 시각 질의 응답 문제를 위한 완전 문장 응답 네트워크와 질문 주도 물체 주의 네트워크

KOASAS

Communities & Collections