DSpace at KOASAS: Deep learning based approaches for multimodal video question answering

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Deep learning based approaches for multimodal video question answering딥러닝을 활용한 멀티모달 비디오 질의응답 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 118
Download : 0

Export

Kim, Junyeong

This dissertation considers the problem of Multimodal Video Question Answering (MVQA) which aims at joint understanding of video and accompanied subtitles to answer the given question. Compared to visual question answering (VQA) which is question answering on a single image, MVQA is challenging in two aspects: (1) it requires pinpointing the temporal parts relevant to answer the question as input is long untrimmed video, and (2) it involves reasoning on heterogeneous modality where different question requires different modality to answer the question. We propose two MVQA networks to address aforementioned challenges: (1) Progressive Attention Memory Network (PAMN), and (2) Modality Shifting Attention Network (MSAN). Experimental results on MovieQA and TVQA shows proposed PAMN and MSAN achieves significant performance improvement compared to previous state-of-the-art methods. Furthermore, we propose Structured Co-reference Graph Attention for Video-grounded Dialog (VideoDial) task and showed performance boost on AVSD benchmark.

Advisors: Yoo, Changdong researcher; 유창동 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2021

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[v, 65 p. :]

Keywords: Multimodal Video Question Answering▼aMemory Network▼aAttention Mechanism▼aMultimodal Video Dialog▼aGraph Neural Network; 멀티모달 비디오 질의응답▼a메모리 네트워크▼a집중 메커니즘▼a멀티모달 비디오 대화▼a그래프 뉴럴 네트워크

URI: http://hdl.handle.net/10203/295671

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956669&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Deep learning based approaches for multimodal video question answering딥러닝을 활용한 멀티모달 비디오 질의응답 기법

KOASAS

Communities & Collections