Deep learning based approaches for multimodal video question answering딥러닝을 활용한 멀티모달 비디오 질의응답 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 118
  • Download : 0
This dissertation considers the problem of Multimodal Video Question Answering (MVQA) which aims at joint understanding of video and accompanied subtitles to answer the given question. Compared to visual question answering (VQA) which is question answering on a single image, MVQA is challenging in two aspects: (1) it requires pinpointing the temporal parts relevant to answer the question as input is long untrimmed video, and (2) it involves reasoning on heterogeneous modality where different question requires different modality to answer the question. We propose two MVQA networks to address aforementioned challenges: (1) Progressive Attention Memory Network (PAMN), and (2) Modality Shifting Attention Network (MSAN). Experimental results on MovieQA and TVQA shows proposed PAMN and MSAN achieves significant performance improvement compared to previous state-of-the-art methods. Furthermore, we propose Structured Co-reference Graph Attention for Video-grounded Dialog (VideoDial) task and showed performance boost on AVSD benchmark.
Advisors
Yoo, Changdongresearcher유창동researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[v, 65 p. :]

Keywords

Multimodal Video Question Answering▼aMemory Network▼aAttention Mechanism▼aMultimodal Video Dialog▼aGraph Neural Network; 멀티모달 비디오 질의응답▼a메모리 네트워크▼a집중 메커니즘▼a멀티모달 비디오 대화▼a그래프 뉴럴 네트워크

URI
http://hdl.handle.net/10203/295671
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956669&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0