Moment proposal network for multi-modal video question answering멀티 모달 질의 응답을 위한 모멘트 제안 네트워크

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 134
  • Download : 0
This paper proposes Moment Proposal Network (MPN) for Multimodal Video Question Answering (MMVQA). MMVQA requires the understanding of the story of TV shows based on the video and conversation to answer the given question. Existing methods rely on temporal attention mechanisms to retrieve the relevant moment to the current QA. However, there are two main limitations. One is that the attention map tends to be blurred as the video length is increased which hinders pinpointing the required moment. Another is that the contribution of each modality is not considered. To this end, Moment Proposal Network (MPN) is proposed to retrieve the golden moment by a hard attention mechanism which reduces the search space for the subsequent reasoning networks. In addition, MPN can dynamically determine the importance of each modality for the given question by Modality Importance Modulation. MPN is trained to solve a ranking problem between the query and the candidate moment proposals. The experiments on publicly available dataset TVQA show that MPN achieves state-of-the-art performance and provides interpretability of where to attend.
Advisors
Yoo, Chang Dongresearcher유창동researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.2,[v, 28 p. :]

Keywords

multi modality▼aquestion answering▼avideo moment proposal▼adeep learning▼acomputer vision; 멀티 모달▼a질의 응답▼a비디오 모멘트 추출▼a딥 러닝▼a컴퓨터 비전

URI
http://hdl.handle.net/10203/284717
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=911326&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0