DSpace at KOASAS: Deep learning based approaches for multimodal video question answering

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Deep learning based approaches for multimodal video question answering딥러닝을 활용한 멀티모달 비디오 질의응답 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 128
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Yoo, Changdong	-
dc.contributor.advisor	유창동	-
dc.contributor.author	Kim, Junyeong	-
dc.date.accessioned	2022-04-21T19:34:04Z	-
dc.date.available	2022-04-21T19:34:04Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956669&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/295671	-
dc.description	학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[v, 65 p. :]	-
dc.description.abstract	This dissertation considers the problem of Multimodal Video Question Answering (MVQA) which aims at joint understanding of video and accompanied subtitles to answer the given question. Compared to visual question answering (VQA) which is question answering on a single image, MVQA is challenging in two aspects: (1) it requires pinpointing the temporal parts relevant to answer the question as input is long untrimmed video, and (2) it involves reasoning on heterogeneous modality where different question requires different modality to answer the question. We propose two MVQA networks to address aforementioned challenges: (1) Progressive Attention Memory Network (PAMN), and (2) Modality Shifting Attention Network (MSAN). Experimental results on MovieQA and TVQA shows proposed PAMN and MSAN achieves significant performance improvement compared to previous state-of-the-art methods. Furthermore, we propose Structured Co-reference Graph Attention for Video-grounded Dialog (VideoDial) task and showed performance boost on AVSD benchmark.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Multimodal Video Question Answering▼aMemory Network▼aAttention Mechanism▼aMultimodal Video Dialog▼aGraph Neural Network	-
dc.subject	멀티모달 비디오 질의응답▼a메모리 네트워크▼a집중 메커니즘▼a멀티모달 비디오 대화▼a그래프 뉴럴 네트워크	-
dc.title	Deep learning based approaches for multimodal video question answering	-
dc.title.alternative	딥러닝을 활용한 멀티모달 비디오 질의응답 기법	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	김준영	-

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Deep learning based approaches for multimodal video question answering딥러닝을 활용한 멀티모달 비디오 질의응답 기법

KOASAS

Communities & Collections