DSpace at KOASAS: Multi-channel neural network structure including two-stream spatiotemporal feature extractor and attention mechanism for solving video QA task

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Multi-channel neural network structure including two-stream spatiotemporal feature extractor and attention mechanism for solving video QA task투 스트림 시공간 특징 추출기 및 집중 기제가 포함된 다중 채널 인공신경망을 활용한 비디오 질의응답

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 146
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Yoon, Sung-eui	-
dc.contributor.advisor	윤성의	-
dc.contributor.author	Song, Chiwan	-
dc.date.accessioned	2021-05-11T19:34:11Z	-
dc.date.available	2021-05-11T19:34:11Z	-
dc.date.issued	2019	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=875465&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/283089	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학부, 2019.8,[iii, 17 p. :]	-
dc.description.abstract	Understanding the content of videos is one of the core techniques for developing various helpful applications in the real world, such as recognizing various human actions for surveillance systems or customer behavior analysis in an autonomous shop. However, understanding the content or story of the video still remains a challenging problem due to its sheer amount of data and temporal structure. In this paper, we propose a multi-channel neural network structure that adopts a two-stream network structure, which has been shown high performance in human action recognition field and uses it as a spatiotemporal video feature extractor for solving video question and answering task. We also adopt a squeeze-and-excitation structure to two-stream network structure for achieving a channel-wise attended spatiotemporal feature. For jointly modeling the spatiotemporal features from video and the textual features from the question, we design a context matching module with a level adjusting layer to remove the gap of information between visual and textual features by applying attention mechanism on joint modeling. Finally, we adopt a scoring mechanism and smoothed ranking loss objective function for selecting the correct answer from answer candidates. We evaluate our model with TVQA dataset and our approach shows the improved result in textual only setting, but the result with visual feature shows the limitation and possibility of our approach.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Two-stream convNet▼aattention mechanism▼avideo question and answering▼acomputer vision▼aartificial intelligence	-
dc.subject	투스트림 신경망▼a집중 기재▼a비디오 질의응답▼a컴퓨터 비전▼a인공지능	-
dc.title	Multi-channel neural network structure including two-stream spatiotemporal feature extractor and attention mechanism for solving video QA task	-
dc.title.alternative	투 스트림 시공간 특징 추출기 및 집중 기제가 포함된 다중 채널 인공신경망을 활용한 비디오 질의응답	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	송치완	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Multi-channel neural network structure including two-stream spatiotemporal feature extractor and attention mechanism for solving video QA task투 스트림 시공간 특징 추출기 및 집중 기제가 포함된 다중 채널 인공신경망을 활용한 비디오 질의응답

KOASAS

Communities & Collections