DSpace at KOASAS: Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment

Cited 0 time in webofscience

Cited 0 time in

Hit : 127
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Lee, Sangmin	ko
dc.contributor.author	Park, Sungjune	ko
dc.contributor.author	Ro, Yong Man	ko
dc.date.accessioned	2022-11-15T06:00:38Z	-
dc.date.available	2022-11-15T06:00:38Z	-
dc.date.created	2022-07-09	-
dc.date.created	2022-07-09	-
dc.date.created	2022-07-09	-
dc.date.created	2022-07-09	-
dc.date.issued	2022-10-25	-
dc.identifier.citation	European Conference on Computer Vision, ECCV 2022, pp.497 - 514	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10203/299637	-
dc.description.abstract	Retrieving desired videos using natural language queries has attracted increasing attention in research and industry fields as a huge number of videos appear on the internet. Some existing methods attempted to address this video retrieval problem by exploiting multi-modal information, especially audio-visual data of videos. However, many videos often have mismatched visual and audio cues for several reasons including background music, noise, and even missing sound. Therefore, the naive fusion of such mismatched visual and audio cues can negatively affect the semantic embedding of video scenes. Mismatch condition can be categorized into two cases: (i) Audio itself does not exist (ii) Audio exists but does not match with visual. To deal with (i), we introduce audio-visual associative memory (AVA-Memory) to associate audio cues even from videos without audio data. The associated audio cues can guide the video embedding feature to be aware of audio information even in the missing audio condition. To address ( ii), we propose audio embedding adjustment by considering the degree of matching between visual and audio data. In this procedure, constructed AVA-Memory enables to figure out how well the visual and audio in the video are matched and to adjust the weighting between actual audio and associated audio. Experimental results show that the proposed method outperforms other state-of-the-art video retrieval methods. Further, we validate the effectiveness of the proposed network designs with ablation studies and analyses.	-
dc.language	English	-
dc.publisher	European Computer Vision Association	-
dc.title	Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment	-
dc.type	Conference	-
dc.identifier.wosid	000904096200029	-
dc.identifier.scopusid	2-s2.0-85142765848	-
dc.type.rims	CONF	-
dc.citation.beginningpage	497	-
dc.citation.endingpage	514	-
dc.citation.publicationname	European Conference on Computer Vision, ECCV 2022	-
dc.identifier.conferencecountry	IS	-
dc.identifier.conferencelocation	Tel Aviv	-
dc.identifier.doi	10.1007/978-3-031-19781-9_29	-
dc.contributor.localauthor	Ro, Yong Man	-

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment

KOASAS

Communities & Collections