Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 126
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Sangminko
dc.contributor.authorPark, Sungjuneko
dc.contributor.authorRo, Yong Manko
dc.date.accessioned2022-11-15T06:00:38Z-
dc.date.available2022-11-15T06:00:38Z-
dc.date.created2022-07-09-
dc.date.created2022-07-09-
dc.date.created2022-07-09-
dc.date.created2022-07-09-
dc.date.issued2022-10-25-
dc.identifier.citationEuropean Conference on Computer Vision, ECCV 2022, pp.497 - 514-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://hdl.handle.net/10203/299637-
dc.description.abstractRetrieving desired videos using natural language queries has attracted increasing attention in research and industry fields as a huge number of videos appear on the internet. Some existing methods attempted to address this video retrieval problem by exploiting multi-modal information, especially audio-visual data of videos. However, many videos often have mismatched visual and audio cues for several reasons including background music, noise, and even missing sound. Therefore, the naive fusion of such mismatched visual and audio cues can negatively affect the semantic embedding of video scenes. Mismatch condition can be categorized into two cases: (i) Audio itself does not exist (ii) Audio exists but does not match with visual. To deal with (i), we introduce audio-visual associative memory (AVA-Memory) to associate audio cues even from videos without audio data. The associated audio cues can guide the video embedding feature to be aware of audio information even in the missing audio condition. To address ( ii), we propose audio embedding adjustment by considering the degree of matching between visual and audio data. In this procedure, constructed AVA-Memory enables to figure out how well the visual and audio in the video are matched and to adjust the weighting between actual audio and associated audio. Experimental results show that the proposed method outperforms other state-of-the-art video retrieval methods. Further, we validate the effectiveness of the proposed network designs with ablation studies and analyses.-
dc.languageEnglish-
dc.publisherEuropean Computer Vision Association-
dc.titleAudio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment-
dc.typeConference-
dc.identifier.wosid000904096200029-
dc.identifier.scopusid2-s2.0-85142765848-
dc.type.rimsCONF-
dc.citation.beginningpage497-
dc.citation.endingpage514-
dc.citation.publicationnameEuropean Conference on Computer Vision, ECCV 2022-
dc.identifier.conferencecountryIS-
dc.identifier.conferencelocationTel Aviv-
dc.identifier.doi10.1007/978-3-031-19781-9_29-
dc.contributor.localauthorRo, Yong Man-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0