DSpace at KOASAS: Event-Specific Audio-Visual Fusion Layers: A Simple and New Perspective on Video Understanding

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Event-Specific Audio-Visual Fusion Layers: A Simple and New Perspective on Video Understanding

Cited 0 time in webofscience

Cited 0 time in

Hit : 61
Download : 0

Export

Senocak, Arda / Kim, Junsik / Oh, Tae-Hyun / Li, Dingzeyu / Kweon, In-So researcher

To understand our surrounding world, our brain is continuously inundated with multisensory information and their complex interactions coming from the outside world at any given moment. While processing this information might seem effortless for human brains, it is challenging to build a machine that can perform similar tasks since complex interactions cannot be dealt with a single type of integration but require more sophisticated approaches. In this paper, we propose a new simple method to address the multisensory integration in video understanding. Unlike previous works where a single fusion type is used, we design a multi-head model with individual event-specific layers to deal with different audio-visual relationships, enabling different ways of audio-visual fusion. Experimental results show that our event-specific layers can discover unique properties of the audio-visual relationships in the videos, e.g., semantically matched moments, and rhythmic events. Moreover, although our network is trained with single labels, our multi-head design can inherently output additional semantically meaningful multi-labels for a video. As an application, we demonstrate that our proposed method can expose the extent of event-characteristics of popular benchmark datasets.

Publisher: Institute of Electrical and Electronics Engineers Inc.

Issue Date: 2023-01

Language: English

Citation: 23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, pp.2236 - 2246

DOI: 10.1109/WACV56688.2023.00227

URI: http://hdl.handle.net/10203/305987

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Event-Specific Audio-Visual Fusion Layers: A Simple and New Perspective on Video Understanding

KOASAS

Communities & Collections