DSpace at KOASAS: Semantically complex audio to video generation with audio source separation

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Journal Papers(저널논문)

Semantically complex audio to video generation with audio source separation

Cited 0 time in webofscience

Cited 0 time in

Hit : 478
Download : 0

Export

Kim, Sieun / Jeong, Jaehwan / In, Sumin / Lee, Seung Hyun / Kim, Seungryong researcher / Kim, Saerom / Baek, Wooyeol / Yoon, Sang Ho researcher / Culurciello, Eugenio / Kim, Sangpil

Recent advancements in artificial intelligence for audio-to-video generation have shown the ability to generate high-quality videos from audio, particularly by focusing on temporal semantics and magnitude. However, existing works struggle to capture all semantics from audio, as real world audios often consist of mixed sources, making it challenging to generate semantically aligned videos. To solve this problem, we present a novel multi- source audio-to-video generation framework that incorporates decomposed multiple audio sources into video generative models. Specifically, our proposed Attention Mosaic directly maps each decomposed audio feature to the corresponding spatial attention feature. In addition, our condition injection module is helpful for producing more natural contexts with non-audible objects by leveraging the knowledge of existing generative models. Our experiments show that the proposed framework achieves state-of-the-art performance in representing both multi- and single-source audio-to-video generation methods.

Publisher: PERGAMON-ELSEVIER SCIENCE LTD

Issue Date: 2025-06

Language: English

Article Type: Article

Citation: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, v.149

ISSN: 0952-1976

DOI: 10.1016/j.engappai.2025.110457

URI: http://hdl.handle.net/10203/328762

Appears in Collection: AI-Journal Papers(저널논문)GCT-Journal Papers(저널논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Academic Information Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Semantically complex audio to video generation with audio source separation

KOASAS

Communities & Collections