DSpace at KOASAS: Sound-Guided Semantic Video Generation

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Conference Papers(학술회의논문)

Sound-Guided Semantic Video Generation

Cited 2 time in

Cited 0 time in

Hit : 110
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Lee, Seung Hyun	ko
dc.contributor.author	Yoon, Sang Ho	ko
dc.contributor.author	Kim, Sangpil	ko
dc.contributor.author	Kim, Jinkyu	ko
dc.contributor.author	Oh, Gyeongrok	ko
dc.contributor.author	Byeon, Wonmin	ko
dc.contributor.author	Kim, Chanyoung	ko
dc.contributor.author	Ryoo, Won Jeong	ko
dc.contributor.author	Bae, Jihyun	ko
dc.contributor.author	Cho, Hyunjun	ko
dc.date.accessioned	2022-11-17T02:01:00Z	-
dc.date.available	2022-11-17T02:01:00Z	-
dc.date.created	2022-11-17	-
dc.date.created	2022-11-17	-
dc.date.created	2022-11-17	-
dc.date.issued	2022-10-23	-
dc.identifier.citation	2022 European Conference on Computer Vision, pp.34 - 50	-
dc.identifier.issn	978-3-031	-
dc.identifier.uri	http://hdl.handle.net/10203/299776	-
dc.description.abstract	The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the direction and magnitude in the StyleGAN latent space. In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound-image-text) embedding space. As sound provides the temporal contexts of the scene, our framework learns to generate a video that is semantically consistent with sound. First, our sound inversion module maps the audio directly into the StyleGAN latent space. We then incorporate the CLIP-based multimodal embedding space to further provide the audio-visual relationships. Finally, the proposed frame generator learns to find the trajectory in the latent space which is coherent with the corresponding sound and generates a video in a hierarchical manner. We provide the new high-resolution landscape video dataset (audio-visual pair) for the sound-guided video generation task. The experiments show that our model outperforms the state-of-the-art methods in terms of video quality. We further show several applications including image and video editing to verify the effectiveness of our method.	-
dc.language	English	-
dc.publisher	Springer	-
dc.title	Sound-Guided Semantic Video Generation	-
dc.type	Conference	-
dc.identifier.wosid	000904106100003	-
dc.identifier.scopusid	2-s2.0-85142678353	-
dc.type.rims	CONF	-
dc.citation.beginningpage	34	-
dc.citation.endingpage	50	-
dc.citation.publicationname	2022 European Conference on Computer Vision	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	Tel Aviv	-
dc.identifier.doi	10.1007/978-3-031-19790-1_3	-
dc.contributor.localauthor	Yoon, Sang Ho	-
dc.contributor.nonIdAuthor	Lee, Seung Hyun	-
dc.contributor.nonIdAuthor	Kim, Sangpil	-
dc.contributor.nonIdAuthor	Kim, Jinkyu	-
dc.contributor.nonIdAuthor	Oh, Gyeongrok	-
dc.contributor.nonIdAuthor	Byeon, Wonmin	-
dc.contributor.nonIdAuthor	Kim, Chanyoung	-
dc.contributor.nonIdAuthor	Ryoo, Won Jeong	-
dc.contributor.nonIdAuthor	Bae, Jihyun	-
dc.contributor.nonIdAuthor	Cho, Hyunjun	-

Appears in Collection: GCT-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 2 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Sound-Guided Semantic Video Generation

This item is cited by other documents in WoS

KOASAS

Communities & Collections