DSpace at KOASAS: NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Theses_Master(석사논문)

NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior생성적 사전 지식을 이용한 단일 이미지로부터 음성 입력 기반 말하는 3D 얼굴 생성

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 1
Download : 0

Export

Kim, Gihoon / 김기훈

Audio-driven talking head generation is advancing from 2D to 3D content. Notably, recent advancements leveraging Neural Radiance Field (NeRF) are in the spotlight to synthesize 3D output but they need extensive paired audio-visual data for each identity, limiting their scalability. On the other hand, some studies have demonstrated that even with a single image, it is possible to generate convincing audio-driven talking head synthesis. Despite their promise, as observed, these techniques struggle to produce accurate 3D-aware results due to insufficient information on obscured regions of a single image. In this paper, we propose our novel pipeline, NeRFFaceSpeech, which enables us to bridge the trade-off between the number of images and 3D information fidelity. Using prior knowledge of generative models combined with NeRF, our method can craft a 3D-consistent facial feature space corresponding to a single image. Following this, our approach employs ray deformation to map the audio-correlated vertex dynamics from a parametric face model to the facial feature space, ensuring realistic 3D facial motion. Moreover, to replenish the lacking information in the inner-mouth area, which can not be obtained from a given single image, we introduce LipaintNet—a novel network trained in a self-supervised manner. Lastly, our comprehensive experiments demonstrate the superiority of our pipeline for producing enhanced 3D consistency in generating audio-driven talking heads from a single image compared to previous approaches.

Advisors: 노준용 researcher

Description: 한국과학기술원 :문화기술대학원,

Publisher: 한국과학기술원

Issue Date: 2024

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 문화기술대학원, 2024.2,[iv, 31 p. :]

Keywords: 음성 기반 말하는 얼굴 생성▼a3D 애니메이션▼a자기 지도 학습▼a신경 방사 필드▼a생성적 사전지식; Audio-driven talking head generation▼aNeural radiance field (NeRF)▼aD-aware imaging▼aSelf-supervised learning▼aGenerative prior

URI: http://hdl.handle.net/10203/321390

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096175&flag=dissertation

Appears in Collection: GCT-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior생성적 사전 지식을 이용한 단일 이미지로부터 음성 입력 기반 말하는 3D 얼굴 생성

KOASAS

Communities & Collections