DSpace at KOASAS: Talking face generation by disentangled audio-visual representation and implicit natural code

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Talking face generation by disentangled audio-visual representation and implicit natural code분리된 오디오 비주얼 표현과 함축적인 내추럴 코드에 의한 말하는 얼굴 생성

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 3
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	최성희	-
dc.contributor.author	Kwak, Sangwon	-
dc.contributor.author	곽상원	-
dc.date.accessioned	2024-07-30T19:31:45Z	-
dc.date.available	2024-07-30T19:31:45Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097262&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321682	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학부, 2024.2,[iv, 19 p. :]	-
dc.description.abstract	This paper addresses making a video where a speaker of a video is changed to another identity to look natural. One related research, called Disentangled Audio-Visual System(DAVS), got good results based on the way it disentangles audio-visual representation from images. However, DAVS generates static results that only move the mouth area. Based on DAVS, I aim to make dynamic results with head pose and facial expression. I approach this issue by adding implicit natural code. It is an expanded idea of Zhou et al., called Pose Controllable Audio-Visual System(PCAVS), that controls the pose of talking faces with only 12 dimensions vector, not by 2D or 3D landmark, to implicit natural code. In addition, I add a discriminator for accurate lip sync. With my model armed with the aforementioned components, I conduct comparative experiments with existing talking face generation models using a large-scale audio-visual dataset. In the case of videos with little movement, my model generates visual quality results equivalent to or better than other models. Also, ablation studies for a natural code and a discriminator for lip sync are conducted. It shows that each component does its job well.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	말하는 얼굴 생성▼a립싱크▼a이미지로부터 오디오와 비주얼 표현 분리하기▼a함축적인 내추럴 코드	-
dc.subject	Talking face generation▼aLip-sync▼aDisentangling audio-visual representation from images▼aImplicit natural code	-
dc.title	Talking face generation by disentangled audio-visual representation and implicit natural code	-
dc.title.alternative	분리된 오디오 비주얼 표현과 함축적인 내추럴 코드에 의한 말하는 얼굴 생성	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	Choi, Sunghee	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Talking face generation by disentangled audio-visual representation and implicit natural code분리된 오디오 비주얼 표현과 함축적인 내추럴 코드에 의한 말하는 얼굴 생성

KOASAS

Communities & Collections