DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Hwang, Sung Ju | - |
dc.contributor.advisor | 황성주 | - |
dc.contributor.author | Min, Dong Chan | - |
dc.date.accessioned | 2023-06-22T19:31:29Z | - |
dc.date.available | 2023-06-22T19:31:29Z | - |
dc.date.issued | 2022 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997690&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/308232 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.2,[iii, 26 p. :] | - |
dc.description.abstract | With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality without fine-tuning. In this work, we propose StyleSpeech, a new TTS model which not only synthesizes high-quality speech but also effectively adapts to new speakers. Specifically, we propose Style-Adaptive Layer Normalization (SALN) which aligns gain and bias of the text input according to the style extracted from a reference speech audio. With SALN, our model effectively synthesizes speech in the style of the target speaker even from a single speech audio. Furthermore, to enhance StyleSpeech’s adaptation to speech from new speakers, we extend it to Meta-StyleSpeech by introducing two discriminators trained with style prototypes, and performing episodic training. The experimental results show that our models generate high-quality speech which accurately follows the speaker’s voice with single short-duration (1~3 sec) speech audio, significantly outperforming baselines. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.title | Meta-StyleSpeech | - |
dc.title.alternative | 다중 화자 적응형 음성 합성 시스템을 위한 합성 방식 연구 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :김재철AI대학원, | - |
dc.contributor.alternativeauthor | 민동찬 | - |
dc.title.subtitle | multi-speaker adaptive text-to-speech generation | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.