Meta-StyleSpeech : multi-speaker adaptive text-to-speech generation다중 화자 적응형 음성 합성 시스템을 위한 합성 방식 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 236
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorHwang, Sung Ju-
dc.contributor.advisor황성주-
dc.contributor.authorMin, Dong Chan-
dc.date.accessioned2023-06-22T19:31:29Z-
dc.date.available2023-06-22T19:31:29Z-
dc.date.issued2022-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997690&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/308232-
dc.description학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.2,[iii, 26 p. :]-
dc.description.abstractWith rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality without fine-tuning. In this work, we propose StyleSpeech, a new TTS model which not only synthesizes high-quality speech but also effectively adapts to new speakers. Specifically, we propose Style-Adaptive Layer Normalization (SALN) which aligns gain and bias of the text input according to the style extracted from a reference speech audio. With SALN, our model effectively synthesizes speech in the style of the target speaker even from a single speech audio. Furthermore, to enhance StyleSpeech’s adaptation to speech from new speakers, we extend it to Meta-StyleSpeech by introducing two discriminators trained with style prototypes, and performing episodic training. The experimental results show that our models generate high-quality speech which accurately follows the speaker’s voice with single short-duration (1~3 sec) speech audio, significantly outperforming baselines.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.titleMeta-StyleSpeech-
dc.title.alternative다중 화자 적응형 음성 합성 시스템을 위한 합성 방식 연구-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :김재철AI대학원,-
dc.contributor.alternativeauthor민동찬-
dc.title.subtitlemulti-speaker adaptive text-to-speech generation-
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0