한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구 An end-to-end synthesis method for Korean text-to-speech systems

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 132
  • Download : 0
A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.
Publisher
한국음성학회
Issue Date
2018-03
Language
Korean
Keywords

attention mechanism; end-to-end; Korean text-to-speech system; sequence-to-sequence; Tacotron

Citation

말소리와 음성과학, v.10, no.1, pp.39 - 48

ISSN
2005-8063
DOI
10.13064/KSSS.2018.10.1.039
URI
http://hdl.handle.net/10203/243809
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0