PHONEMIC-LEVEL DURATION CONTROL USING ATTENTION ALIGNMENT FOR NATURAL SPEECH SYNTHESIS

Cited 12 time in webofscience Cited 11 time in scopus
  • Hit : 618
  • Download : 0
Recent attention-based end-to-end speech synthesis from text systems have achieved human-level performance. However, many approaches cause a sequence-to-sequence model to generate only averaged results of the input text, making it difficult to control the duration of utterance. In this study, we present a novel mechanism for phonemic-level duration control ( PDC) in a nearly end-to-end manner in order to solve this problem. We used a teacher attention alignment generated by an annotation speech analyzer program. Our method is inspired by the idea that the duration of a phoneme is highly related to its phonemic features. These phonemic features are saved on the attention alignment by adding duration embedding to it. This enables the model to learn and control the phonemic and rhythmic features of speech. We also show that providing alignment information as a teacher loss term improves training speed and notably, makes the model better at controlling the speed of dramatic change in phonemic-level duration with subjective demonstration. As a result, we show that our PDC speech synthesis with alignment loss outperforms other baseline methods without losing the ability to control the duration of phonemes in extremely adjusted environments with faster convergence.
Publisher
IEEE
Issue Date
2019-05
Language
English
Citation

44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5896 - 5900

ISSN
1520-6149
DOI
10.1109/ICASSP.2019.8683827
URI
http://hdl.handle.net/10203/274903
Appears in Collection
BiS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 12 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0