This thesis describes a methodology in the prediction of prosodic phrase boundaries for Korean text-to-speech (TTS) conversion systems. The proposed method in this thesis is modeled using the temporal constraints of the human articulatory system and the syntactic influence emanating from dependency relation which is effective in freer word-order languages.
TTS conversion, a well-known technique of communication between humans and computers, is the process of generating speech from text and the ultimate goal of speech synthesis; for any string of words a TTS system can approximate the way a human would read these same words. Although the need for communication between humans and computers is increasing as computers become more prevalent, currently, TTS systems are used only for several restricted applications because of their poor synthetic quality.
Prosody plays an important role in speech production as well as speech understanding. In continuous speech, speakers tend to group words into phrases whose boundaries are marked by duration and intonational cues, and many phonological rules constrain operation only within such phrases, usually termed prosodic phrases. Therefore, a computational model for prosodic structure is necessary for high quality TTS conversion since the correct assignment of phrase breaks can increase the intelligibility of a sentence as well as improve its naturalness.
In this work, several statistical models for predicting the prosodic phrase boundaries of speech are proposed. The computational prosody model in this work is automatically trainable only with syntactic information and can be incorporated into existing TTS conversion systems. This work makes use of dependency grammar, which is known to be more effective for parsing word-order free languages including Korean. For prosodic boundary prediction, various relevant features extracted from text analysis are incorporated instead of an input word sequence itself, whose motivation and ...