For more frequent applications of synthetic speech in daily life, the speech quality improvement of presently available speech synthesizers in the sense of naturalness is very important. But it cannot be denied that still the naturalness of present synthetic speech is far from satisfaction. This study aims to provide some useful information on Korean read- and dialogue-style prosodic and phonetic characteristics in the hope of its future use for the improvement of synthetic Korean speech naturalness. In order for the prosodic and phonetic characteristics exploitation, four features such as the spectrogram, the short-time energy, the pitch frequency, and the duration are mainly utilized. Firstly, spectrogram analysis shows dialogue-style speech generally has more severe coarticulation effect than that of dictation-style speech and, as a result, allophones are much more different from the original corresponding phonemes in the sense of their frequency characteristic. Both the short-time energy and the pitch frequency tend to be higher in dialogue-style speech while their variances also larger.
However, no clear relationship between them is observed. Finally, the duration of dialogue-style speech shows much larger variance as expected. This phenomenon might be caused by several factors, such as speaker’s emotion, speaking style, understanding level, etc. By the way, it is clearly observed that accented speech, usually encountered in dialogue-style speech more frequently, is shortened. Our results confirm that dialogue-style and dictation-style speech are significantly different from each other both in their phonetic and prosodic characteristics. This suggests that current Korean synthesizers based on the dictation-style speech database may fail to achieve acceptable naturalness when dialogue-style speech synthesis is tried by utilizing only the prosodic information of dialogue-style speech.