Prosody control in a large corpus-based TTS system대규모 코퍼스 기반 TTS 시스템에서의 운율 제어

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 463
  • Download : 0
A text-to-speech (TTS) system converts an arbitrary text to synthetic speech. As TTS systems are being incorporated into more and more various applications like e-mail reader and language education system, human users`` desire for a higher quality system is increasing. Recently, large corpus-based concatenative speech synthesis has been the most popular approach for constructing TTS systems. With this method, it should be possible to synthesize more natural sounding speech than can be produced with a small set of controlled units. Although intelligibility of the TTS system with this method is extremely good and certainly good enough for many real applications, the lack of natural prosody is the major source of barriers to meeting the users`` expectation. Prosody, therefore, is the feature within TTS systems that is most in need of improvement. In this thesis, we develop a large corpus-based Korean TTS system and propose prosody control methods for the system to improve the naturalness of synthetic speech. The implemented TTS system uses a triphone as a basic unit for concatenation, and has 400,042 triphone instances as a speech corpus, which contains 16,072 unique triphone types. Since a triphone includes context information, it can present all possible allophones. However, it has two problems to use a triphone as a basic synthesis unit. One is the absence or sparsity of some triphone types, and the other is the size of search space caused by some triphone types which have too many instances. In a text selection process where a set of sentence for recording is prepared, we use a greedy algorithm with the score table designed in consideration of the triphone coverage and the balance of instances in an effort to avoid these problems. After recording speech corpus, we use a bottom-up clustering and three backing off trees to solve the sparsity problem. To reduce search space for real-time processing, we use pre-selected candidate unit lists, and the performance te...
Advisors
Oh, Yung-Hwanresearcher오영환researcher
Description
한국과학기술원 : 전산학전공,
Publisher
한국과학기술원
Issue Date
2004
Identifier
237664/325007  / 000965814
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학전공, 2004.2, [ vii, 93 p. ]

Keywords

TTS SYSTEM; PROSODY CONTROL; 운율 제어; 문서-음성 변환 시스템

URI
http://hdl.handle.net/10203/32861
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=237664&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0