(A) study on utterance verification using phone and state log-likelihood ratios in large vocabulary speech recognition대어휘 음성인식에서 음소 및 스테이트 로그우도비를 이용한 발화검증 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 572
  • Download : 0
Nowadays we can build a good quality automatic speech recognition (ASR) system with high recognition performance in some applications if sufficient training data are provided for the target tasks. However in real fields, not only diverse environmental conditions but also out-of-vocabulary inputs degrade the recognition performance of ASR systems. Thus it is very important to develop a technology to be able to make a proper decision with acceptance and rejection according to reliability of recognition results. Since similar words are likely to exist in search network as out-of-vocabulary words in large vocabulary speech recognition system, it is quite difficult to effectively reject incorrectly-recognized word simply by conventional confidence measures, particularly when the recognized word is similar to its correct transcription. In this thesis, we propose a few confidence measures using word voiceprint models and state log-likelihood ratio (SLLR) in verifying the speech recognition results. Word voiceprint models are designed to include word-dependent characteristics from the distributions of phone log-likelihood ratio and duration. Additionally, when obtaining a log-likelihood ratio-based word voiceprint score, we propose a new log-scale normalization function using the distribution of the phone log-likelihood ratio, instead of the sigmoid function widely used. This function plays a role of emphasizing the contribution of an incorrectly-recognized phone to the confidence score. This word-dependent information helps achieving a more discriminative score for out-of-vocabulary words. The proposed method shows that the relative reduction in equal error rate is 16.9% compared to the baseline one using simple phone log-likelihood ratios. The second proposed utterance verification algorithm uses state log-likelihood ratio with frame and state selection. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we...
Advisors
Kim, Hoi-Rinresearcher김회린researcher
Description
한국과학기술원 : 정보통신공학과,
Publisher
한국과학기술원
Issue Date
2010
Identifier
418778/325007  / 020045912
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 정보통신공학과, 2010.2, [ iv, 96 p. ]

Keywords

state log-likelihood ratio; Word voiceprint; Confidence measure; Utterance verification; adatpive word thresholding; 적응 단어 문턱치 모델링; 스테이트 로그우도비; 단어 음색도 모델; 신뢰도; 발화검증

URI
http://hdl.handle.net/10203/39856
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=418778&flag=dissertation
Appears in Collection
ICE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0