Audio-visual speech recognition : stochastic optimization of hidden markov models, modeling of interframe correlations, and integration with neural networks시청각 음성인식 : 은닉 마르코프 모델의 확률적 최적화, 프레임간 상관관계의 모델링 및 신경회로망을 이용한 통합

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 394
  • Download : 0
Automatic speech recognition has become a popular and important technique for the man-machine interface service nowadays. Although many existing speech recognition systems show high recognition performance in controlled situations, their performance is not satisfactory in noisy circumstances yet. The problem of overcoming this limitation and achieving noise-robust recognition performance is important but difficult in the automatic speech recognition field. Audio-visual speech recognition (AVSR) is to recognize speech by observing both acoustic and visual signals for robust recognition in such circumstances; a microphone records the voice signal, a camera captures the speaker’s lip movement, and the two signals are combined for recognition of the speech. Although speech recognition using the visual signal shows rather low accuracy compared to the conventional acoustic speech recognition in low-noise environments, it is not affected by the acoustic noise and, thus, can be a powerful solution which compensates for the performance degradation of the acoustic speech recognition in noisy environments. In this dissertation, we focus on improving robustness of AVSR by considering the three parts composing the recognition system: acoustic speech recognition, visual speech recognition and integration of the two modalities. First, we propose a novel stochastic optimization algorithm of hidden Markov models (HMMs) used for the recognizer to improve the visual speech recognition performance. We combine the powerful stochastic search algorithm, simulated annealing, with the local optimization technique to develop the hybrid simulated annealing algorithm for improving speed and performance of the algorithm. While the conventional learning algorithm of HMMs, the expectation-maximization method, only performs local optimization of the likelihood function, the proposed algorithm can perform global search and, thus, improve the recognition performance of the HMMs. It ...
Advisors
Park, Cheol-Hoonresearcher박철훈researcher
Description
한국과학기술원 : 전기및전자공학전공,
Publisher
한국과학기술원
Issue Date
2006
Identifier
258134/325007  / 020015218
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학전공, 2006.8, [ x, 113 p. ]

Keywords

hidden Markov model; robustness speech recognition; lipreading; Audio-visual speech recognition; neural network; 신경회로망; 은닉 마르코프 모델; 강인음성인식; 립리딩; 시청각 음성인식

URI
http://hdl.handle.net/10203/36066
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=258134&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0