DSpace at KOASAS: Audio-visual speech recognition : stochastic optimization of hidden markov models, modeling of interframe correlations, and integration with neural networks

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Audio-visual speech recognition : stochastic optimization of hidden markov models, modeling of interframe correlations, and integration with neural networks시청각 음성인식 : 은닉 마르코프 모델의 확률적 최적화, 프레임간 상관관계의 모델링 및 신경회로망을 이용한 통합

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 410
Download : 0

Export

Lee, Jong-Seok / 이종석

Automatic speech recognition has become a popular and important technique for the man-machine interface service nowadays. Although many existing speech recognition systems show high recognition performance in controlled situations, their performance is not satisfactory in noisy circumstances yet. The problem of overcoming this limitation and achieving noise-robust recognition performance is important but difficult in the automatic speech recognition field. Audio-visual speech recognition (AVSR) is to recognize speech by observing both acoustic and visual signals for robust recognition in such circumstances; a microphone records the voice signal, a camera captures the speaker’s lip movement, and the two signals are combined for recognition of the speech. Although speech recognition using the visual signal shows rather low accuracy compared to the conventional acoustic speech recognition in low-noise environments, it is not affected by the acoustic noise and, thus, can be a powerful solution which compensates for the performance degradation of the acoustic speech recognition in noisy environments. In this dissertation, we focus on improving robustness of AVSR by considering the three parts composing the recognition system: acoustic speech recognition, visual speech recognition and integration of the two modalities. First, we propose a novel stochastic optimization algorithm of hidden Markov models (HMMs) used for the recognizer to improve the visual speech recognition performance. We combine the powerful stochastic search algorithm, simulated annealing, with the local optimization technique to develop the hybrid simulated annealing algorithm for improving speed and performance of the algorithm. While the conventional learning algorithm of HMMs, the expectation-maximization method, only performs local optimization of the likelihood function, the proposed algorithm can perform global search and, thus, improve the recognition performance of the HMMs. It ...

Advisors: Park, Cheol-Hoon researcher; 박철훈 researcher

Description: 한국과학기술원 : 전기및전자공학전공,

Publisher: 한국과학기술원

Issue Date: 2006

Identifier: 258134/325007 / 020015218

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학전공, 2006.8, [ x, 113 p. ]

Keywords: hidden Markov model; robustness speech recognition; lipreading; Audio-visual speech recognition; neural network; 신경회로망; 은닉 마르코프 모델; 강인음성인식; 립리딩; 시청각 음성인식

URI: http://hdl.handle.net/10203/36066

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=258134&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

KOASAS

Communities & Collections