Coherence-based quantitative analysis of reverberation effect on english automatic speech recognition error잔향이 영어 음성인식 오류에 끼치는 영향의 코히런스 기반 정량적 분석

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 143
  • Download : 0
Automatic speech recognition (ASR) is one of core techniques for human-machine interaction, yet it is too vulnerable to the external noises for real-life uses. Especially, reverberation has convolutive nature which reduces speech clarity to hinder ASR and make it very difficult to be removed from speeches recorded in reverberant environments. Therefore, improving ASR's robustness to reverberation is essential to applying ASR in various environments. In this research, as a precedent research to optimize ASR performance on reverberated speeches, effect of reverberation on ASR error is quantitatively analyzed using coherence. The ASR environment used in this research is in single-channel machine listening ASR in English language. Room impulse responses obtained in various reverberant conditions are convoluted with clean speeches from English language corpus to simulate reverberated speech. Coherence is used to measure the similarity between reverberated speech spectrograms and corresponding clean speech spectrogram at each time frame and frequency bin. A variable named mean phoneme coherence (MPC) is presented to quantify the spectral contamination of a phoneme in a reverberated speech. MPC of a phoneme is obtained by averaging the coherence values of time frames and frequency bins within the time interval where that phoneme is articulated. Spectral contamination of a phoneme is small when the phoneme’s MPC is close to one. On the other hand, spectral contamination is severe when the phoneme’s MPC is close to zero. By applying ASR to reverberated speeches and comparing MPC distributions of each phoneme in correctly and wrongly recognized words, it is shown that MPC values are statistically higher when phonemes belong to the correctly recognized words than when phonemes belong to wrongly recognized words. From this result, it is quantitatively verified that severe spectrum contamination upon reverberation leads to more ASR error. By comparing phoneme groups' MPC distributions, it is shown that stops increase ASR error rate the least while fricatives increase ASR error rate the most upon increase in spectral contamination. In addition, sequential interaction between phonemes is analyzed by grouping phonemes into voiced consonants, unvoiced consonants and vowels. Upon increase in spectral contamination, voiced consonants increase ASR error rate less when preceded by consonants. On the other hand, vowel and unvoiced consonants increase ASR error rate more when one precedes the other upon increase in spectral contamination. From such methodologies, physical interactions between phonemes and spectral contamination upon reverberation on English ASR error are quantitatively analyzed based on coherence.
Advisors
Park, Yong-Hwaresearcher박용화researcher
Description
한국과학기술원 :기계공학과,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 기계공학과, 2020.2,[vii, 45 p. :]

Keywords

reverberation▼asingle-channel▼amachine listening▼aenglish language▼aautomatic speech recognition▼aquantitative▼aphoneme▼aspectral contamination▼acoherence▼amean phoneme coherence (MPC); 잔향▼a단일채널▼a기계청취▼a영어▼a음성인식▼a정량적▼a음소▼a스펙트로그램 오염▼a코히런스▼a평균 음소 코히런스 (MPC)

URI
http://hdl.handle.net/10203/284604
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=910901&flag=dissertation
Appears in Collection
ME-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0