(A) speech separation from multi-speaker dialogues under reverberant environment based on enhanced interaural coherence = 잔향이 있는 다중 화자 환경에서의 두 개의 마이크로폰을 이용한 코히런스 기반 음성 분리 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 236
  • Download : 0
The degenerate unmixing estimation technique (DUET) and model-based expectation-maximization source separation and localization (MESSL) separate the spectrogram based on the histogram. However, accurate histogram separation is difficult because the histogram is distributed around the actual source location and overlap due to the reverberation effect. In addition, since speech recognition performance is lower than that of speech without reverberation, only a direct speech having less reverberation influence should be extracted. In order to solve this problem, the interaural coherence proposed in the previous study is used to isolate spectrogram bins which have a large influence of reverberation. However, it does not apply sufficient ensemble averaging, so we can not exactly see the effect of reverberation. In this research, we tried to apply sufficient ensemble averaging by determining the quasi-steady state interval of speech and the Canny edge detection algorithm, which is used in image processing, is applied to the spectrogram image to determine this interval. Based on the determined interval, the optimal interaural coherence calculation method is applied so that the effect of the reverberation can be seen more accurately for the same resolution. In order to extract only the direct sound source with less effect of reverberation, we proposed a model in which the coherence is applied as a sigmoid function to the MESSL. As a result, we improve the speech separation performance by reducing the distribution of the histogram and extract only the spectrogram bins with less influence of the reverberation, so that the speech recognition performance deteriorates. As a result of this research, it is possible to improve the performance of multiple direct speech separation in a reverberant environment with a small number of microphones and apply it to a mobile device or a companion robot so as to provide better service through improved speech recognition performance.
Advisors
Park, Yong-Hwaresearcher박용화researcher
Description
한국과학기술원 :기계공학과,
Publisher
한국과학기술원
Issue Date
2019
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 기계공학과, 2019.2,[iv, 50 p. :]

Keywords

Time-frequency masking▼aspeech separation▼aspeech enhancement▼areverberation▼acoherence; 시간-주파수 마스킹▼a음성 분리▼a음성 품질 향상▼a잔향▼a코히런스

URI
http://hdl.handle.net/10203/265854
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=843010&flag=dissertation
Appears in Collection
ME-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0