Speech enhancement utilizing input correlation matrix characteristics for dual- and speech presence probability for multi-channel입력 상관행렬의 특성과 음성존재확률을 이용한 멀티채널 음성강화에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 620
  • Download : 0
The ultimate goal of speech enhancement should be ‘maximal noise reduction with minimal speech distortion.’ Since Norbert Wiener proposed an optimum filter derived from the unconstrained optimization problem by minimizing the total estimation error, speech enhancement technology has exceedingly progressed. However, single channel-based speech enhancement techniques inevitably cause signal distortion in return for noise reduction. In particular, the performance degradation becomes much worse in case for nonstationary noises. As for the remedy of the problem of the nonstationary noise, researches on microphone array-based noise reduction area have achieved great advances in blind source separation (BSS) and adaptive beamforming (ABF). Nonetheless, still BSS methods have to be improved in their practicality more in order to be used for real-time noise reduction and ABF methods need an effort to overcome the signal cancellation caused by inaccurate time-alignment in adverse noisy environments. Recently, some researchers have reported that especially for nonstationary noise reduction, multiple-input-multiple-output (MIMO)-based methods are superior to the generalized sidelobe canceller which is the representative ABF method. Hence, it becomes very natural that our work is focused on the performance improvement of the MIMO-based speech enhancement methods. MIMO-based noise reduction methods are primarily based on the SNR estimation with second-order statistics of noise-corrupted inputs and estimated noises without any additional information. In general, they can be largely classified into the parameterized multichannel Wiener filter (PMWF) and multi-channel subspace-based filters (MSFs). The PMWF is comparatively efficient and easy to implement whereas MSFs are computationally demanding due to the data-dependent transformation, i.e., singular value decomposition or eigenvalue decomposition. Nevertheless, MSFs have great potential to surpass the PMWF in noise reduction performance due to the optimal signal decomposition. In addition, as a special case of the MIMO-based methods, a dual-microphone noise reduction method based on the phase difference of two microphones presented outstanding noise reduction performance. However, the phase-based method extrinsic to magnitude information can become vulnerable in reverberant and adverse noisy environments. In order to contribute to solve the above problems, we proposed two methods. First, an MSF named as an optimal filter in spatiospectral domain (OFSS) is derived, the MC-SPP in that domain is also defined, and gain modification of the OFSS using the MC-SPP is proposed. In this method, we decompose each frequency bin of the multichannel inputs (a power spectral density matrix) into smaller units such as eigenvectors or subspaces in order to perform more delicate and efficient noise reduction. Especially, the proposed OFSS alleviates computational burden and improves the viability of MSFs compared to the existing MSF methods. This approach is proved to be effective by the simulation results although it still needs further improvements. Second, , a novel dual-microphone-based noise reduction method based on the determinant analysis on the input correlation matrix is proposed. Through the analysis, an equation established between determinants of the noise-corrupted input and noise correlation matrices is derived. Using the equation, a prominent feature for speech activity detection to update noise statistics and SNR estimation to obtain the Wiener filter is extracted. Through the evaluation with the database collected in a real car environment, it is proved that the proposed dual-microphone-based method outperforms the state-of-the-art dual-microphone noise reduction method.
Advisors
Hahn, Minsooresearcher한민수researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2016.8 ,[v, 82 p. :]

Keywords

microphone array; MIMO-based speech enhancement; multichannel speech presence probability; determinant analysis; spatiospectral subspace method; 마이크로폰 배열; 다채널입력다채널출력 기반 음성강화; 다채널 음성 존재 확률; 행렬식 분석; 공간주파수 부공간 방법

URI
http://hdl.handle.net/10203/222380
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663193&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0