Spatial hearing algorithms based on binaural zero-crossings : sound source localization, segregation, and dereverberation = 영교차점에 기초한 공간 청각 알고리즘 : 음원 국지화, 분리 및 반향제거
sound source localization, segregation, and dereverberation
This thesis concerns a new zero-crossing-based binaural model for spatial hearing. Conventional binaural model computes cross-correlations of binaural signals for the estimation of the interaural time difference which is a primary spatial cue. However, the cross-correlation-based binaural processing model requires high computational complexity and suffers from inaccuracies in localizing sound sources especially in a noisy multisource environment.
The proposed model extracts two important binaural cues of interaural time difference (ITD) and interaural intensity difference (IID) on the basis of zero-crossing times and interval powers of filtered signal. This fundamental difference on binaural cue extraction gives great flexibility on designing spatial hearing algorithms. Another distinctive feature of our model is to estimate the signal-to-noise ratios (SNRs) of filtered signal using the variances of ITD sample, enabling us to perform noise-robust estimation of ITDs using the estimated SNRs. Using the zero-crossing-based binaural model, we developed three novel algorithms on spatial hearing: localization, segregation, and dereverberation.
Localization: On the histogram of ITD samples weighted by the estimated SNRs, multiple sound source directions are localized in noisy environments. In the experiments on noisy multisource environments, the proposed localization algorithm provided more accurate noise robust estimation of sound source directions compared conventional cross-correlation-based method.
Segregation: Using the locations of sound sources, we assigned each zero-crossing interval power to one of the sound source to estimate the target-to-interferers power ratio. Then two types of masks, binary and soft, derived from the estimated power ratios for the segregation and missing data recognition tasks. On both the speech segregation and recognition tests, our ratio mask showed superior results to the cross-correlation-ba...