DSpace at KOASAS: Monaural speech segregation based on pitch track correction using bayesian filters

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Monaural speech segregation based on pitch track correction using bayesian filters베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 471
Download : 0

Export

Kim, Han-Gyu

In this work, pitch tracking technique that adopts Bayesian filters and speech/music pitch classification using recurrent neural networks (RNN) for speech segregation from mixtures of speech and competing sounds are proposed. Conventional speech segregation methods use sub-band masking in which the masks are obtained by modulation at the found speech pitch frequency. Segregation performance, therefore, relies heavily on the quality of the pitch estimation. However, pitch estimation is difficult in severe noise environment. In order to improve the accuracy of estimation, we use Bayesian filters which are popularly used in object tracking from noisy videos. Two types of Bayesian filters, particle filter and ensemble Kalman filter, are adopted for tracking the pitch contours. The particle filter uses a simple first-order Markovian process from the past state to the present, and the ensemble Kalman filter adds a linear transition model to the same Markovian model. As speech and music has similar harmonic structures, the conventional speech segregation methods based on sub-band masking perform badly against music interference. Therefore, we propose speech/music pitch classification which adopts RNNs, which are simple recurrent network, long short-term memory (LSTM) and bidirectional LSTM, for modeling the characteristics of the speech pitch and music pitch. The experiment results conducted on mixtures of speech signals and various types of noise and music sound sources show that the proposed methods achieved significantly better segregation performance than the conventional method in most cases. Among all proposed methods, the segregation method with ensemble Kalman filter and bidirectional LSTM achieved the best performance.

Advisors: Choi, Ho-Jin researcher; 최호진 researcher; Oh, Yung-Hwan researcher; 오영환 researcher

Description: 전산학부,

Publisher: 한국과학기술원

Issue Date: 2018

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 전산학부, 2018.8,[v, 65 p. :]

Keywords: Monaural speech segregation▼apitch track correction▼aparticle filter▼aensemble Kalman filter▼aspeech/music pitch classification▼arecurrent neural network; 단일채널 음성분리▼a피치 트랙 수정▼a파티클 필터▼a앙상블 칼만 필터▼a음성/음악 피치 분류▼a순환신경망

URI: http://hdl.handle.net/10203/265354

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828223&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Monaural speech segregation based on pitch track correction using bayesian filters베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리

KOASAS

Communities & Collections