DSpace at KOASAS: Embedding approach based speech enhancement robust for noise and speaker variability

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Embedding approach based speech enhancement robust for noise and speaker variability임베딩 접근 기반의 잡음과 화자 변이에 강인한 음성 향상에 관한 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 100
Download : 0

Export

Lee, Joohyung

Noise and speaker variations can degrade the performance of deep learning-based speech enhancement (SE) system. One of the ways to overcome these issues is that the SE model is adaptively trained by covering information about background noise or speaker in training stage so that then the SE model can produce the result optimal to unseen noise or speaker in inference stage. To make SE system robust to these variations, we propose 2 types of embedding that represent the information about background noise and speaker. Also, to improve the representation ability of each embedding, voice activity detection (VAD) is used ahead of SE. The speech presence probability obtained from VAD is used to focus on non-speech frames when extracting the noise-related embedding, the dynamic noise embedding (DNE), and speech frames when extracting the speaker-related embedding, the deep speaker embedding (DSE). This approach also flexibly resolves the chicken-and-egg problem associated with the order of use of the VAD and SE. In VAD, 3 types of attention module are proposed to improve the performance of VAD. The temporal attention (TA) and frequential attention (FA) can make attention vector containing temporal and frequential attention, respectively. These attention vectors can improve the performance of VAD by concentrating on important components of hidden states. The dual attention (DA) using both modules shows the best results and is used to show the correlation of VAD and SE. Experiments are conducted on TIMIT dataset for single-channel denoising task and convolutional recurrent neural network (CRNN) is used as baseline. Experimental results show that the DNE and DSE play an important role in the SE model by increasing the quality and the intelligibility of corrupted speech signal even if the noise and speaker are unseen. In addition, through ablation studies, we show that not only the performance of VAD but the performance of SE is improved by applying proposed attention module to VAD model.

Advisors: Kim, Hoirin researcher; 김회린 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2021

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[iv, 52 p :]

Keywords: Speech Enhancement▼aVoice Activity Detection▼aNoise Embedding▼aSpeaker Embeddins▼aAttention; 음성향상▼a음성검출▼a잡음 임베딩▼a화자 임베딩▼a어텐션

URI: http://hdl.handle.net/10203/295968

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=948741&flag=dissertation

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Embedding approach based speech enhancement robust for noise and speaker variability임베딩 접근 기반의 잡음과 화자 변이에 강인한 음성 향상에 관한 연구

KOASAS

Communities & Collections