DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 87
  • Download : 0
In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppress-ing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocksfor aggregatinginformationin thespatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal con-former with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three differ-ent characteristics of audio signals enables more compre-hensive enhancement in noisy and reverberant environ-ments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Issue Date
2023
Language
English
Article Type
Article
Citation

IEEE SIGNAL PROCESSING LETTERS, v.30, pp.155 - 159

ISSN
1070-9908
DOI
10.1109/LSP.2023.3244428
URI
http://hdl.handle.net/10203/305794
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0