Target sound extraction on reverberant mixture

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 77
  • Download : 0
Target sound extraction is a task to extract only a desired sound signal from a mixture of different sounds, using a clue given by a target class label or a target signal similar to the desired sound. Currently, available network architectures for this task are designed to handle only dry sounds. In this work, we introduce a transformer-based target sound extraction model that can extract reverberant sounds. To separate reverberant sound mixtures, we begin with the Dense Frequency-Time Attentive Network (DeFT-AN) architecture developed for speech enhancement tasks, which generates the complex short-time Fourier transform (STFT) mask of clean speech from a noisy reverberant mixture to suppress noises. To make DeFT-AN compatible with the target sound extraction task, we modify its architecture such that the embedding vector for the target class label can be fused in the middle of sequentially connected DeFT-A blocks constituting DeFT-AN. We demonstrate that the transformer-based speech enhancement model can be successfully converted into a target sound extraction model and outperforms state-of-the-art extraction models in the test carried out with reverberant mixtures.
Publisher
Acoustical Society of America
Issue Date
2023-12-07
Language
English
Citation

Acoustics 2023

DOI
10.1121/10.0023494
URI
http://hdl.handle.net/10203/316669
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0