Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

Cited 0 time in webofscience Cited 8 time in scopus
  • Hit : 120
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJung, Myunghunko
dc.contributor.authorJung, Youngmoonko
dc.contributor.authorGoo, Jahyunko
dc.contributor.authorKim, Hoi-Rinko
dc.date.accessioned2020-12-18T07:10:18Z-
dc.date.available2020-12-18T07:10:18Z-
dc.date.created2020-11-28-
dc.date.issued2020-10-26-
dc.identifier.citationInterspeech 2020, pp.931 - 935-
dc.identifier.urihttp://hdl.handle.net/10203/278700-
dc.description.abstractKeyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challenging conditions such as noisy environments, open-vocabulary KWS, and short-duration SV, by introducing novel techniques of connectionist temporal classification (CTC)-based soft voice activity detection (VAD) and global query attention. Frame-level acoustic and speaker information is integrated with phonetically originated weights so that forms a word-level global representation. Then it is used for the aggregation of feature vectors to generate discriminative embeddings. Our proposed approach shows 4.06% and 26.71% relative improvements in equal error rate (EER) compared to the baselines for both tasks. We also present a visualization example and results of ablation experiments.-
dc.languageEnglish-
dc.publisherISCA-
dc.titleMulti-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.beginningpage931-
dc.citation.endingpage935-
dc.citation.publicationnameInterspeech 2020-
dc.identifier.conferencecountryCC-
dc.identifier.conferencelocationVirtual-
dc.identifier.doi10.21437/Interspeech.2020-1420-
dc.contributor.localauthorKim, Hoi-Rin-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0