DC Field | Value | Language |
---|---|---|
dc.contributor.author | Jung, Myunghun | ko |
dc.contributor.author | Jung, Youngmoon | ko |
dc.contributor.author | Goo, Jahyun | ko |
dc.contributor.author | Kim, Hoi-Rin | ko |
dc.date.accessioned | 2020-12-18T07:10:18Z | - |
dc.date.available | 2020-12-18T07:10:18Z | - |
dc.date.created | 2020-11-28 | - |
dc.date.issued | 2020-10-26 | - |
dc.identifier.citation | Interspeech 2020, pp.931 - 935 | - |
dc.identifier.uri | http://hdl.handle.net/10203/278700 | - |
dc.description.abstract | Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challenging conditions such as noisy environments, open-vocabulary KWS, and short-duration SV, by introducing novel techniques of connectionist temporal classification (CTC)-based soft voice activity detection (VAD) and global query attention. Frame-level acoustic and speaker information is integrated with phonetically originated weights so that forms a word-level global representation. Then it is used for the aggregation of feature vectors to generate discriminative embeddings. Our proposed approach shows 4.06% and 26.71% relative improvements in equal error rate (EER) compared to the baselines for both tasks. We also present a visualization example and results of ablation experiments. | - |
dc.language | English | - |
dc.publisher | ISCA | - |
dc.title | Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention | - |
dc.type | Conference | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 931 | - |
dc.citation.endingpage | 935 | - |
dc.citation.publicationname | Interspeech 2020 | - |
dc.identifier.conferencecountry | CC | - |
dc.identifier.conferencelocation | Virtual | - |
dc.identifier.doi | 10.21437/Interspeech.2020-1420 | - |
dc.contributor.localauthor | Kim, Hoi-Rin | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.