Pushing the limits of raw waveform speaker recognition

Cited 10 time in webofscience Cited 0 time in scopus
  • Hit : 108
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJung, Jee-weonko
dc.contributor.authorKim, Youjinko
dc.contributor.authorHeo, Hee-Sooko
dc.contributor.authorLee, Bong-Jinko
dc.contributor.authorKwon, Youngkiko
dc.contributor.authorChung, Joon Sonko
dc.date.accessioned2023-05-10T12:01:23Z-
dc.date.available2023-05-10T12:01:23Z-
dc.date.created2023-05-03-
dc.date.issued2022-09-
dc.identifier.citation23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, pp.2228 - 2232-
dc.identifier.issn2308-457X-
dc.identifier.urihttp://hdl.handle.net/10203/306697-
dc.description.abstractIn recent years, speaker recognition systems based on raw waveform inputs have received increasing attention. However, the performance of such systems are typically inferior to the state-of-the-art handcrafted feature-based counterparts, which demonstrate equal error rates under 1% on the popular VoxCeleb1 test set. This paper proposes a novel speaker recognition model based on raw waveform inputs. The model incorporates recent advances in machine learning and speaker verification, including the Res2Net backbone module and multi-layer feature aggregation. Our best model achieves an equal error rate of 0.89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin. We also explore the application of the proposed model in the context of self-supervised learning framework. Our self-supervised model outperforms single phase-based existing works in this line of research. Finally, we show that self-supervised pre-training is effective for the semi-supervised scenario where we only have a small set of labelled training data, along with a larger set of unlabelled examples.-
dc.languageEnglish-
dc.publisherISCA-INT SPEECH COMMUNICATION ASSOC-
dc.titlePushing the limits of raw waveform speaker recognition-
dc.typeConference-
dc.identifier.wosid000900724502081-
dc.identifier.scopusid2-s2.0-85140069445-
dc.type.rimsCONF-
dc.citation.beginningpage2228-
dc.citation.endingpage2232-
dc.citation.publicationname23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022-
dc.identifier.conferencecountryKO-
dc.identifier.conferencelocationIncheon-
dc.identifier.doi10.21437/Interspeech.2022-126-
dc.contributor.localauthorChung, Joon Son-
dc.contributor.nonIdAuthorJung, Jee-weon-
dc.contributor.nonIdAuthorKim, Youjin-
dc.contributor.nonIdAuthorHeo, Hee-Soo-
dc.contributor.nonIdAuthorLee, Bong-Jin-
dc.contributor.nonIdAuthorKwon, Youngki-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 10 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0