Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning

Cited 1 time in webofscience Cited 0 time in scopus
  • Hit : 136
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKang, Jinguko
dc.contributor.authorHuh, Jaesungko
dc.contributor.authorHeo, Hee Sooko
dc.contributor.authorChung, Joon Sonko
dc.date.accessioned2022-11-02T06:00:58Z-
dc.date.available2022-11-02T06:00:58Z-
dc.date.created2022-11-01-
dc.date.created2022-11-01-
dc.date.issued2022-10-
dc.identifier.citationIEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, v.16, no.6, pp.1253 - 1262-
dc.identifier.issn1932-4553-
dc.identifier.urihttp://hdl.handle.net/10203/299242-
dc.description.abstractThe goal of this work is to train robust speaker recognition models using self-supervised representation learning. Recent works on self-supervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be similar and across-utterance embeddings to be dissimilar. However, since the within-utterance segments share the same acoustic characteristics, it is difficult to separate the speaker information from the channel information. To this end, we propose an augmentation adversarial training strategy that trains the network to be discriminative for the speaker information, while invariant to the augmentation applied. Since the augmentation simulates the acoustic characteristics, training the network to be invariant to augmentation also encourages the network to be invariant to the channel information in general. Extensive experiments on the VoxCeleb and VOiCES datasets show significant improvements over previous works using self-supervision, and the performance of our self-supervised models far exceeds that of humans. We also conduct semi-supervised learning experiments to show that augmentation adversarial training benefits performance in presence of speaker labels.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleAugmentation Adversarial Training for Self-Supervised Speaker Representation Learning-
dc.typeArticle-
dc.identifier.wosid000870301500010-
dc.identifier.scopusid2-s2.0-85137590360-
dc.type.rimsART-
dc.citation.volume16-
dc.citation.issue6-
dc.citation.beginningpage1253-
dc.citation.endingpage1262-
dc.citation.publicationnameIEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING-
dc.identifier.doi10.1109/JSTSP.2022.3200915-
dc.contributor.localauthorChung, Joon Son-
dc.contributor.nonIdAuthorKang, Jingu-
dc.contributor.nonIdAuthorHuh, Jaesung-
dc.contributor.nonIdAuthorHeo, Hee Soo-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorSpeaker recognition-
dc.subject.keywordAuthorMeasurement-
dc.subject.keywordAuthorTask analysis-
dc.subject.keywordAuthorSemisupervised learning-
dc.subject.keywordAuthorRepresentation learning-
dc.subject.keywordAuthorEntropy-
dc.subject.keywordAuthorSelf-supervised learning-
dc.subject.keywordAuthorspeaker recognition-
dc.subject.keywordPlusVERIFICATION-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 1 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0