DSpace at KOASAS: Pushing the limits of raw waveform speaker recognition

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Pushing the limits of raw waveform speaker recognition

Cited 10 time in

Cited 0 time in

Hit : 108
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Jung, Jee-weon	ko
dc.contributor.author	Kim, Youjin	ko
dc.contributor.author	Heo, Hee-Soo	ko
dc.contributor.author	Lee, Bong-Jin	ko
dc.contributor.author	Kwon, Youngki	ko
dc.contributor.author	Chung, Joon Son	ko
dc.date.accessioned	2023-05-10T12:01:23Z	-
dc.date.available	2023-05-10T12:01:23Z	-
dc.date.created	2023-05-03	-
dc.date.issued	2022-09	-
dc.identifier.citation	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, pp.2228 - 2232	-
dc.identifier.issn	2308-457X	-
dc.identifier.uri	http://hdl.handle.net/10203/306697	-
dc.description.abstract	In recent years, speaker recognition systems based on raw waveform inputs have received increasing attention. However, the performance of such systems are typically inferior to the state-of-the-art handcrafted feature-based counterparts, which demonstrate equal error rates under 1% on the popular VoxCeleb1 test set. This paper proposes a novel speaker recognition model based on raw waveform inputs. The model incorporates recent advances in machine learning and speaker verification, including the Res2Net backbone module and multi-layer feature aggregation. Our best model achieves an equal error rate of 0.89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin. We also explore the application of the proposed model in the context of self-supervised learning framework. Our self-supervised model outperforms single phase-based existing works in this line of research. Finally, we show that self-supervised pre-training is effective for the semi-supervised scenario where we only have a small set of labelled training data, along with a larger set of unlabelled examples.	-
dc.language	English	-
dc.publisher	ISCA-INT SPEECH COMMUNICATION ASSOC	-
dc.title	Pushing the limits of raw waveform speaker recognition	-
dc.type	Conference	-
dc.identifier.wosid	000900724502081	-
dc.identifier.scopusid	2-s2.0-85140069445	-
dc.type.rims	CONF	-
dc.citation.beginningpage	2228	-
dc.citation.endingpage	2232	-
dc.citation.publicationname	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022	-
dc.identifier.conferencecountry	KO	-
dc.identifier.conferencelocation	Incheon	-
dc.identifier.doi	10.21437/Interspeech.2022-126	-
dc.contributor.localauthor	Chung, Joon Son	-
dc.contributor.nonIdAuthor	Jung, Jee-weon	-
dc.contributor.nonIdAuthor	Kim, Youjin	-
dc.contributor.nonIdAuthor	Heo, Hee-Soo	-
dc.contributor.nonIdAuthor	Lee, Bong-Jin	-
dc.contributor.nonIdAuthor	Kwon, Youngki	-

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 10 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Pushing the limits of raw waveform speaker recognition

This item is cited by other documents in WoS

KOASAS

Communities & Collections