DSpace at KOASAS: Learning a Cross-Domain Embedding Space of Vocal and Mixed audio with a Structure-Preserving Triplet Loss

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Conference Papers(학술회의논문)

Learning a Cross-Domain Embedding Space of Vocal and Mixed audio with a Structure-Preserving Triplet Loss

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 195
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kim, Keunhyoung	ko
dc.contributor.author	Lee, Jongpil	ko
dc.contributor.author	Kum, Sangeun	ko
dc.contributor.author	Nam, Juhan	ko
dc.date.accessioned	2021-12-14T06:50:23Z	-
dc.date.available	2021-12-14T06:50:23Z	-
dc.date.created	2021-12-03	-
dc.date.created	2021-12-03	-
dc.date.issued	2021-11-09	-
dc.identifier.citation	International Society for Music Information Retrieval Conference, ISMIR 2021	-
dc.identifier.uri	http://hdl.handle.net/10203/290603	-
dc.description.abstract	Recent advances of music source separation have achieved high quality of vocal isolation from mix audio. This has paved the way for various applications in the area of music informational retrieval (MIR). In this paper, we propose a method to learn a cross-domain embedding space between isolated vocal and mixed audio for vocal-centric MIR tasks, leveraging a pre-trained music source separation model. Learning the cross-domain embedding was previously attempted with a triplet-based similarity model where vocal and mixed audio are encoded by two different convolutional neural networks. We improve the approach with a structure-preserving triplet loss that exploits not only cross-domain similarity between vocal and mixed audio but also intra-domain similarity within vocal tracks or mix tracks. We learn vocal embedding using a large-scaled dataset and evaluate it in singer identification and query-by-singer tasks. In addition, we use the vocal embedding for vocal-based music tagging and artist classification in transfer learning settings. We show that the proposed model significantly improves the previous cross-domain embedding model, particularly when the two embedding spaces from isolated vocals and mixed audio are concatenated.	-
dc.language	English	-
dc.publisher	International Society for Music Information Retrieval	-
dc.title	Learning a Cross-Domain Embedding Space of Vocal and Mixed audio with a Structure-Preserving Triplet Loss	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.publicationname	International Society for Music Information Retrieval Conference, ISMIR 2021	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	Online	-
dc.contributor.localauthor	Nam, Juhan	-
dc.contributor.nonIdAuthor	Kim, Keunhyoung	-
dc.contributor.nonIdAuthor	Lee, Jongpil	-
dc.contributor.nonIdAuthor	Kum, Sangeun	-

Appears in Collection: GCT-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning a Cross-Domain Embedding Space of Vocal and Mixed audio with a Structure-Preserving Triplet Loss

KOASAS

Communities & Collections