Learning a Cross-Domain Embedding Space of Vocal and Mixed audio with a Structure-Preserving Triplet Loss

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 195
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Keunhyoungko
dc.contributor.authorLee, Jongpilko
dc.contributor.authorKum, Sangeunko
dc.contributor.authorNam, Juhanko
dc.date.accessioned2021-12-14T06:50:23Z-
dc.date.available2021-12-14T06:50:23Z-
dc.date.created2021-12-03-
dc.date.created2021-12-03-
dc.date.issued2021-11-09-
dc.identifier.citationInternational Society for Music Information Retrieval Conference, ISMIR 2021-
dc.identifier.urihttp://hdl.handle.net/10203/290603-
dc.description.abstractRecent advances of music source separation have achieved high quality of vocal isolation from mix audio. This has paved the way for various applications in the area of music informational retrieval (MIR). In this paper, we propose a method to learn a cross-domain embedding space between isolated vocal and mixed audio for vocal-centric MIR tasks, leveraging a pre-trained music source separation model. Learning the cross-domain embedding was previously attempted with a triplet-based similarity model where vocal and mixed audio are encoded by two different convolutional neural networks. We improve the approach with a structure-preserving triplet loss that exploits not only cross-domain similarity between vocal and mixed audio but also intra-domain similarity within vocal tracks or mix tracks. We learn vocal embedding using a large-scaled dataset and evaluate it in singer identification and query-by-singer tasks. In addition, we use the vocal embedding for vocal-based music tagging and artist classification in transfer learning settings. We show that the proposed model significantly improves the previous cross-domain embedding model, particularly when the two embedding spaces from isolated vocals and mixed audio are concatenated.-
dc.languageEnglish-
dc.publisherInternational Society for Music Information Retrieval-
dc.titleLearning a Cross-Domain Embedding Space of Vocal and Mixed audio with a Structure-Preserving Triplet Loss-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.publicationnameInternational Society for Music Information Retrieval Conference, ISMIR 2021-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationOnline-
dc.contributor.localauthorNam, Juhan-
dc.contributor.nonIdAuthorKim, Keunhyoung-
dc.contributor.nonIdAuthorLee, Jongpil-
dc.contributor.nonIdAuthorKum, Sangeun-
Appears in Collection
GCT-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0