Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

Cited 66 time in webofscience Cited 0 time in scopus
  • Hit : 749
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Jongpilko
dc.contributor.authorNam, Juhanko
dc.date.accessioned2017-08-08T06:05:07Z-
dc.date.available2017-08-08T06:05:07Z-
dc.date.created2017-06-09-
dc.date.created2017-06-09-
dc.date.created2017-06-09-
dc.date.issued2017-06-
dc.identifier.citationIEEE SIGNAL PROCESSING LETTERS, v.24, no.8, pp.1208 - 1212-
dc.identifier.issn1070-9908-
dc.identifier.urihttp://hdl.handle.net/10203/225076-
dc.description.abstractMusic auto-tagging is often handled in a similar manner to image classification by regarding the two-dimensional audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstraction. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised feature learning to capture local audio features using a set of CNNs with different input sizes. Second, we extract audio features from each layer of the pretrained convolutional networks separately and aggregate them altogether giving a long audio clip. Finally, we put them into fully connected networks and make final predictions of the tags. Our experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms the previous state-of-the-art methods on the MagnaTagATune dataset and the Million Song Dataset. We further show that the proposed architecture is useful in transfer learning.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleMulti-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging-
dc.typeArticle-
dc.identifier.wosid000404291100022-
dc.identifier.scopusid2-s2.0-85028368778-
dc.type.rimsART-
dc.citation.volume24-
dc.citation.issue8-
dc.citation.beginningpage1208-
dc.citation.endingpage1212-
dc.citation.publicationnameIEEE SIGNAL PROCESSING LETTERS-
dc.identifier.doi10.1109/LSP.2017.2713830-
dc.contributor.localauthorNam, Juhan-
dc.contributor.nonIdAuthorLee, Jongpil-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorConvolutional neural networks-
dc.subject.keywordAuthorfeature aggregation-
dc.subject.keywordAuthormusic auto-tagging-
dc.subject.keywordAuthortransfer learning-
Appears in Collection
GCT-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 66 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0