DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Jongpil | ko |
dc.contributor.author | Nam, Juhan | ko |
dc.date.accessioned | 2017-08-08T06:05:07Z | - |
dc.date.available | 2017-08-08T06:05:07Z | - |
dc.date.created | 2017-06-09 | - |
dc.date.created | 2017-06-09 | - |
dc.date.created | 2017-06-09 | - |
dc.date.issued | 2017-06 | - |
dc.identifier.citation | IEEE SIGNAL PROCESSING LETTERS, v.24, no.8, pp.1208 - 1212 | - |
dc.identifier.issn | 1070-9908 | - |
dc.identifier.uri | http://hdl.handle.net/10203/225076 | - |
dc.description.abstract | Music auto-tagging is often handled in a similar manner to image classification by regarding the two-dimensional audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstraction. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised feature learning to capture local audio features using a set of CNNs with different input sizes. Second, we extract audio features from each layer of the pretrained convolutional networks separately and aggregate them altogether giving a long audio clip. Finally, we put them into fully connected networks and make final predictions of the tags. Our experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms the previous state-of-the-art methods on the MagnaTagATune dataset and the Million Song Dataset. We further show that the proposed architecture is useful in transfer learning. | - |
dc.language | English | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging | - |
dc.type | Article | - |
dc.identifier.wosid | 000404291100022 | - |
dc.identifier.scopusid | 2-s2.0-85028368778 | - |
dc.type.rims | ART | - |
dc.citation.volume | 24 | - |
dc.citation.issue | 8 | - |
dc.citation.beginningpage | 1208 | - |
dc.citation.endingpage | 1212 | - |
dc.citation.publicationname | IEEE SIGNAL PROCESSING LETTERS | - |
dc.identifier.doi | 10.1109/LSP.2017.2713830 | - |
dc.contributor.localauthor | Nam, Juhan | - |
dc.contributor.nonIdAuthor | Lee, Jongpil | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | Convolutional neural networks | - |
dc.subject.keywordAuthor | feature aggregation | - |
dc.subject.keywordAuthor | music auto-tagging | - |
dc.subject.keywordAuthor | transfer learning | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.