DSpace at KOASAS: Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Journal Papers(저널논문)

Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

Cited 72 time in

Cited 0 time in

Hit : 805
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Lee, Jongpil	ko
dc.contributor.author	Nam, Juhan	ko
dc.date.accessioned	2017-08-08T06:05:07Z	-
dc.date.available	2017-08-08T06:05:07Z	-
dc.date.created	2017-06-09	-
dc.date.created	2017-06-09	-
dc.date.created	2017-06-09	-
dc.date.issued	2017-06	-
dc.identifier.citation	IEEE SIGNAL PROCESSING LETTERS, v.24, no.8, pp.1208 - 1212	-
dc.identifier.issn	1070-9908	-
dc.identifier.uri	http://hdl.handle.net/10203/225076	-
dc.description.abstract	Music auto-tagging is often handled in a similar manner to image classification by regarding the two-dimensional audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstraction. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised feature learning to capture local audio features using a set of CNNs with different input sizes. Second, we extract audio features from each layer of the pretrained convolutional networks separately and aggregate them altogether giving a long audio clip. Finally, we put them into fully connected networks and make final predictions of the tags. Our experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms the previous state-of-the-art methods on the MagnaTagATune dataset and the Million Song Dataset. We further show that the proposed architecture is useful in transfer learning.	-
dc.language	English	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging	-
dc.type	Article	-
dc.identifier.wosid	000404291100022	-
dc.identifier.scopusid	2-s2.0-85028368778	-
dc.type.rims	ART	-
dc.citation.volume	24	-
dc.citation.issue	8	-
dc.citation.beginningpage	1208	-
dc.citation.endingpage	1212	-
dc.citation.publicationname	IEEE SIGNAL PROCESSING LETTERS	-
dc.identifier.doi	10.1109/LSP.2017.2713830	-
dc.contributor.localauthor	Nam, Juhan	-
dc.contributor.nonIdAuthor	Lee, Jongpil	-
dc.description.isOpenAccess	N	-
dc.type.journalArticle	Article	-
dc.subject.keywordAuthor	Convolutional neural networks	-
dc.subject.keywordAuthor	feature aggregation	-
dc.subject.keywordAuthor	music auto-tagging	-
dc.subject.keywordAuthor	transfer learning	-

Appears in Collection: GCT-Journal Papers(저널논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 72 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

This item is cited by other documents in WoS

KOASAS

Communities & Collections