SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

Cited 71 time in webofscience Cited 0 time in scopus
  • Hit : 687
  • Download : 467
Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for different modalities of raw data in an end-to-end fashion. In the audio domain, a raw waveform-based approach has been explored to directly learn hierarchical characteristics of audio. However, the majority of previous studies have limited their model capacity by taking a frame-level structure similar to short-time Fourier transforms. We previously proposed a CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations. The architecture showed comparable performance to the spectrogram-based CNN model in music auto-tagging. In this paper, we extend the previous work in three ways. First, considering the sample-level model requires much longer training time, we progressively downsample the input signals and examine how it affects the performance. Second, we extend the model using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks. Finally, we visualize filters learned by the sample-level CNN in each layer to identify hierarchically learned features and show that they are sensitive to log-scaled frequency.
Publisher
MDPI
Issue Date
2018-01
Language
English
Article Type
Article
Citation

APPLIED SCIENCES, v.8, no.1

ISSN
2076-3417
DOI
10.3390/app8010150
URI
http://hdl.handle.net/10203/240632
Appears in Collection
GCT-Journal Papers(저널논문)
Files in This Item
applsci-08-00150-v2.pdf(15 MB)Download
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 71 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0