Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

Cited 46 time in webofscience Cited 0 time in scopus
  • Hit : 285
  • Download : 0
Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates.
Publisher
IEEE
Issue Date
2018-04-18
Language
English
Citation

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.366 - 370

DOI
10.1109/ICASSP.2018.8462046
URI
http://hdl.handle.net/10203/247510
Appears in Collection
GCT-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 46 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0