Deep learning for vocal melody extraction보컬 멜로디 추출을 위한 딥러닝

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 161
  • Download : 0
In this thesis, we propose various deep learning (DL) based methods for vocal melody extraction. Vocal melody extraction is the task that identifies the melody pitch contour of the singing voice from multiple sources. Previous studies have been proposed as methods of calculating the pitch saliency from a spectrogram or isolating the melody source from the mixture. However, these methods have limitations in obtaining optimal outputs for various music. Although the performance of melody extraction has improved with the recent advances in DL, there are still limitations in terms of overall performance, the model using music-related knowledge and the lack of labeled data. Here we report the effective methods to estimate the pitch of melody and detect singing voice by introducing novel DL models and loss function. We also propose a multi-task network that allows pitch estimation and voice detection are tightly coupled. To address the lack of labeled data, we applied the semi-supervised learning that utilizes large amounts of unlabeled data. We explored the effects of three teacher-student model setups, data augmentation, unlabeled data, and proposed the most effective learning method for vocal melody extraction. In addition, we apply semi-supervised learning to the singing vocal detection and show that it can be extended to other MIR tasks that suffer from lack of labeled data.
Advisors
Nam, Juhanresearcher남주한researcher
Description
한국과학기술원 :문화기술대학원,
Country
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Article Type
Thesis(Ph.D)
URI
http://hdl.handle.net/10203/294535
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956569&flag=dissertation
Appears in Collection
GCT-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0