Singing melody extraction using multi-column deep neural networks다중 심층 신경망을 사용한 가창 멜로디 추출

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 639
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorNam, Juhan-
dc.contributor.advisor남주한-
dc.contributor.authorKum, Sangeun-
dc.contributor.author금상은-
dc.date.accessioned2017-03-29T02:31:35Z-
dc.date.available2017-03-29T02:31:35Z-
dc.date.issued2016-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663324&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/221343-
dc.description학위논문(석사) - 한국과학기술원 : 문화기술대학원, 2016.8 ,[v, 39 p. :]-
dc.description.abstractWhile the music market has been growing, the need for new service has also been increasing, such as cover song identification and query by humming. These services use a melody to search songs and so extracting melody, particularly from singing voice, is important to implement the systems. In this thesis, we focus on algorithms to extract the singing melody from audio signals. Singing melody extraction is a task that tracks pitch contour of singing voice in polyphonic music. While the majority of melody extraction algorithms are based on computing a saliency function of pitch candidates or separating the melody source from the mixture, data-driven approaches based on classification have been rarely explored. In this thesis, we present a classification-based approach for singing melody extraction using multi-column deep neural networks. In the proposed model, each of neural networks is trained to predict a pitch label of singing voice from spectrogram, but their outputs have different pitch resolutions. The melody contour is inferred by combining the outputs of the networks. We conduct the Viterbi decoding based on hidden Markov model to capture long-term temporal information. Our system also includes a singing voice detector to select singing voice frames using an additional deep neural network. It is trained with labels of singing voice activity and the output of deep neural networks for melody extraction. In order to take advantage of the data-driven approach, we also augment training data by pitch-shifting the audio content and modifying the pitch label accordingly. We use the RWC dataset and part of the MedleyDB dataset for training the model and evaluate it on the ADC 2004, MIREX 2005 and MIR-1k datasets. Through several settings of experiments, we show incremental improvements of the melody prediction. Lastly, we compare our best result to those of previous state-of-the-arts.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectmelody extraction-
dc.subjectdata-driven approach-
dc.subjectmulti-column deep neural network-
dc.subjectdata augmentation-
dc.subjectsinging voice detection-
dc.subject가창 멜로디 추출-
dc.subject데이터 기반 방법-
dc.subject다중 심층 신경망-
dc.subject데이터 증가 방법-
dc.subject가창 목소리 검출-
dc.titleSinging melody extraction using multi-column deep neural networks-
dc.title.alternative다중 심층 신경망을 사용한 가창 멜로디 추출-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :문화기술대학원,-
Appears in Collection
GCT-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0