DSpace at KOASAS: Singing melody extraction using multi-column deep neural networks

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Theses_Master(석사논문)

Singing melody extraction using multi-column deep neural networks다중 심층 신경망을 사용한 가창 멜로디 추출

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 639
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Nam, Juhan	-
dc.contributor.advisor	남주한	-
dc.contributor.author	Kum, Sangeun	-
dc.contributor.author	금상은	-
dc.date.accessioned	2017-03-29T02:31:35Z	-
dc.date.available	2017-03-29T02:31:35Z	-
dc.date.issued	2016	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663324&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/221343	-
dc.description	학위논문(석사) - 한국과학기술원 : 문화기술대학원, 2016.8 ,[v, 39 p. :]	-
dc.description.abstract	While the music market has been growing, the need for new service has also been increasing, such as cover song identification and query by humming. These services use a melody to search songs and so extracting melody, particularly from singing voice, is important to implement the systems. In this thesis, we focus on algorithms to extract the singing melody from audio signals. Singing melody extraction is a task that tracks pitch contour of singing voice in polyphonic music. While the majority of melody extraction algorithms are based on computing a saliency function of pitch candidates or separating the melody source from the mixture, data-driven approaches based on classification have been rarely explored. In this thesis, we present a classification-based approach for singing melody extraction using multi-column deep neural networks. In the proposed model, each of neural networks is trained to predict a pitch label of singing voice from spectrogram, but their outputs have different pitch resolutions. The melody contour is inferred by combining the outputs of the networks. We conduct the Viterbi decoding based on hidden Markov model to capture long-term temporal information. Our system also includes a singing voice detector to select singing voice frames using an additional deep neural network. It is trained with labels of singing voice activity and the output of deep neural networks for melody extraction. In order to take advantage of the data-driven approach, we also augment training data by pitch-shifting the audio content and modifying the pitch label accordingly. We use the RWC dataset and part of the MedleyDB dataset for training the model and evaluate it on the ADC 2004, MIREX 2005 and MIR-1k datasets. Through several settings of experiments, we show incremental improvements of the melody prediction. Lastly, we compare our best result to those of previous state-of-the-arts.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	melody extraction	-
dc.subject	data-driven approach	-
dc.subject	multi-column deep neural network	-
dc.subject	data augmentation	-
dc.subject	singing voice detection	-
dc.subject	가창 멜로디 추출	-
dc.subject	데이터 기반 방법	-
dc.subject	다중 심층 신경망	-
dc.subject	데이터 증가 방법	-
dc.subject	가창 목소리 검출	-
dc.title	Singing melody extraction using multi-column deep neural networks	-
dc.title.alternative	다중 심층 신경망을 사용한 가창 멜로디 추출	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :문화기술대학원,	-

Appears in Collection: GCT-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Singing melody extraction using multi-column deep neural networks다중 심층 신경망을 사용한 가창 멜로디 추출

KOASAS

Communities & Collections