DSpace at KOASAS: A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Conference Papers(학술회의논문)

A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS

Cited 1 time in

Cited 0 time in

Hit : 94
Download : 0

Export

Choi, Soonbeom / Nam, Juhan researcher

Recent studies in singing voice synthesis have achieved high-quality results leveraging advances in text-to-speech models based on deep neural networks. One of the main issues in training singing voice synthesis models is that they require melody and lyric labels to be temporally aligned with audio data. The temporal alignment is a time-exhausting manual work in preparing for the training data. To address the issue, we propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time but generates singing voice audio given a melody and lyrics input in inference time. The proposed model is composed of a phoneme classifier and a singing voice generator jointly trained in an end-to-end manner. The model can be fine-tuned by adjusting the amount of supervision with temporally aligned melody labels. Through experiments in melody-unsupervision and semi-supervision settings, we compare the audio quality of synthesized singing voice. We also show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.

Publisher: Institute of Electrical and Electronics Engineers Inc.

Issue Date: 2022-05-25

Language: English

Citation: 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, pp.7242 - 7246

ISSN: 1520-6149

DOI: 10.1109/ICASSP43922.2022.9747422

URI: http://hdl.handle.net/10203/298715

Appears in Collection: GCT-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 1 items in WoS	Click to see citing articles in

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS

This item is cited by other documents in WoS

KOASAS

Communities & Collections