Label Embedding for Chinese Grapheme-to-Phoneme Conversion

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 90
  • Download : 0
Chinese grapheme-to-phoneme (G2P) conversion plays a significant role in text-to-speech systems by generating pronunciations corresponding to Chinese input characters. The main challenge in Chinese G2P conversion is polyphone disambiguation, which requires selecting the appropriate pronunciation among several candidates. In polyphone disambiguation, calculating probabilities for the entire pronunciations is unnecessary since each Chinese character has only a few (mostly two or three) candidate pronunciations. In this study, we introduce a label embedding approach that matches the character embedding with the closest label embedding among the possible candidates. Specifically, negative sampling and triplet loss were applied to maximize the difference between the correct embedding and the other candidate embeddings. Experimental results show that the label embedding approach improved the polyphone disambiguation accuracy by 4.50% and 1.74% on two datasets compared to the one-hot label classification approach. Moreover, the bidirectional long short-term memory model with the label embedding approach outperformed the previous most advanced model, BERT, demonstrating outstanding performance in polyphone disambiguation. Lastly, we discuss the effect of contextual information in character embeddings on the G2P conversion task.
Publisher
ISCA
Issue Date
2021-08
Language
English
Citation

22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, pp.3196 - 3200

ISSN
2308-457X
DOI
10.21437/interspeech.2021-885
URI
http://hdl.handle.net/10203/312359
Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0