Seamless equal accuracy ratio for inclusive CTC speech recognition

Cited 5 time in webofscience Cited 0 time in scopus
  • Hit : 252
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorGao, Hetingko
dc.contributor.authorWang, Xiaoxuanko
dc.contributor.authorKang, Sunghunko
dc.contributor.authorMina, Rustyko
dc.contributor.authorIssa, Diasko
dc.contributor.authorHarvill, Johnko
dc.contributor.authorSari, Ledako
dc.contributor.authorHasegawa-Johnson, Markko
dc.contributor.authorYoo, Chang-Dongko
dc.date.accessioned2021-12-25T06:40:11Z-
dc.date.available2021-12-25T06:40:11Z-
dc.date.created2021-12-07-
dc.date.created2021-12-07-
dc.date.created2021-12-07-
dc.date.created2021-12-07-
dc.date.issued2022-01-
dc.identifier.citationSPEECH COMMUNICATION, v.136, pp.76 - 83-
dc.identifier.issn0167-6393-
dc.identifier.urihttp://hdl.handle.net/10203/291261-
dc.description.abstractConcerns have been raised regarding performance disparity in automatic speech recognition (ASR) systems as they provide unequal transcription accuracy for different user groups defined by different attributes that include gender, dialect, and race. In this paper, we propose “equal accuracy ratio”, a novel inclusiveness measure for ASR systems that can be seamlessly integrated into the standard connectionist temporal classification (CTC) training pipeline of an end-to-end neural speech recognizer to increase the recognizer’s inclusiveness. We also create a novel multi-dialect benchmark dataset to study the inclusiveness of ASR, by combining data from existing corpora in seven dialects of English (African American, General American, Latino English, British English, Indian English, Afrikaaner English, and Xhosa English). Experiments on this multi-dialect corpus show that using the equal accuracy ratio as a regularization term along with CTC loss, succeeds in lowering the accuracy gap between user groups and reduces the recognition error rate compared with a non-regularized baseline. Experiments on additional speech corpora that have different user groups also confirm our findings.-
dc.languageEnglish-
dc.publisherELSEVIER-
dc.titleSeamless equal accuracy ratio for inclusive CTC speech recognition-
dc.typeArticle-
dc.identifier.wosid000789483100004-
dc.identifier.scopusid2-s2.0-85121444596-
dc.type.rimsART-
dc.citation.volume136-
dc.citation.beginningpage76-
dc.citation.endingpage83-
dc.citation.publicationnameSPEECH COMMUNICATION-
dc.identifier.doi10.1016/j.specom.2021.11.004-
dc.contributor.localauthorYoo, Chang-Dong-
dc.contributor.nonIdAuthorGao, Heting-
dc.contributor.nonIdAuthorWang, Xiaoxuan-
dc.contributor.nonIdAuthorMina, Rusty-
dc.contributor.nonIdAuthorHarvill, John-
dc.contributor.nonIdAuthorSari, Leda-
dc.contributor.nonIdAuthorHasegawa-Johnson, Mark-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorSpeech recognition-
dc.subject.keywordAuthorFairness-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 5 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0