Counterfactually Fair Automatic Speech Recognition

Cited 7 time in webofscience Cited 0 time in scopus
  • Hit : 181
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorSari, Ledako
dc.contributor.authorHasegawa-Johnson, Markko
dc.contributor.authorYoo, Chang-Dongko
dc.date.accessioned2021-12-14T06:41:49Z-
dc.date.available2021-12-14T06:41:49Z-
dc.date.created2021-12-07-
dc.date.created2021-12-07-
dc.date.created2021-12-07-
dc.date.created2021-12-07-
dc.date.created2021-12-07-
dc.date.issued2021-12-
dc.identifier.citationIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, v.29, pp.3515 - 3525-
dc.identifier.issn2329-9290-
dc.identifier.urihttp://hdl.handle.net/10203/290528-
dc.description.abstractWidelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleCounterfactually Fair Automatic Speech Recognition-
dc.typeArticle-
dc.identifier.wosid000725790900002-
dc.identifier.scopusid2-s2.0-85119948304-
dc.type.rimsART-
dc.citation.volume29-
dc.citation.beginningpage3515-
dc.citation.endingpage3525-
dc.citation.publicationnameIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING-
dc.identifier.doi10.1109/TASLP.2021.3126949-
dc.contributor.localauthorYoo, Chang-Dong-
dc.contributor.nonIdAuthorSari, Leda-
dc.contributor.nonIdAuthorHasegawa-Johnson, Mark-
dc.description.isOpenAccessY-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorMachine learning-
dc.subject.keywordAuthorSpeech processing-
dc.subject.keywordAuthorError analysis-
dc.subject.keywordAuthorMachine learning algorithms-
dc.subject.keywordAuthorComputational modeling-
dc.subject.keywordAuthorTransducers-
dc.subject.keywordAuthorAutomatic speech recognition-
dc.subject.keywordAuthorspeaker adaptation-
dc.subject.keywordAuthorfairness in machine learning-
dc.subject.keywordAuthorcounterfactual fairness-
dc.subject.keywordPlusBIAS-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 7 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0