Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

Cited 7 time in webofscience Cited 0 time in scopus
  • Hit : 251
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorGEONMIN, KIMko
dc.contributor.authorLee, Hwaranko
dc.contributor.authorKim, Bo-Kyeongko
dc.contributor.authorOh, Sang-Hoonko
dc.contributor.authorLee, Soo-Youngko
dc.date.accessioned2019-01-22T08:30:07Z-
dc.date.available2019-01-22T08:30:07Z-
dc.date.created2018-12-26-
dc.date.created2018-12-26-
dc.date.created2018-12-26-
dc.date.issued2019-01-
dc.identifier.citationIEEE SIGNAL PROCESSING LETTERS, v.26, no.1, pp.159 - 163-
dc.identifier.issn1070-9908-
dc.identifier.urihttp://hdl.handle.net/10203/248985-
dc.description.abstractMany speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleUnpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition-
dc.typeArticle-
dc.identifier.wosid000452619700002-
dc.identifier.scopusid2-s2.0-85056343657-
dc.type.rimsART-
dc.citation.volume26-
dc.citation.issue1-
dc.citation.beginningpage159-
dc.citation.endingpage163-
dc.citation.publicationnameIEEE SIGNAL PROCESSING LETTERS-
dc.identifier.doi10.1109/LSP.2018.2880285-
dc.contributor.localauthorLee, Soo-Young-
dc.contributor.nonIdAuthorOh, Sang-Hoon-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorSpeech enhancement-
dc.subject.keywordAuthorroom simulator-
dc.subject.keywordAuthorconnectionist temporal classification-
dc.subject.keywordAuthorgenerative adversarial network-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 7 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0