Acoustic Model Combination Incorporated With Mask-Based Multi-Channel Source Separation for Automatic Speech Recognition

Cited 3 time in webofscience Cited 1 time in scopus
  • Hit : 632
  • Download : 690
DC FieldValueLanguage
dc.contributor.authorYoon, JSko
dc.contributor.authorPark, JHko
dc.contributor.authorKim, HKko
dc.contributor.authorKim, HoiRinko
dc.date.accessioned2011-03-14T08:03:56Z-
dc.date.available2011-03-14T08:03:56Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2010-10-
dc.identifier.citationIEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, v.4, no.5, pp.772 - 784-
dc.identifier.issn1932-4553-
dc.identifier.urihttp://hdl.handle.net/10203/22631-
dc.description.abstractIn this paper, we propose an acoustic model combination (AMC) technique for reducing a mismatch between training and testing conditions of an automatic speech recognition (ASR) system in a multi-channel noisy environment. In our previous work, we proposed a hidden Markov model (HMM)-based mask estimation method for multi-channel source separation using two microphones, where HMMs were adopted for mask estimation in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. However, it was observed that a certain degree of noise still remained in the separated speech source especially under low signal-to-noise ratio (SNR) conditions. This was because the estimated mask was not ideal, which resulted in limiting the improvement of ASR performance. To mitigate this problem, the remaining noise can be further compensated in the acoustic model domain under a framework of parallel model combination (PMC). In particular, a noise model and a weighting factor for the proposed AMC can be estimated from the remaining noise and the average of the relative magnitude of the mask, respectively. It is shown from the experiments that an ASR system employing the proposed AMC technique achieves a relative average word error rate (WER) reduction of 56.91%, when compared to a system using the mask-based source separation alone. In addition, compared to a conventional PMC implemented with a log-normal approximation, the proposed AMC relatively reduces WER by 43.64%.-
dc.languageEnglish-
dc.language.isoen_USen
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.subjectNOISE-
dc.titleAcoustic Model Combination Incorporated With Mask-Based Multi-Channel Source Separation for Automatic Speech Recognition-
dc.typeArticle-
dc.identifier.wosid000283266800002-
dc.identifier.scopusid2-s2.0-77956739077-
dc.type.rimsART-
dc.citation.volume4-
dc.citation.issue5-
dc.citation.beginningpage772-
dc.citation.endingpage784-
dc.citation.publicationnameIEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING-
dc.identifier.doi10.1109/JSTSP.2010.2057196-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.contributor.localauthorKim, HoiRin-
dc.contributor.nonIdAuthorYoon, JS-
dc.contributor.nonIdAuthorPark, JH-
dc.contributor.nonIdAuthorKim, HK-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorComputational auditory scene analysis (CASA)-
dc.subject.keywordAuthormask estimation-
dc.subject.keywordAuthormask-based noise model estimation-
dc.subject.keywordAuthormask-based weighting factor estimation-
dc.subject.keywordAuthormulti-channel source separation(MCSS)-
dc.subject.keywordAuthorparallel model combination-
dc.subject.keywordAuthorspeech recognition-
dc.subject.keywordPlusNOISE-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0