Robust audio-visual speech recognition based on late integration

Cited 43 time in webofscience Cited 0 time in scopus
  • Hit : 305
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Jong-Seokko
dc.contributor.authorPark, Cheol Hoonko
dc.date.accessioned2013-03-06T17:29:37Z-
dc.date.available2013-03-06T17:29:37Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2008-08-
dc.identifier.citationIEEE TRANSACTIONS ON MULTIMEDIA, v.10, no.5, pp.767 - 779-
dc.identifier.issn1520-9210-
dc.identifier.urihttp://hdl.handle.net/10203/87783-
dc.description.abstractAudio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention because of its robustness in noisy environments. In this paper, we present a late integration scheme-based AVSR system whose robustness under various noise conditions is improved by enhancing the performance of the three parts composing the system. First, we improve the performance of the visual subsystem by using the stochastic optimization method for the hidden Markov models as the speech recognizer. Second, we propose a new method of considering dynamic characteristics of speech for improved robustness of the acoustic subsystem. Third, the acoustic and the visual subsystems are effectively integrated to produce final robust recognition results by using neural networks. We demonstrate the performance of the proposed methods via speaker-independent isolated word recognition experiments. The results show that the proposed system improves robustness over the conventional system under various noise conditions without a priori knowledge about the noise contained in the speech.-
dc.languageEnglish-
dc.publisherIEEE-Inst Electrical Electronics Engineers Inc-
dc.subjectFUSION-
dc.titleRobust audio-visual speech recognition based on late integration-
dc.typeArticle-
dc.identifier.wosid000258223800010-
dc.identifier.scopusid2-s2.0-47649103796-
dc.type.rimsART-
dc.citation.volume10-
dc.citation.issue5-
dc.citation.beginningpage767-
dc.citation.endingpage779-
dc.citation.publicationnameIEEE TRANSACTIONS ON MULTIMEDIA-
dc.identifier.doi10.1109/TMM.2008.922789-
dc.contributor.localauthorPark, Cheol Hoon-
dc.type.journalArticleArticle-
dc.subject.keywordAuthoraudio-visual speech recognition-
dc.subject.keywordAuthorlate integration-
dc.subject.keywordAuthorrobustness hidden Markov model-
dc.subject.keywordAuthorinterframe correlation-
dc.subject.keywordAuthorneural network-
dc.subject.keywordAuthorstochastic optimization-
dc.subject.keywordPlusFUSION-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 43 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0