Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

Cited 6 time in webofscience Cited 0 time in scopus
  • Hit : 374
  • Download : 571
DC FieldValueLanguage
dc.contributor.authorLee, Byeongwookko
dc.contributor.authorCho, Kwang-Hyunko
dc.date.accessioned2017-01-12T07:21:12Z-
dc.date.available2017-01-12T07:21:12Z-
dc.date.created2016-12-18-
dc.date.created2016-12-18-
dc.date.issued2016-11-
dc.identifier.citationSCIENTIFIC REPORTS, v.6-
dc.identifier.issn2045-2322-
dc.identifier.urihttp://hdl.handle.net/10203/218316-
dc.description.abstractSpeech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.-
dc.languageEnglish-
dc.publisherNATURE PUBLISHING GROUP-
dc.subjectHUMAN AUDITORY-CORTEX-
dc.subjectFRAME RATE ANALYSIS-
dc.subjectPHASE-
dc.subjectPATTERNS-
dc.subjectINTELLIGIBILITY-
dc.subjectOSCILLATIONS-
dc.subjectCOMPREHENSION-
dc.subjectMECHANISMS-
dc.subjectCONSONANTS-
dc.subjectLISTENERS-
dc.titleBrain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference-
dc.typeArticle-
dc.identifier.wosid000388362800001-
dc.identifier.scopusid2-s2.0-84996523818-
dc.type.rimsART-
dc.citation.volume6-
dc.citation.publicationnameSCIENTIFIC REPORTS-
dc.identifier.doi10.1038/srep37647-
dc.contributor.localauthorCho, Kwang-Hyun-
dc.description.isOpenAccessY-
dc.type.journalArticleArticle-
dc.subject.keywordPlusHUMAN AUDITORY-CORTEX-
dc.subject.keywordPlusFRAME RATE ANALYSIS-
dc.subject.keywordPlusPHASE-
dc.subject.keywordPlusPATTERNS-
dc.subject.keywordPlusINTELLIGIBILITY-
dc.subject.keywordPlusOSCILLATIONS-
dc.subject.keywordPlusCOMPREHENSION-
dc.subject.keywordPlusMECHANISMS-
dc.subject.keywordPlusCONSONANTS-
dc.subject.keywordPlusLISTENERS-
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 6 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0