DSpace at KOASAS: Top-down selective attention with a deep neural network and confidence measure for automatic speech recognition

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Top-down selective attention with a deep neural network and confidence measure for automatic speech recognition심층 신경망에의 하향식 주의 집중과 신뢰도 측정을 이용한 자동 음성 인식 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 388
Download : 0

Export

Kim, Ho-Gyeong

In cognitive science, a top-down selective attention (TDSA) mechanism of humans has been studied for decades and is known to be controlled by “objects” in our mind via feedback processes. This cognitive process enhances the perceptual saliency of a response to the object of interest and filters out irrelevant responses. The engineering models using TDSA have been proposed for out-of-vocabulary rejection, and isolated word recognition. In this work, we apply the TDSA mechanism to the N-best rescoring framework to provide attentional information of confusing words within competing hypotheses. The TDSA mechanism is applied to adapt a test input feature for several confusing words. The attentional information required to rescore the hypotheses is then derived as the probability of the adapted features and the amount of feature deformation. Recently, numerous neural network models with attention have been developed and successfully applied to diverse tasks. The sequence to sequence learning framework with attention has become especially popular for sequence labeling tasks such as neural machine translation, image caption generation, and speech recognition. While predicting a soft-window over input sequences corresponding to output targets in previous attention works, our attention approach adapts a test input feature “directly” using a gradient to maximize the probability of the feature given target words. Therefore, our system provides the most probable feature of the target words without the need to train extra attention networks We propose an N-best rescoring and utterance verification systems that integrate attentional information for locally confusing words extracted from alternative hypotheses to a conventional speech recognition system. The attentional information is derived by adapting a test input feature for the word of interest, which is motivated by the top-down selective attention mechanism of the brain. To rescore the competing hypotheses, we define a new confidence measure that contains both the conventional posterior probability and the attentional information for the confusing words. In addition, a neural network is designed to provide different weights within the confidence measure for each utterance. The network is then optimized to minimize the word error rates. Tests on the WSJ and Aurora4 speech recognition tasks were conducted, and our best rescoring results achieve a word error rate of 3.83% and 11.09%, yielding a relative reduction of 5.20% and 2.55% over baselines, respectively.

Advisors: Lee, Soo-Young researcher; 이수영 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2018

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2018.2,[vii, 103 p. :]

Keywords: top-down selective attention▼aconfidence measure▼aN-best rescoring▼aparameter optimization▼autterance verification▼aautomatic speech recognition; 하향식 선택 집중▼a신뢰도 측정▼aN-best 리스코어링▼a매개 변수 최적화▼a음성 확인▼a자동 음성 인식

URI: http://hdl.handle.net/10203/265254

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=734385&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Top-down selective attention with a deep neural network and confidence measure for automatic speech recognition심층 신경망에의 하향식 주의 집중과 신뢰도 측정을 이용한 자동 음성 인식 연구

KOASAS

Communities & Collections