DSpace at KOASAS: Learning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR

DSpace at KOASAS

RIMS Collection RIMS Conference Papers

Learning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR

Cited 0 time in webofscience

Cited 0 time in

Hit : 106
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Song, Jaeyun	ko
dc.contributor.author	Shim, Hajin	ko
dc.contributor.author	Yang, Eunho	ko
dc.date.accessioned	2022-12-05T02:05:41Z	-
dc.date.available	2022-12-05T02:05:41Z	-
dc.date.created	2022-12-04	-
dc.date.created	2022-12-04	-
dc.date.created	2022-12-04	-
dc.date.issued	2021-12-15	-
dc.identifier.citation	2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021, pp.441 - 448	-
dc.identifier.uri	http://hdl.handle.net/10203/301615	-
dc.description.abstract	Monotonic Multihead Attention, which allows multiple heads to learn their own alignments per head, shows great performance on simultaneous machine translation and streaming speech recognition. However, it causes high latency waiting for the slowest head. Some recent advances such as Head-Synchronous Beam Search Decoding and its learnable version Mutually-Constrained Monotonic Multihead Attention, try to address this issue by restricting the difference in times of chosen frames among multi-heads to a fixed waiting time threshold. In this paper, we hypothesis that the optimal threshold for high performance with low latency depends on the input sequence, and propose an adaptive algorithm that learns how long to wait depending on input tokens by introducing a threshold prediction module. We evaluate our approach on two benchmark datasets for online Automatic Speech Recognition task and demonstrate that our method reduces the latency together with even improving the recognition accuracy.	-
dc.language	English	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Learning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR	-
dc.type	Conference	-
dc.identifier.wosid	000792364700059	-
dc.identifier.scopusid	2-s2.0-85126795282	-
dc.type.rims	CONF	-
dc.citation.beginningpage	441	-
dc.citation.endingpage	448	-
dc.citation.publicationname	2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021	-
dc.identifier.conferencecountry	CK	-
dc.identifier.conferencelocation	Cartagena	-
dc.identifier.doi	10.1109/ASRU51503.2021.9688138	-
dc.contributor.localauthor	Yang, Eunho	-
dc.contributor.nonIdAuthor	Shim, Hajin	-

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR

KOASAS

Communities & Collections