Learning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 100
  • Download : 0
Monotonic Multihead Attention, which allows multiple heads to learn their own alignments per head, shows great performance on simultaneous machine translation and streaming speech recognition. However, it causes high latency waiting for the slowest head. Some recent advances such as Head-Synchronous Beam Search Decoding and its learnable version Mutually-Constrained Monotonic Multihead Attention, try to address this issue by restricting the difference in times of chosen frames among multi-heads to a fixed waiting time threshold. In this paper, we hypothesis that the optimal threshold for high performance with low latency depends on the input sequence, and propose an adaptive algorithm that learns how long to wait depending on input tokens by introducing a threshold prediction module. We evaluate our approach on two benchmark datasets for online Automatic Speech Recognition task and demonstrate that our method reduces the latency together with even improving the recognition accuracy.
Publisher
Institute of Electrical and Electronics Engineers Inc.
Issue Date
2021-12-15
Language
English
Citation

2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021, pp.441 - 448

DOI
10.1109/ASRU51503.2021.9688138
URI
http://hdl.handle.net/10203/301615
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0