Learning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 106
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorSong, Jaeyunko
dc.contributor.authorShim, Hajinko
dc.contributor.authorYang, Eunhoko
dc.date.accessioned2022-12-05T02:05:41Z-
dc.date.available2022-12-05T02:05:41Z-
dc.date.created2022-12-04-
dc.date.created2022-12-04-
dc.date.created2022-12-04-
dc.date.issued2021-12-15-
dc.identifier.citation2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021, pp.441 - 448-
dc.identifier.urihttp://hdl.handle.net/10203/301615-
dc.description.abstractMonotonic Multihead Attention, which allows multiple heads to learn their own alignments per head, shows great performance on simultaneous machine translation and streaming speech recognition. However, it causes high latency waiting for the slowest head. Some recent advances such as Head-Synchronous Beam Search Decoding and its learnable version Mutually-Constrained Monotonic Multihead Attention, try to address this issue by restricting the difference in times of chosen frames among multi-heads to a fixed waiting time threshold. In this paper, we hypothesis that the optimal threshold for high performance with low latency depends on the input sequence, and propose an adaptive algorithm that learns how long to wait depending on input tokens by introducing a threshold prediction module. We evaluate our approach on two benchmark datasets for online Automatic Speech Recognition task and demonstrate that our method reduces the latency together with even improving the recognition accuracy.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleLearning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR-
dc.typeConference-
dc.identifier.wosid000792364700059-
dc.identifier.scopusid2-s2.0-85126795282-
dc.type.rimsCONF-
dc.citation.beginningpage441-
dc.citation.endingpage448-
dc.citation.publicationname2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021-
dc.identifier.conferencecountryCK-
dc.identifier.conferencelocationCartagena-
dc.identifier.doi10.1109/ASRU51503.2021.9688138-
dc.contributor.localauthorYang, Eunho-
dc.contributor.nonIdAuthorShim, Hajin-
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0