Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 44
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJang, Kangwookko
dc.contributor.authorKim, Sungnyunko
dc.contributor.authorYun, Seyoungko
dc.contributor.authorKim, Hoi-Rinko
dc.date.accessioned2023-11-22T07:02:17Z-
dc.date.available2023-11-22T07:02:17Z-
dc.date.created2023-11-22-
dc.date.issued2023-08-21-
dc.identifier.citation24th International Speech Communication Association, Interspeech 2023, pp.316 - 320-
dc.identifier.urihttp://hdl.handle.net/10203/315050-
dc.description.abstractTransformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks. However, huge number of parameters in speech SSL models necessitate the compression to a more compact model for wider usage in academia or small companies. In this study, we suggest to reuse attention maps across the Transformer layers, so as to remove key and query parameters while retaining the number of layers. Furthermore, we propose a novel masking distillation strategy to improve the student model's speech representation quality. We extend the distillation loss to utilize both masked and unmasked speech frames to fully leverage the teacher model's high-quality representation. Our universal compression strategy yields the student model that achieves phoneme error rate (PER) of 7.72% and word error rate (WER) of 9.96% on the SUPERB benchmark.-
dc.languageEnglish-
dc.publisherInternational Speech Communication Association-
dc.titleRecycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85171584650-
dc.type.rimsCONF-
dc.citation.beginningpage316-
dc.citation.endingpage320-
dc.citation.publicationname24th International Speech Communication Association, Interspeech 2023-
dc.identifier.conferencecountryIE-
dc.identifier.conferencelocationDublin-
dc.contributor.localauthorYun, Seyoung-
dc.contributor.localauthorKim, Hoi-Rin-
Appears in Collection
AI-Conference Papers(학술대회논문)EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0