Audio-text word representations for open-vocabulary keyword spotting개방 어휘 핵심어 검출을 위한 단어의 음성-텍스트 표현 기법에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 15
  • Download : 0
Open-vocabulary keyword spotting (KWS) is a technology detecting an occurrence of arbitrary keywords from input audio, and it has high research value in that users can customize their keywords while liberated from a pre-defined set of keywords of conventional KWS. However, compared to conventional KWS widely used in mobile devices, open-vocabulary KWS needs much performance improvement before it can be practically applicable. In this dissertation, we analyze the optimistic development of recent deep learning approaches utilizing audio and text jointly to represent words through the roles of two modalities when mapped to the same embedding space, and it is confirmed with a proposed Decoder-Sharing method. We extend the framework of audio-text representation into proxy-based deep metric learning (DML) and propose an Asymmetric-Proxy loss by exploring the optimal combination of existing DML loss functions. In addition, we introduce an Adaptive Margin and Scale method where class-wise learnable parameters dynamically change according to the training progress, which shows significant improvement in generalization performance. Finally, we propose a Monotonic-Aligned Audio-Text loss to resolve the data segmentation problem that embedding-based open-vocabulary KWS approaches suffer at inference.
Advisors
김회린researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[v, 52 p. :]

Keywords

개방 어휘 핵심어 검출▼a디코더 공유 기법▼a비대칭적-프록시 손실 함수▼a적응적 마진과 스케일▼a단조 정렬된 음성-텍스트 손실 함수; Open-vocabulary keyword spotting▼aDecoder-sharing▼aAsymmetric-proxy loss▼aAdaptive margin and scale▼aMonotonic-aligned audio-text loss

URI
http://hdl.handle.net/10203/322153
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100053&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0