Semantically-driven cut-and-paste data augmentation strategy for automatic speech recognition자동 음성 인식을 위한 의미 중심 컷앤페이스트 데이터 증강 전략

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
A data augmentation technique involving cut-and-paste operations has garnered significant interest within the field of computer vision because of its straightforward nature and its proven effectiveness in enhancing the ability to generalize. However, applying this method to Automatic Speech Recognition (ASR) tasks poses challenges due to the varying lengths of segments corresponding to specific output tokens such as words or sub-words. Furthermore, if speech segments are combined without regard for their meaning, there is a risk of generating incoherent or nonsensical sentences. In this paper, we introduce a method called WeavSpeech, which addresses these challenges by offering a straightforward yet powerful cut-and-paste augmentation approach for ASR tasks. WeavSpeech weaves together pairs of speech data while taking into account their semantics. This method is universally applicable to languages without requiring language-specific knowledge and can be seamlessly incorporated with other verified augmentation techniques such as SpecAugment. Our research demonstrates the superiority of WeavSpeech on well-known ASR benchmark datasets, including LibriSpeech and WSJ.
Advisors
양은호researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 17 p. :]

Keywords

음성 인식▼a데이터 증강▼a컷앤페이스트▼a컷믹스▼a믹스업; Speech recognition▼aData augmentation▼aCut-and-paste▼aCutmix▼aMixup

URI
http://hdl.handle.net/10203/321356
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096061&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0