Semantically-driven cut-and-paste data augmentation strategy for automatic speech recognition자동 음성 인식을 위한 의미 중심 컷앤페이스트 데이터 증강 전략

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor양은호-
dc.contributor.authorSeo, Kyusung-
dc.contributor.author서규성-
dc.date.accessioned2024-07-30T19:30:38Z-
dc.date.available2024-07-30T19:30:38Z-
dc.date.issued2024-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096061&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/321356-
dc.description학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 17 p. :]-
dc.description.abstractA data augmentation technique involving cut-and-paste operations has garnered significant interest within the field of computer vision because of its straightforward nature and its proven effectiveness in enhancing the ability to generalize. However, applying this method to Automatic Speech Recognition (ASR) tasks poses challenges due to the varying lengths of segments corresponding to specific output tokens such as words or sub-words. Furthermore, if speech segments are combined without regard for their meaning, there is a risk of generating incoherent or nonsensical sentences. In this paper, we introduce a method called WeavSpeech, which addresses these challenges by offering a straightforward yet powerful cut-and-paste augmentation approach for ASR tasks. WeavSpeech weaves together pairs of speech data while taking into account their semantics. This method is universally applicable to languages without requiring language-specific knowledge and can be seamlessly incorporated with other verified augmentation techniques such as SpecAugment. Our research demonstrates the superiority of WeavSpeech on well-known ASR benchmark datasets, including LibriSpeech and WSJ.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject음성 인식▼a데이터 증강▼a컷앤페이스트▼a컷믹스▼a믹스업-
dc.subjectSpeech recognition▼aData augmentation▼aCut-and-paste▼aCutmix▼aMixup-
dc.titleSemantically-driven cut-and-paste data augmentation strategy for automatic speech recognition-
dc.title.alternative자동 음성 인식을 위한 의미 중심 컷앤페이스트 데이터 증강 전략-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :김재철AI대학원,-
dc.contributor.alternativeauthorYang, Eunho-
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0