DSpace at KOASAS: Semantically-driven cut-and-paste data augmentation strategy for automatic speech recognition

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Semantically-driven cut-and-paste data augmentation strategy for automatic speech recognition자동 음성 인식을 위한 의미 중심 컷앤페이스트 데이터 증강 전략

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 5
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	양은호	-
dc.contributor.author	Seo, Kyusung	-
dc.contributor.author	서규성	-
dc.date.accessioned	2024-07-30T19:30:38Z	-
dc.date.available	2024-07-30T19:30:38Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096061&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321356	-
dc.description	학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 17 p. :]	-
dc.description.abstract	A data augmentation technique involving cut-and-paste operations has garnered significant interest within the field of computer vision because of its straightforward nature and its proven effectiveness in enhancing the ability to generalize. However, applying this method to Automatic Speech Recognition (ASR) tasks poses challenges due to the varying lengths of segments corresponding to specific output tokens such as words or sub-words. Furthermore, if speech segments are combined without regard for their meaning, there is a risk of generating incoherent or nonsensical sentences. In this paper, we introduce a method called WeavSpeech, which addresses these challenges by offering a straightforward yet powerful cut-and-paste augmentation approach for ASR tasks. WeavSpeech weaves together pairs of speech data while taking into account their semantics. This method is universally applicable to languages without requiring language-specific knowledge and can be seamlessly incorporated with other verified augmentation techniques such as SpecAugment. Our research demonstrates the superiority of WeavSpeech on well-known ASR benchmark datasets, including LibriSpeech and WSJ.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	음성 인식▼a데이터 증강▼a컷앤페이스트▼a컷믹스▼a믹스업	-
dc.subject	Speech recognition▼aData augmentation▼aCut-and-paste▼aCutmix▼aMixup	-
dc.title	Semantically-driven cut-and-paste data augmentation strategy for automatic speech recognition	-
dc.title.alternative	자동 음성 인식을 위한 의미 중심 컷앤페이스트 데이터 증강 전략	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :김재철AI대학원,	-
dc.contributor.alternativeauthor	Yang, Eunho	-

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Semantically-driven cut-and-paste data augmentation strategy for automatic speech recognition자동 음성 인식을 위한 의미 중심 컷앤페이스트 데이터 증강 전략

KOASAS

Communities & Collections