Meta-learning for speaker recognition in practical scenarios실제 상황에서의 화자인식을 위한 메타학습 방식에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 174
  • Download : 0
The goal of this thesis is text-independent speaker recognition where utterances come from 'in the wild' scenarios and may contain insufficient or irrelevant information. In other words, we can consider the speaker recognition for practical scenarios as long-short utterance pair and pair-matching problem. To this end, we first introduce a meta-learning framework for imbalance length pairs. Specifically, we use a Prototypical Networks and train the model with a support set of long utterances and a query set of short utterances of varying lengths. Further, since optimizing only for the classes in the given episode may be insufficient for learning discriminative embeddings for unseen classes, we additionally enforce the model to classify both the support and the query set against the entire set of classes in the training set. By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance(1-2 seconds) on the VoxCeleb datasets. We also validate our proposed model for unseen speaker identification, on which it also achieves significant performance gains over the existing approaches. Secondly, for the pair-matching problem of speaker verification, we propose Cross Attentive Pooling(CAP) that utilizes the context information across the reference-query pair to generate utterance-level embedding that contains the most discriminative information for the pair matching problem. Experiments are performed on the VoxCeleb datasets in which our method outperforms comparable pooling strategies.
Advisors
Kim, Hoirinresearcher김회린researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[v, 42 p. :]

Keywords

speaker verification▼aspeaker identification▼ameta-learning▼ashort duration▼atext-independent▼aopen-set; 화자인증▼a화자식별▼a메타학습▼a짧은 길이▼a문장 독립▼a오픈셋

URI
http://hdl.handle.net/10203/295966
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=948677&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0