Analysis-Based Optimization of Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification

Cited 2 time in webofscience Cited 0 time in scopus
  • Hit : 83
  • Download : 0
Temporal dynamic convolution neural networks (TDY-CNNs) extract speaker embeddings considering the time-varying characteristics of speech and improve text-independent speaker verification performance. In this paper, we optimize TDY-CNNs based on the detailed analysis of the network architecture. The temporal dynamic convolution generates attention weight of basis kernels from features defined by concatenating average channel and frequency data, resulting in a reduction in network parameters by 26%. In addition, the temporal dynamic convolutions replace vanilla convolutions in earlier layers, while the optimized temporal dynamic convolutions of latter layers use a steady kernel regardless of time bin data. As a result, Opt-TDY-ResNet-34(x0.50) shows the best speaker verification performance with EER of 1.07% among speaker verification models without data augmentation including ResNet-based baseline networks and other state-of-the-art networks. Moreover, we validate that Opt-TDY-CNNs adapt to time-bin data through various methods. By comparing the inter and intra phoneme distance of attention weights, it was confirmed that the temporal dynamic convolution uses different kernels depending on the phoneme groups directly related to the time-bin data. In addition, by applying gradient-weighted class activation mapping (Grad-CAM) on speaker verification to obtain speaker activation map (SAM), we showed that temporal dynamic convolution extracts speaker information from frequency characteristics of time bins such as phonemes' formant frequencies while vanilla convolution extracts vague outline of Mel-spectrogram.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Issue Date
2023-06
Language
English
Article Type
Article
Citation

IEEE ACCESS, v.11, pp.60646 - 60659

ISSN
2169-3536
DOI
10.1109/ACCESS.2023.3286034
URI
http://hdl.handle.net/10203/310480
Appears in Collection
ME-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 2 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0