Learnable gammatone filterbank and energy normalized gated convolutional network for environmental sound classification환경음 인식을 위한 학습가능 감마톤 필터뱅크 및 에너지 정규화된 게이티드 컨볼루셔널 신경망 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 156
  • Download : 0
This dissertation considers a deep neural network architecture based on gammatone filterbank (GTFB) and gated convolutional neural networks (GCNN) for environmental sound classification (ESC). In previous ESC methods, 2D CNNs on time-frequency representation inputs have shown good performance. Specially, mel-frequency filterbank (MelFB) to reflect human auditory model is the most used time-frequency representation for environmental sounds. However, human auditory based processing may not be the most appropriate method for environmental sounds. In this dissertation, a learnable gammatone filterbank (LGTFB) layer is proposed to obtain time-frequency representation from raw waveform input. The LGTFB layer is a 1D convolutional layer with kernels based on bandpass gammatone filters that have been utilized to model auditory systems. Moreover, a normalization method based on a switchable normalization (SN) to improve generalization ability of the time-frequency representation obtained by LGTFB is introduced. In this dissertation, SN learns a weighted combination of instance normalization (IN) per frequency bin and local response normalization (LRN) methods. The proposed normalization method can learn good combination of the normalization methods to increase training accuracy. Finally, energy normalized gated CNN (ENGCNN) is proposed to extract feature from the LGTFB activation. The purpose of using gated architecture is to pass target sound feature and reduce surrounding sound features in time-frequency domain. However, we find that the gating map depends on the local input energy empirically. To reduce this dependency, an input energy is normalized to obtain gating function that does not depends on the input energy. To confirm the effectiveness of the considered architecture and normalization method, ESC experiments on ESC-50 and UrbanSound8K were conducted. The proposed model showed state-of-the-art results on the considered datasets. Moreover, ensemble model achieved more performance improvement.
Advisors
Yoo, Chang Dongresearcher유창동researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2019
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2019.8,[v, 60 p. :]

Keywords

Environmental sound classification▼agammatone filterbank▼aenergy normalization▼aconvolutional neural network activation; 환경음 인식▼a감마톤 필터뱅크▼a에너지 정규화▼a컨볼루셔널 신경망

URI
http://hdl.handle.net/10203/283269
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=871443&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0