Non-contrastive self-supervised learning with uno process for respiratory sound classification호흡기 청진음 분류를 위한 UNO 처리 기반 비대조-자기지도학습 기법의 평가

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 72
  • Download : 0
Both auscultation and deep learning have come a long way since the 19\textit{th} century. The evolution of technology creates possibilities in respiratory auscultation through join-forces with deep learning-based digital stethoscope sound analysis. There will be exciting applications in this new era of auscultation such as augmented intelligence for cost-effective medical student training and telemedicine. Unfortunately, it is hard for deep neural networks to generalise over complex data representation in a small-scale, long-tailed regime. With other potential limitations such as the trustworthiness of labels, limited audio augmentation methods and privacy issue constraints, we believe that SSL is a good starting point to address these problems through representation learning. However, one of the state-of-the-art SSL models - BYOL suffers from dimensional collapse in the ICBHI'17 dataset because of a low initial latent entropy due to limited training sample size and small data augmentation distribution. Therefore, we propose a non-contrastive SSL algorithm, Unpredictable Neuron Operation (UNO) process. UNO process is a simple yet effective algorithm that utilises neuron masking as a model augmentation variant to maximise the latent representation and avoid collapsing. We replace \texttt{BatchNorm} layer of the predictor with Neuron Mask, which ignores or injects noises via Binomial or Gaussian distribution onto the prediction latent. We demonstrate theoretically and empirically that UNO process is invariant to training sample size and data augmentation distribution while acting as an upper bound to the latent representation. Our UNO-trained audio spectrogram transformer model reaches a novel score of 59.14\% on pretext dataset ICBHI'17 official split and 84.23\% on downstream dataset Fraiwan. On top of that, UNO process alleviates the long-tailed data imbalance effect. Besides that, our ablation study suggests that UNO process is more robust towards the choice of data augmentation.
Advisors
Youn, Chan-Hyunresearcher윤찬현researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[v, 71 p. :]

Keywords

Respiratory Sounds Classification▼aNon-Contrastive Self-Supervised Learning▼aAudio Deep Learning▼aRepresentation Learning▼aEntropy Maximisation; 호흡기 청진음 분류를▼a비대조-자기지도학습▼a딥러닝 기반의 오디오 기술▼a가진 표현 학습▼a엔트로피최대화

URI
http://hdl.handle.net/10203/309991
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008385&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0