DSpace at KOASAS: Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

DSpace at KOASAS

RIMS Collection RIMS Conference Papers

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 439
Download : 0

Export

Kim, Taehyeon / Oh, Jaehoon / Kim, Nak Yil / Cho, Sangwook / Yun, Se-Young researcher

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler(KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ . Despite its widespread use, few studies have discussed the influence of such softening on generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the difference in the penultimate layer representations between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, particularly when using the KL divergence loss with small τ , mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd data/.

Publisher: IJCAI

Issue Date: 2021-08-24

Language: English

Citation: 30th International Joint Conference on Artificial Intelligence (IJCAI-21), pp.2628 - 2635

URI: http://hdl.handle.net/10203/290714

Appears in Collection: RIMS Conference Papers

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

KOASAS

Communities & Collections