Rethinking the Entropy of Instance in Adversarial Training

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 32
  • Download : 0
Adversarial training, which minimizes the loss of adversarially-perturbed training examples, has been extensively studied as a solution to improve the robustness of deep neural networks. However, most adversarial training methods treat all training examples equally, while each example may have a different impact on the model's robustness during the course of adversarial training. A couple of recent works have exploited such unequal importance of adversarial samples to the model's robustness by proposing to assign more weights to the misclassified samples or to the samples that violate the margin more severely, which have been shown to obtain high robustness against untargeted PGD attacks. However, we empirically find that they make the feature spaces of adversarial samples across different classes overlap and thus yield more high-entropy samples whose labels could be easily flipped. This makes them more vulnerable to adversarial perturbations, and their seemingly good robustness against PGD attacks is actually achieved by a false sense of robustness. To address such limitations, we propose simple yet effective re-weighting scheme that weighs the loss for each adversarial training example proportionally to the entropy of its predicted distribution to focus on examples whose labels are more uncertain.
Publisher
Institute of Electrical and Electronics Engineers Inc.
Issue Date
2023-02-08
Language
English
Citation

2023 IEEE Conference on Secure and Trustworthy Machine Learning, SaTML 2023, pp.316 - 326

DOI
10.1109/SaTML54575.2023.00029
URI
http://hdl.handle.net/10203/316276
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0