DSpace at KOASAS: Textual Adversarial Training of Machine Learning Model for Resistance to Adversarial Examples

DSpace at KOASAS

RIMS Collection RIMS Journal Papers

Textual Adversarial Training of Machine Learning Model for Resistance to Adversarial Examples

Cited 3 time in

Cited 0 time in

Hit : 121
Download : 0

Export

Kwon, Hyun / Lee, Sanghyun researcher

Deep neural networks provide good performance for image recognition, speech recognition, text recognition, and pattern recognition. However, such networks are vulnerable to attack by adversarial examples. Adversarial examples are created by adding a small amount of noise to an original sample in such a way that no problem is perceptible to humans, yet the sample will be incorrectly recognized by a model. Adversarial examples have been studied mainly in the context of images, but research has expanded to include the text domain. In the textual context, an adversarial example is a sample of text in which certain important words have been changed so that the sample will be misclassified by a model even though to humans it is the same as the original text in terms of meaning and grammar. In the text domain, there have been relatively few studies on defenses against adversarial examples compared with the number of studies on adversarial example attacks. In this paper, we propose an adversarial training method to defend against adversarial examples that target the latest text model, bidirectional encoder representations from transformers (BERT). In the proposed method, adversarial examples are generated using various parameters and then are applied in additional training of the target model to instill robustness against unknown adversarial examples. Experiments were conducted using five datasets (AG's News, a movie review dataset, the IMDB Large Movie Review Dataset (IMDB), the Stanford Natural Language Inference (SNLI) corpus, and the Multi-Genre Natural Language Inference (MultiNLI) corpus), with TensorFlow as the machine learning library. According to the experimental results, the baseline model had an accuracy of 88.1% on the original sentences and an accuracy of 9.2% on the adversarial sentences, whereas the model that underwent the proposed training method maintained an average accuracy of 87.2% on the original sentences and had an average accuracy of 22.5% on the adversarial sentences.

Publisher: WILEY-HINDAWI

Issue Date: 2022-04

Language: English

Article Type: Article

Citation: SECURITY AND COMMUNICATION NETWORKS, v.2022

ISSN: 1939-0114

DOI: 10.1155/2022/4511510

URI: http://hdl.handle.net/10203/296558

Appears in Collection: RIMS Journal Papers

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 3 items in WoS	Click to see citing articles in

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Textual Adversarial Training of Machine Learning Model for Resistance to Adversarial Examples

This item is cited by other documents in WoS

KOASAS

Communities & Collections