DSpace at KOASAS: Detecting textual adversarial examples through text modification on text classification systems

DSpace at KOASAS

RIMS Collection RIMS Journal Papers

Detecting textual adversarial examples through text modification on text classification systems

Cited 2 time in

Cited 0 time in

Hit : 71
Download : 0

Export

Kwon, Hyun / Lee, Sanghyun researcher

In this paper, we propose a method for detecting adversarial examples using a text modification module. The proposed method detects adversarial examples based on the change in classification result that occurs when a sample is modified by arbitrarily changing a specific word to a similar word. The method exploits the fact that the adversarial example's sensitivity to changes to specific words is greater than that of the original sample. Experiments were conducted with three datasets (AG's News, a movie review dataset, and the IMDB Large Movie Review Dataset), and TensorFlow was used as a machine learning library. In the experiment using these datasets, the proposed method detected an average of 71.7% of the adversarial sentences while minimizing the change in the results given by the model for the original sentences to an average of 2.9%.

Publisher: SPRINGER

Issue Date: 2023-08

Language: English

Article Type: Article

Citation: APPLIED INTELLIGENCE, v.53, no.16, pp.19161 - 19185

ISSN: 0924-669X

DOI: 10.1007/s10489-022-03313-w

URI: http://hdl.handle.net/10203/312291

Appears in Collection: RIMS Journal Papers

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 2 items in WoS	Click to see citing articles in

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Detecting textual adversarial examples through text modification on text classification systems

This item is cited by other documents in WoS

KOASAS

Communities & Collections