KAIST
Research of Adversarial Example
on a Deep Neural Network
18 February 2019
Hyun Kwon
2
Outline
Introduction
Related work
Adversarial example attacks are divided into four categories
Target model information, distance measure, recognition, and generating
method
Adversarial example defense
Reactive defense
Proactive defense
Problem Statement
Scheme 1
Scheme 2
Conclusion
Reference
3
Deep neural network (DNN)
Effective performance in machine learning tasks
Image recognition
Speech recognition
Intrusion detection
Pattern analysis
Introduction
4
Threat to the security of DNN
Adversarial example
Slightly modified data that lead to incorrect classification
Introduction
ㆍㆍㆍ
ㆍㆍㆍ
Pr(0) = 0.89
Pr(1) = 0.03
Pr(n-1) = 0.01
Pr(n) = 0.02
ㆍㆍㆍ
Input layer
Output layer
: Node: Weight link
5
Threat to the security of DNN
Adversarial example
Slightly modified data that lead to incorrect classification
Introduction
ㆍㆍㆍ
ㆍㆍㆍ
Pr(0) = 0.02
Pr(1) = 0.03
Pr(n-1) = 0.01
Pr(n) = 0.84
ㆍㆍㆍ
Input layer
Output layer
: Node: Weight link
6
Introduction
Adversarial example generation
Calculate
each class
probability
Choose
targeted
class
Adjust 𝑤 value
𝑋
NoYes
𝑋∗
= 𝑋 + 𝑤
𝑓 𝑋 = 7
𝑓 𝑋∗
= 0
𝑓 𝑋 + 𝑤 = 0
Neural Network Ex) Targeted class “0”
7
𝒙
ㆍ
𝒙∗
ㆍ𝑳(𝒙, 𝒙∗
)
𝒙∗
ㆍ
𝒙∗
ㆍ
Introduction
Adversarial example 𝑥∗
𝒙∗
: 𝒂𝒓𝒈𝒎𝒊𝒏 𝒙∗ 𝑳 𝒙, 𝒙∗
𝐬. 𝐭. 𝐟 𝐱∗
≠ 𝒇(𝒙)
𝐿(ㆍ) : Distance between 𝒙 and 𝒙∗
8
Introduction
ML Security Issues Category
Causative attacks influence learning with control over training data
Ex) poisoning attack
Exploratory attacks exploit misclassification but don’t affect training
Ex) Adversarial example
Poisoning attack
Decreasing the recognition accuracy of the target model.
Adding malicious training data on targeted model
Assume
It requires that the attacker access training data
9
Introduction
Adversarial example
Causing misclassification by adding a little of noise to the sample.
Szedgedy et al. first presented the adversarial example
Attacker transforms an image slightly, causing this adversarial example
Advance attacks and their counter measures has been proposed
10
Type of target model information
White box attack
The attacker knows the detailed information about the target model
Model architecture, parameters, and class probabilities
Success rate of white box attack reaches almost 100%
Black box attack
The attacker dose not know the information about the target model
Only query the target model
The well-known black box attack are twofold
Transferability
Universal perturbation
Substitute network
11
Type of target model information
Transferability (black-box attack)
An adversarial example modified for a single target model is effective for
other model.
Adversarial examples generated using ensemble-based approaches can
successfully attack black box image classification.
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. ICLR,
abs/1611.02770, 2017.
Kwon, Hyun, et al. "Advanced ensemble adversarial example on unknown deep neural network classifiers." IEICE Transactions on
Information and Systems 101.10 (2018): 2485-2500.
12
Type of target model information
Universal perturbation
Find a universal perturbation vector.
𝜂 𝑝 ≤ 𝜖
Ρ 𝑥∗ ≠ 𝑓 𝑥 ≥ 1 − 𝜎
𝜖 limits the size of universal perturbation
𝜎 controls the failure rate of all the adversarial
examples.
This loop continue until the most data sample
are fooled (P < 1 − 𝜎 )
Seyed Mohsen Moosavi Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard.
Universal adversarial perturbations.
In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), number EPFL-CONF-226156, 2017.
13
Type of target model information
Substitute network (black-box attack)
The attacker can create a substitute network similar to the target model
By repeating the query process.
Once a substitute network is created, the attacker can perform a white box
attack.
Approximately 80% attack success for Amazon and Google services
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks
against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages
506–519. ACM, 2017.
14
Type of distance measure
There are three ways to measure the distortion
𝐿0, 𝐿2, 𝐿∞
𝐿0 represents the sum of the number of all changed pixels
෍
𝑖=0
𝑛
𝑥𝑖 − 𝑥𝑖
∗
𝐿2 represents standard Euclidean norm
෍
𝑖=0
𝑛
(𝑥𝑖 − 𝑥𝑖
∗
)2
𝐿∞ is the maximum distance value between 𝑥𝑖 and 𝑥𝑖
∗
As the three distance measures become small,
the similarity of the sample image increases.
15
Type of target recognition
There are two types: targeted attack and untargeted attack
Targeted attack
The target model to recognize the adversarial example as a
particular intended class
𝑥∗: 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗ s. t. f x∗ = 𝑦∗
Untargeted attack
The target model to recognize the adversarial example as a class
other than the original class.
𝑥∗
: 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗
s. t. f x∗
≠ 𝑦
16
Methods of adversarial attack
Fast-gradient sign method (FGSM)
Take a step in the direction of the gradient of the loss function
𝑥∗ = 𝑥 + 𝜖 ∙ 𝑠𝑖𝑔𝑛(𝛻𝑙𝑜𝑠𝑠 𝐹,𝑡 𝑥 )
This is simple and good performance.
Iterative FGSM (I-FGSM)
Update version of the FGSM
Instead of changing the amount of 𝜖 , a smaller amount of 𝛼 is used.
Clipped by the same 𝜖
𝑥𝑖
∗
= 𝑥𝑖−1
∗
− 𝑐𝑙𝑖𝑝 𝜖(𝛼 ∙ 𝑠𝑖𝑔𝑛(𝛻𝑙𝑜𝑠𝑠 𝐹,𝑡 𝑥𝑖−1
∗
)
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International
Conference on Learning Representations, 2015.
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. ICLR Workshop, 2017.
17
Methods of adversarial attack
Carlini-Wagner (CW)
7 objective function (𝑓1~𝑓7) + 3 distance attack (𝐿1, 𝐿2, 𝐿∞)
Choosing constant 𝑐 (control weight of class)
𝑚𝑖𝑛𝑖𝑚𝑧𝑒 𝐷 𝑥, 𝑥 + 𝑤 + 𝑐 ∙ 𝑓(𝑥 + 𝑤) 𝑠. 𝑡. 𝑥 + 𝑤 ∈ 0,1 𝑛
Confidence 𝑘 (control distortion)
𝑓 𝑥∗
= 𝑚𝑎 𝑥 𝑚𝑎𝑥 𝑍 𝑥∗
𝑖 ∶ 𝑖 ≠ 𝑡 − 𝑍 𝑥∗
𝑡, −𝑘
It can applied to the image and audio domain.
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE
Symposium on, pages 39–57. IEEE, 2017.
18
Methods of adversarial attack
One pixel attack
Only modifying one pixel to cause the misclassification
Differential evolution (DE) to find the optimal solution.
Current solution (father)
Candidate solution (child)
CIFAR10 with 70.97% success rate
All convolution network (AllConv), Network in Network (NiN), VGG16
Su, Jiawei, Danilo Vasconcellos Vargas, and Kouichi Sakurai. "One pixel attack for fooling deep neural networks." IEEE
Transactions on Evolutionary Computation (2019).
19
Methods of defense
The defense of adversarial examples have two types
Reactive: detect the adversarial example
Proactive: make deep neural networks more robust.
Reactive defense
Adversarial example detection
Proactive defense
Distillation method
Adversarial training
Filtering method
Ensemble defense methods are available.
20
Reactive defense
Adversarial detecting
Binary threshold: last layer’s output as the features
Distinguish distribution differences
Confidence value, p-value
Y.-C. Lin, M.-Y. Liu, M. Sun, and J.-B. Huang, “Detecting adversarial attacks on neural network policies with visual
foresight,” arXiv preprint arXiv:1710.00814, 2017.
T. Pang, C. Du, Y. Dong, and J. Zhu, “Towards robust detection of adversarial examples,”
arXiv preprint arXiv:1706.00633, 2017.
21
Proactive defense
Distillation method
Using two neural network (detailed class probability)
Ex) “1”, class: [0100000000]  1”, class:[0.02 0.91 … 0.02]
Avoid calculating the gradient of the loss function.
Training Data 𝑥 Training label y
0
1
0
0 Training Data 𝑥 Training label f(x)
DNN 𝑓(𝑥) trained at temperature T DNN 𝑓 𝑑𝑖𝑠𝑡𝑖𝑙
(𝑥) trained at temperature T
Probability Vector Predictions 𝑓(𝑥) Probability Vector Predictions 𝑓 𝑑𝑖𝑠𝑡𝑖𝑙
(𝑥)
0.02
0.92
0.04
0.02
0.02
0.92
0.04
0.02
Initial Network Distilled Network
0.03
0.93
0.01
0.03
Nicolas Papernot, Patrick McDaniel, XiWu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial
perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 582–597.
IEEE, 2016.
22
Proactive defense
Adversarial training
Original example + adversarial example training process
Simple and effective
Make sure that the accuracy of the original sample is not
compromised.
Ensemble adversarial training
Using several neural networks
It is more resistance to the adversarial example.
Tramèr, Florian, et al. "Ensemble adversarial training: Attacks and defenses." arXiv preprint
arXiv:1705.07204 (2017).
23
Proactive defense
Filtering method
Eliminating the perturbation of the adversarial example
Creating a filtering module requires time and process.
Generator𝑿 𝒂𝒅𝒗
“7” (96.5%)
training Success rate
Targeted model
Shen et al. "AE-GAN: adversarial eliminating with GAN." arXiv preprint arXiv:1707.05474 (2017).
24
Ensemble defense method
Magnet method
There are two modules: Detector and Reformer
Generality
Detector
Find the adversarial example by comparing the output of several
original samples.
Detecting adversarial examples with large distortion.
However, if the distortion is small, the detection probability is
lowered
Multiple Detector configurations can be also combined
Detector Reformer Classifier
Dongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC
Conference on Computer and Communications Security, pages 135–147. ACM, 2017.
25
Ensemble defense method
Reformer
Targeting adversarial example with small distortion.
auto-encoder is used.
Reform: it convert adversarial examples with output that most
closely resembles the original sample.
26
Ensemble defense method
Ex) Magnet method
Detector
Reformer
Classifier
<Detector를 37개가 통과>
<Reform에서는 37개 중에는 2개의 adversarial example이 성공>
27
Ensemble defense method
Feature Squeezing
By comparing the output result from three models
Detecting adversarial examples
Xu, Weilin, David Evans, and Yanjun Qi. "Feature squeezing: Detecting adversarial examples in deep neural networks." NDSS 2018.
28
Scheme 1. Problem Definition
In military domain, adversarial example is useful
Deceive an enemy’s machine classifier
Ex) Battlefield road signs
Modified to deceive an adversary’s self-driving vehicle
But, friendly self-driving vehicles should not be deceived
Friendly self-driving vehicle Adversary’s self-driving vehicle
“Left” “Right”
29
Scheme 2. Problem Definition
Untargeted adversarial example
focus on certain classes for a
original class
It is easy to satisfy
misclassification
There is a pattern problem in
generating untargeted adversarial
example
By analyzing the output classes,
the defense determines the
original class.
<Confusion matrix in MNIST>
30
Scheme 1
Friend-safe Evasion Attack: an adversarial example is
correctly classified by a friendly classifier.
Goal
Proposed Methods
Experiment & Evaluation
Discussion
Kwon, Hyun, et al. "Friend-safe evasion attack: An adversarial example that is correctly recognized by a friendly classifier."
Computers & Security 78 (2018): 380-397.
31
Scheme 1. Goal
Proposed an evasion attack scheme that creates adversarial example
Incorrectly classified by enemy classifiers
Correctly recognized by friendly classifiers
Maintaining low distortion
Has two configuration
Targeted classes: original sample to be recognized as a specific class
Untargeted classes: misclassification to any class other than the right class
Analyze the difference of two configuration.
Difference in distortion between the targeted and untargeted scheme.
Difference among targeted digits.
32
Scheme 1. Proposed method
Given the 𝐷𝑓𝑟𝑖𝑒𝑛𝑑, 𝐷𝑒𝑛𝑒𝑚𝑦, and original input 𝑥 ∈ 𝑋,
the problem is an optimization problem that generates 𝑥∗
Targeted adversarial example
𝑥∗
: 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗
𝑠. 𝑡. 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 𝑥∗
= 𝑦 𝑎𝑛𝑑 𝐷𝑒𝑛𝑒𝑚𝑦 𝑥∗
= 𝑦∗
(𝑡𝑎𝑟𝑔𝑒𝑡𝑒𝑑 𝑐𝑙𝑎𝑠𝑠)
Untargeted adversarial example
𝑥∗
: 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗
𝑠. 𝑡. 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 𝑥∗
= 𝑦 𝑎𝑛𝑑 𝐷𝑒𝑛𝑒𝑚𝑦 𝑥∗
≠ 𝑦
Transformer
𝑫 𝒇𝒓𝒊𝒆𝒏𝒅
𝑫 𝒆𝒏𝒆𝒎𝒚
+Loss function
+Loss function
Original sample: x
Original class: y
𝒙∗
𝒙∗
<Proposed architecture>
33
Scheme 1. Proposed method
A friend-safe adversarial example generation by minimizing 𝑙𝑜𝑠𝑠 𝑇
𝑙𝑜𝑠𝑠 𝑇 = 𝑙𝑜𝑠𝑠 𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛 + 𝑙𝑜𝑠𝑠𝑓𝑟𝑖𝑒𝑛𝑑 + 𝑙𝑜𝑠𝑠 𝑒𝑛𝑒𝑚𝑦
𝑙𝑜𝑠𝑠 𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛 : the distortion of transformed example
𝑙𝑜𝑠𝑠 𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛 = 𝑥∗
−
tanh 𝑥
2 2
2
𝑙𝑜𝑠𝑠𝑓𝑟𝑖𝑒𝑛𝑑 : 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 𝑥∗ has a higher probability of predicting the original class
𝑙𝑜𝑠𝑠𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑔 𝑓
𝑥∗
𝑔 𝑓
𝑘 = max 𝑍 𝑘 𝑖 ∶ 𝑖 ≠ 𝑜𝑟𝑔 − 𝑍 𝑘 𝑜𝑟𝑔 , 𝑜𝑟𝑔 is the original class
𝑍 ㆍ is the probability of the class that is predicted by 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦
𝑙𝑜𝑠𝑠 𝑒𝑛𝑒𝑚𝑦 : 𝐷𝑒𝑛𝑒𝑚𝑦 𝑥∗ has a higher probability of predicting the other class
(Targeted) 𝑙𝑜𝑠𝑠𝑒𝑛𝑒𝑚𝑦 = 𝑔 𝑒 𝑡 𝑥∗
𝑔 𝑒 𝑡 𝑘 = max 𝑍 𝑘 𝑖 ∶ 𝑖 ≠ 𝑡 − 𝑍 𝑘 𝑡 , 𝑡 is the targeted class chosen by the attacker
(Untargeted) 𝑙𝑜𝑠𝑠𝑒𝑛𝑒𝑚𝑦 = 𝑔 𝑒 𝑢 𝑥∗
𝑔 𝑒 𝑢 𝑘 = 𝑍 𝑘 𝑜𝑟𝑔 − max 𝑍 𝑘 𝑖 ∶ 𝑖 ≠ 𝑜𝑟𝑔 − 𝑍
34
Scheme 1. Proposed method
35
Scheme 1. Experiment & Evaluation
Dataset: MNIST and CIFAR10
A collection of handwritten digit images (0-9)
A collection of color images (0-9)
plane, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks.
Language: python 2.3
Machine learning library: Tensorflow
Server: Xeon E5-2609 1.7 GHz
36
Scheme 1. Experiment & Evaluation
Experimental method
First, 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦 are pre-trained.
𝐷𝑓𝑟𝑖𝑒𝑛𝑑 : CNN, 𝐷𝑒𝑛𝑒𝑚𝑦 : Distillation
60,000 training sample (original MNIST)
10,000 test sample (original MNIST)
𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦: 99.25 and 99.12%
accuracy
Second, the transformer updates output x^*
and give it to 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦, from which
it then received feedback. (for # of iteration)
Adam is used as an optimizer to minimize the
𝑙𝑜𝑠𝑠 𝑇
Learning rate: 1𝑒−2
, initial constant: 1𝑒−3
<𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦 architecture>
<𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦 parameter>
37
Scheme 1. Experiment & Evaluation
Experimental result
Two section: targeted and untargeted adversarial example
Targeted adversarial example
39
Scheme 1. Experiment & Evaluation
40
Scheme 1. Experiment & Evaluation
<Targeted attack success rate, 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 accuracy, and average distortion>
<Images of the friend-safe adversarial example for the iteration>
41
Scheme 1. Experiment & Evaluation
Untargeted adversarial example
<Confusion matrix of 𝐷𝑒𝑛𝑒𝑚𝑦 for an untargeted class (400 iterations)>
42
Scheme 1. Experiment & Evaluation
<Untargeted attack success rate, 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 accuracy, and average distortion>
43
Scheme 1. Experiment & Evaluation
<Comparison between targeted and untargeted attacks when the success rate is 100%>
44
Scheme 1. Experiment & Evaluation
Targeted adversarial example
45
Scheme 1. Experiment & Evaluation
Untargeted adversarial example
46
Scheme 1. Experiment & Evaluation
<Comparison between targeted and untargeted attacks when the success rate is 100%>
47
Scheme 1. Discussion
If two models are exactly same, it is impossible to generate example.
If two model are very similar model, it is possible to generate example.
Same model, but different training set or different training sample order
Untargeted attacks required less distortion, and are ideal for when
targeting is unnecessary.
Covert channel scheme can be applied
The roles of the friend and enemy are reversed.
Targeted class is hidden information that is transferred via the covert channel.
48
Scheme 1. Discussion
Covert channel scheme can be applied.
it was a matter of ascertaining which of the nine objects (those other than the
visible object) was hidden.
50
Scheme 1. Discussion
Multi-targeted adversarial example can be applied.
Attacker makes multiple models recognize a single original image as
different classes
.Ex) Battlefield road signs
Adversary 1 Adversary 3
“U-turn” “Right”
Adversary 2
“Straight”
51
Scheme 1. Discussion
Multi-targeted Adversarial Example
Kwon, Hyun, et al. "Multi-Targeted Adversarial Example in Evasion Attack on Deep Neural Network." IEEE Access 6 (2018): 46084-
46096.
52
Scheme 1. Discussion
Multi-targeted Adversarial Example
53
Scheme 2
Random Untargeted Adversarial Example
Goal
Proposed Methods
Experiment & Evaluation
Discussion
Kwon, Hyun, et al. "Fooling a Neural Network in Military Environments: Random Untargeted Adversarial Example." MILCOM 2018-2018
IEEE Military Communications Conference (MILCOM). IEEE, 2018.
Kwon, Hyun, et al. "Random Untargeted Adversarial Example on Deep Neural Network." Symmetry 10.12 (2018): 738.
54
Scheme 2. Goal
Proposed random untargeted adversarial example
Using an arbitrary class in the generation process.
Keeping Higher attack success
Maintaining low distortion
Eliminated the pattern vulnerability.
Analyzing the confusion matrix.
55
Scheme 2. Proposed method
Given the target model 𝐷, original input 𝑥 ∈ 𝑋, and random class 𝑟,
the problem is an optimization problem that generates random
untargeted adversarial example 𝑥∗
𝑥∗: 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗ such that 𝑓 𝑥∗ = 𝑟 (not y)
𝐿(∙) is the distance between original sample 𝑥 and transformed example 𝑥∗
.
𝑓(∙) is the operation function of the target model 𝐷.
<Proposed architecture>
Transformer 𝑫
+Loss function
Original sample: 𝒙
Original class: 𝒚
Random class: 𝒓
𝒙∗
56
Scheme 2. Experiment & Evaluation
Experimental result
The confusion matrix of MNIST and CIFAR10 for the proposed example.
The wrong class are evenly distributed for each original class
There is no pattern vulnerability.
< Confusion matrix in MNIST > < Confusion matrix in CIFAR10 >
57
Scheme 2. Discussion
The proposed scheme can attack the enemy DNNs due to eliminating the
pattern vulnerability.
The proposed method is white-box access to the target model.
Distortion depends on the dataset dimension.
Distortion: sum of the square root of each pixel difference from original data.
MNIST is (28×28×1).pixel matrix and CIFAR10 is (32×32×3) pixel matrix.
The proposed scheme can be applied to various applications.
Road signs
Audio domain
Camouflage
58
Conclusion
Advanced attacks and their defenses have been being proposed
continuously.
Recently, interest in black-box attacks has increased.
In addition, detection methods of adversarial example have been studied.
Applications
CAPTCHA system
Face recognition system
Speech recognition system
59
Reference
Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "Optimal Cluster Expansion-Based Intrusion Tolerant
System to Prevent Denial of Service Attacks." Applied Sciences, 2017.11
Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "CAPTCHA Image Generation Systems Using Generative
Adversarial Networks.” IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017.10
Hyun Kwon, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, Daeseon Choi, “Advanced Ensemble Adversarial Example
on Unknown Deep Neural Network Classifiers”, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018.07
Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "Friend-safe Evasion Attack: an adversarial example that is
correctly recognized by a friendly classifier.” COMPUTERS & SECURITY, 2018.08
Hyun Kwon, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, Daeseon Choi, “Multi-targeted Adversarial Example in
Evasion Attack on Deep Neural Network"”, IEEE Access, 2018.08
Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "Random Untargeted Adversarial Example on Deep Neural
Network,” Symmetry, 2018.12
Hyun Kwon, Hyunsoo Yoon, Daeseon Choi. "Friend-Safe Adversarial Examples in an Evasion Attack on a Deep Neural
Network”, International Conference on Information Security and Cryptology (ICISC 2017: pp. 351-367). Springer
Cham, 2017.11
Hyun Kwon, Hyunsoo Yoon, Daeseon Choi. "POSTER: Zero-Day Evasion Attack Analysis on Race between Attack and
Defense”, The 13th ACM on Asia Conference on Computer and Communications Security (ASIA CCS '18), 2018.06
Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "One-Pixel Adversarial Example that is Safe for Friendly
Deep Neural Networks. ”, WISA 2018
Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "Fooling a Neural Network in Military Environments:
Random Untargeted Adversarial Example", Military Communications Conference 2018 (MILCOM 2018)
Hyun Kwon, Hyunsoo Yoon, Daeseon Choi, "Priority Adversarial Example in Evasion Attack on Multiple Deep Neural
Networks", International Conference on Artificial Intelligence in information and communication (ICAIIC 2019),
2019.02
60
Q & A

Research of adversarial example on a deep neural network

  • 1.
    KAIST Research of AdversarialExample on a Deep Neural Network 18 February 2019 Hyun Kwon
  • 2.
    2 Outline Introduction Related work Adversarial exampleattacks are divided into four categories Target model information, distance measure, recognition, and generating method Adversarial example defense Reactive defense Proactive defense Problem Statement Scheme 1 Scheme 2 Conclusion Reference
  • 3.
    3 Deep neural network(DNN) Effective performance in machine learning tasks Image recognition Speech recognition Intrusion detection Pattern analysis Introduction
  • 4.
    4 Threat to thesecurity of DNN Adversarial example Slightly modified data that lead to incorrect classification Introduction ㆍㆍㆍ ㆍㆍㆍ Pr(0) = 0.89 Pr(1) = 0.03 Pr(n-1) = 0.01 Pr(n) = 0.02 ㆍㆍㆍ Input layer Output layer : Node: Weight link
  • 5.
    5 Threat to thesecurity of DNN Adversarial example Slightly modified data that lead to incorrect classification Introduction ㆍㆍㆍ ㆍㆍㆍ Pr(0) = 0.02 Pr(1) = 0.03 Pr(n-1) = 0.01 Pr(n) = 0.84 ㆍㆍㆍ Input layer Output layer : Node: Weight link
  • 6.
    6 Introduction Adversarial example generation Calculate eachclass probability Choose targeted class Adjust 𝑤 value 𝑋 NoYes 𝑋∗ = 𝑋 + 𝑤 𝑓 𝑋 = 7 𝑓 𝑋∗ = 0 𝑓 𝑋 + 𝑤 = 0 Neural Network Ex) Targeted class “0”
  • 7.
    7 𝒙 ㆍ 𝒙∗ ㆍ𝑳(𝒙, 𝒙∗ ) 𝒙∗ ㆍ 𝒙∗ ㆍ Introduction Adversarial example𝑥∗ 𝒙∗ : 𝒂𝒓𝒈𝒎𝒊𝒏 𝒙∗ 𝑳 𝒙, 𝒙∗ 𝐬. 𝐭. 𝐟 𝐱∗ ≠ 𝒇(𝒙) 𝐿(ㆍ) : Distance between 𝒙 and 𝒙∗
  • 8.
    8 Introduction ML Security IssuesCategory Causative attacks influence learning with control over training data Ex) poisoning attack Exploratory attacks exploit misclassification but don’t affect training Ex) Adversarial example Poisoning attack Decreasing the recognition accuracy of the target model. Adding malicious training data on targeted model Assume It requires that the attacker access training data
  • 9.
    9 Introduction Adversarial example Causing misclassificationby adding a little of noise to the sample. Szedgedy et al. first presented the adversarial example Attacker transforms an image slightly, causing this adversarial example Advance attacks and their counter measures has been proposed
  • 10.
    10 Type of targetmodel information White box attack The attacker knows the detailed information about the target model Model architecture, parameters, and class probabilities Success rate of white box attack reaches almost 100% Black box attack The attacker dose not know the information about the target model Only query the target model The well-known black box attack are twofold Transferability Universal perturbation Substitute network
  • 11.
    11 Type of targetmodel information Transferability (black-box attack) An adversarial example modified for a single target model is effective for other model. Adversarial examples generated using ensemble-based approaches can successfully attack black box image classification. Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. ICLR, abs/1611.02770, 2017. Kwon, Hyun, et al. "Advanced ensemble adversarial example on unknown deep neural network classifiers." IEICE Transactions on Information and Systems 101.10 (2018): 2485-2500.
  • 12.
    12 Type of targetmodel information Universal perturbation Find a universal perturbation vector. 𝜂 𝑝 ≤ 𝜖 Ρ 𝑥∗ ≠ 𝑓 𝑥 ≥ 1 − 𝜎 𝜖 limits the size of universal perturbation 𝜎 controls the failure rate of all the adversarial examples. This loop continue until the most data sample are fooled (P < 1 − 𝜎 ) Seyed Mohsen Moosavi Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), number EPFL-CONF-226156, 2017.
  • 13.
    13 Type of targetmodel information Substitute network (black-box attack) The attacker can create a substitute network similar to the target model By repeating the query process. Once a substitute network is created, the attacker can perform a white box attack. Approximately 80% attack success for Amazon and Google services Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
  • 14.
    14 Type of distancemeasure There are three ways to measure the distortion 𝐿0, 𝐿2, 𝐿∞ 𝐿0 represents the sum of the number of all changed pixels ෍ 𝑖=0 𝑛 𝑥𝑖 − 𝑥𝑖 ∗ 𝐿2 represents standard Euclidean norm ෍ 𝑖=0 𝑛 (𝑥𝑖 − 𝑥𝑖 ∗ )2 𝐿∞ is the maximum distance value between 𝑥𝑖 and 𝑥𝑖 ∗ As the three distance measures become small, the similarity of the sample image increases.
  • 15.
    15 Type of targetrecognition There are two types: targeted attack and untargeted attack Targeted attack The target model to recognize the adversarial example as a particular intended class 𝑥∗: 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗ s. t. f x∗ = 𝑦∗ Untargeted attack The target model to recognize the adversarial example as a class other than the original class. 𝑥∗ : 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗ s. t. f x∗ ≠ 𝑦
  • 16.
    16 Methods of adversarialattack Fast-gradient sign method (FGSM) Take a step in the direction of the gradient of the loss function 𝑥∗ = 𝑥 + 𝜖 ∙ 𝑠𝑖𝑔𝑛(𝛻𝑙𝑜𝑠𝑠 𝐹,𝑡 𝑥 ) This is simple and good performance. Iterative FGSM (I-FGSM) Update version of the FGSM Instead of changing the amount of 𝜖 , a smaller amount of 𝛼 is used. Clipped by the same 𝜖 𝑥𝑖 ∗ = 𝑥𝑖−1 ∗ − 𝑐𝑙𝑖𝑝 𝜖(𝛼 ∙ 𝑠𝑖𝑔𝑛(𝛻𝑙𝑜𝑠𝑠 𝐹,𝑡 𝑥𝑖−1 ∗ ) Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. ICLR Workshop, 2017.
  • 17.
    17 Methods of adversarialattack Carlini-Wagner (CW) 7 objective function (𝑓1~𝑓7) + 3 distance attack (𝐿1, 𝐿2, 𝐿∞) Choosing constant 𝑐 (control weight of class) 𝑚𝑖𝑛𝑖𝑚𝑧𝑒 𝐷 𝑥, 𝑥 + 𝑤 + 𝑐 ∙ 𝑓(𝑥 + 𝑤) 𝑠. 𝑡. 𝑥 + 𝑤 ∈ 0,1 𝑛 Confidence 𝑘 (control distortion) 𝑓 𝑥∗ = 𝑚𝑎 𝑥 𝑚𝑎𝑥 𝑍 𝑥∗ 𝑖 ∶ 𝑖 ≠ 𝑡 − 𝑍 𝑥∗ 𝑡, −𝑘 It can applied to the image and audio domain. Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
  • 18.
    18 Methods of adversarialattack One pixel attack Only modifying one pixel to cause the misclassification Differential evolution (DE) to find the optimal solution. Current solution (father) Candidate solution (child) CIFAR10 with 70.97% success rate All convolution network (AllConv), Network in Network (NiN), VGG16 Su, Jiawei, Danilo Vasconcellos Vargas, and Kouichi Sakurai. "One pixel attack for fooling deep neural networks." IEEE Transactions on Evolutionary Computation (2019).
  • 19.
    19 Methods of defense Thedefense of adversarial examples have two types Reactive: detect the adversarial example Proactive: make deep neural networks more robust. Reactive defense Adversarial example detection Proactive defense Distillation method Adversarial training Filtering method Ensemble defense methods are available.
  • 20.
    20 Reactive defense Adversarial detecting Binarythreshold: last layer’s output as the features Distinguish distribution differences Confidence value, p-value Y.-C. Lin, M.-Y. Liu, M. Sun, and J.-B. Huang, “Detecting adversarial attacks on neural network policies with visual foresight,” arXiv preprint arXiv:1710.00814, 2017. T. Pang, C. Du, Y. Dong, and J. Zhu, “Towards robust detection of adversarial examples,” arXiv preprint arXiv:1706.00633, 2017.
  • 21.
    21 Proactive defense Distillation method Usingtwo neural network (detailed class probability) Ex) “1”, class: [0100000000]  1”, class:[0.02 0.91 … 0.02] Avoid calculating the gradient of the loss function. Training Data 𝑥 Training label y 0 1 0 0 Training Data 𝑥 Training label f(x) DNN 𝑓(𝑥) trained at temperature T DNN 𝑓 𝑑𝑖𝑠𝑡𝑖𝑙 (𝑥) trained at temperature T Probability Vector Predictions 𝑓(𝑥) Probability Vector Predictions 𝑓 𝑑𝑖𝑠𝑡𝑖𝑙 (𝑥) 0.02 0.92 0.04 0.02 0.02 0.92 0.04 0.02 Initial Network Distilled Network 0.03 0.93 0.01 0.03 Nicolas Papernot, Patrick McDaniel, XiWu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 582–597. IEEE, 2016.
  • 22.
    22 Proactive defense Adversarial training Originalexample + adversarial example training process Simple and effective Make sure that the accuracy of the original sample is not compromised. Ensemble adversarial training Using several neural networks It is more resistance to the adversarial example. Tramèr, Florian, et al. "Ensemble adversarial training: Attacks and defenses." arXiv preprint arXiv:1705.07204 (2017).
  • 23.
    23 Proactive defense Filtering method Eliminatingthe perturbation of the adversarial example Creating a filtering module requires time and process. Generator𝑿 𝒂𝒅𝒗 “7” (96.5%) training Success rate Targeted model Shen et al. "AE-GAN: adversarial eliminating with GAN." arXiv preprint arXiv:1707.05474 (2017).
  • 24.
    24 Ensemble defense method Magnetmethod There are two modules: Detector and Reformer Generality Detector Find the adversarial example by comparing the output of several original samples. Detecting adversarial examples with large distortion. However, if the distortion is small, the detection probability is lowered Multiple Detector configurations can be also combined Detector Reformer Classifier Dongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 135–147. ACM, 2017.
  • 25.
    25 Ensemble defense method Reformer Targetingadversarial example with small distortion. auto-encoder is used. Reform: it convert adversarial examples with output that most closely resembles the original sample.
  • 26.
    26 Ensemble defense method Ex)Magnet method Detector Reformer Classifier <Detector를 37개가 통과> <Reform에서는 37개 중에는 2개의 adversarial example이 성공>
  • 27.
    27 Ensemble defense method FeatureSqueezing By comparing the output result from three models Detecting adversarial examples Xu, Weilin, David Evans, and Yanjun Qi. "Feature squeezing: Detecting adversarial examples in deep neural networks." NDSS 2018.
  • 28.
    28 Scheme 1. ProblemDefinition In military domain, adversarial example is useful Deceive an enemy’s machine classifier Ex) Battlefield road signs Modified to deceive an adversary’s self-driving vehicle But, friendly self-driving vehicles should not be deceived Friendly self-driving vehicle Adversary’s self-driving vehicle “Left” “Right”
  • 29.
    29 Scheme 2. ProblemDefinition Untargeted adversarial example focus on certain classes for a original class It is easy to satisfy misclassification There is a pattern problem in generating untargeted adversarial example By analyzing the output classes, the defense determines the original class. <Confusion matrix in MNIST>
  • 30.
    30 Scheme 1 Friend-safe EvasionAttack: an adversarial example is correctly classified by a friendly classifier. Goal Proposed Methods Experiment & Evaluation Discussion Kwon, Hyun, et al. "Friend-safe evasion attack: An adversarial example that is correctly recognized by a friendly classifier." Computers & Security 78 (2018): 380-397.
  • 31.
    31 Scheme 1. Goal Proposedan evasion attack scheme that creates adversarial example Incorrectly classified by enemy classifiers Correctly recognized by friendly classifiers Maintaining low distortion Has two configuration Targeted classes: original sample to be recognized as a specific class Untargeted classes: misclassification to any class other than the right class Analyze the difference of two configuration. Difference in distortion between the targeted and untargeted scheme. Difference among targeted digits.
  • 32.
    32 Scheme 1. Proposedmethod Given the 𝐷𝑓𝑟𝑖𝑒𝑛𝑑, 𝐷𝑒𝑛𝑒𝑚𝑦, and original input 𝑥 ∈ 𝑋, the problem is an optimization problem that generates 𝑥∗ Targeted adversarial example 𝑥∗ : 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗ 𝑠. 𝑡. 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 𝑥∗ = 𝑦 𝑎𝑛𝑑 𝐷𝑒𝑛𝑒𝑚𝑦 𝑥∗ = 𝑦∗ (𝑡𝑎𝑟𝑔𝑒𝑡𝑒𝑑 𝑐𝑙𝑎𝑠𝑠) Untargeted adversarial example 𝑥∗ : 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗ 𝑠. 𝑡. 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 𝑥∗ = 𝑦 𝑎𝑛𝑑 𝐷𝑒𝑛𝑒𝑚𝑦 𝑥∗ ≠ 𝑦 Transformer 𝑫 𝒇𝒓𝒊𝒆𝒏𝒅 𝑫 𝒆𝒏𝒆𝒎𝒚 +Loss function +Loss function Original sample: x Original class: y 𝒙∗ 𝒙∗ <Proposed architecture>
  • 33.
    33 Scheme 1. Proposedmethod A friend-safe adversarial example generation by minimizing 𝑙𝑜𝑠𝑠 𝑇 𝑙𝑜𝑠𝑠 𝑇 = 𝑙𝑜𝑠𝑠 𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛 + 𝑙𝑜𝑠𝑠𝑓𝑟𝑖𝑒𝑛𝑑 + 𝑙𝑜𝑠𝑠 𝑒𝑛𝑒𝑚𝑦 𝑙𝑜𝑠𝑠 𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛 : the distortion of transformed example 𝑙𝑜𝑠𝑠 𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛 = 𝑥∗ − tanh 𝑥 2 2 2 𝑙𝑜𝑠𝑠𝑓𝑟𝑖𝑒𝑛𝑑 : 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 𝑥∗ has a higher probability of predicting the original class 𝑙𝑜𝑠𝑠𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑔 𝑓 𝑥∗ 𝑔 𝑓 𝑘 = max 𝑍 𝑘 𝑖 ∶ 𝑖 ≠ 𝑜𝑟𝑔 − 𝑍 𝑘 𝑜𝑟𝑔 , 𝑜𝑟𝑔 is the original class 𝑍 ㆍ is the probability of the class that is predicted by 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦 𝑙𝑜𝑠𝑠 𝑒𝑛𝑒𝑚𝑦 : 𝐷𝑒𝑛𝑒𝑚𝑦 𝑥∗ has a higher probability of predicting the other class (Targeted) 𝑙𝑜𝑠𝑠𝑒𝑛𝑒𝑚𝑦 = 𝑔 𝑒 𝑡 𝑥∗ 𝑔 𝑒 𝑡 𝑘 = max 𝑍 𝑘 𝑖 ∶ 𝑖 ≠ 𝑡 − 𝑍 𝑘 𝑡 , 𝑡 is the targeted class chosen by the attacker (Untargeted) 𝑙𝑜𝑠𝑠𝑒𝑛𝑒𝑚𝑦 = 𝑔 𝑒 𝑢 𝑥∗ 𝑔 𝑒 𝑢 𝑘 = 𝑍 𝑘 𝑜𝑟𝑔 − max 𝑍 𝑘 𝑖 ∶ 𝑖 ≠ 𝑜𝑟𝑔 − 𝑍
  • 34.
  • 35.
    35 Scheme 1. Experiment& Evaluation Dataset: MNIST and CIFAR10 A collection of handwritten digit images (0-9) A collection of color images (0-9) plane, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks. Language: python 2.3 Machine learning library: Tensorflow Server: Xeon E5-2609 1.7 GHz
  • 36.
    36 Scheme 1. Experiment& Evaluation Experimental method First, 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦 are pre-trained. 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 : CNN, 𝐷𝑒𝑛𝑒𝑚𝑦 : Distillation 60,000 training sample (original MNIST) 10,000 test sample (original MNIST) 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦: 99.25 and 99.12% accuracy Second, the transformer updates output x^* and give it to 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦, from which it then received feedback. (for # of iteration) Adam is used as an optimizer to minimize the 𝑙𝑜𝑠𝑠 𝑇 Learning rate: 1𝑒−2 , initial constant: 1𝑒−3 <𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦 architecture> <𝐷𝑓𝑟𝑖𝑒𝑛𝑑 and 𝐷𝑒𝑛𝑒𝑚𝑦 parameter>
  • 37.
    37 Scheme 1. Experiment& Evaluation Experimental result Two section: targeted and untargeted adversarial example Targeted adversarial example
  • 38.
  • 39.
    40 Scheme 1. Experiment& Evaluation <Targeted attack success rate, 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 accuracy, and average distortion> <Images of the friend-safe adversarial example for the iteration>
  • 40.
    41 Scheme 1. Experiment& Evaluation Untargeted adversarial example <Confusion matrix of 𝐷𝑒𝑛𝑒𝑚𝑦 for an untargeted class (400 iterations)>
  • 41.
    42 Scheme 1. Experiment& Evaluation <Untargeted attack success rate, 𝐷𝑓𝑟𝑖𝑒𝑛𝑑 accuracy, and average distortion>
  • 42.
    43 Scheme 1. Experiment& Evaluation <Comparison between targeted and untargeted attacks when the success rate is 100%>
  • 43.
    44 Scheme 1. Experiment& Evaluation Targeted adversarial example
  • 44.
    45 Scheme 1. Experiment& Evaluation Untargeted adversarial example
  • 45.
    46 Scheme 1. Experiment& Evaluation <Comparison between targeted and untargeted attacks when the success rate is 100%>
  • 46.
    47 Scheme 1. Discussion Iftwo models are exactly same, it is impossible to generate example. If two model are very similar model, it is possible to generate example. Same model, but different training set or different training sample order Untargeted attacks required less distortion, and are ideal for when targeting is unnecessary. Covert channel scheme can be applied The roles of the friend and enemy are reversed. Targeted class is hidden information that is transferred via the covert channel.
  • 47.
    48 Scheme 1. Discussion Covertchannel scheme can be applied. it was a matter of ascertaining which of the nine objects (those other than the visible object) was hidden.
  • 48.
    50 Scheme 1. Discussion Multi-targetedadversarial example can be applied. Attacker makes multiple models recognize a single original image as different classes .Ex) Battlefield road signs Adversary 1 Adversary 3 “U-turn” “Right” Adversary 2 “Straight”
  • 49.
    51 Scheme 1. Discussion Multi-targetedAdversarial Example Kwon, Hyun, et al. "Multi-Targeted Adversarial Example in Evasion Attack on Deep Neural Network." IEEE Access 6 (2018): 46084- 46096.
  • 50.
  • 51.
    53 Scheme 2 Random UntargetedAdversarial Example Goal Proposed Methods Experiment & Evaluation Discussion Kwon, Hyun, et al. "Fooling a Neural Network in Military Environments: Random Untargeted Adversarial Example." MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM). IEEE, 2018. Kwon, Hyun, et al. "Random Untargeted Adversarial Example on Deep Neural Network." Symmetry 10.12 (2018): 738.
  • 52.
    54 Scheme 2. Goal Proposedrandom untargeted adversarial example Using an arbitrary class in the generation process. Keeping Higher attack success Maintaining low distortion Eliminated the pattern vulnerability. Analyzing the confusion matrix.
  • 53.
    55 Scheme 2. Proposedmethod Given the target model 𝐷, original input 𝑥 ∈ 𝑋, and random class 𝑟, the problem is an optimization problem that generates random untargeted adversarial example 𝑥∗ 𝑥∗: 𝑎𝑟𝑔𝑚𝑖𝑛 𝑥∗ 𝐿 𝑥, 𝑥∗ such that 𝑓 𝑥∗ = 𝑟 (not y) 𝐿(∙) is the distance between original sample 𝑥 and transformed example 𝑥∗ . 𝑓(∙) is the operation function of the target model 𝐷. <Proposed architecture> Transformer 𝑫 +Loss function Original sample: 𝒙 Original class: 𝒚 Random class: 𝒓 𝒙∗
  • 54.
    56 Scheme 2. Experiment& Evaluation Experimental result The confusion matrix of MNIST and CIFAR10 for the proposed example. The wrong class are evenly distributed for each original class There is no pattern vulnerability. < Confusion matrix in MNIST > < Confusion matrix in CIFAR10 >
  • 55.
    57 Scheme 2. Discussion Theproposed scheme can attack the enemy DNNs due to eliminating the pattern vulnerability. The proposed method is white-box access to the target model. Distortion depends on the dataset dimension. Distortion: sum of the square root of each pixel difference from original data. MNIST is (28×28×1).pixel matrix and CIFAR10 is (32×32×3) pixel matrix. The proposed scheme can be applied to various applications. Road signs Audio domain Camouflage
  • 56.
    58 Conclusion Advanced attacks andtheir defenses have been being proposed continuously. Recently, interest in black-box attacks has increased. In addition, detection methods of adversarial example have been studied. Applications CAPTCHA system Face recognition system Speech recognition system
  • 57.
    59 Reference Hyun Kwon, YongchulKim, Hyunsoo Yoon, Daeseon Choi, "Optimal Cluster Expansion-Based Intrusion Tolerant System to Prevent Denial of Service Attacks." Applied Sciences, 2017.11 Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "CAPTCHA Image Generation Systems Using Generative Adversarial Networks.” IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017.10 Hyun Kwon, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, Daeseon Choi, “Advanced Ensemble Adversarial Example on Unknown Deep Neural Network Classifiers”, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018.07 Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "Friend-safe Evasion Attack: an adversarial example that is correctly recognized by a friendly classifier.” COMPUTERS & SECURITY, 2018.08 Hyun Kwon, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, Daeseon Choi, “Multi-targeted Adversarial Example in Evasion Attack on Deep Neural Network"”, IEEE Access, 2018.08 Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "Random Untargeted Adversarial Example on Deep Neural Network,” Symmetry, 2018.12 Hyun Kwon, Hyunsoo Yoon, Daeseon Choi. "Friend-Safe Adversarial Examples in an Evasion Attack on a Deep Neural Network”, International Conference on Information Security and Cryptology (ICISC 2017: pp. 351-367). Springer Cham, 2017.11 Hyun Kwon, Hyunsoo Yoon, Daeseon Choi. "POSTER: Zero-Day Evasion Attack Analysis on Race between Attack and Defense”, The 13th ACM on Asia Conference on Computer and Communications Security (ASIA CCS '18), 2018.06 Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "One-Pixel Adversarial Example that is Safe for Friendly Deep Neural Networks. ”, WISA 2018 Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, Daeseon Choi, "Fooling a Neural Network in Military Environments: Random Untargeted Adversarial Example", Military Communications Conference 2018 (MILCOM 2018) Hyun Kwon, Hyunsoo Yoon, Daeseon Choi, "Priority Adversarial Example in Evasion Attack on Multiple Deep Neural Networks", International Conference on Artificial Intelligence in information and communication (ICAIIC 2019), 2019.02
  • 58.