Adversarial ml

Adversarial Machine Learning
Junfei Wang
Supervisor: Pirathayini Srikantha
Graduated form UWO in April,2020
Now a Phd student at York University

Outline
Presentation Title Here
1. Introduction
2. Connections Between Standard ML and Adversarial
ML
3. Attack Algorithm
4. Defense Mechanism

Introduction
Self-driving car: physical change on traffic sign may cause misclassifying.
ASR system: https://adversarial-attacks.net/
…….
 Small change cause huge difference on output
 Not really a noise
 Use case:

Introduction
 In 2014, the phenomenen is discovered in [1]
 Definition: legitimate inputs altered by adding small, often imperceptible,
perturbations to force a learned classier to misclassify the resulting
adversarial inputs, while remaining correctly classified by a human observer.
 The perturbation can be physical. Example of traffic sign.
[1]C. Szegedy, et al. Intriguing properties of neural networks. In Proceedings of the International
Conference on Learning Representations, 2014.

Recap of Machine Learning Training
Process (1)
Given inputs and labels, keep updating weights of the model to fit
them

Recap of Machine Learning Training
Process (1)
Given the model, we change input to travel across the boundary

Recap of Machine Learning Process (2)
Loss
w

Recap of Machine Learning Process (2)
Loss
X

White-box Adversarial Attack (1)
 Perspective 1: Given the model and the original label, we keep updating
input so as to change the output label.
 Perspective 2: Instead travel downhill the loss curve, we can do gradient
ascent to increase the loss.
 Perspective 3: For any input, it can be perturbed, and fool the target
model, but it may not be stealthy enough.
 So, a successful adversarial attack should evade the detection

Detector and Stealthiness
 Detector:
a. Image & audio attack: human observer
b. Fraud transaction (time series): Anomaly detection mechanism
c. Can be defense mechanism toward adversarial attack
 How to make stealthy attack?
Impossible to build model for detector, restrict the norm of perturbation.
a. L0 Norm: number of dimensions can be perturbed
b. L2: Euclidean Distance
c. L-∞: maximum change among all dimensions

Threat Model
 Assumptions of attack’s knowledge:

White-box Attack Algorithm(1)
1. Projected Gradient Descent(PGD)[2]:Using Gradient Descent with L-∞ constraint.
[2] Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial
attacks."

2. Fast Gradient Sign Method (FGSM)[3]:
Rely on the first order-derivative, using sgn function to avoid too small gradient.
http://jlin.xyz/advis/
[3] Ian J Goodfellow, et al. Explaining and harnessing adversarial examples. In
Proceedings of the International Conference on Learning Representations, 2015.

3. Jacobian Saliency Map Attack(JSMA)[4]:
• JSMA: Iteratively modify the most sensitive pixel (dimension)
• Jacobian Saliency Map: sensitivity function
• A 70 km/h speed limit sign misclassified as a 30km/h speed limit sign.
[4]Papernot, Nicolas, et al. "Practical black-box attacks against machine learning." Proceedings of
the 2017 ACM on Asia conference on computer and communications security. 2017.

4. AdvGAN[5]:
[5]Xiao, Chaowei, et al. "Generating adversarial examples with adversarial networks." arXiv
preprint arXiv:1801.02610 (2018).

Black-box Attack Strategy
• Black-box: More practical but harder

Black-box Attack Strategy
 Train substitute model: a local representative model built based on
strategically querying the targeted model
 Transferability: Adversarial examples generated by model A can also fool
model B
 Jacobian-based Dataset Augmentation: on substitute F, identifying
directions in which the model's output is varying
[6]Papernot, Nicolas, et al. "Practical black-box attacks against machine learning." Proceedings of
the 2017 ACM on Asia conference on computer and communications security. 2017.

Defense Mechanism
 Unavoidable: a little bit pessimistic
 Robustness: is the price that attackers have to pay
 Defense mechanisms:
a. Detection-based defense
b. Adversarial training: training set augmentation, reducing sensitivity of the
model
c. Input data sanitization: denoise the input data, mapping back to learned
manifold

MagNet: AE-based Defence
Detector + Reformer
[7]Meng, Dongyu, and Hao Chen. "Magnet: a two-pronged defense against adversarial examples."
Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2017.

DefenseGAN
 Training Stage: standard GAN training
 Inference Stage:
[8]Samangouei, Pouya, Maya Kabkab, and Rama Chellappa. "Defense-gan: Protecting classifiers
against adversarial attacks using generative models." arXiv preprint arXiv:1805.06605 (2018).

Adversarial ml

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Adversarial ml

Similar to Adversarial ml (20)

Recently uploaded

Recently uploaded (20)

Adversarial ml