Nicolas Papernot
Pennsylvania State University & Google Brain
Lecture for Prof. Trent J aeger’s CSE 543 Computer Security
Class
November 2017 - Penn
(Penn State)
(Google Brain)
Martín
Abadi
(Google
Brain)
Somesh (U of
Thank you to my collaborators
2
Alexey
Kurakin
(Google
Brain)
Xi Wu (Google
3
Machine Learning
Classifier
0.01 0.84 0.02 0.01 0.03 0.01
p(0|x,θ) p(1|x,θ) p(2|x,θ) p(7|x,θ) p(8|x,θ) p(9|x,θ)
Machine
Learning
Classifier
cost/loss function (~model error)
4
Outline of this lecture
1
2
5
Part I
Security in machine learning
6
bad even if an attacker needs to know details of the
machine learning model to do an attack --- aka a white-box attacker
worse if attacker who knows very little (e.g. only
gets to ask a few questions) can do an attack --- aka a black-box attacker
ML
Attack Models
7
Attack Models
bad even if an attacker needs to know details of the
machine learning model to do an attack --- aka a white-box attacker
worse if attacker who knows very little (e.g. only
gets to ask a few questions) can do an attack --- aka a black-box attacker
ML
8
Adversari
al
examples
(white-box
attacks)
9
Jacobian-based Saliency Map Approach (J SMA)
ations of Deep Learning in Adversarial Settings
10
Papernot et al. The Limit
J acobian-Based Iterative Approach: source-target misclassification
11
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Evading a Neural Network Malware Classifier
P[X=Benign]=
0.10
P[X*=Benign]= 0.90
12
Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification
Supervised vs. reinforcement learning
13
Supervised learning Reinforcement learning
Model inputs Observation
(e.g., traffic sign, music, email)
Environment & Reward function
Model outputs Class
(e.g., stop/yield, jazz/classical,
spam/legitimate)
Action
Training “goal”
(i.e., cost/loss)
Minimize class prediction error
over pairs of (inputs, outputs)
Maximize reward
by exploring the environment and
taking actions
Example
Adversarial attacks on neural network policies
14
Huang et al. Adversarial Attacks on Neural Network Policies
Adversari
al
examples
(black-box
attacks)
15
Threat model of a black-box attack
16
Our approach to black-box attacks
Alleviate lack of knowledge
about model
17
Alleviate lack of
training data
Adversarial example transferability
These property comes in
several v
● Intra-technique
transferability:
ari
a
nts
:
○ Cross model transferability
○ Cross training set
transferability
● Cross-technique
transferability
ML A
18
Szegedy et al. Intriguing properties of neural networks
These property comes in
several v
● Intra-technique
transferability:
ari
a
nts
:
○ Cross model transferability
○ Cross training set
transferability
● Cross-technique
transferabi
lity
ML A
ML B
Victim
Adversarial example transferability
19
Szegedy et al. Intriguing properties of neural networks
20
Adversarial example transferability
Cross-technique transferability
21
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial
Cross-technique transferability
22
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial
Our approach to black-box attacks
23
Alleviate lack of
training data
Alleviate lack of knowledge
about model
Attacking remotely hosted black-box models
Remote
ML sys
24
Remote
ML sys
Attacking remotely hosted black-box models
25
2
Remote
ML sys
Attacking remotely hosted black-box models
6
Remote
ML sys
Attacking remotely hosted black-box models
27
Our approach to black-box attacks
28
Alleviate lack of knowledge
about model
Alleviate lack of
training data
Results on real-world remote systems
29
[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Remote Platform ML technique Number of queries Adversarial
examples
misclassified
(after querying)
Deep Learning 6,400 84.24%
Logistic Regression 800 96.19%
Unknown 2,000 97.72%
Benchmarkin
g progress in
th
e adversarial
ML
community
30
31
Growing community
1.3K+
stars
340+ forks
40+ contributors
32
33
34
Adversarial examples are a tangible
instance of hypothetical AI safety
problems
Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
Part II
Privacy in machine
learning
35
Types of adversaries and our threat model
In our work, the threat model assumes:
- Adversary can make a potentially unbounded number of
queries
- Adversary has access to model internals
Model querying (black-box
adversary)
Black-box
ML
36
A definition of privacy
37
Our design goals
38
The PATE
approach
39
Teacher ensemble
Sensitiv
e Data
40
Partition 1 Teacher 1
Partition 2 Teacher 2
Partition n Teacher n
Partition 3 Teacher 3
Aggregation
41
Intuitive privacy analysis
42
Noisy aggregation
43
Teacher ensemble
Partition
3
Teacher
3
Aggregate
d
Teacher
Sensitiv
e Data
44
Partition 1 Teacher 1
Partition 2 Teacher 2
Partition n Teacher n
Student training
Partition
3
Teacher
3
Aggregate
d
Teacher
Sensitiv
e Data
45
Partition 1 Teacher 1
Partition 2 Teacher 2
Partition n Teacher n
Student
Public
Data
Why train an additional “student”model?
1
2
46
Student training
Partition
3
Teacher
3
Aggregate
d
Teacher
Sensitiv
e Data
47
Partition 1 Teacher 1
Partition 2 Teacher 2
Partition n Teacher n
Student
Public
Data
Deployment
48
Student
Differential privacy analysis
49
Experiment
al results
50
Experimental setup
51
Dataset Teacher Model
Student
Model
MNIST Convolutional Neural Network Generative Adversarial Networks
SVHN Convolutional Neural Network Generative Adversarial Networks
UCI Adult Random Forest Random Forest
UCI Diabetes Random Forest Random Forest
/ /models/tree/master/differential_privacy/multiple_teacher
s
Aggregated teacher accuracy
52
Trade-off between student accuracy and privacy
53
Trade-off between student accuracy and privacy
54
UCI Diabetes
᷑ 1.44
ᶖ 10-5
Non-private
baseline
93.81%
Student
accuracy
93.94%
Synergy between privacy and generalization
55
www.papernot.fr
@NicolasPapernot 56
57
Gradient masking
58
Gradient masking
59

slides_security_and_privacy_in_machine_learning.pptx