The document presents insights into machine learning's intersection with security, highlighting malware detection, privacy of training data, and various attack vectors such as adversarial examples and model theft. It outlines the significance of differential privacy, training set poisoning, and the challenges faced by current defenses based on unrealistic threat models. The presentation also emphasizes the need for robust methodologies when deploying machine learning models, particularly those that handle sensitive data.
Session introduces Ian Goodfellow discussing machine learning security and privacy, outlining concepts, and emphasizing the collaborative nature of the work.
Describes components of machine learning pipeline, including training data, algorithms, learned parameters, testing inputs, and outputs.
Focus on training data privacy, differential privacy definitions, and implications of training set poisoning.
Discusses adversarial examples, model theft, and transfer attacks, showing the vulnerabilities in models that allow malicious exploitation.
Explores cross-model transfer, the implications of transfer attacks on human cognition, and their relevance in the physical world.
Contrasts adversarial training with certified defenses, discussing their effectiveness and assumptions of threat models.
Introduces the Clever Hans effect in algorithms and encourages engagement and involvement in the field.
Highlights the importance of considering data sensitivity, adversarial potential, and the practical limitations of current defenses.
(Goodfellow 2018)
#RSAC
An overviewof a field
4
This presentation summarizes the work of many people, not just
my own / my collaborators
Download the slides for this link to extensive references
The presentation focuses on the concepts, not the history or the
inventors
(Goodfellow 2018)
#RSAC
Deep Diveon Adversarial Examples
14
...solving CAPTCHAS and
reading addresses...
...recognizing objects
and faces….
(Szegedy et al, 2014)
(Goodfellow et al, 2013)
(Taigmen et al, 2013)
(Goodfellow et al, 2013)
and other tasks...
Since 2013, deep neural networks have matched
human performance at...
(Goodfellow 2018)
#RSAC
Transfer attack
21
Trainyour
own model
Target model with
unknown weights,
machine learning
algorithm, training
set; maybe non-
differentiable
Substitute model
mimicking target
model with known,
differentiable function
Adversarial
examples
Adversarial crafting
against substitute
Deploy adversarial
examples against the
target; transferability
property results in them
succeeding
(Goodfellow 2018)
#RSAC
Adversarial Trainingvs Certified Defenses
26
Adversarial Training:
Train on adversarial examples
This minimizes a lower bound on the true worst-case error
Achieves a high amount of (empirically tested) robustness on small to
medium datasets
Certified defenses
Minimize an upper bound on true worst-case error
Robustness is guaranteed, but amount of robustness is small
Verification of models that weren’t trained to be easy to verify is hard
27.
(Goodfellow 2018)
#RSAC
Limitations ofdefenses
27
Even certified
defenses so far
assume unrealistic
threat model
Typical model:
attacker can change
input within some
norm ball
Real attacks will be
stranger, hard to
characterize ahead of
time (Brown et al., 2017)
(Goodfellow 2018)
#RSAC
Apply WhatYou Have Learned
30
Publishing an ML model or a prediction API?
Is the training data sensitive? -> train with differential privacy
Consider how an attacker could cause damage by fooling your
model
Current defenses are not practical
Rely on situations with no incentive to cause harm / limited amount of
potential harm