This document summarizes IBM's Adversarial Robustness Toolbox (ART), an open source library for defending deep learning models against adversarial attacks. ART includes methods for attacking models, such as the Fast Gradient Method, and defending them with approaches like adversarial training. It supports frameworks like TensorFlow and PyTorch. The document outlines the types of attacks and defenses in ART, provides an example use case in a Jupyter notebook, and notes that ART is used in IBM's Watson Studio platform. It concludes by listing some key references on adversarial machine learning.
2. Animesh Singh
2
STSM
Lead for IBM Watson and
Cloud Platform
Member of IBM Academy
of Technology
MS in Software Engineering from University
of Texas, Dallas
@AnimeshSingh
Svetlana Levitan
Developer Advocate with IBM CODAIT
Software Engineer for SPSS analytic components
(2000-2018)
IBM Representative to the Data Mining Group
PhD in Applied Math and MS in CS from University
of Maryland, College Park
Originally from Moscow, Russia
@SvetaLevitan
7. Adversarial machine learning
7
Very active area of research since ~2013
Evasion attack: a very small change in input to cause misclassification
- White-box attack or black-box attack (may use a surrogate model and transferability)
Adversarial defence: model hardening and runtime detection of adversarial inputs
- Model hardening: augment training data with adversarial examples or preprocess inputs
Poisoning attacks – manipulated training data
8. IBM Adversarial Robustness Toolbox
8
https://github.com/IBM/adversarial-robustness-toolbox
https://ibm.biz/Bd2fd8
Includes many attack and defense methods and detection
methods of adversarial samples or poisoning
Developed by IBM Research group led by
Irina Nicolae and Mathiue Sinn (Ireland)
9. Types of adversarial attacks in latest version (0.4.0)
9
DeepFool (Moosavi-Dezfooli et al., 2015)
Fast Gradient Method (Goodfellow et al., 2014)
Basic Iterative Method (Kurakin et al., 2016)
Projected Gradient Descent (Madry et al., 2017)
Jacobian Saliency Map (Papernot et al., 2016)
Universal Perturbation (Moosavi-Dezfooli et al., 2016)
Virtual Adversarial Method (Miyato et al., 2015)
C&W Attack (Carlini and Wagner, 2016)
NewtonFool (Jang et al., 2017)
10. Types of defense methods in ART
10
Feature squeezing (Xu et al., 2017)
Spatial smoothing (Xu et al., 2017)
Label smoothing (Warde-Farley and Goodfellow, 2016)
Adversarial training (Szegedy et al., 2013)
Virtual adversarial training (Miyato et al., 2015)
Gaussian data augmentation (Zantedeschi et al., 2017)
Thermometer encoding (Buckman et al., 2018)
Total variance minimization (Guo et al., 2018)
JPEG compression (Dziugaite et al., 2016)
11. Poisoning detection
• Detection based on
clustering activations
• Proof of attack strategy
Evasion detection
• Detector based on
inputs
• Detector based on
activations
Robustness metrics
• CLEVER
• Empirical robustness
• Loss sensitivity
Unified model API
• Training
• Prediction
• Access to loss and
prediction gradients
Evasion defenses
• Feature squeezing
• Spatial smoothing
• Label smoothing
• Adversarial training
• Virtual adversarial
training
• Thermometer encoding
• Gaussian data
augmentation
Evasion attacks
• FGSM
• JSMA
• BIM
• PGD
• Carlini & Wagner
• DeepFool
• NewtonFool
• Universal perturbation
11
Implementation for state-of-the-art methods for attacking and defending
classifiers.
12. Jupyter notebook with an example
12
https://nbviewer.jupyter.org/github/IBM/adversarial-robustness-toolbox/
blob/master/notebooks/attack_defense_imagenet.ipynb
24. 24
Watson Studio (formerly Data Science Experience)
ART is used in
Watson Studio
along with a lot of
other open source
modules
(C) 2019 IBM Corp
25. Conclusions
25
Adversarial attacks present a serious threat
ART is an open source library of tools for protection from such attacks
Works with TensorFlow, Keras, PyTorch, and MXNet
Developed by IBM Research
Ireland: Irina Nicolae,
Mathieu Sinn
Current version 0.4.0
Inspired by brain
Perceptron can’t do XOR, only linearly separable problem (1969 Marvin Minsky and Seymour Papert)
A multi-layer perceptron with nonlinear activation function can approximate any function ( Hornick at al., 1989).
Backpropagation works, but could be very slow for large networks.
Recently deep networks became practical, thanks to hardware and algorithms progress.
Convolutional networks are based on how retina works, the picture is from Wikipedia
Modern networks include convolutional, pooling, ReLu layers.
Here we will consider image recognition models, but adversarial attacks happen on other models too, e.g. speech recognition