SlideShare a Scribd company logo
1 of 22
Download to read offline
Detecting adversarial example attacks to
deep neural networks
CBMI, 19-21 June 2017, Florence, Italy 1
Fabio Carrara, Fabrizio Falchi, Roberto Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli
Outline
● Introduction
○ Adversarial examples for deep neural network image classifiers
○ Risk of attacks to vision systems
● Our defense strategy
○ Detecting adversarial examples
○ CBIR approach
● Evaluation
2
Deep Neural Networks as Image Classifiers
● DNNs as image classifiers for several vision tasks
○ image annotation, face recognition, etc.
● More and more in sensible applications (safety- or security-related)
○ content filtering (spam, porn, violence, terrorist propaganda images, etc.)
○ malware detection
○ self-driving cars
CONV1
RELU1
POOL1
CONV2
RELU2
POOL2
CONV3
RELU3
CONV4
RELU4
CONV5
RELU5
POOL5
FC6
RELU6
FC7
RELU7
FC8
It’s a stop sign.
I’m pretty sure.
Image Classifier
(Deep Neural Network)
3
Adversarial images
● DNN image classifiers are vulnerable to adversarial images
○ malicious images crafted adding a small but intentional (not random!)
perturbation
○ adversarial images fool DNNs to predict a wrong class with high confidence
○ imperceptible to human eye, like an optical illusion for the DNN
○ efficient algorithms to find them
Image Classifier
(DNN) It’s a
roundabout
sign!
No doubt.
+ =
Original Image Adv. Perturbation
(5x amplified for visualization)
Adversarial Image
Adversary
4
Risk of attacks to DNNs
5
● Attacks are possible:
○ if you have the model [1,2]
[1] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199(2013).
[2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial
examples." arXiv preprint arXiv:1412.6572 (2014).
● Bypass filters
○ E.g: NSFW images
https://github.com/yahoo/open_nsfw
Risk of attacks to DNNs
6
● Attacks are possible:
○ if you have the model [1,2]
○ if you have access to input and output only! [3]
[1] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199(2013).
[2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
[3] Papernot, Nicolas, et al. "Practical black-box attacks against deep learning systems using adversarial examples." arXiv preprint arXiv:1602.02697 (2016).
84.24% 88.94% 96.19%Error Rate:
Risk of attacks to DNNs
7
● Attacks are possible:
○ if you have the model [1,2]
○ if you have access to input and output only! [3]
○ in the physical world (printout adversarial images) [4]
● Safety-related issues
○ E.g: self-driving car crash
[1] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199(2013).
[2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial
examples." arXiv preprint arXiv:1412.6572 (2014).
[3] Papernot, Nicolas, et al. "Practical black-box attacks against deep learning systems using adversarial examples." arXiv preprint arXiv:1602.02697 (2016).
[4] Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. "Adversarial examples in the physical world." arXiv preprint arXiv:1607.02533 (2016).
How to defend from adversarial attacks?
● Make more robust classifiers
○ E.g: include adversarial images in the training phase
○ “Every law has a loophole” → Every model has its adversarial images it is vulnerable to
● Detect adversarial inputs
○ Understand when the network is talking nonsense
8
Our Adversarial Detection Approach (I)
9
input image
Our Adversarial Detection Approach (I)
DNN says:
Stop Sign
10
Image
Classifier
(DNN)
Our Adversarial Detection Approach (I)
DNN says:
Stop Sign
11
Image
Classifier
(DNN)
LABELLED
IMAGES
search for
similar images
most similar images retrieved
LABELLED
IMAGES
(TRAIN SET)
Our Adversarial Detection Approach (I)
DNN says:
Stop Sign
12
✓Image
Classifier
(DNN)
LABELLED
IMAGES
search for
similar images
most similar images retrieved
LABELLED
IMAGES
(TRAIN SET)
Our Adversarial Detection Approach (II)
DNN says:
Stop Sign
13
Image
Classifier
(DNN)
LABELLED
IMAGES
(TRAIN SET)
search for
similar images
most similar images retrieved
✗
Deep Features as Similarity Measure
● Reuse the intermediate output of the network (deep features)
○ intermediate representation of visual aspects of the image
○ we can use euclidean distance among deep features to evaluate visual similarity
CONV1
RELU1
POOL1
CONV2
RELU2
POOL2
CONV3
RELU3
CONV4
RELU4
CONV5
RELU5
POOL5
FC6
RELU6
FC7
RELU7
FC8
0.2, 1.5, 5.4, …, 1.0, 0.0, 8.3
1.6, 4.3, 0.1, …, 0.2, 7.0, 4.9
0.3, 1.9, 5.1, …, 0.0, 0.1, 6.3
14
kNN scoring
● Use k-Nearest Neighbors score to evaluate the goodness of the classification
● Score is assigned looking at the classes of the k neighbors
● Neighbor distance is important
● Threshold on score to accept or reject classification
15
= 1 if i-th neighbor label == predicted label
= 0 otherwise
Evaluation (I)
● OverFeat classifier on ImageNet, pool5 layer (1024-D) as deep feature
● set of images to be classified generated from ImageNet validation set
(a) 1000 authentic images correctly classified by the DCNN (1 per calss, randomly chosen)
(b) ~2000 adversarial images generated from (a) with L-BFGS and FGS generation algorithms
(c) 1000 authentic images incorrectly classified by the DCNN (errors, 1 per class, randomly chosen)
● model evaluated as a binary classifier (authentic / spurious) 16
kNN ScoringOverFeat DCNN pretrained on ImageNet
CONV1
RELU1
POOL1
CONV2
RELU2
POOL2
CONV3
RELU3
CONV4
RELU4
CONV5
RELU5
POOL5
FC6
RELU6
FC7
RELU7
FC8
ImageNet
TRAIN SET
predicted
class Accept
or
Reject
deep feature
threshold
Evaluation (II)
17
ー % of authentic images correctly retained
ー % of FGS adversarials correctly discarded
ー % of L-BFGS adversarials correctly
discarded
ー % of wrong classifications of authentic
images correctly discarded
Threshold
Evaluation (II)
● with low threshold you can:
[1] Filter out 50% of adversarial images
(and 10% of errors) while retaining
almost all the authentic images
[2] Filter out 80% of adversarial images
(and 30% of errors) while retaining 90%
the authentic images
● filter aggressiveness can be
adjusted
[1] [2]
18
Threshold
● Examples of content that might
be filtered
● Our approach successfully
identifies adversarial images,
assigning them low scores
19
Evaluation (IV) -
Good Detections
http://deepfeatures.org/adversarials/
● The most difficult adversarial
images to detect (the ones
having highest kNN scores)
● Note the visual similarity and
common aspects among the
classes
20
Evaluation (IV) -
Bad Detections
http://deepfeatures.org/adversarials/
Conclusions
● We presented an approach to cope with adversarial images
○ with a satisfactory level of accuracy
○ without changing the model (no retrain)
○ without using additional data
● Future Works
○ test more network architectures
○ test more generation algorithms for adversarial images
○ compare with other defense methodologies
21
Thanks for your attention!
Questions ?
http://deepfeatures.org/adversarials/
Fabio Carrara <fabio.carrara@isti.cnr.it>
22

More Related Content

Similar to Detecting adversarials examples attacks to deep neural networks

NMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptxNMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptx
LEGENDARYTECHNICAL
 
NMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptxNMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptx
LEGENDARYTECHNICAL
 

Similar to Detecting adversarials examples attacks to deep neural networks (20)

NMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptxNMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptx
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Deep Generative Modelling
Deep Generative ModellingDeep Generative Modelling
Deep Generative Modelling
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
NMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptxNMO IE-2 Activity Presentation.pptx
NMO IE-2 Activity Presentation.pptx
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 
Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)
 
lec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdflec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdf
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Robustness of compressed CNNs
Robustness of compressed CNNsRobustness of compressed CNNs
Robustness of compressed CNNs
 
GAN in medical imaging
GAN in medical imagingGAN in medical imaging
GAN in medical imaging
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
Adversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfAdversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdf
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

Detecting adversarials examples attacks to deep neural networks

  • 1. Detecting adversarial example attacks to deep neural networks CBMI, 19-21 June 2017, Florence, Italy 1 Fabio Carrara, Fabrizio Falchi, Roberto Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli
  • 2. Outline ● Introduction ○ Adversarial examples for deep neural network image classifiers ○ Risk of attacks to vision systems ● Our defense strategy ○ Detecting adversarial examples ○ CBIR approach ● Evaluation 2
  • 3. Deep Neural Networks as Image Classifiers ● DNNs as image classifiers for several vision tasks ○ image annotation, face recognition, etc. ● More and more in sensible applications (safety- or security-related) ○ content filtering (spam, porn, violence, terrorist propaganda images, etc.) ○ malware detection ○ self-driving cars CONV1 RELU1 POOL1 CONV2 RELU2 POOL2 CONV3 RELU3 CONV4 RELU4 CONV5 RELU5 POOL5 FC6 RELU6 FC7 RELU7 FC8 It’s a stop sign. I’m pretty sure. Image Classifier (Deep Neural Network) 3
  • 4. Adversarial images ● DNN image classifiers are vulnerable to adversarial images ○ malicious images crafted adding a small but intentional (not random!) perturbation ○ adversarial images fool DNNs to predict a wrong class with high confidence ○ imperceptible to human eye, like an optical illusion for the DNN ○ efficient algorithms to find them Image Classifier (DNN) It’s a roundabout sign! No doubt. + = Original Image Adv. Perturbation (5x amplified for visualization) Adversarial Image Adversary 4
  • 5. Risk of attacks to DNNs 5 ● Attacks are possible: ○ if you have the model [1,2] [1] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199(2013). [2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). ● Bypass filters ○ E.g: NSFW images https://github.com/yahoo/open_nsfw
  • 6. Risk of attacks to DNNs 6 ● Attacks are possible: ○ if you have the model [1,2] ○ if you have access to input and output only! [3] [1] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199(2013). [2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). [3] Papernot, Nicolas, et al. "Practical black-box attacks against deep learning systems using adversarial examples." arXiv preprint arXiv:1602.02697 (2016). 84.24% 88.94% 96.19%Error Rate:
  • 7. Risk of attacks to DNNs 7 ● Attacks are possible: ○ if you have the model [1,2] ○ if you have access to input and output only! [3] ○ in the physical world (printout adversarial images) [4] ● Safety-related issues ○ E.g: self-driving car crash [1] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199(2013). [2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). [3] Papernot, Nicolas, et al. "Practical black-box attacks against deep learning systems using adversarial examples." arXiv preprint arXiv:1602.02697 (2016). [4] Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. "Adversarial examples in the physical world." arXiv preprint arXiv:1607.02533 (2016).
  • 8. How to defend from adversarial attacks? ● Make more robust classifiers ○ E.g: include adversarial images in the training phase ○ “Every law has a loophole” → Every model has its adversarial images it is vulnerable to ● Detect adversarial inputs ○ Understand when the network is talking nonsense 8
  • 9. Our Adversarial Detection Approach (I) 9 input image
  • 10. Our Adversarial Detection Approach (I) DNN says: Stop Sign 10 Image Classifier (DNN)
  • 11. Our Adversarial Detection Approach (I) DNN says: Stop Sign 11 Image Classifier (DNN) LABELLED IMAGES search for similar images most similar images retrieved LABELLED IMAGES (TRAIN SET)
  • 12. Our Adversarial Detection Approach (I) DNN says: Stop Sign 12 ✓Image Classifier (DNN) LABELLED IMAGES search for similar images most similar images retrieved LABELLED IMAGES (TRAIN SET)
  • 13. Our Adversarial Detection Approach (II) DNN says: Stop Sign 13 Image Classifier (DNN) LABELLED IMAGES (TRAIN SET) search for similar images most similar images retrieved ✗
  • 14. Deep Features as Similarity Measure ● Reuse the intermediate output of the network (deep features) ○ intermediate representation of visual aspects of the image ○ we can use euclidean distance among deep features to evaluate visual similarity CONV1 RELU1 POOL1 CONV2 RELU2 POOL2 CONV3 RELU3 CONV4 RELU4 CONV5 RELU5 POOL5 FC6 RELU6 FC7 RELU7 FC8 0.2, 1.5, 5.4, …, 1.0, 0.0, 8.3 1.6, 4.3, 0.1, …, 0.2, 7.0, 4.9 0.3, 1.9, 5.1, …, 0.0, 0.1, 6.3 14
  • 15. kNN scoring ● Use k-Nearest Neighbors score to evaluate the goodness of the classification ● Score is assigned looking at the classes of the k neighbors ● Neighbor distance is important ● Threshold on score to accept or reject classification 15 = 1 if i-th neighbor label == predicted label = 0 otherwise
  • 16. Evaluation (I) ● OverFeat classifier on ImageNet, pool5 layer (1024-D) as deep feature ● set of images to be classified generated from ImageNet validation set (a) 1000 authentic images correctly classified by the DCNN (1 per calss, randomly chosen) (b) ~2000 adversarial images generated from (a) with L-BFGS and FGS generation algorithms (c) 1000 authentic images incorrectly classified by the DCNN (errors, 1 per class, randomly chosen) ● model evaluated as a binary classifier (authentic / spurious) 16 kNN ScoringOverFeat DCNN pretrained on ImageNet CONV1 RELU1 POOL1 CONV2 RELU2 POOL2 CONV3 RELU3 CONV4 RELU4 CONV5 RELU5 POOL5 FC6 RELU6 FC7 RELU7 FC8 ImageNet TRAIN SET predicted class Accept or Reject deep feature threshold
  • 17. Evaluation (II) 17 ー % of authentic images correctly retained ー % of FGS adversarials correctly discarded ー % of L-BFGS adversarials correctly discarded ー % of wrong classifications of authentic images correctly discarded Threshold
  • 18. Evaluation (II) ● with low threshold you can: [1] Filter out 50% of adversarial images (and 10% of errors) while retaining almost all the authentic images [2] Filter out 80% of adversarial images (and 30% of errors) while retaining 90% the authentic images ● filter aggressiveness can be adjusted [1] [2] 18 Threshold
  • 19. ● Examples of content that might be filtered ● Our approach successfully identifies adversarial images, assigning them low scores 19 Evaluation (IV) - Good Detections http://deepfeatures.org/adversarials/
  • 20. ● The most difficult adversarial images to detect (the ones having highest kNN scores) ● Note the visual similarity and common aspects among the classes 20 Evaluation (IV) - Bad Detections http://deepfeatures.org/adversarials/
  • 21. Conclusions ● We presented an approach to cope with adversarial images ○ with a satisfactory level of accuracy ○ without changing the model (no retrain) ○ without using additional data ● Future Works ○ test more network architectures ○ test more generation algorithms for adversarial images ○ compare with other defense methodologies 21
  • 22. Thanks for your attention! Questions ? http://deepfeatures.org/adversarials/ Fabio Carrara <fabio.carrara@isti.cnr.it> 22