Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Anthill adversarial

Draft slides for the talk. I will be reducing the number of slides to make it ready for a crisp talk.

  • Be the first to comment

  • Be the first to like this

Anthill adversarial

  1. 1. Adversarial Attacks on Deep Learning Gaurav Goswami
  2. 2. IBM Systems Outline Introduction and motivation Our approach and contributions • Proposed perturbations • Proposed Adversarial attack Detection method • Proposed Adversarial attack Mitigation method Results Conclusions 2
  3. 3. IBM Systems What is Deep Learning? 3
  4. 4. IBM Systems Inside a Neural Network 4
  5. 5. IBM Systems Neural Network to Deep Learning 5
  6. 6. IBM Systems How does the network learn? 6
  7. 7. IBM Systems What does the network learn? 7
  8. 8. IBM Systems Deep Learning Applications Automotive and Transportation Security and Public Safety Healthcare Broadcast, Media and Entertainment •Autonomous driving •Pedestrian detection •Accident avoidance •Video Surveillance •Facial recognition and detection •Skin cancer detection •Lung cancer •Neurological •Captioning •Search
  9. 9. IBM Systems Adversaries for deep learning systems Input: • Perceptible vs Imperceptible input perturbations • Targeted attacks vs. non-targeted attacks • Image specific vs. Universal Network: • Black-box vs white-box Benign vs. dangerous 9 Input Network Embedding matching Perturb Black-box vs. White-box
  10. 10. IBM Systems Creating Adversaries 10
  11. 11. IBM Systems Benign Adversaries 11
  12. 12. IBM Systems Benign Adversaries 12
  13. 13. IBM Systems Dangerous Adversaries Stop  Yield: Practical Black-Box Attacks against Machine Learning Flipping face attributes: Are Facial Attributes Adversarially Robust? Accessorize to a Crime: Real and Stealthy Attacks on State-of- the-Art Face Recognition 158.00  10,000.00 Normal  cancer
  14. 14. IBM Systems Black-box vs White-box Adversaries Black box: • No detail about the network is known • Input-Output mapping can be obtained • Train a dummy network and attack it, attack scales to black-box network White box: • All the details about the network are known • Can attack the network directly • Less practical, but easier • Attacks often scale to other similarly structured networks 14
  15. 15. IBM Systems Existence of Adversaries 15
  16. 16. IBM Systems Gap in Machine Learning and Real World 16
  17. 17. IBM Systems What is Face Recognition? 17
  18. 18. IBM Systems Overview 18 Three components: 1. How do face recognition systems handle adversarial face images? 2. How does one detect the adversarial images? 3. How does one mitigate once the adversarial image is detected? Perturbation generation 1. Standard state of the art universal perturbation 2. Perceptible but face specific perturbations 3. Targeted attack
  19. 19. IBM Systems Adversarial Attacks on Deep Learning based Face Recognition 19
  20. 20. IBM Systems Our Contributions ✓ Assess the impact of adversarial attacks on deep learning ✓ Detecting adversarial images ✓ Improving performance for adversarial images
  21. 21. IBM Systems Attacks on Faces
  22. 22. IBM Systems Literature Review: Summary Summary of existing methods of defense: • Modified training method or modified input during testing • Modifying networks: adding more layers/changing loss/activation functions • Using external models as network add-on • Image level pre-processing done inside or out of the network pipeline • Using particular type of intermediate layers’ activations and statistics Proposed method: • Works without modifying the existing training method or network • Irrespective of loss function involved • Does not use external/add-on deep learning models
  23. 23. IBM Systems Comparison with existing detection methods SafetyNet Detector subnetwork Exploiting convolution filter statistics Additional class augmentation Learn a SVM on different patterns of late stage ReLU activations Add layers the original network to detect adversarial attacks Cascaded classifier applied on convolution filter statistics Add a class to the list of classes that captures all adversarial samples Not limited to just a particular stage or type of layer Does not modify or require re-training of the models Not limited to just a particular type of layer, using relative distances not statistics Does not modify or require re-training of the models Feature Squeezing MagNet Denoising Uncertainty estimates Use a second squeezing network and compare the output of the 2 networks A manifold of clean images is learned and used for detection Scalar quantization and spatial smoothing filter used to detect the attacks Uncertainty estimates and density estimation for dropout based networks Does not need a The focus is on the The image is not Not specific to dropout
  24. 24. IBM Systems Network Activation Observation 24
  25. 25. IBM Systems • The deep network’s output from intermediate layers is characterized by the mean output for undistorted images during training: • Learn the Canberra distances of the intermediate activations of each intermediate layer when given distorted input with the mean activations for undistorted input: • Using these distance metrics as feature vectors (feature vector length = number of layers), a SVM classifier is trained to classify each image as normal/adversarial Adversarial Perturbation Detection i
  26. 26. IBM Systems Results of Adversary Detection 26 Distortions LightCNN VGG-Face Liang et al. 2017 Feinman et al. 2017 Beard 89.5 99.8 83.4 85.1 ERO 90.6 99.7 84.9 84.6 FHBO 81.7 99.8 78.3 77.8 Grids 89.7 99.9 85.1 85.7 xMSB 93.2 99.8 88.2 87.9 M E D S P a S C Distortions LightCNN VGG-Face Liang et al. 2017 Feinman et al. 2017 Beard 92.2 86.8 81.2 80.9 ERO 91.9 86.0 80.4 80.0 FHBO 92.9 84.4 79.8 79.6 Grids 68.4 84.4 62.1 62.4 xMSB 92.9 85.4 80.2 80.9
  27. 27. IBM Systems Prior work on mitigation Summary of existing methods of defense: • Modified training method or modified input during testing • Modifying networks: adding more layers/changing loss/activation functions • Using external models as network add-on Proposed method: • Works without modifying the existing training method or network • Irrespective of loss function involved • Does not use external models
  28. 28. IBM Systems Comparison with existing mitigation methods Deep Contractive Networks Gradient regularization Defensive Distillation Biologically inspired protection Using DAEs with smoothness penalty+CNN Train networks to regularize the difference in output with relation to change in input Using class probability vectors from trained network to re-train the original model Using highly non-linear activation functions Does not require modifying the network Does not require re-training the network Does not depend on a specific activation function Parseval networks DeepCloak Defense against universal perturbations GAN-based defense Layer-wise regularization by maintaing a small global Lipschitz constant Add a mask layer, trained explicitly with clean and adversarial samples Adds pre-input layers for detecting and mitigating universal perturbations The generator network is used to create/rectify adversarial images
  29. 29. IBM Systems Comparison with existing mitigation methods Brute-force adversarial training Data compression Foveation Data Randomization Use adversarial samples during training of the network Compress the image before testing Applying the network to different regions of the image Using data augmentation and data transformation techniques before testing The network does not need to be re-trained or fine-tuned No compression techniques are applied to the images before testing Does not require regional application of the network Simpler denoising approach combined with network activation suppression
  30. 30. IBM Systems Proposed Mitigation Method 30 • Learn layer-wise filter-wise scores: • ϵij denotes the score for the jth filter in the ith layer • These determine the layers and filters that are most affected in the presence of adversarial distortions which are used to perform selective dropout at runtime
  31. 31. IBM Systems Mitigation Results (GAR (%) at 1% FAR) on the MEDS and PaSC databases 31 Algorithm Database Original Distorted Corrected LightCNN PaSC 60.5 25.9 36.2 MEDS 89.3 41.6 61.3 VGG-Face PaSC 54.3 14.6 24.8 MEDS 78.4 30.5 40.6
  32. 32. IBM Systems Conclusions Deep networks are susceptible to adversarial attack Deep networks do not behave like the human mind There may be perceptible or imperceptible attacks There exist measures to detect and mitigate the effect of such attacks but they are not perfect either
  33. 33. IBM Systems References [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199, 2014. [2] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and Harnessing Adversarial Examples, arXiv preprint arXiv:1412.6572, 2015. [3] S. Moosavi-Dezfooli, A. Fawzi, P. Frossard, DeepFool: a simple and accurate method to fool deep neural networks, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574-2582, 2016. [4] T. Miyato, A. M. Dai, and Ian Goodfellow, Adversarial Training Methods for Semi-Supervised Text Classification, arXiv preprint arXiv:1605.07725, 2016. [5] S. Zheng, Y. Song, T. Leung, and I. Goodfellow, Improving the robustness of deep neural networks via stability training, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4480-4488, 2016. [6] S. M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi and P. Frossard, Universal adversarial perturbations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [7] C. Guo, M. Rana, M. Cisse, L. Maaten, Countering Adversarial Images using Input Transformations, arXiv preprint arXiv:1711.00117, 2017. [8] ] A. N. Bhagoji, D. Cullina, C. Sitawarin, P. Mittal, Enhancing Robustness of Machine Learning Systems via Data Transformations, arXiv preprint arXiv:1704.02654, 2017. [9] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy, A study of the effect of JPG compression on adversarial images, arXiv preprint arXiv:1608.00853, 2016. [10] Y. Luo, Xavier Boix, Gemma Roig, Tomaso Poggio, and Qi Zhao, Foveation-based mechanisms alleviate adversarial examples, arXiv preprint arXiv:1511.06292, 2015.
  34. 34. IBM Systems References [11] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, Adversarial Examples for Semantic Segmentation and Object Detection, arXiv preprint arXiv:1703.08603, 2017. [12] Q. Wang, W. Guo, K. Zhang, I. I. Ororbia, G. Alexander, X. Xing, C. L. Giles, and X. Liu, Learning Adversary-Resistant Deep Neural Networks, arXiv preprint arXiv:1612.01401, 2016. [13] S. Gu, L. Rigazio, Towards Deep Neural Network Architectures Robust to Adversarial Examples, arXiv preprint arXiv:1412.5068, 2015 [14] W. Bai, C. Quan, and Z. Luo, Alleviating adversarial attacks via convolutional autoencoder, In International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 53-58, 2017. [15] A. S. Ross, F. Doshi-Velez, Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients, arXiv preprint arXiv:1711.09404, 2017. [16] C. Lyu, K. Huang, H. Liang, A Unified Gradient Regularization Family for Adversarial Examples, In IEEE International Conference on Data Mining, pp. 301-309, 2015. [17] L. Nguyen, A. Sinha, A Learning and Masking Approach to Secure Learning, arXiv preprint arXiv:1709.04447, 2017. [18] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks, In IEEE Symposium on Security and Privacy (SP), pp. 582-597, 2016 [19] N. Papernot, P. McDaniel, Extending Defensive Distillation, arXiv preprint arXiv:1705.05264, 2017. [20] A. Nayebi, and S. Ganguli, Biologically inspired protection of deep networks from adversarial attacks, arXiv preprint arXiv:1703.09202, 2017.
  35. 35. IBM Systems References [21] D. Krotov, and J. J. Hopfield. Dense Associative Memory is Robust to Adversarial Inputs, arXiv preprint arXiv:1701.00939, 2017. [22] M. Cisse, Y. Adi, N. Neverova, and J. Keshet, Houdini: Fooling deep structured prediction models, arXiv preprint arXiv:1707.05373, 2017. [23] J. Gao, B. Wang, Z. Lin, W. Xu, and Y. Qi, DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples, (2017). [24] N. Akhtar, J. Liu, A. Mian, Defense against Universal Adversarial Perturbations, arXiv preprint arXiv:1711.05929, 2017. [25] H. Lee, S. Han, J. Lee, Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN, arXiv preprint arXiv:1705.03387, 2017. [26] ] S. Shen, G. Jin, K. Gao, Y. Zhang, APE-GAN: Adversarial Perturbation Elimination with GAN, arXiv preprint arXiv:1707.05474, 2017.
  36. 36. IBM Systems References [27] J. Lu, T. Issaranon, D. Forsyth, SafetyNet: Detecting and Rejecting Adversarial Examples Robustly, arXiv preprint arXiv:1704.00103, 2017. [28] J. H. Metzen, T. Genewein, V. Fischer, B. Bischoff, On Detecting Adversarial Perturbations, arXiv preprint arXiv:1702.04267, 2017. [29] X. Li, F. Li, Adversarial Examples Detection in Deep Networks withConvolutional Filter Statistics, In Proceedings of International Con-ference on Computer Vision, 2017. [30] K. Grosse, P. Manoharan, N. Papernot, M. Backes, P. McDaniel,On the (Statistical) Detection of Adversarial Examples, arXiv preprintarXiv:1702.06280, 2017. [31] W. Xu, D. Evans, Y. Qi, Feature Squeezing: Detecting AdversarialExamples in Deep Neural Networks, arXiv preprint arXiv:1704.01155,2017. [32] Dongyu Meng, Hao Chen, MagNet: a Two-Pronged Defense againstAdversarial Examples, In Proceedings of ACM Conference on Com-puter and Communications Security (CCS), 2017. [33] B. Liang, H. Li, M. Su, X. Li, W. Shi, X. Wang, Detecting AdversarialExamples in Deep Networks with Adaptive Noise Reduction, arXivpreprint arXiv:1705.08378, 2017. [34] R. Feinman, R. R. Curtin, S. Shintre, A. B. Gardner, DetectingAdversarial Samples from Artifacts, arXiv preprint arXiv:1703.00410,2017.