Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019

1,201 views

Published on

Slides from the talk given by Battista Biggio and Fabio Roli at ITASEC 2019, February 14, Pisa

Published in: Engineering
  • How can I sharpen my memory? How can I improve forgetfulness? find out more... ♣♣♣ https://bit.ly/2GEWG9T
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ➤➤ 3 Reasons Why You Shouldn't take Pills for ED (important) ♥♥♥ https://tinyurl.com/rockhardxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019

  1. 1. Introduction to Adversarial Machine Learning Fabio Roli https://www.pluribus-one.it/research/sec-ml/wild-patterns/ Pattern Recognition and Applications Lab University of Cagliari, Italy ITASEC 2019, Italian Conference on Cybersecurity, February 14, Pisa, Italy
  2. 2. http://pralab.diee.unica.it We Are Living in the Best of the Worlds… 2 AI is going to transform industry and business as electricity did about a century ago (Andrew Ng, Jan. 2017)
  3. 3. 3 All Right? All Good?
  4. 4. http://pralab.diee.unica.it iPhone 5s and 6s with Fingerprint Reader… 4
  5. 5. http://pralab.diee.unica.it Hacked a Few Days After Release… 5
  6. 6. 6 But maybe this happens only for old, shallow machine learning… end-to-end deep learning is another story…
  7. 7. http://pralab.diee.unica.it Adversarial School Bus 7 Biggio, Roli et al., Evasion attacks against machine learning at test time, ECML-PKDD 2013 Szegedy et al., Intriguing properties of neural networks, ICLR 2014
  8. 8. 8 But maybe this happens only for image recognition…
  9. 9. http://pralab.diee.unica.it Deep Neural Networks for EXE Malware Detection • MalConv: convolutional deep network trained on raw bytes to detect EXE malware • Gradient-based attacks can evade it by adding few padding bytes 9[Kolosniaji, Biggio, Roli et al., Adversarial Malware Binaries, EUSIPCO2018]
  10. 10. http://pralab.diee.unica.it Take-home Message We are living exciting time for machine learning… …Our work feeds a lot of consumer technologies for personal applications... This opens up new big possibilities, but also new security risks 10
  11. 11. http://pralab.diee.unica.it The Classical Statistical Model Note these two implicit assumptions of the model: 1. the source of data is given, and it does not dependent on the classifier 2. Noise affecting data is stochastic 11 Data source acquisition/ measurementRaw data x1 x2 ... xd feature vector learning algorithm classifier stochastic noise ed by sets of coupled s for formal neurons ation of essentials feat Example: OCR
  12. 12. http://pralab.diee.unica.it Can This Model Be Used Under Attack? 12 Data source acquisition/ measurementRaw data x1 x2 ... xd feature vector learning algorithm classifier stochastic noise ed by sets of coupled s for formal neurons ation of essentials feat Example: OCR
  13. 13. http://pralab.diee.unica.it An Example: Spam Filtering 13 Total score = 6.0 From: spam@example.it Buy Viagra ! > 5.0 (threshold) Spam Linear Classifier ØThe famous SpamAssassin filter is really a linear classifier §http://spamassassin.apache.org Feature weights buy = 1.0 viagra = 5.0
  14. 14. http://pralab.diee.unica.it Feature Space View 14 X2 X1 + + + + + - - - - - yc(x) Feature weights buy = 1.0 viagra = 5.0 From: spam@example.it Buy Viagra! • Classifier’s weights can be learnt using a training set • The SpamAssassin filter uses the perceptron algorithm
  15. 15. 15 But spam filtering is not a stationary classification task, the data source is not neutral…
  16. 16. http://pralab.diee.unica.it The Data Source Can Add “Good” Words 16 Total score = 1.0 From: spam@example.it Buy Viagra ! conference meeting < 5.0 (threshold) Linear Classifier Feature weights buy = 1.0 viagra = 5.0 conference = -2.0 meeting = -3.0 Ham üAdding good words is a typical spammers trick [Z. Jorgensen et al., JMLR 2008]
  17. 17. http://pralab.diee.unica.it Adding Good Words: Feature Space View 17 X2 X1 + + + + + - - - - - yc(x) Feature weights buy = 1.0 viagra = 5.0 conference = -2.0 meeting = -3.0 From: spam@example.it Buy Viagra! conference meeting - üNote that spammers corrupt patterns with a noise that is not random..
  18. 18. http://pralab.diee.unica.it Is This Model Good for Spam Filtering? Ø the source of data is given, and it does not dependent on the classifier Ø Noise affecting data is stochastic (“random”) 18 Data source acquisition/ measurementRaw data x1 x2 ... xd feature vector learning algorithm classifier stochastic noise ed by sets of coupled s for formal neurons ation of essentials feat Example: OCR
  19. 19. 19 No, it is not…
  20. 20. http://pralab.diee.unica.it Adversarial Machine Learning 1. the source of data is not neutral, it really depends on the classifier 2. noise is not stochastic, it is adversarial, it is just crafted to maximize the classification error 20 measurementRaw data x1 x2 ... xn feature vector learning algorithm classifier adversarial noise Spam message: Buy Viagra Camouflaged message: Buy Vi@gra Dublin University Non-neutral data source
  21. 21. http://pralab.diee.unica.it Adversarial Noise vs. Stochastic Noise 21 Hamming’s adversarial noise model: the channel acts as an adversary that arbitrarily corrupts the code-word subject to a bound on the total number of errors Shannon’s stochastic noise model: probabilistic model of the channel, the probability of occurrence of too many or too few errors is usually low • This distinction is not new...
  22. 22. http://pralab.diee.unica.it The Classical Model Cannot Work • Standard classification algorithms assume that data generating process is independent from the classifier – This is not the case for adversarial tasks • Easy to see that classifier performance will degrade quickly if the adversarial noise is not taken into account • Adversarial tasks are a mission impossible for the classical model 22
  23. 23. 23 How Should We Design Pattern Classifiers Under Attack?
  24. 24. http://pralab.diee.unica.it Adversary-aware Machine Learning 24 Adversary Classifier Designer 1. Analyze classifier 2. Devise and execute attack 3. Analyze attack 4. Design defences Machine learning systems should be aware of the arms race with the adversary [Biggio, Fumera, Roli. Security evaluation of pattern classifiers under attack, IEEE TKDE, 2014]
  25. 25. 25 How Can We Design Adversary-aware Machine Learning Systems?
  26. 26. http://pralab.diee.unica.it The Three Golden Rules 1. Know your adversary 2. Be proactive 3. Protect your classifier 26
  27. 27. 27 Know your adversary If you know the enemy and know yourself, you need not fear the result of a hundred battles (Sun Tzu, The art of war, 500 BC)
  28. 28. http://pralab.diee.unica.it Adversary’s 3D Model 28 Adversary’s Knowledge Adversary’s Capability Adversary’s Goal
  29. 29. http://pralab.diee.unica.it Adversary’s Goal 29[R. Lippmann, Dagstuhl Workshop , Sept. 2012] Normal Attack Normal OK False Alarm Attack Miss Alarm OK Truth Classifier output
  30. 30. http://pralab.diee.unica.it Adversary’s Goal 30 Normal Attack Normal OK False Alarm Attack Miss Alarm OK Truth Classifier output Evasion attack Denial of service (DoS) attack [R. Lippmann, Dagstuhl Workshop , Sept. 2012]
  31. 31. http://pralab.diee.unica.it Adversary’s Knowledge [B. Biggio, G. Fumera, F. Roli, IEEE Trans. on KDE 2014] • Perfect-knowledge (white-box) attacks – upper bound on the performance degradation under attack 31 TRAINING DATA FEATURE REPRESENTATION LEARNING ALGORITHM e.g., SVM x1 x2 ... xd x xx x x x x x x x x x x xxx x - Learning algorithm - Parameters (e.g., feature weights) - Feedback on decisions
  32. 32. http://pralab.diee.unica.it Adversary’s Knowledge [B. Biggio, G. Fumera, F. Roli, IEEE Trans. on KDE 2014] • Limited-knowledge Attacks – Ranging from gray-box to black-box attacks 32 TRAINING DATA FEATURE REPRESENTATION LEARNING ALGORITHM e.g., SVM x1 x2 ... xd x xx x x x x x x x x x x xxx x - Learning algorithm - Parameters (e.g., feature weights) - Feedback on decisions
  33. 33. http://pralab.diee.unica.it Kerckhoffs’ Principle • Kerckhoffs’ Principle (Kerckhoffs 1883) states that the security of a system should not rely on unrealistic expectations of secrecy – It’s the opposite of the principle of “security by obscurity” • Secure systems should make minimal assumptions about what can realistically be kept secret from a potential attacker • For machine learning systems, one could assume that the adversary is aware of the learning algorithm and can obtain some degree of information about the data used to train the learner • But the best strategy is to assess system security under different levels of adversary’s knowledge 33[A. D. Joseph et al., Adversarial Machine Learning, Cambridge Univ. Press, 2017]
  34. 34. http://pralab.diee.unica.it Adversary’s Capability 34[B. Biggio, G. Fumera, F. Roli, IEEE TKDE 2014; M. Barreno et al., ML 2010] Training data Learning-based classifier Attack at training time (a.k.a. poisoning) http://www.vulnerablehotel.com/login/ malicious
  35. 35. http://pralab.diee.unica.it Adversary’s Capability 35 Training data Learning-based classifier Attack at training time (“poisoning”) http://www.vulnerablehotel.com/login/ http://www.vulnerablehotel.com/login/ malicious
  36. 36. http://pralab.diee.unica.it A Deliberate Poisoning Attack? 36 [http://exploringpossibilityspace.blogspot.it/2016/03/po or-software-qa-is-root-cause-of-tay.html] Microsoft deployed Tay, and AI chatbot designed to talk to youngsters on Twitter, but after 16 hours the chatbot was shut down since it started to raise racist and offensive comments.
  37. 37. http://pralab.diee.unica.it Adversary’s Capability 37 Evasion attack at test time Classifier Camouflaged input data Buy Vi@gra Dublin University [B. Biggio, G. Fumera, F. Roli, IEEE TKDE 2014; M. Barreno et al., ML 2010]
  38. 38. http://pralab.diee.unica.it Adversary’s Capability • Luckily, the adversary is not omnipotent, she is constrained… 38[B. Biggio et al., IET Biometrics, 2012; R. Lippmann, Dagstuhl Workshop , Sept. 2012] Email messages must be understandable by human readers Data packets must execute on a computer, usually exploit a known vulnerability, and violate a sometimes explicit security policy Spoofing attacks are not perfect replicas of the live biometric traits
  39. 39. http://pralab.diee.unica.it Adversary’s Capability [B. Biggio, G. Fumera, F. Roli, IEEE Trans. KDE 2014] • Constraints on data manipulation – maximum number of samples that can be added to the training data • the attacker usually controls only a small fraction of the training samples – maximum amount of modifications • application-specific constraints in feature space • e.g., max. number of words that are modified in spam emails 39 d(x, !x ) ≤ dmax x2 x1 f(x) x Feasible domain x ' TRAINING TEST
  40. 40. http://pralab.diee.unica.it Conservative Design • The design and analysis of a system should avoid unnecessary or unreasonable assumptions about and limitations on the adversary • This allows one to assess “worst-case” scenarios • Conversely, however, analysing the capabilities of an omnipotent adversary reveals little about a learning system’s behaviour against realistic constrained attackers • Again, the best strategy is to assess system security under different levels of adversary’s capability 40[A. D. Joseph et al., Adversarial Machine Learning, Cambridge Univ. Press, 2017]
  41. 41. 41 2014: Deep Learning Meets Adversarial Machine Learning
  42. 42. http://pralab.diee.unica.it The Discovery of Adversarial Examples 42 ... we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error ... [Szegedy, Goodfellow et al., Intriguing Properties of NNs, ICLR 2014, ArXiv 2013]
  43. 43. http://pralab.diee.unica.it Adversarial Examples and Deep Learning • C. Szegedy et al. (ICLR 2014) independently developed a gradient-based attack against deep neural networks – minimally-perturbed adversarial examples 43[Szegedy, Goodfellow et al., Intriguing Properties of NNs, ICLR 2014, ArXiv 2013]
  44. 44. http://pralab.diee.unica.it !(#) ≠ & School Bus (x) Ostrich Struthio Camelus Adversarial Noise (r) The adversarial image x + r is visually hard to distinguish from x Informally speaking, the solution x + r is the closest image to x classified as l by f The solution is approximated using using a box-constrained limited-memory BFGS Creation of Adversarial Examples 44[Szegedy, Goodfellow et al., Intriguing Properties of NNs, ICLR 2014, ArXiv 2013]
  45. 45. Many Adversarial Examples After 2014… [Search https://arxiv.org with keywords “adversarial examples”] 45 Several defenses have been proposed against adversarial examples, and more powerful attacks have been developed to show that they are ineffective. Most of these attacks are modifications to the optimization problems reported for evasion attacks / adversarial examples, using different gradient-based solution algorithms, initializations and stopping conditions. Most popular attack algorithms: FGSM (Goodfellow et al.), JSMA (Papernot et al.), CW (Carlini & Wagner, and follow-up versions)
  46. 46. 46 Why Adversarial Perturbations are Imperceptible?
  47. 47. http://pralab.diee.unica.it Why Adversarial Perturbations against Deep Networks are Imperceptible? • Large sensitivity of g(x) to input changes – i.e., the input gradient !"#(") has a large norm (scales with input dimensions!) – Thus, even small modifications along that direction will cause large changes in the predictions 47[Simon-Gabriel et al., Adversarial vulnerability of NNs increases with input dimension, ArXiv 2018] & g(&) &’ !"#(")
  48. 48. 48 Do Adversarial Examples exist in the Physical World?
  49. 49. http://pralab.diee.unica.it • Adversarial images fool deep networks even when they operate in the physical world, for example, images are taken from a cell-phone camera? – Alexey Kurakin et al. (2016, 2017) explored the possibility of creating adversarial images for machine learning systems which operate in the physical world. They used images taken from a cell-phone camera as an input to an Inception v3 image classification neural network. – They showed that in such a set-up, a significant fraction of adversarial images crafted using the original network are misclassified even when fed to the classifier through the camera. [Alexey Kurakin et al., ICLR 2017] Adversarial Images in the Physical World 49
  50. 50. http://pralab.diee.unica.it Adversarial Turtle… 50
  51. 51. http://pralab.diee.unica.it [M. Sharif et al., ACM CCS 2016] The adversarial perturbation is applied only to the eyeglasses image region Adversarial Glasses 51
  52. 52. http://pralab.diee.unica.it Should We Be Worried ? 52
  53. 53. http://pralab.diee.unica.it No, we shouldn’t… 53 In this paper, we show experiments that suggest that a trained neural network classifies most of the pictures taken from different distances and angles of a perturbed image correctly. We believe this is because the adversarial property of the perturbation is sensitive to the scale at which the perturbed picture is viewed, so (for example) an autonomous car will misclassify a stop sign only from a small range of distances. [arXiv:1707.03501; CVPR 2017]
  54. 54. http://pralab.diee.unica.it Yes, we should… 54
  55. 55. http://pralab.diee.unica.it Yes, we should... 55
  56. 56. http://pralab.diee.unica.it Yes, we should… 56 [https://blog.openai.com/robust-adversarial-inputs/]
  57. 57. http://pralab.diee.unica.it • Adversarial examples can exist in the physical world, we can fabricate concrete adversarial objects (glasses, road signs, etc.) • But the effectiveness of attacks carried out by adversarial objects is still to be investigated with large scale experiments in realistic security scenarios • Gilmer et al. (2018) have recently discussed the realism of security threat caused by adversarial examples, pointing out that it should be investigated more carefully – Are indistinguishable adversarial examples a real security threat ? – For which real security scenarios adversarial examples are the best attack vector? Better than attacking components outside the machine learning component – … 57 Is a Real Security Threat? [Justin Gilmer et al., Motivating the Rules of the Game for Adversarial Example Research, https://arxiv.org/abs/1807.06732]
  58. 58. http://pralab.diee.unica.it Attacks with content preservation 58 There are well known security applications where minimal perturbations and indistinguishability of adversarial inputs are not required at all…
  59. 59. http://pralab.diee.unica.it To Conclude… This is a recent research field… 59 Dagstuhl Perspectives Workshop on “Machine Learning in Computer Security” Schloss Dagstuhl, Germany, Sept. 9th-14th, 2012
  60. 60. http://pralab.diee.unica.it Timeline of Learning Security 60 Security of DNNs Adversarial M L 2004-2005: pioneering work Dalvi et al., KDD 200; Lowd & M eek, KDD 2005 2013: Srndic & Laskov, NDSS claim nonlinear classifiers are secure 2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of M achine Learning 2013-2014: Biggio et al., ECM L, IEEE TKDE high-confidence & black-box evasion attacks to show vulnerability of nonlinear classifiers 2014: Srndic & Laskov, IEEE S&P shows vulnerability of nonlinear classifiers with our ECM L ‘13 gradient-based attack 2014: Szegedy et al., ICLR adversarial exam ples vs DL 2016: Papernot et al., IEEE S&P evasion attacks / adversarial exam ples 2017: Papernot et al., ASIACCS black-box evasion attacks 2017: Carlini & W agner, IEEE S&P high-confidence evasion attacks 2017: Grosse et al., ESORICS Application to Android m alware 2017: Dem ontis et al., IEEE TDSC Secure learning for Android m alware 2004 2014 2006 2013 2014 2017 2016 2017 2017 2017 2014
  61. 61. http://pralab.diee.unica.it Timeline of Learning Security 61 Security of DNNs Adversarial M L 2004-2005: pioneering work Dalvi et al., KDD 200; Lowd & M eek, KDD 2005 2013: Srndic & Laskov, NDSS claim nonlinear classifiers are secure 2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of M achine Learning 2013-2014: Biggio et al., ECM L, IEEE TKDE high-confidence & black-box evasion attacks to show vulnerability of nonlinear classifiers 2014: Srndic & Laskov, IEEE S&P shows vulnerability of nonlinear classifiers with our ECM L ‘13 gradient-based attack 2014: Szegedy et al., ICLR adversarial exam ples vs DL 2016: Papernot et al., IEEE S&P evasion attacks / adversarial exam ples 2017: Papernot et al., ASIACCS black-box evasion attacks 2017: Carlini & W agner, IEEE S&P high-confidence evasion attacks 2017: Grosse et al., ESORICS Application to Android m alware 2004 2014 2006 2013 2014 2017 2016 2017 2017 2017 2014 2017: Dem ontis et al., IEEE TDSC Secure learning for Android m alware
  62. 62. http://pralab.diee.unica.it Timeline of Learning Security AdversarialM L 2004-2005: pioneering work Dalvi et al., KDD 2004 Lowd & Meek, KDD 2005 2013: Srndic & Laskov, NDSS 2013: Biggio et al., ECML-PKDD - demonstrated vulnerability of nonlinear algorithms to gradient-based evasion attacks, also under limited knowledge Main contributions: 1. gradient-based adversarial perturbations (against SVMs and neural nets) 2. projected gradient descent / iterative attack (also on discrete features from malware data) transfer attack with surrogate/substitute model 3. maximum-confidence evasion (rather than minimum-distance evasion) Main contributions: - minimum-distance evasion of linear classifiers - notion of adversary-aware classifiers 2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of Machine Learning (and references therein) Main contributions: - first consolidated view of the adversarial ML problem - attack taxonomy - exemplary attacks against some learning algorithms 2014: Szegedy et al., ICLR Independent discovery of (gradient- based) minimum-distance adversarial examples against deep nets; earlier implementation of adversarial training SecurityofDNNs 2016: Papernot et al., IEEE S&P Framework for security evalution of deep nets 2017: Papernot et al., ASIACCS Black-box evasion attacks with substitute models (breaks distillation with transfer attacks on a smoother surrogate classifier) 2017: Carlini & Wagner, IEEE S&P Breaks again distillation with maximum-confidence evasion attacks (rather than using minimum-distance adversarial examples) 2016: Papernot et al., Euro S&P Distillation defense (gradient masking) Main contributions: - evasion of linear PDF malware detectors - claims nonlinear classifiers can be more secure 2014: Biggio et al., IEEE TKDE Main contributions: - framework for security evaluation of learning algorithms - attacker’s model in terms of goal, knowledge, capability 2017: Demontis et al., IEEE TDSC Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection Main contributions: - Secure SVM against adversarial examples in malware detection 2017: Grosse et al., ESORICS Adversarial examples for malware detection 2018: Madry et al., ICLR Improves the basic iterative attack from Kurakin et al. by adding noise before running the attack; first successful use of adversarial training to generalize across many attack algorithms 2014: Srndic & Laskov, IEEE S&P used Biggio et al.’s ECML-PKDD ‘13 gradient-based evasion attack to demonstrate vulnerability of nonlinear PDF malware detectors 2006: Globerson & Roweis, ICML 2009: Kolcz et al., CEAS 2010: Biggio et al., IJMLC Main contributions: - evasion attacks against linear classifiers in spam filtering Work on security evaluation of learning algorithms Work on evasion attacks (a.k.a. adversarial examples) Pioneering work on adversarial machine learning ... in malware detection (PDF / Android) Legend 1 2 3 4 1 2 3 4 2015: Goodfellow et al., ICLR Maximin formulation of adversarial training, with adversarial examples generated iteratively in the inner loop 2016: Kurakin et al. Basic iterative attack with projected gradient to generate adversarial examples 2 iterative attacks Biggio and Roli, Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning, Pattern Recognition, 2018 62
  63. 63. http://pralab.diee.unica.it Learning Comes at a Price! 63 The introduction of novel learning functionalities increases the attack surface of computer systems and produces new vulnerabilities Safety of machine learning will be more and more important in future computer systems, as well as accountability, transparency, and the protection of fundamental human values and rights
  64. 64. Vulnerability Assessment of Machine Learning Battista Biggio https://www.pluribus-one.it/research/sec-ml/wild-patterns/ Pattern Recognition and Applications Lab University of Cagliari, Italy ITASEC 2019, Italian Conference on Cybersecurity, February 14, Pisa, Italy
  65. 65. 65 Be Proactive To know your enemy, you must become your enemy (Sun Tzu, The art of war, 500 BC)
  66. 66. http://pralab.diee.unica.it Be Proactive • Given a model of the adversary characterized by her: – Goal – Knowledge – Capability Try to anticipate the adversary! • What is the optimal attack she can do? • What is the expected performance decrease of your classifier? 66
  67. 67. http://pralab.diee.unica.it Evasion of Linear Classifiers • Problem: how to evade a linear (trained) classifier? Start 2007 with a bang! Make WBFS YOUR PORTFOLIO’s first winner of the year ... start bang portfolio winner year ... university campus 1 1 1 1 1 ... 0 0 +6 > 0, SPAM (correctly classified) f (x) = sign(wT x) x start bang portfolio winner year ... university campus +2 +1 +1 +1 +1 ... -3 -4 w x’ St4rt 2007 with a b4ng! Make WBFS YOUR PORTFOLIO’s first winner of the year ... campus start bang portfolio winner year ... university campus 0 0 1 1 1 ... 0 1 +3 -4 < 0, HAM (misclassified email) f (x) = sign(wT x) 67
  68. 68. http://pralab.diee.unica.it Evasion of Nonlinear Classifiers • What if the classifier is nonlinear? • Decision functions can be arbitrarily complicated, with no clear relationship between features (x) and classifier parameters (w) −2−1.5−1−0.500.511.5 68
  69. 69. http://pralab.diee.unica.it Detection of Malicious PDF Files Srndic & Laskov, Detection of malicious PDF files based on hierarchical document structure, NDSS 2013 “The most aggressive evasion strategy we could conceive was successful for only 0.025% of malicious examples tested against a nonlinear SVM classifier with the RBF kernel [...]. Currently, we do not have a rigorous mathematical explanation for such a surprising robustness. Our intuition suggests that [...] the space of true features is “hidden behind” a complex nonlinear transformation which is mathematically hard to invert. [...] the same attack staged against the linear classifier [...] had a 50% success rate; hence, the robustness of the RBF classifier must be rooted in its nonlinear transformation” 69
  70. 70. http://pralab.diee.unica.it Evasion Attacks against Machine Learning at Test Time Biggio, Corona, Maiorca, Nelson, Srndic, Laskov, Giacinto, Roli, ECML-PKDD 2013 • Goal: maximum-confidence evasion • Knowledge: perfect (white-box attack) • Attack strategy: • Non-linear, constrained optimization – Projected gradient descent: approximate solution for smooth functions • Gradients of g(x) can be analytically computed in many cases – SVMs, Neural networks −2−1.5−1−0.500.51 x f (x) = sign g(x)( )= +1, malicious −1, legitimate " # $ %$ x ' min $% &((% ) s. t. ( − (% . ≤ 0123 70
  71. 71. http://pralab.diee.unica.it Computing Descent Directions Support vector machines Neural networks x1 xd d1 dk dm xf g(x) w1 wk wm v11 vmd vk1 …… …… g(x) = αi yik(x, i ∑ xi )+ b, ∇g(x) = αi yi∇k(x, xi ) i ∑ g(x) = 1+exp − wkδk (x) k=1 m ∑ # $ % & ' ( ) * + , - . −1 ∂g(x) ∂xf = g(x) 1− g(x)( ) wkδk (x) 1−δk (x)( )vkf k=1 m ∑ RBF kernel gradient: ∇k(x,xi ) = −2γ exp −γ || x − xi ||2 { }(x − xi ) [Biggio, Roli et al., ECML PKDD 2013] 71
  72. 72. http://pralab.diee.unica.it An Example on Handwritten Digits • Nonlinear SVM (RBF kernel) to discriminate between ‘3’ and ‘7’ • Features: gray-level pixel values (28 x 28 image = 784 features) Few modifications are enough to evade detection! 1st adversarial examples generated with gradient-based attacks date back to 2013! (one year before attacks to deep neural networks) Before attack (3 vs 7) 5 10 15 20 25 5 10 15 20 25 After attack, g(x)=0 5 10 15 20 25 5 10 15 20 25 After attack, last iter. 5 10 15 20 25 5 10 15 20 25 0 500 −2 −1 0 1 2 g(x) number of iterations After attack (misclassified as 7) 72[Biggio, Roli et al., ECML PKDD 2013]
  73. 73. http://pralab.diee.unica.it Bounding the Adversary’s Knowledge Limited-knowledge (gray/black-box) attacks • Only feature representation and (possibly) learning algorithm are known • Surrogate data sampled from the same distribution as the classifier’s training data • Classifier’s feedback to label surrogate data PD(X,Y)data Surrogate training data Send queries Get labels f(x) Learn surrogate classifier f’(x) This is the same underlying idea behind substitute models and black- box attacks (transferability) [N. Papernot et al., IEEE Euro S&P ’16; N. Papernot et al., ASIACCS’17] 73[Biggio, Roli et al., ECML PKDD 2013]
  74. 74. http://pralab.diee.unica.it Recent Results on Android Malware Detection • Drebin: Arp et al., NDSS 2014 – Android malware detection directly on the mobile phone – Linear SVM trained on features extracted from static code analysis [Demontis, Biggio et al., IEEE TDSC 2017] x2 Classifier 0 1 ... 1 0 Android app (apk) malware benign x1 x f (x) 74
  75. 75. http://pralab.diee.unica.it Recent Results on Android Malware Detection • Dataset (Drebin): 5,600 malware and 121,000 benign apps (TR: 30K, TS: 60K) • Detection rate at FP=1% vs max. number of manipulated features (averaged on 10 runs) – Perfect knowledge (PK) white-box attack; Limited knowledge (LK) black-box attack 75[Demontis, Biggio et al., IEEE TDSC 2017]
  76. 76. http://pralab.diee.unica.it Take-home Messages • Linear and non-linear supervised classifiers can be highly vulnerable to well-crafted evasion attacks • Performance evaluation should be always performed as a function of the adversary’s knowledge and capability – Security Evaluation Curves 76 min x' g(x') s.t. d(x, x') ≤ dmax x ≤ x'
  77. 77. http://pralab.diee.unica.it Why Is Machine Learning So Vulnerable? • Learning algorithms tend to overemphasize some features to discriminate among classes • Large sensitivity to changes of such input features: !"#(") • Different classifiers tend to find the same set of relevant features – that is why attacks can transfer across models! 77[Melis et al., Explaining Black-box Android Malware Detection, EUSIPCO 2018] & g(&) &’ positivenegative
  78. 78. 78 Is Deep Learning Safe for Robot Vision?
  79. 79. http://pralab.diee.unica.it Is Deep Learning Safe for Robot Vision? • Evasion attacks against the iCub humanoid robot – Deep Neural Network used for visual object recognition 79[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
  80. 80. http://pralab.diee.unica.it [http://old.iit.it/projects/data-sets] iCubWorld28 Data Set: Example Images 80
  81. 81. http://pralab.diee.unica.it From Binary to Multiclass Evasion • In multiclass problems, classification errors occur in different classes. • Thus, the attacker may aim: 1. to have a sample misclassified as any class different from the true class (error-generic attacks) 2. to have a sample misclassified as a specific class (error-specific attacks) 81 cup sponge dish detergent Error-generic attacks any class cup sponge dish detergent Error-specific attacks [Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
  82. 82. http://pralab.diee.unica.it Error-generic Evasion • Error-generic evasion – k is the true class (blue) – l is the competing (closest) class in feature space (red) • The attack minimizes the objective to have the sample misclassified as the closest class (could be any!) 82 1 0 1 1 0 1 Indiscriminate evasion [Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
  83. 83. http://pralab.diee.unica.it • Error-specific evasion – k is the target class (green) – l is the competing class (initially, the blue class) • The attack maximizes the objective to have the sample misclassified as the target class Error-specific Evasion 83 max 1 0 1 1 0 1 Targeted evasion [Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
  84. 84. http://pralab.diee.unica.it Adversarial Examples against iCub – Gradient Computation ∇fi (x) = ∂fi(z) ∂z ∂z ∂x f1 f2 fi fc ... ... 84 The given optimization problems can be both solved with gradient-based algorithms The gradient of the objective can be computed using the chain rule 1. the gradient of the functions fi(z) can be computed if the chosen classifier is differentiable 2. ... and then backpropagated through the deep network with automatic differentiation
  85. 85. http://pralab.diee.unica.it Example of Adversarial Images against iCub An adversarial example from class laundry-detergent, modified by the proposed algorithm to be misclassified as cup 85[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
  86. 86. http://pralab.diee.unica.it The “Sticker” Attack against iCub Adversarial example generated by manipulating only a specific region, to simulate a sticker that could be applied to the real-world object. This image is classified as cup. 86[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
  87. 87. 87 Countering Evasion Attacks What is the rule? The rule is protect yourself at all times (from the movie “Million dollar baby”, 2004)
  88. 88. http://pralab.diee.unica.it Security Measures against Evasion Attacks 1. Reduce sensitivity to input changes with robust optimization – Adversarial Training / Regularization 2. Introduce rejection / detection of adversarial examples 88 min $ ∑& max ||*+||,- ℓ(0&, 2$ 3& + *& ) bounded perturbation! 1 0 1 1 0 1 SVM-RBF (higher rejection rate) 1 0 1 1 0 1 SVM-RBF (no reject)
  89. 89. 89 Countering Evasion: Reducing Sensitivity to Input Changes with Robust Optimization
  90. 90. http://pralab.diee.unica.it • Robust optimization (a.k.a. adversarial training) • Robustness and regularization (Xu et al., JMLR 2009) – under linearity of ℓ and "#, equivalent to robust optimization Reducing Input Sensitivity via Robust Optimization min # max ||*+||,-. ∑0 ℓ 10, "# 30 + *0 bounded perturbation! 90 min # ∑5 ℓ 10, "# 30 + 6||73"||8 dual norm of the perturbation ||73"||8 = ||#||8
  91. 91. http://pralab.diee.unica.it Experiments on Android Malware • Infinity-norm regularization is the optimal regularizer against sparse evasion attacks – Sparse evasion attacks penalize | " |# promoting the manipulation of only few features Results on Adversarial Android Malware [Demontis, Biggio et al., Yes, ML Can Be More Secure!..., IEEE TDSC 2017] Absolute weight values |$| in descending order Why? It bounds the maximum weight absolute values! min w,b w ∞ +C max 0,1− yi f (xi )( ) i ∑ , w ∞ = max i=1,...,d wiSec-SVM 91
  92. 92. http://pralab.diee.unica.it Adversarial Training and Regularization • Adversarial training can also be seen as a form of regularization, which penalizes the (dual) norm of the input gradients ! |#$ℓ |& • Known as double backprop or gradient/Jacobian regularization – see, e.g., Simon-Gabriel et al., Adversarial vulnerability of neural networks increases with input dimension, ArXiv 2018; and Lyu et al., A unified gradient regularization family for adversarial examples, ICDM 2015. 92 ' g(') '’ with adversarial trainingTake-home message: the net effect of these techniques is to make the prediction function of the classifier smoother
  93. 93. http://pralab.diee.unica.it Ineffective Defenses: Obfuscated Gradients • Work by Carlini & Wagner (SP’ 17) and Athalye et al. (ICML ‘18) has shown that – some recently-proposed defenses rely on obfuscated / masked gradients, and – they can be circumvented 93 g(") "’" Obfuscated gradients do not allow the correct execution of gradient-based attacks... " g(") "’ ... but substitute models and/or smoothing can correctly reveal meaningful input gradients!
  94. 94. 94 Countering Evasion: Detecting & Rejecting Adversarial Examples
  95. 95. http://pralab.diee.unica.it Detecting & Rejecting Adversarial Examples • Adversarial examples tend to occur in blind spots – Regions far from training data that are anyway assigned to ‘legitimate’ classes 95 blind-spot evasion (not even required to mimic the target class) rejection of adversarial examples through enclosing of legitimate classes
  96. 96. http://pralab.diee.unica.it Detecting & Rejecting Adversarial Examples input perturbation (Euclidean distance) 96[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
  97. 97. http://pralab.diee.unica.it [S. Sabour at al., ICLR 2016] Why Rejection (in Representation Space) Is Not Enough? 97
  98. 98. 98 Poisoning Machine Learning
  99. 99. http://pralab.diee.unica.it Poisoning Machine Learning 99 x xx x x x x x x x x x x xxx x x1 x2 ... xd pre-processing and feature extraction training data (with labels) classifier learning start bang portfolio winner year ... university campus Start 2007 with a bang! Make WBFS YOUR PORTFOLIO’s first winner of the year ... start bang portfolio winner year ... university campus 1 1 1 1 1 ... 0 0 xSPAM start bang portfolio winner year ... university campus +2 +1 +1 +1 +1 ... -3 -4 w x x x x xx x x x x x x x x xx x classifier generalizes well on test data
  100. 100. http://pralab.diee.unica.it Poisoning Machine Learning 100 x xx x x x x x x x x x x xxx x x1 x2 ... xd pre-processing and feature extraction corrupted training data classifier learning is compromised... Start 2007 with a bang! Make WBFS YOUR PORTFOLIO’s first winner of the year ... university campus... start bang portfolio winner year ... university campus 1 1 1 1 1 ... 1 1 xSPAM start bang portfolio winner year ... university campus +2 +1 +1 +1 +1 ... +1 +1 w x x x x xx x x x x x x x x xx x ... to maximize error on test data xx x poisoning data
  101. 101. http://pralab.diee.unica.it • Goal: to maximize classification error • Knowledge: perfect / white-box attack • Capability: injecting poisoning samples into TR • Strategy: find an optimal attack point xc in TR that maximizes classification error xc classification error = 0.039classification error = 0.022 Poisoning Attacks against Machine Learning xc classification error as a function of xc [Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012] 101
  102. 102. http://pralab.diee.unica.it Poisoning is a Bilevel Optimization Problem • Attacker’s objective – to maximize generalization error on untainted data, w.r.t. poisoning point xc • Poisoning problem against (linear) SVMs: Loss estimated on validation data (no attack points!) Algorithm is trained on surrogate data (including the attack point) [Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012] [Xiao, Biggio, Roli et al., Is feature selection secure against training data poisoning? ICML, 2015] [Munoz-Gonzalez, Biggio, Roli et al., Towards poisoning of deep learning..., AISec 2017] max $% & '()*, ,∗ s. t. ,∗ = argmin6 ℒ '89 ∪ ;<, =< , , max $% > ?@A B max(0,1 − =?,∗ ;? ) s. t. ,∗ = argminH,I A J HK H + C ∑O@A P max(0,1 − =O, ;O ) + C max(0,1 − =<, ;< ) 102
  103. 103. http://pralab.diee.unica.it xc (0) xc Gradient-based Poisoning Attacks • Gradient is not easy to compute – The training point affects the classification function • Trick: – Replace the inner learning problem with its equilibrium (KKT) conditions – This enables computing gradient in closed form • Example for (kernelized) SVM – similar derivation for Ridge, LASSO, Logistic Regression, etc. 103 xc (0) xc [Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012] [Xiao, Biggio, Roli et al., Is feature selection secure against training data poisoning? ICML, 2015]
  104. 104. http://pralab.diee.unica.it Experiments on MNIST digits Single-point attack • Linear SVM; 784 features; TR: 100; VAL: 500; TS: about 2000 – ‘0’ is the malicious (attacking) class – ‘4’ is the legitimate (attacked) one xc (0) xc 104[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]
  105. 105. http://pralab.diee.unica.it Experiments on MNIST digits Multiple-point attack • Linear SVM; 784 features; TR: 100; VAL: 500; TS: about 2000 – ‘0’ is the malicious (attacking) class – ‘4’ is the legitimate (attacked) one 105[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]
  106. 106. http://pralab.diee.unica.it How about Poisoning Deep Nets? • ICML 2017 Best Paper by Koh et al., “Understanding black-box predictions via Influence Functions” has derived adversarial training examples against a DNN – they have been constructed attacking only the last layer (KKT-based attack against logistic regression) and assuming the rest of the network to be ”frozen” 106
  107. 107. http://pralab.diee.unica.it Towards Poisoning Deep Neural Networks • Solving the poisoning problem without exploiting KKT conditions (back-gradient) – Muñoz-González, Biggio, Roli et al., AISec 2017 https://arxiv.org/abs/1708.08689 107
  108. 108. 108 Countering Poisoning Attacks What is the rule? The rule is protect yourself at all times (from the movie “Million dollar baby”, 2004)
  109. 109. http://pralab.diee.unica.it Security Measures against Poisoning • Rationale: poisoning injects outlying training samples • Two main strategies for countering this threat 1. Data sanitization: remove poisoning samples from training data • Bagging for fighting poisoning attacks • Reject-On-Negative-Impact (RONI) defense 2. Robust Learning: learning algorithms that are robust in the presence of poisoning samples xc (0) xc xc (0) xc 109
  110. 110. http://pralab.diee.unica.it Robust Regression with TRIM • TRIM learns the model by retaining only training points with the smallest residuals argmin ',),* + ,, -, . = 1 |.| 2 3∈* 5 63 − 83 9 + ;Ω(>) @ = 1 + A B, . ⊂ 1, … , @ , . = B [Jagielski, Biggio et al., IEEE Symp. Security and Privacy, 2018] 110
  111. 111. http://pralab.diee.unica.it Experiments with TRIM (Loan Dataset) • TRIM MSE is within 1% of original model MSE Existing methods Our defense No defense Better defense 111[Jagielski, Biggio et al., IEEE Symp. Security and Privacy, 2018]
  112. 112. 112 Other Attacks against ML
  113. 113. http://pralab.diee.unica.it Attacks against Machine Learning Integrity Availability Privacy / Confidentiality Test data Evasion (a.k.a. adversarial examples) - Model extraction / stealing Model inversion (hill-climbing) Membership inference attacks Training data Poisoning (to allow subsequent intrusions) – e.g., backdoors or neural network trojans Poisoning (to maximize classification error) - [Biggio & Roli, Wild Patterns, 2018 https://arxiv.org/abs/1712.03141] Misclassifications that do not compromise normal system operation Misclassifications that compromise normal system operation Attacker’s Goal Attacker’s Capability Querying strategies that reveal confidential information on the learning model or its users Attacker’s Knowledge: • perfect-knowledge (PK) white-box attacks • limited-knowledge (LK) black-box attacks (transferability with surrogate/substitute learning models) 113
  114. 114. http://pralab.diee.unica.it Model Inversion Attacks Privacy Attacks • Goal: to extract users’ sensitive information (e.g., face templates stored during user enrollment) – Fredrikson, Jha, Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. ACM CCS, 2015 • Also known as hill-climbing attacks in the biometric community – Adler. Vulnerabilities in biometric encryption systems. 5th Int’l Conf. AVBPA, 2005 – Galbally, McCool, Fierrez, Marcel, Ortega-Garcia. On the vulnerability of face verification systems to hill-climbing attacks. Patt. Rec., 2010 • How: by repeatedly querying the target system and adjusting the input sample to maximize its output score (e.g., a measure of the similarity of the input sample with the user templates) 114 Reconstructed Image Training Image
  115. 115. http://pralab.diee.unica.it Membership Inference Attacks Privacy Attacks (Shokri et al., IEEE Symp. SP 2017) • Goal: to identify whether an input sample is part of the training set used to learn a deep neural network based on the observed prediction scores for each class 115
  116. 116. http://pralab.diee.unica.it Training data (poisoned) Backdoored stop sign (labeled as speedlimit) Backdoor Attacks Poisoning Integrity Attacks 116 Backdoor / poisoning integrity attacks place mislabeled training points in a region of the feature space far from the rest of training data. The learning algorithm labels such region as desired, allowing for subsequent intrusions / misclassifications at test time Training data (no poisoning) T. Gu, B. Dolan-Gavitt, and S. Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. In NIPS Workshop on Machine Learning and Computer Security, 2017. X. Chen, C. Liu, B. Li, K. Lu, and D. Song. Targeted backdoor attacks on deep learning systems using data poisoning. ArXiv e-prints, 2017. M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proc. ACM Symp. Information, Computer and Comm. Sec., ASIACCS ’06, pages 16–25, New York, NY, USA, 2006. ACM. M. Barreno, B. Nelson, A. Joseph, and J. Tygar. The security of machine learning. Machine Learning, 81:121–148, 2010. B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support vector machines. In J. Langford and J. Pineau, editors, 29th Int’l Conf. on Machine Learning, pages 1807–1814. Omnipress, 2012. B. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classifiers under attack. IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, April 2014. H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli. Is feature selection secure against training data poisoning? In F. Bach and D. Blei, editors, JMLR W&CP - Proc. 32nd Int’l Conf. Mach. Learning (ICML), volume 37, pages 1689–1698, 2015. L. Munoz-Gonzalez, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli. Towards poisoning of deep learning algorithms with back-gradient optimization. In 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17, pp. 27–38, 2017. ACM. B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial machine learning. ArXiv e-prints, 2018. M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 39th IEEE Symp. on Security and Privacy, 2018. Attackreferred toasbackdoor Attackreferredtoas ‘poisoningintegrity’
  117. 117. Pentesting of Machine Learning Battista Biggio https://www.pluribus-one.it/research/sec-ml/wild-patterns/ Pattern Recognition and Applications Lab University of Cagliari, Italy ITASEC 2019, Italian Conference on Cybersecurity, February 14, Pisa, Italy
  118. 118. http://pralab.diee.unica.it secml: An open source Python library for ML Security 118 adv ml expl others - ML algorithms via sklearn - DL algorithms and optimizers via PyTorch and Tensorflow - attacks (evasion, poisoning, ...) with custom/faster solvers - defenses (advx rejection, adversarial training, ...) - Explanation methods based on influential features - Explanation methods based on influential prototypes - Parallel computation - Support for dense/sparse data - Advanced plotting functions (via matplotlib) - Modular and easy to extend First release scheduled on April 2019! Marco Melis Ambra Demontis
  119. 119. http://pralab.diee.unica.it ML Security Evaluation Security Evaluation (Attack Simulation) attack model and parameters Security Risk Level (numeric value) machine-learning model secml data 119 e.g., evasion attacks – modify samples of class +1 with l1 perturbation, for eps in {0, 5, 10, 50} e.g., poisoning attacks – inject samples of classes +1/-1 in the training set, with % of poisoning points in {0, 2, 5, 10, 20}
  120. 120. http://pralab.diee.unica.it Security Evaluation Curves • Security evaluation curves – accuracy vs increasing perturbation • Security value: – mean accuracy under attack • Security level: – Low / Med / High 120
  121. 121. http://pralab.diee.unica.it Security Evaluation Curves 121
  122. 122. 122 Interactive Demos Adversarial Examples: https://sec-ml.pluribus-one.it/demo/ Security Evaluation: https://advx-secml.pluribus-one.it
  123. 123. http://pralab.diee.unica.it Thanks for Listening! Any questions? 123 Engineering isn't about perfect solutions; it's about doing the best you can with limited resources (Randy Pausch, 1960-2008)

×