Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Secure Kernel Machines against Evasion Attacks

433 views

Published on

Authors: Paolo Russu, Ambra Demontis, Battista Biggio, Giorgio Fumera, and Fabio Roli (University of Cagliari, Italy).

Talk by Battista Biggio at AISec '16, co-located with CCS '16 in Vienna, Oct. 28 2016.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Secure Kernel Machines against Evasion Attacks

  1. 1. Pattern Recognition and Applications Lab University of Cagliari, Italy Department of Electrical and Electronic Engineering Secure Kernel Machines against Evasion Attacks Paolo Russu, Ambra Demontis, Battista Biggio, Giorgio Fumera, Fabio Roli battista.biggio@diee.unica.it Dept. Of Electrical and Electronic Engineering University of Cagliari, Italy AISec 2016 – Vienna, Austria – Oct., 28th 2016
  2. 2. http://pralab.diee.unica.it Recent Applications of Machine Learning • Consumer technologies for personal applications 2
  3. 3. http://pralab.diee.unica.it iPhone 5s with Fingerprint Recognition… 3
  4. 4. http://pralab.diee.unica.it … Cracked a Few Days After Its Release 4 EU FP7 Project: TABULA RASA
  5. 5. http://pralab.diee.unica.it New Challenges for Machine Learning • The use of machine learning opens up new big possibilities but also new security risks • Proliferation and sophistication of attacks and cyberthreats – Skilled / economically-motivated attackers (e.g., ransomware) • Several security systems use machine learning to detect attacks – but … is machine learning secure enough? 5
  6. 6. http://pralab.diee.unica.it Is Machine Learning Secure Enough? • Problem: how to evade a linear (trained) classifier? Start 2007 with a bang! Make WBFS YOUR PORTFOLIO’s first winner of the year ... start bang portfolio winner year ... university campus 1 1 1 1 1 ... 0 0 +6 > 0, SPAM (correctly classified) f (x) = sign(wT x) x start bang portfolio winner year ... university campus +2 +1 +1 +1 +1 ... -3 -4 w x’ St4rt 2007 with a b4ng! Make WBFS YOUR PORTFOLIO’s first winner of the year ... campus start bang portfolio winner year ... university campus 0 0 1 1 1 ... 0 1 +3 -4 < 0, HAM (misclassified email) f (x) = sign(wT x) 6 But… what if the classifier is non-linear?
  7. 7. http://pralab.diee.unica.it Gradient-based Evasion • Goal: maximum-confidence evasion • Attack strategy: • Non-linear, constrained optimization – Gradient descent: approximate solution for smooth functions • Gradients of g(x) can be analytically computed in many cases – SVMs, Neural networks −2−1.5−1−0.500.51 x f (x) = sign g(x)( )= +1, malicious −1, legitimate " # $ %$ min x' g(x') s.t. d(x, x') ≤ dmax x ' 7 d(x, !x ) ≤ dmax Feasible domain [Biggio et al., ECML 2013]
  8. 8. http://pralab.diee.unica.it Computing Descent Directions Support vector machines Neural networks x1 xd d1 dk dm xf g(x) w1 wk wm v11 vmd vk1 …… …… g(x) = αi yik(x, i ∑ xi )+ b, ∇g(x) = αi yi∇k(x, xi ) i ∑ g(x) = 1+exp − wkδk (x) k=1 m ∑ # $ % & ' ( ) * + , - . −1 ∂g(x) ∂xf = g(x) 1− g(x)( ) wkδk (x) 1−δk (x)( )vkf k=1 m ∑ RBF kernel gradient: ∇k(x,xi ) = −2γ exp −γ || x − xi ||2 { }(x − xi ) 8 But… what if the classifier is non-differentiable? [Biggio et al., ECML 2013]
  9. 9. http://pralab.diee.unica.it Evasion of Non-differentiable Classifiers 9 PD(X,Y)data Surrogate training data f(x) Send queries Get labels Learn surrogate classifier f’(x)
  10. 10. http://pralab.diee.unica.it Dense and Sparse Evasion Attacks • L2-norm noise corresponds to dense evasion attacks – All features are modified by a small amount • L1-norm noise corresponds to sparse evasion attacks – Few features are significantly modified 10 min$% 𝑔 𝑥% 𝑠. 𝑡. |𝑥 − 𝑥% |- - ≤ 𝑑01$ min$% 𝑔 𝑥% 𝑠. 𝑡. |𝑥 − 𝑥% |2 ≤ 𝑑01$
  11. 11. http://pralab.diee.unica.it Goal of This Work • Secure learning against evasion attacks exploits game- theoretical models, robust optimization, multiple classifiers, adversarial training, etc. • Practical adoption of current secure learning algorithms is hindered by several factors: – strong theoretical requirements – complexity of implementation – scalability issues (computational time and space for training) 11 Our goal: to develop secure kernel machines that are not computationally more demanding than their non-secure counterparts
  12. 12. http://pralab.diee.unica.it Security of Linear Classifiers 12
  13. 13. http://pralab.diee.unica.it Secure Linear Classifiers • Intuition in previous work on spam filtering [Kolcz and Teo, CEAS 2007; Biggio et al., IJMLC 2010] – the attacker aims to modify few features – features assigned to highest absolute weights are modified first – heuristic methods to design secure linear classifiers with more evenly- distributed weights • We know now that the aforementioned attack is sparse – l1-norm constrained 13 Then, what does more evenly-distributed weights mean from a more theoretical perspective?
  14. 14. http://pralab.diee.unica.it Robustness and Regularization [Xu et al., JMLR 2009] • SVM learning is equivalent to a robust optimization problem – regularization depends on the noise on training data! 14 min w,b 1 2 wT w+C max 0,1− yi f (xi )( ) i ∑ min w,b max ui∈U max 0,1− yi f (xi +ui )( ) i ∑ l2-norm regularization is optimal against l2-norm noise! infinity-norm regularization is optimal against l1-norm noise!
  15. 15. http://pralab.diee.unica.it Infinity-norm SVM (I-SVM) • Infinity-norm regularizer optimal against sparse evasion attacks 15 min w,b w ∞ +C max 0,1− yi f (xi )( ) i ∑ , w ∞ = max i=1,...,d wi weights weights
  16. 16. http://pralab.diee.unica.it Cost-sensitive Learning • Unbalancing cost of classification errors to account for different levels of noise over the training classes [Katsumada and Takeda, AISTATS ‘15] • Evasion attacks: higher amount of noise on malicious data 16
  17. 17. http://pralab.diee.unica.it Experiments on MNIST Handwritten Digits 17 • 8 vs 9, 28x28 images (784 features – grey-level pixel values) • 500 training samples, 500 test samples, 5 repetitions • Parameter tuning (max. detection rate at 1% FP) 0 200 400 600 0 0.2 0.4 0.6 0.8 1 Handwritten digits (dense attack) TPatFP=1% d max SVM cSVM I−SVM cI−SVM 0 2000 4000 6000 0 0.2 0.4 0.6 0.8 1 Handwritten digits (sparse attack) TPatFP=1% d max SVM cSVM I−SVM cI−SVM vs
  18. 18. http://pralab.diee.unica.it Examples of Manipulated MNIST Digits 18 original sample 5 10 15 20 25 5 10 15 20 25 SVM g(x)= −0.216 5 10 15 20 25 5 10 15 20 25 I−SVM g(x)= 0.112 5 10 15 20 25 5 10 15 20 25 cI−SVM g(x)= 0.148 5 10 15 20 25 5 10 15 20 25 cSVM g(x)= −0.158 5 10 15 20 25 5 10 15 20 25 Sparse evasion attacks (l1-norm constrained) original sample 5 10 15 20 25 5 10 15 20 25 cI−SVM g(x)= −0.018 5 10 15 20 25 5 10 15 20 25 I−SVM g(x)= −0.163 5 10 15 20 25 5 10 15 20 25 cSVM g(x)= 0.242 5 10 15 20 25 5 10 15 20 25 SVM g(x)= 0.213 5 10 15 20 25 5 10 15 20 25 Dense evasion attacks (l2-norm constrained)
  19. 19. http://pralab.diee.unica.it Experiments on Spam Filtering • 5000 samples from TREC 07 (spam/ham emails) • 200 features (words) selected to maximize information gain • Parameter tuning (max. detection rate at 1% FP) • Results averaged on 5 repetitions 19 0 5 10 15 0 0.2 0.4 0.6 0.8 1 Spam filtering (sparse attack) TPatFP=1% d max SVM cSVM I−SVM cI−SVM
  20. 20. http://pralab.diee.unica.it Security of Non-linear Classifiers 20
  21. 21. http://pralab.diee.unica.it Secure Nonlinear Classifiers (Intuition) 21
  22. 22. http://pralab.diee.unica.it Secure Nonlinear Classifiers (Intuition) 22
  23. 23. http://pralab.diee.unica.it Secure Nonlinear Classifiers (Intuition) 22
  24. 24. http://pralab.diee.unica.it Secure Nonlinear Classifiers (Intuition) 22
  25. 25. http://pralab.diee.unica.it Secure Nonlinear Classifiers (Intuition) 22
  26. 26. http://pralab.diee.unica.it Secure Kernel Machines • Key Idea: to better enclose benign data (eliminate blind spots) – Adversarial Training / Game-theoretical models • We can achieve a similar effect by properly modifying the SVM parameters (classification costs and kernel parameters) 23 Standard SVM Cost-sensitive Learning Kernel Modification
  27. 27. http://pralab.diee.unica.it • Lux0R [Corona et al., AISec ‘14] • Adversary’s capability – adding up to dmax API calls – removing API calls may compromise the embedded malware code classifier benign malicious API reference extraction API reference selection learning-based model runtime analysis known label JavaScript API references Suspicious references Experiments on PDF Malware Detection min x' g(x') s.t. d(x, x') ≤ dmax x ≤ x' 24 eval isNaN this.getURL ...
  28. 28. http://pralab.diee.unica.it Experiments on PDF Malware Detection • Lux0R Data: Benign / Malicious PDF files with Javascript – 5,000 training samples, 5,000 test samples, 5 repetitions – 100 API calls selected from training data • Detection rate (TP) at FP=1% vs max. number of added API calls 25 number of added API calls
  29. 29. http://pralab.diee.unica.it Conclusions and Future Work • Classifier security can be significantly improved by properly tuning classifier parameters – regularization terms – cost-sensitive learning, kernel parameters Future Work • Security / complexity comparison against current adversarial approaches – Adversarial Training / Game-theoretical models • More theoretical insights on classifier / feature vulnerability 26
  30. 30. http://pralab.diee.unica.it ?Any questions Thanks for your attention! 27

×