Report

Share

Follow

•0 likes•717 views

•0 likes•717 views

Report

Share

Download to read offline

Authors: Ambra Demontis, Paolo Russu, Battista Biggio, Giorgio Fumera and Fabio Roli. Presentation given at S+SSPR 2016

Follow

- 1. Pattern Recognition and Applications Lab University of Cagliari, Italy Department of Electrical and Electronic Engineering On Security and Sparsity of Linear Classifiers for Adversarial Settings Ambra Demontis, Paolo Russu, Battista Biggio, Giorgio Fumera, Fabio Roli battista.biggio@diee.unica.it Dept. Of Electrical and Electronic Engineering University of Cagliari, Italy S+SSPR, Merida, Mexico, Dec. 1 2016
- 2. http://pralab.diee.unica.it Recent Applications of Machine Learning • Consumer technologies for personal applications 2
- 3. http://pralab.diee.unica.it iPhone 5s with Fingerprint Recognition… 3
- 4. http://pralab.diee.unica.it … Cracked a Few Days After Its Release 4 EU FP7 Project: TABULA RASA
- 5. http://pralab.diee.unica.it New Challenges for Machine Learning • The use of machine learning opens up new big possibilities but also new security risks • Proliferation and sophistication of attacks and cyberthreats – Skilled / economically-motivated attackers (e.g., ransomware) • Several security systems use machine learning to detect attacks – but … is machine learning secure enough? 5
- 7. http://pralab.diee.unica.it Is Machine Learning Secure Enough? • Problem: how to evade a linear (trained) classifier? Start 2007 with a bang! Make WBFS YOUR PORTFOLIO’s first winner of the year ... start bang portfolio winner year ... university campus 1 1 1 1 1 ... 0 0 +6 > 0, SPAM (correctly classified) f (x) = sign(wT x) x start bang portfolio winner year ... university campus +2 +1 +1 +1 +1 ... -3 -4 w x’ St4rt 2007 with a b4ng! Make WBFS YOUR PORTFOLIO’s first winner of the year ... campus start bang portfolio winner year ... university campus 0 0 1 1 1 ... 0 1 +3 -4 < 0, HAM (misclassified email) f (x) = sign(wT x) 7
- 8. http://pralab.diee.unica.it Evasion of Linear Classifiers • Formalized as an optimization problem – Goal: to minimize the discriminant function • i.e., to be classified as legitimate with the maximum confidence – Constraints on input data manipulation • e.g., number of words to be modified in each spam email 8 min$% 𝑤( 𝑥′ 𝑠. 𝑡. 𝑑(𝑥, 𝑥% ) ≤ 𝑑34$
- 9. http://pralab.diee.unica.it Dense and Sparse Evasion Attacks • L2-norm noise corresponds to dense evasion attacks – All features are modified by a small amount • L1-norm noise corresponds to sparse evasion attacks – Few features are significantly modified 9 min$% 𝑤( 𝑥′ 𝑠. 𝑡. |𝑥 − 𝑥% |7 7 ≤ 𝑑34$ min$% 𝑤( 𝑥% 𝑠. 𝑡. |𝑥 − 𝑥% |8 ≤ 𝑑34$
- 10. http://pralab.diee.unica.it Examples on Handwritten Digits (9 vs 8) 10 original sample 5 10 15 20 25 5 10 15 20 25 SVM g(x)= −0.216 5 10 15 20 25 5 10 15 20 25 Sparse evasion attacks (l1-norm constrained) original sample 5 10 15 20 25 5 10 15 20 25 cSVM g(x)= 0.242 5 10 15 20 25 5 10 15 20 25 Dense evasion attacks (l2-norm constrained) manipulated sample manipulated sample
- 12. http://pralab.diee.unica.it • SVM learning is equivalent to a robust optimization problem Robustness and Regularization [Xu et al., JMLR 2009] 12 min w,b 1 2 wT w+C max 0,1− yi f (xi )( ) i ∑ min w,b max ui∈U max 0,1− yi f (xi +ui )( ) i ∑ 1/margin classification error on training data (hinge loss) bounded perturbation!
- 13. http://pralab.diee.unica.it Generalizing to Other Norms • Optimal regularizer should use dual norm of noise uncertainty sets 13 l2-norm regularization is optimal against l2-norm noise! Infinity-norm regularization is optimal against l1-norm noise! min w,b 1 2 wT w+C max 0,1− yi f (xi )( ) i ∑ min w,b w ∞ +C max 0,1− yi f (xi )( ) i ∑ , w ∞ = max i=1,...,d wi
- 14. http://pralab.diee.unica.it Interesting Fact • Infinity-norm SVM is more secure against L1 attacks as it bounds the maximum absolute value of the feature weights • This explains the heuristic intuition of using more uniform feature weights in previous work [Kolcz and Teo, 2009; Biggio et al., 2010] 14 weights weights
- 15. http://pralab.diee.unica.it Security and Sparsity of Linear Classifiers 15
- 16. http://pralab.diee.unica.it Security vs Sparsity • Problem: SVM and Infinity-norm SVM provide dense solutions! • Trade-off between security (to l2 or l1 attacks) and sparsity – Sparsity reduces computational complexity at test time! 16 weights weights
- 17. http://pralab.diee.unica.it Elastic-Net Regularization [H. Zou & T. Hastie, 2005] • Originally proposed for feature selection – to group correlated features together • Trade-off between sparsity and security against l2-norm attacks 17 𝑤 9:;9< = 1 − 𝜆 𝑤 8 + 𝜆 2 𝑤 7 7 elastic net l1 l2
- 18. http://pralab.diee.unica.it Octagonal Regularization • Trade-off between sparsity and security against l1-norm attacks 18 𝑤 BCD; = 1 − 𝜌 𝑤 8 + 𝜌 𝑤 F octagonal l1 infinity (max)
- 20. http://pralab.diee.unica.it Linear Classifiers • SVM – quadratic prog. • Infinity-norm SVM – linear prog. • 1-norm SVM – linear prog. • Elastic-net SVM – quadratic prog. • Octagonal SVM – linear prog. 20 min G,H 1 2 𝑤 7 7 + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O ; OQ8 min G,H 𝑤 F + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O ; OQ8 min G,H 𝑤 8 + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O ; OQ8 min G,H 1 − 𝜆 𝑤 8 + 𝜆 2 𝑤 7 7 + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O ; OQ8 min G,H 1 − 𝜌 𝑤 8 + 𝜌 𝑤 F + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O ; OQ8 𝑓 𝑥 = 𝑤( 𝑥 + 𝑏
- 21. http://pralab.diee.unica.it Security and Sparsity Measures • Sparsity – Fraction of weights equal to zero • Security (Weight Evenness) – E=1/d if only one weight is different from zero – E=1 if all weights are equal in absolute value • Parameter selection with 5-fold cross-validation optimizing: AUC + 0.1 S + 0.1 E 21 𝑆 = 1 𝑑 𝑤T|𝑤T = 0, 𝑘 = 1, … , 𝑑 𝐸 = 1 𝑑 𝑤 8 𝑤 F ∈ [ 1 𝑑 , 1]
- 22. http://pralab.diee.unica.it Results on Spam Filtering Sparse Evasion Attack • 5000 samples from TREC 07 (spam/ham emails) • 200 features (words) selected to maximize information gain • Results averaged on 5 repetitions, using 500 TR/TS samples • (S,E) measures reported in the legend (in %) 22 0 10 20 30 40 0 0.2 0.4 0.6 0.8 1 Spam Filtering AUC10% d max SVM (0, 37) ∞−norm (4, 96) 1−norm (86, 4) el−net (67, 6) 8gon (12, 88) maximum number of words modified in each spam
- 23. http://pralab.diee.unica.it Results on PDF Malware Detection Sparse Evasion Attack • PDF: hierarchy of interconnected objects (keyword/value pairs) 23 0 20 40 60 80 0 0.2 0.4 0.6 0.8 1 PDF Malware DetectionAUC10% d max SVM (0, 47) ∞−norm (0, 100) 1−norm (91, 2) el−net (55, 13) 8gon (69, 29) maximum number of keywords added in each malicious PDF file /Type 2 /Page 1 /Encoding 1 … 13 0 obj << /Kids [ 1 0 R 11 0 R ] /Type /Page ... >> end obj 17 0 obj << /Type /Encoding ...>> endobj Features: keyword count 11,500 samples 5 reps - 500 TR/TS samples 114 features (keywords) selected with information gain
- 24. http://pralab.diee.unica.it Conclusions and Future Work • We have shed light on the theoretical and practical implications of sparsity and security in linear classifiers • We have defined a novel regularizer to tune the trade-off between sparsity and security against sparse evasion attacks • Future work – To investigate a similar trade-off for • poisoning (training) attacks • nonlinear classifiers 24
- 25. http://pralab.diee.unica.it ?Any questions Thanks for your attention! 26
- 26. http://pralab.diee.unica.it Limited-Knowledge (LK) attacks 26 PD(X,Y)data Surrogate training data f(x) Send queries Get labels Learn surrogate classifier f’(x)