Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Center for Bioinformatics Tübingen       Understanding the Risk Factors of      Learning in Adversarial Environments      ...
Robust Classification for Security Applications   • Machine Learning can be used in security apps         • E.g., spam fil...
Background: Robust Estimation   • Core idea of Robustness: small     perturbations should have small     impact on estimat...
Background: Robust Estimation   • Core idea of Robustness: small     perturbations should have small     impact on estimat...
Background Influence Function   • Influence function (IF) is the response of     estimator to infinitesimal contamination ...
Extending to Classifiers   • Influence Function approach extends to     regression, statistical testing, & other settings ...
Problem with IF Approach   • IF approach intuitive but has strange     implications:         • Every classifier is robust ...
Problem with IF Approach   • IF approach intuitive but has strange     implications:         • Every classifier is robust ...
Rotational Intuition   • For hyperplanes, robustness should capture a     notion of rotational invariance under     contam...
Empirical Risk Framework   • Learners seek to minimize risk (ie, avg. loss)          min Ε D ~P [ RP ( f )]         Rp ( f...
Empirical Risk Framework   • Approximation error due to limited hyp. space, F                                     apprx   ...
Empirical Risk Framework   • Estimation error due to limited dataset, D                                     est   Ε P [ RP...
Empirical Risk Framework   • Modeling contamination gives notion of stability:                                            ...
Empirical Risk Framework   • Expected risk decomposed into 3 components:                                      ˆ           ...
Bounding Robustness Component   • For classifiers, we consider the 0-1 loss   • Classifiers are of form:                  ...
Distributional Independence   • Problem: bound depends on distribution of X…                                              ...
Distributional Independence   • Problem: bound depends on distribution of X…                              rbst            ...
Distributional Independence   • Problem: bound depends on distribution of X…                        Measure against unifor...
Case of Hyperplanes   • For hyperplane w (through origin), uniform     measure yields expected angular change:            ...
Discussion   • Incorporation of rotational stability is needed for     robust classification   • Feasibility of estimation...
References   1)    M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The Security         of Machine Learning. MLJ, 81...
B. Nelson, B. Biggio and P. Laskov   22
Bound DerivationsB. Nelson, B. Biggio and P. Laskov   23
Bound based on triangle inequality:                                 ( x, y )             ( x, z )            ( z, y)      ...
Classifiers of form:                      f ( x)        2Ι[ g ( x)                  0] 1  * Product of pair given by g1 an...
Models of Adversary Capabilities   • Outlier Injection         • Adversary arbitrarily alters some data (fixed size)   • D...
B. Nelson, B. Biggio and P. Laskov   27
B. Nelson, B. Biggio and P. Laskov   28
Classical Risk Minimization   • Learners seek to minimize risk (ie, avg. loss)              min Ε D ~P RP f        Rp f N ...
Upcoming SlideShare
Loading in …5
×

Understanding the risk factors of learning in adversarial environments

698 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Understanding the risk factors of learning in adversarial environments

  1. 1. Center for Bioinformatics Tübingen Understanding the Risk Factors of Learning in Adversarial Environments Blaine Nelson1, Battista Biggio2 and Pavel Laskov1 (1) Cognitive Systems Group Reactive Wilhelm Schickard Institute for Computer Science Security University of Tuebingen, Germany PRA group (2) Pattern Recognition and Applications Group Department of Electrical and Electronic Engineering (DIEE) University of Cagliari, Italy Computer Science Department Computer Architecture Prof. Zell
  2. 2. Robust Classification for Security Applications • Machine Learning can be used in security apps • E.g., spam filtering, fraud/intrusion detection • Benefits: adaptability, scalability & sound inference • ML in security domains is susceptible to attacks • Classification relies on stationarity • Learner can be manipulated by adversary • Security requires a sound theoretical foundation to provide assurances against adversaries [1,6]B. Nelson, B. Biggio and P. Laskov 2
  3. 3. Background: Robust Estimation • Core idea of Robustness: small perturbations should have small impact on estimator [5] Mean MedianB. Nelson, B. Biggio and P. Laskov 3
  4. 4. Background: Robust Estimation • Core idea of Robustness: small perturbations should have small impact on estimator [5] Perturbation Mean MedianB. Nelson, B. Biggio and P. Laskov 4
  5. 5. Background Influence Function • Influence function (IF) is the response of estimator to infinitesimal contamination at x [4] IF of mean IF of median • IF shows quantitative effect of contamination • Bounded IF is an indication of robustnessB. Nelson, B. Biggio and P. Laskov 5
  6. 6. Extending to Classifiers • Influence Function approach extends to regression, statistical testing, & other settings • Classification presents challenges: • Classifiers are bounded functions • Robustness must measure change in decision over space • Approach via influence function measures change in classifier’s parameters.B. Nelson, B. Biggio and P. Laskov 6
  7. 7. Problem with IF Approach • IF approach intuitive but has strange implications: • Every classifier is robust if space is bounded • Every classifier with bounded params is robustB. Nelson, B. Biggio and P. Laskov 7
  8. 8. Problem with IF Approach • IF approach intuitive but has strange implications: • Every classifier is robust if space is bounded • Every classifier with bounded params is robust ContaminationB. Nelson, B. Biggio and P. Laskov 8
  9. 9. Rotational Intuition • For hyperplanes, robustness should capture a notion of rotational invariance under contamination • Is their a general principle behind this intuition? • Main Result: we connect this intuition to empirical risk minimization; c.f., [2]B. Nelson, B. Biggio and P. Laskov 9
  10. 10. Empirical Risk Framework • Learners seek to minimize risk (ie, avg. loss) min Ε D ~P [ RP ( f )] Rp ( f ) E ( x , y ) P [ ( y , f ( x))] f F fB. Nelson, B. Biggio and P. Laskov 10
  11. 11. Empirical Risk Framework • Approximation error due to limited hyp. space, F apprx ΕP [ RP ( f † ) RP ( f )] † F f apprx fB. Nelson, B. Biggio and P. Laskov 11
  12. 12. Empirical Risk Framework • Estimation error due to limited dataset, D est Ε P [ RP ( f N ) RP ( f † )] fN F f† est apprx fB. Nelson, B. Biggio and P. Laskov 12
  13. 13. Empirical Risk Framework • Modeling contamination gives notion of stability: ˆ ΕP [ RP ( f N ) RP ( f N )] rbst ˆ fN rbst fN F f† est apprx fB. Nelson, B. Biggio and P. Laskov 13
  14. 14. Empirical Risk Framework • Expected risk decomposed into 3 components: ˆ Ε P [ RP ( f N ) RP ( f )] rbst est apprx ˆ fN rbst fN F f† est apprx fB. Nelson, B. Biggio and P. Laskov 14
  15. 15. Bounding Robustness Component • For classifiers, we consider the 0-1 loss • Classifiers are of form: f ( x) 2 Ι[ g ( x) 0] 1 • i.e. a threshold on a decision function g • Algorithmic stability can thus be bounded: rbst ˆ ΕP [Prx ~P [ g N (x) g N (x) 0]] • Robustness is related to disagreement between decision functions learned for clean and contaminated dataB. Nelson, B. Biggio and P. Laskov 15
  16. 16. Distributional Independence • Problem: bound depends on distribution of X… Support of X Support of X rbst ˆ Ε P [Prx ~P [ g N ( x) g N ( x) 0]] 0B. Nelson, B. Biggio and P. Laskov 16
  17. 17. Distributional Independence • Problem: bound depends on distribution of X… rbst ˆ Ε P [Prx ~P [ g N ( x) g N ( x) 0]] 1B. Nelson, B. Biggio and P. Laskov 17
  18. 18. Distributional Independence • Problem: bound depends on distribution of X… Measure against uniform distribution as measure of change over space! ˆ EP [Prx ~U [ g N (x) g N (x) 0]]B. Nelson, B. Biggio and P. Laskov 18
  19. 19. Case of Hyperplanes • For hyperplane w (through origin), uniform measure yields expected angular change: U 1 1 ˆN wT w N E P cos rbst ˆ wN wN • Result from Dasgupta et al. [3] • Expectation is over datasets and their resulting transformation by adversary • Robustness component is bounded between 0 (no change) and 1 (complete rotation) • This measure gives an intuitive way to compare (linear) learning algorithmsB. Nelson, B. Biggio and P. Laskov 19
  20. 20. Discussion • Incorporation of rotational stability is needed for robust classification • Feasibility of estimation of rbst under realistic contamination • Development of algorithms based on tradeoffs between rbst and other error termsB. Nelson, B. Biggio and P. Laskov 20
  21. 21. References 1) M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The Security of Machine Learning. MLJ, 81(2): 121-148, 2010. 2) L. Bottou and O. Bosquet. The Tradeoffs of Large Sclae Learning. In NIPS, volume 20, pages 161-168, 2008. 3) S. Dasgupta, A. T. Kalai, and C. Monteleoni. Analysis of Perceptron-based Active Learning. JMLR, 10:281-299, 2009. 4) F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. John Wiley and Sons, 1986. 5) P. Huber. Robust Statistics. John Wiley and Sons, 1981. 6) P. Laskov and M. Kloft. A Framework for Quantitative Security Analysis of Machine Learning. In AISec Workshop, pages 1-4, 2009.B. Nelson, B. Biggio and P. Laskov 21
  22. 22. B. Nelson, B. Biggio and P. Laskov 22
  23. 23. Bound DerivationsB. Nelson, B. Biggio and P. Laskov 23
  24. 24. Bound based on triangle inequality: ( x, y ) ( x, z ) ( z, y) ˆ Ε P [ RP ( f N ) RP ( f N )] Ε P [E ( x , y )~ P ( fˆN ( x ), y ) E ( x , y )~ P ( f N ( x), y )] rbst ˆ Ε P [E ( x , y )~ P [ ( f N ( x ), y ) ( f N ( x), y )]] Ε [E ˆ ( f ( x ), f ( x))] P ( x , y )~ P N N 1Bound using alternative 0-1 loss: 0 1 ( x, y ) 2 (1 x y ) ˆ Ε P [E x ~P ( f N ( x), f N ( x))] rbst 1 ˆ (1 Ε P [E x ~P [ f N x f N x ]]) 2B. Nelson, B. Biggio and P. Laskov 24
  25. 25. Classifiers of form: f ( x) 2Ι[ g ( x) 0] 1 * Product of pair given by g1 and g2 f1 ( x ) f 2 ( x) 4 Ι[ g1 ( x) 0]* Ι[ g 2 ( x) 0] 2 ( Ι[ g1 ( x) 0] Ι[ g 2 ( x) 0]) 1 1 g1 ( x ) 0 and g 2 ( x ) 0 4 Ι[ g1 ( x ) 0 and g 2 ( x) 0] 2 ( Ι[ g1 ( x) 0] Ι[ g 2 ( x) 0]) 1 0 if g1 ( x ) 0 and g 2 ( x ) 0 1 if g1 ( x ) 0 xor g 2 ( x ) 0 2 if g1 ( x ) 0 and g 2 ( x ) 0 2 Ι[ g1 ( x ) 0 xor g 2 ( x) 0] 1 2 ( Ι[ ( g1 ( x ) 0 xor g 2 ( x) 0)] 1) 1 2 Ι[ g1 ( x ) g 2 ( x) 0] 1 * Bound becomes: 1 1 ˆ Ε P [E x ~P [ f N ( x) f N ( x)]] rbst 2 2 1 2 1 2 ˆ Ε P [E x ~P 2 I[ g N ( x) g N ( x) 0]] 1 2 ˆ 1 Ε P [Prx ~P [ g N ( x) g N ( x) ˆ 0]] Ε P [Prx ~P [ g N ( x) g N ( x) 0]]B. Nelson, B. Biggio and P. Laskov 25
  26. 26. Models of Adversary Capabilities • Outlier Injection • Adversary arbitrarily alters some data (fixed size) • Data Perturbation • Adversary manipulates all data (limited degree) • Label Flipping • Adversary only changes data labels (fixed #) • Feature-Constrained Changes • Adversary only alters fixed set of featuresB. Nelson, B. Biggio and P. Laskov 26
  27. 27. B. Nelson, B. Biggio and P. Laskov 27
  28. 28. B. Nelson, B. Biggio and P. Laskov 28
  29. 29. Classical Risk Minimization • Learners seek to minimize risk (ie, avg. loss) min Ε D ~P RP f Rp f N E x, y P y, f N x f F • Risk is classically decomposed into 2 components: Ε P RP f N RP f Ε P RP fN RP f † est Ε P RP f † RP f estis estimation error due to finite data approx approx is approximation error from hyp. spaceB. Nelson, B. Biggio and P. Laskov 29

×