Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering

Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems.
However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data.
In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior.
To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms.

  • Be the first to comment

  • Be the first to like this

Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering

  1. 1. Pa#ern Recogni-on and Applica-ons Lab Poisoning Behavioral Malware Clustering University of Cagliari, Italy Department of Electrical and Electronic Engineering Ba#sta Biggio1, Konrad Rieck2, Davide Ariu1, Chris-an Wressnegger2, Igino Corona1, Giorgio Giacinto1, and Fabio Roli1 (1) University of Cagliari (IT) (2) University of GoeLngen (GE) Sco#sdale, Arizona, AISec 2014 US, Nov., 7 2014
  2. 2. Threats and Attacks in Computer Security • Huge number of devices, services and apps on the Internet – Vulnerabilities in code, services, apps, etc. • Attacks through malicious software (malware) – Botnets, spam, identity theft / stolen credit card numbers • Manual analysis and crafting of signatures costly – Need for automated / assisted detection (and rule generation) – Machine learning-based defenses (data clustering) http://pralab.diee.unica.it 2 Evasion: malware families / variants +65% new malware variants from 2012 to 2013 Mobile Adware & Malw. Analysis, Symantec, 2014 Detection: antivirus systems Rule-based systems
  3. 3. Data Clustering for Computer Security • Goal: clustering of malware families to identify common characteristics and design suitable countermeasures • e.g., antivirus rules / signatures http://pralab.diee.unica.it 3 xx x x x x x x x x x x x x x x x x1 x2 ... xd feature extraction (e.g., executed instructions, system calls, etc.) clustering of malware families (e.g., similar program behavior) for each cluster if … then … else … data analysis / countermeasure design (e.g., signature generation) data collection (honeypots) Malware samples
  4. 4. Is Data Clustering Secure? • Attackers can poison input data to subvert malware clustering http://pralab.diee.unica.it 4 x x x x x x x x x x x x x x x x x x1 x2 ... xd feature extraction (e.g., executed instructions, system calls, etc.) Malware samples designed to subvert clustering … is significantly compromised for each cluster if … then … else … … becomes useless (too many false alarms, low detection rate) (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014 data collection (honeypots) clustering of malware families (e.g., similar program behavior) data analysis / countermeasure design (e.g., signature generation)
  5. 5. Is Data Clustering Secure? • Our previous work (1,2): – Framework for security evaluation of clustering algorithms – Formalization of poisoning attacks (optimization) against single- and complete-linkage hierarchical clustering • In this work we focus on a realistic application example on http://pralab.diee.unica.it 5 Poisoning a,acks against a behavioral malware clustering approach (3) Malheur h,p://www.mlsec.org/malheur/ (1) B. Biggio et al.. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering. In S+SSPR 2014 (3) K. Rieck et al.. Automatic analysis of malware behavior using machine learning. JCS 2011
  6. 6. Poisoning Attacks • Goal: to maximally compromise the clustering output on D • Capability: adding m attack samples • Knowledge: perfect / worst-case attack • Attack strategy: x http://pralab.diee.unica.it 6 max A m dc (Y,Y!(A)), A= ai { }i=1 Distance between the clustering in the absence of attack and that under attack Y! = fD(D∪A) x x x x x x x x x x x x x x x Attack samples A Y = f (D) x x x x x x x x x x x x Clustering on untainted data D (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
  7. 7. Poisoning Attacks dc (Y,Y!) = YY T −Y!Y!T http://pralab.diee.unica.it 7 F m , Y = %%%%%% & 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 # $ (((((( , YY T = ' 1 0 0 1 0 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 1 # %%%%%% $ This distance counts how many pairs of samples have been clustered together in one clustering and not in the other, and vice-versa & (((((( ' For a given clustering: Sample 1 … Sample 5 max A dc (Y,Y!(A)), A= ai { }i=1 (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
  8. 8. Single-Linkage Hierarchical Clustering • Bottom-up agglomerative clustering – each point is initially considered as a cluster – closest clusters are iteratively merged • Linkage criterion to define distance between clusters – single-linkage criterion x x • Clustering output is a hierarchy of clusterings – Criterion needed to select a given clustering (e.g., number of clusters) – Cutoff threshold on the maximum intra-cluster distance http://pralab.diee.unica.it 8 x dist(Ci,Cj ) = min a∈Ci , b∈Cj d(a, b) x x x x x (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
  9. 9. Poisoning Single-Linkage Clustering • Attack strategy: • Heuristic-based solutions m – Greedy approach: adding one attack sample at a time – Bridge-based heuristics: local maxima are found in between the closest points of adjacent clusters http://pralab.diee.unica.it 9 max A dc (Y,Y!(A)), A= ai { }i=1 (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
  10. 10. Poisoning Single-Linkage Clustering http://pralab.diee.unica.it 10 • Underlying idea: bridging the closest clusters – Given K clusters, K-1 candidate attack points Candidate attack points (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
  11. 11. Poisoning Single-Linkage Clustering 1. Bridge (Best): evaluates Y’(a) for each candidate attack, retaining the best one – Clustering is run for each candidate attack point 2. Bridge (Hard): estimates Y’(a) assuming that each candidate will split the corresponding cluster, potentially merging it with a fragment of the closest cluster – It does not require running clustering to find the best attack point 3. Bridge (Soft): estimates Y’(a) as Bridge (Hard), but using a soft probabilistic estimate instead of 0/1 sample-to-cluster assignments – It does not require running clustering to find the best attack point http://pralab.diee.unica.it 11 (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
  12. 12. Poisoning Single-Linkage Clustering • The attack compromises the initial clustering by forming heterogeneous clusters http://pralab.diee.unica.it 12 Clustering on untainted data 2.5 Clustering after adding 20 attack samples 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
  13. 13. Malheur Behavioral Malware Clustering • Malware executed in a sandbox (e.g., virtual machine) – Monitoring of program behavior (instructions, system calls, etc.) • Embedding of malware behavior in feature space – Each feature denotes presence / absence of a given instruction – Each vector is normalized to unit Euclidean norm • Clustering using single-linkage (or other linkage variants) http://pralab.diee.unica.it 13 Filesystem copy file ‘a’ to ’b’ open file ’foo.txt’ Network ping host ’10.1.2.3’ listen on port ‘31337’ Registry set key ‘reboot’ to ‘1’ (level 1) 14 01 | 11 04 … 02 02 | 02 02 … 0d 01 | 03 0a … 03 03 | 03 01 … 03 0a | 11 04 … Sandbox MIST Instruction (opcode) arguments Feature space 14 01 02 02 + (1) K. Rieck et al.. Automatic analysis of malware behavior using machine learning. JCS 2011
  14. 14. Poisoning Malheur • Poisoning single-linkage hierarchical clustering • Problem: how to create bridge points in this feature space? – Binary-valued vectors normalized to unit Euclidean norm • Additional constraint on the manipulation of malware samples – Malware should be modified without affecting malicious functionality – Adding instructions after malware program execution – Feature values can be only incremented http://pralab.diee.unica.it 14 x1 = (1 1 0 0 0) x2 = (0 0 1 1 1) 1.5 1 0.5 0 d(x,x ) 1 d(x,x ) 2 0 1 2 3 number of added features x2 14 01 02 02 + + + Bridge point x1 x x
  15. 15. Experimental Setup and Datasets • Setup – Data split into two portions of equal size T and S – T used for extracting instructions and setting the cutoff threshold – S used for performance evaluation – F-measure: agreement between clusters and malware families • Malheur data – 3131 malware samples collected in 2009 (publicly available) – 85 instructions / features (on average) – Cutoff distance (max. F-measure on T): 0.49 (on average) • Recent Malware data – 657 malware samples from most prominent families in 2013 – 78 instructions / features (on average) – Cutoff distance (max. F-measure on T): 0.63 (on average) http://pralab.diee.unica.it 15
  16. 16. Experimental Results (Malheur data) • Attack strategies – Bridge (Best/Hard/Soft), Random, Random (Best), F-measure (Best) • Results for Malheur data – Random-based attacks are not effective (high-dimensional space) – Bridging is effective / clusters are fused together (cutoff threshold is fixed) – F-measure decreases while maximizing distance between clusterings Random Random (Best) Bridge (Best) Bridge (Soft) Bridge (Hard) F−measure (Best) 1600 1400 1200 1000 800 600 400 200 0 http://pralab.diee.unica.it 16 100 10 20 30 40 50 60 70 80 0% 2% 5% 7% 9% 11%13%15%17%18%20% Objective function 90 80 70 60 50 40 30 20 10 0% 2% 5% 7% 9% 11% 13% 15% 17% 18% 20% F−measure Fraction of poisoning attacks
  17. 17. Experimental Results (Recent Malware data) • Attack strategies – Bridge (Best/Hard/Soft), Random, Random (Best), F-measure (Best) • Results for Recent Malware data – Random-based attacks are not effective (high-dimensional space) – Bridging is effective / clusters are fused together (cutoff threshold is fixed) – F-measure decreases while maximizing distance between clusterings Random Random (Best) Bridge (Best) Bridge (Soft) Bridge (Hard) F−measure (Best) 300 250 200 150 100 50 0 http://pralab.diee.unica.it 17 72 10 20 30 40 50 60 70 80 0% 2% 5% 7% 9% 11%13%15%16%18%20% Objective function 70 68 66 64 62 60 58 56 54 52 0% 2% 5% 7% 9% 11%13%15%16%18%20% F−measure Fraction of poisoning attacks
  18. 18. Conclusions and Future Work • Poisoning attacks can subvert behavioral malware clustering • Future work – Extensions to other clustering algorithms, common attack strategy • e.g., black-box optimization with suitable heuristics – Attacks with limited knowledge of the data / clustering algorithm http://pralab.diee.unica.it 18 Secure clustering algorithms Attacks against clustering
  19. 19. http://pralab.diee.unica.it ? 19 Thanks for your a#en-on! Any quesCons

    Be the first to comment

    Login to see the comments

Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data. In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior. To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms.

Views

Total views

1,929

On Slideshare

0

From embeds

0

Number of embeds

1,005

Actions

Downloads

19

Shares

0

Comments

0

Likes

0

×