- Malware Similarity and Clustering Made Easy


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide - Malware Similarity and Clustering Made Easy

  1. 1. Simseer.comMalware Similarity and ClusteringMade EasySilvio Cesare <>
  2. 2. Introduction• is a set of web services to analyse malware using program structure as a signature.. Why?• AV String signatures not very robust.• Can’t detect ‘approximate’ matches.• Hard to generate signature for an entire family.• Program structure improves signature-based methods.
  3. 3. Who am I?• Ph.D. Student at Deakin University.• Presented at Ruxcon, Black Hat, AusCERT, etc.• Published in academia.• Book author • Recently relocated to Canberra.
  4. 4. Outline1. Introduction2.’s Malware Services3. Supporting Infrastructure4. Other Services5. Conclusion
  5. 5. Signatures• In my other presentations.• Signature is based on ‘set of control flow graphs’
  6. 6. Signature Extraction• Transform ‘set of control flow graphs’ into a ‘feature vector’• Decompilation + N-Grams W|IE |IEH W|IEH}R IEH} EH}R proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5
  7. 7. Simseer• Begin start of demo...• A revamp of my existing service.• Submit an archive of malware samples.• Results ▫ A similarity matrix comparing samples. ▫ An evolutionary tree showing relationships.
  8. 8. Submission Page
  9. 9. Results
  10. 10. Simseer• Demo complete...• Use ‘distance between vectors’ to show similarity.• Visualize using phylogenetics software.
  11. 11. SimseerCluster• Begin demo...• A new service.• Submit an archive of malware samples.• Define the number of clusters.• Results ▫ Samples grouped into clusters. ▫ Cross checking samples with AV. ▫ Identification of families.
  12. 12. Submission Page
  13. 13. Results
  14. 14. SimseerCluster• Demo complete...• Use ‘similarity matrix’ and ‘cosine similarity’.• Pass to ‘cluster analysis software’ – The Weka Machine Learning Toolkit.• Use Hierarchical clustering.
  15. 15. SimseerSearch• Begin demo...• A new service.• Submit a malware sample.• Specify threshold of similarity.• Results ▫ All samples in database similar to query. ▫ An AV report. ▫ Heuristics to detect obfuscations (packing).
  16. 16. Submission Page
  17. 17. Results
  18. 18. Query Benign rSimseerSearch p d(p,q) q Query Malicious Query• Demo complete... Malware• Use ‘nearest neighbour similarity search’ based on ‘Euclidean distance’.• Packer detection based on entropy analysis.
  19. 19. Supporting Infrastructure
  20. 20. Other Services• Other services on the same infrastructure ▫ Clonewise ▫ Bugwise
  21. 21. Clonewise – Detecting embeddedlibraries.
  22. 22. Bugwise on real Debian Linux binaries
  23. 23. Future Work• Integrate Cuckoo sandbox ▫ Unpacking with Volatility. ▫ Non EXE formats (PDF, DOC, etc). ▫ API Call classification (non signature-based).
  24. 24. Conclusion• Free services.• Control flow better than traditional string signatures.• Try it!•