Your SlideShare is downloading. ×
  • Like - Malware Similarity and Clustering Made Easy
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply - Malware Similarity and Clustering Made Easy



  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Simseer.comMalware Similarity and ClusteringMade EasySilvio Cesare <>
  • 2. Introduction• is a set of web services to analyse malware using program structure as a signature.. Why?• AV String signatures not very robust.• Can’t detect ‘approximate’ matches.• Hard to generate signature for an entire family.• Program structure improves signature-based methods.
  • 3. Who am I?• Ph.D. Student at Deakin University.• Presented at Ruxcon, Black Hat, AusCERT, etc.• Published in academia.• Book author • Recently relocated to Canberra.
  • 4. Outline1. Introduction2.’s Malware Services3. Supporting Infrastructure4. Other Services5. Conclusion
  • 5. Signatures• In my other presentations.• Signature is based on ‘set of control flow graphs’
  • 6. Signature Extraction• Transform ‘set of control flow graphs’ into a ‘feature vector’• Decompilation + N-Grams W|IE |IEH W|IEH}R IEH} EH}R proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5
  • 7. Simseer• Begin start of demo...• A revamp of my existing service.• Submit an archive of malware samples.• Results ▫ A similarity matrix comparing samples. ▫ An evolutionary tree showing relationships.
  • 8. Submission Page
  • 9. Results
  • 10. Simseer• Demo complete...• Use ‘distance between vectors’ to show similarity.• Visualize using phylogenetics software.
  • 11. SimseerCluster• Begin demo...• A new service.• Submit an archive of malware samples.• Define the number of clusters.• Results ▫ Samples grouped into clusters. ▫ Cross checking samples with AV. ▫ Identification of families.
  • 12. Submission Page
  • 13. Results
  • 14. SimseerCluster• Demo complete...• Use ‘similarity matrix’ and ‘cosine similarity’.• Pass to ‘cluster analysis software’ – The Weka Machine Learning Toolkit.• Use Hierarchical clustering.
  • 15. SimseerSearch• Begin demo...• A new service.• Submit a malware sample.• Specify threshold of similarity.• Results ▫ All samples in database similar to query. ▫ An AV report. ▫ Heuristics to detect obfuscations (packing).
  • 16. Submission Page
  • 17. Results
  • 18. Query Benign rSimseerSearch p d(p,q) q Query Malicious Query• Demo complete... Malware• Use ‘nearest neighbour similarity search’ based on ‘Euclidean distance’.• Packer detection based on entropy analysis.
  • 19. Supporting Infrastructure
  • 20. Other Services• Other services on the same infrastructure ▫ Clonewise ▫ Bugwise
  • 21. Clonewise – Detecting embeddedlibraries.
  • 22. Bugwise on real Debian Linux binaries
  • 23. Future Work• Integrate Cuckoo sandbox ▫ Unpacking with Volatility. ▫ Non EXE formats (PDF, DOC, etc). ▫ API Call classification (non signature-based).
  • 24. Conclusion• Free services.• Control flow better than traditional string signatures.• Try it!•