DEEPSEC 2013: Malware Datamining And Attribution

5,529 views

Published on

Greg Hoglund explained at BlackHat 2010 that the development environments that malware authors use leaves traces in the code which can be used to attribute malware to a individual or a group of individuals. Not with the precision of name, date of birth and address but with evidence that a arrested suspects computer can be analysed and compared with the "tool marks" on the collected malware sample.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,529
On SlideShare
0
From Embeds
0
Number of Embeds
3,915
Actions
Shares
0
Downloads
571
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

DEEPSEC 2013: Malware Datamining And Attribution

  1. 1. Malware Attribution Theory, Code and Result
  2. 2. Who am I? • Michael Boman, M.A.R.T. project • Have been “playing around” with malware analysis “for a while” • Working for FireEye • This is a HOBBY project that I use my SPARE TIME to work on
  3. 3. Agenda Theory behind Malware Attribution Code to conduct Malware Attribution analysis Result of analysis
  4. 4. Theory
  5. 5. • Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat 2010 http://www.youtube.com/watch?v=k4Ry1trQhDk
  6. 6. What am I trying to do? Move this way Binary Human
  7. 7. What am I trying to do? Blacklists Binary Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT Human
  8. 8. What am I trying to do? Blacklists Binary Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT Human
  9. 9. Blacklists Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT
  10. 10. Physical Surveillance HUMINT Social Cyberspace DIGINT Developer Fingerprints Tactics Techniques Procedures Blacklists Net Recon Command and Control Actions / Intent Installation / Deployment CNA (spreader) / CNE (search & exfil tool) COMS Defensive / Anti-forensic Exploit Shellcode DNS, Command and Control Protocol, Encryption
  11. 11. Physical Surveillance HUMINT Social Cyberspace DIGINT Developer Fingerprints Tactics Techniques Procedures Blacklists Net Recon Command and Control Actions / Intent Installation / Deployment CNA (spreader) / CNE (search & exfil tool) COMS Defensive / Anti-forensic Exploit Shellcode DNS, Command and Control Protocol, Encryption
  12. 12. Steps • Step 0: Gather malware • Step 1: Extract metadata from binary • Step 2: Store metadata and binary in MongoDB • Step 3: Analyze collected data
  13. 13. Step 0: Gather malware • • • • VirusShare (virusshare.com) • Malware Domain List (www.malwaredomainlist.com/mdl.php) OpenMalware (www.offensivecomputing.net) MalShare (www.malshare.com) CleanMX (support.clean-mx.de/clean-mx/ viruses)
  14. 14. Step 1: Extract metadata from binary
  15. 15. Development Steps Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  16. 16. Development Steps Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  17. 17. Development Steps Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  18. 18. Step 1: Extract metadata from binary • • • • • Hashes (for sample identification) • md5, sha1, sha256, sha512, ssdeep etc. File type / Exif / PEiD • Compiler / Packer etc. PE Headers / Imports / Exports etc. Virustotal results Tags
  19. 19. Identifying compiler / packer • PEiD • Python • peutils.SignatureDatabase().match_all()
  20. 20. PE Header information
  21. 21. VirusTotal Results
  22. 22. Tags • User-supplied tags to identify sample source and behavior • analyst / analyst-system supplied
  23. 23. Step 2: Store metadata and binary in MongoDB
  24. 24. Components • • Modified VXCage server • Stores malware & metadata in MongoDB instead of FS / ORDBMS Collects a lot more metadata then the original
  25. 25. VXCage REST API • • • /malware/add • Add sample /malware/get/<filehash> • Download sample. If no local sample, search other repos /malware/find • Search for sample by md5, sha256, ssdeep, tag, date • /tags/list • List tags
  26. 26. Step 3: Analyze collected data
  27. 27. Identifying development environments • Compiler / Linker / Libraries • Strings • Paths • PE Translation header • Compile times • Number of times a software been built
  28. 28. Cataloging behaviors • Packers • Encryption • Anti-debugging • Anti-VM • Anti-forensics
  29. 29. Result
  30. 30. Have I seen you before? • Detects similar malware (based on SSDEEP fuzzy hashing)
  31. 31. Different MD5, 100% SSDeep match
  32. 32. SSDEEP Analysis (3007)
  33. 33. SSDEEP Analysis (3007)
  34. 34. SSDEEP Analysis (851)
  35. 35. Challanges • Party handshake problem: • 707k samples analyzed and counting (resulting in over 250 billion compares!) • Need a better target (pre-)selection
  36. 36. What compilers / packers are common? 1. "Borland Delphi 3.0 (???)", 54298 2. "Microsoft Visual C++ v6.0", 33364 3. "Microsoft Visual C++ 8", 28005 4. "Microsoft Visual Basic v5.0 - v6.0", 26573 5. "UPX v0.80 - v0.84", 22353
  37. 37. Are there any unidentified packers? • How to identify a packer • PE Section is empty in binary, is writable and executable
  38. 38. How common are antidebugging techniques? • 31622 out of 531182 PE binaries uses IsDebuggerPresent (6 %) • Packed executable uncounted
  39. 39. Analysis Coverage Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  40. 40. Future
  41. 41. What am I trying to do in the future Blacklists Binary Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT Human Expand scope of analysis +network +memory +os changes +behavior
  42. 42. What am I trying to do in the future • More automation • More modular design • Solve the “Big Data” issue I am getting myself into (Hadoop?) • More pretty graphs
  43. 43. Thank you • Michael Boman • michael@michaelboman.org • @mboman • http://blog.michaelboman.org • Code available at https://github.com/ mboman/vxcage

×