Malware Attribution
Theory, Code and Result
Who am I?
• Michael Boman, M.A.R.T. project
• Have been “playing around” with malware
analysis “for a while”

• Working fo...
Agenda
Theory
behind Malware Attribution

Code
to conduct Malware Attribution analysis

Result
of analysis
Theory
•

Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat
2010
http://www.youtube.com/watch?v=k4Ry1trQhDk
What am I trying to
do?
Move this way
Binary

Human
What am I trying to
do?
Blacklists

Binary

Net Recon
Command
and Control

Developer
Fingerprints

Tactics
Techniques
Proc...
What am I trying to
do?
Blacklists

Binary

Net Recon
Command
and Control

Developer
Fingerprints

Tactics
Techniques
Proc...
Blacklists

Net Recon
Command
and Control

Developer
Fingerprints

Tactics
Techniques
Procedures

Social
Cyberspace
DIGINT...
Physical
Surveillance
HUMINT
Social
Cyberspace
DIGINT
Developer
Fingerprints

Tactics
Techniques
Procedures
Blacklists

Ne...
Physical
Surveillance
HUMINT
Social
Cyberspace
DIGINT
Developer
Fingerprints

Tactics
Techniques
Procedures
Blacklists

Ne...
Steps
• Step 0: Gather malware
• Step 1: Extract metadata from binary
• Step 2: Store metadata and binary in
MongoDB

• St...
Step 0: Gather malware
•
•
•
•

VirusShare (virusshare.com)

•

Malware Domain List
(www.malwaredomainlist.com/mdl.php)

O...
Step 1: Extract
metadata from binary
Development Steps
Source
Core “backbone”
sourcecode

Machine

Binary

Tweaks & Mods
Compiler
3rd party
sourcecode

3rd par...
Development Steps
Source
Core “backbone”
sourcecode

Machine

Binary

Tweaks & Mods
Compiler
3rd party
sourcecode

3rd par...
Development Steps
Source
Core “backbone”
sourcecode

Machine

Binary

Tweaks & Mods
Compiler
3rd party
sourcecode

3rd par...
Step 1: Extract
metadata from binary

•
•
•
•
•

Hashes (for sample identification)

•

md5, sha1, sha256, sha512, ssdeep e...
Identifying
compiler / packer
• PEiD

• Python
• peutils.SignatureDatabase().match_all()
PE Header information
VirusTotal Results
Tags
• User-supplied tags to identify sample
source and behavior

• analyst / analyst-system supplied
Step 2: Store metadata
and binary in MongoDB
Components
•
•

Modified VXCage server

•

Stores malware & metadata
in MongoDB instead of FS /
ORDBMS

Collects a lot more...
VXCage REST API
•
•
•

/malware/add

•

Add sample

/malware/get/<filehash>

•

Download sample. If no local sample, searc...
Step 3: Analyze
collected data
Identifying development
environments
• Compiler / Linker / Libraries
• Strings
• Paths
• PE Translation header
• Compile t...
Cataloging behaviors
• Packers
• Encryption
• Anti-debugging
• Anti-VM
• Anti-forensics
Result
Have I seen you before?

• Detects similar malware (based on SSDEEP
fuzzy hashing)
Different MD5,
100% SSDeep match
SSDEEP Analysis

(3007)
SSDEEP Analysis

(3007)
SSDEEP Analysis

(851)
Challanges
• Party handshake problem:
• 707k samples analyzed and counting

(resulting in over 250 billion compares!)

• N...
What compilers /
packers are common?
1. "Borland Delphi 3.0 (???)", 54298
2. "Microsoft Visual C++ v6.0", 33364
3. "Micros...
Are there any
unidentified packers?
• How to identify a packer
• PE Section is empty in binary, is writable
and executable
How common are antidebugging techniques?
• 31622 out of 531182 PE binaries uses
IsDebuggerPresent (6 %)

• Packed executab...
Analysis Coverage
Source
Core “backbone”
sourcecode

Machine

Binary

Tweaks & Mods
Compiler
3rd party
sourcecode

3rd par...
Future
What am I trying to do
in the future
Blacklists

Binary

Net Recon
Command
and Control

Developer
Fingerprints

Tactics
Te...
What am I trying to do
in the future
• More automation
• More modular design
• Solve the “Big Data” issue I am getting
mys...
Thank you
• Michael Boman
• michael@michaelboman.org
• @mboman
• http://blog.michaelboman.org
• Code available at https://...
Upcoming SlideShare
Loading in …5
×

DEEPSEC 2013: Malware Datamining And Attribution

5,647 views

Published on

Greg Hoglund explained at BlackHat 2010 that the development environments that malware authors use leaves traces in the code which can be used to attribute malware to a individual or a group of individuals. Not with the precision of name, date of birth and address but with evidence that a arrested suspects computer can be analysed and compared with the "tool marks" on the collected malware sample.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,647
On SlideShare
0
From Embeds
0
Number of Embeds
3,921
Actions
Shares
0
Downloads
574
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

DEEPSEC 2013: Malware Datamining And Attribution

  1. 1. Malware Attribution Theory, Code and Result
  2. 2. Who am I? • Michael Boman, M.A.R.T. project • Have been “playing around” with malware analysis “for a while” • Working for FireEye • This is a HOBBY project that I use my SPARE TIME to work on
  3. 3. Agenda Theory behind Malware Attribution Code to conduct Malware Attribution analysis Result of analysis
  4. 4. Theory
  5. 5. • Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat 2010 http://www.youtube.com/watch?v=k4Ry1trQhDk
  6. 6. What am I trying to do? Move this way Binary Human
  7. 7. What am I trying to do? Blacklists Binary Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT Human
  8. 8. What am I trying to do? Blacklists Binary Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT Human
  9. 9. Blacklists Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT
  10. 10. Physical Surveillance HUMINT Social Cyberspace DIGINT Developer Fingerprints Tactics Techniques Procedures Blacklists Net Recon Command and Control Actions / Intent Installation / Deployment CNA (spreader) / CNE (search & exfil tool) COMS Defensive / Anti-forensic Exploit Shellcode DNS, Command and Control Protocol, Encryption
  11. 11. Physical Surveillance HUMINT Social Cyberspace DIGINT Developer Fingerprints Tactics Techniques Procedures Blacklists Net Recon Command and Control Actions / Intent Installation / Deployment CNA (spreader) / CNE (search & exfil tool) COMS Defensive / Anti-forensic Exploit Shellcode DNS, Command and Control Protocol, Encryption
  12. 12. Steps • Step 0: Gather malware • Step 1: Extract metadata from binary • Step 2: Store metadata and binary in MongoDB • Step 3: Analyze collected data
  13. 13. Step 0: Gather malware • • • • VirusShare (virusshare.com) • Malware Domain List (www.malwaredomainlist.com/mdl.php) OpenMalware (www.offensivecomputing.net) MalShare (www.malshare.com) CleanMX (support.clean-mx.de/clean-mx/ viruses)
  14. 14. Step 1: Extract metadata from binary
  15. 15. Development Steps Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  16. 16. Development Steps Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  17. 17. Development Steps Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  18. 18. Step 1: Extract metadata from binary • • • • • Hashes (for sample identification) • md5, sha1, sha256, sha512, ssdeep etc. File type / Exif / PEiD • Compiler / Packer etc. PE Headers / Imports / Exports etc. Virustotal results Tags
  19. 19. Identifying compiler / packer • PEiD • Python • peutils.SignatureDatabase().match_all()
  20. 20. PE Header information
  21. 21. VirusTotal Results
  22. 22. Tags • User-supplied tags to identify sample source and behavior • analyst / analyst-system supplied
  23. 23. Step 2: Store metadata and binary in MongoDB
  24. 24. Components • • Modified VXCage server • Stores malware & metadata in MongoDB instead of FS / ORDBMS Collects a lot more metadata then the original
  25. 25. VXCage REST API • • • /malware/add • Add sample /malware/get/<filehash> • Download sample. If no local sample, search other repos /malware/find • Search for sample by md5, sha256, ssdeep, tag, date • /tags/list • List tags
  26. 26. Step 3: Analyze collected data
  27. 27. Identifying development environments • Compiler / Linker / Libraries • Strings • Paths • PE Translation header • Compile times • Number of times a software been built
  28. 28. Cataloging behaviors • Packers • Encryption • Anti-debugging • Anti-VM • Anti-forensics
  29. 29. Result
  30. 30. Have I seen you before? • Detects similar malware (based on SSDEEP fuzzy hashing)
  31. 31. Different MD5, 100% SSDeep match
  32. 32. SSDEEP Analysis (3007)
  33. 33. SSDEEP Analysis (3007)
  34. 34. SSDEEP Analysis (851)
  35. 35. Challanges • Party handshake problem: • 707k samples analyzed and counting (resulting in over 250 billion compares!) • Need a better target (pre-)selection
  36. 36. What compilers / packers are common? 1. "Borland Delphi 3.0 (???)", 54298 2. "Microsoft Visual C++ v6.0", 33364 3. "Microsoft Visual C++ 8", 28005 4. "Microsoft Visual Basic v5.0 - v6.0", 26573 5. "UPX v0.80 - v0.84", 22353
  37. 37. Are there any unidentified packers? • How to identify a packer • PE Section is empty in binary, is writable and executable
  38. 38. How common are antidebugging techniques? • 31622 out of 531182 PE binaries uses IsDebuggerPresent (6 %) • Packed executable uncounted
  39. 39. Analysis Coverage Source Core “backbone” sourcecode Machine Binary Tweaks & Mods Compiler 3rd party sourcecode 3rd party libraries Time Runtime libraries Paths MAC Address Malware Packing
  40. 40. Future
  41. 41. What am I trying to do in the future Blacklists Binary Net Recon Command and Control Developer Fingerprints Tactics Techniques Procedures Social Cyberspace DIGINT Physical Surveillance HUMINT Human Expand scope of analysis +network +memory +os changes +behavior
  42. 42. What am I trying to do in the future • More automation • More modular design • Solve the “Big Data” issue I am getting myself into (Hadoop?) • More pretty graphs
  43. 43. Thank you • Michael Boman • michael@michaelboman.org • @mboman • http://blog.michaelboman.org • Code available at https://github.com/ mboman/vxcage

×