Your SlideShare is downloading. ×
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
The Future of Automated Malware Generation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The Future of Automated Malware Generation

12,548

Published on

Published in: Spiritual
1 Comment
11 Likes
Statistics
Notes
No Downloads
Views
Total Views
12,548
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
250
Comments
1
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Future of Automated Malware Generation Stephan Chenette Director of Security Research & Development1
  • 2. Who Am I? • Stephan Chenette @StephanChenette (twitter) • Currently Director of Security R&D @ IOActive •Building / Breaking / Hacking / Researching • R&D @ eEye Digital Security 4+ years • Head Security Researcher @ Websense 6+ years • (Graduate Student @ UCSD - Network Security)2
  • 3. What I hope you learn… • An understanding of the current malware landscape • Various malware/exploit defense techniques • Where I think detection/defense technologies are headed • How malware authors will most likely react  drive the future of automated malware generation3
  • 4. Statement This particular topic/area is a personal research interest of mine – I’m hoping to basically motivate you to think offensively when building or using defensive technologies… For Example: I’m currently helping on an open source automated detection technology for the cuckoo sandbox – and am trying to evade/bypass it at the same time4
  • 5. Agenda • Current State of Automated Malware Generation • Current State of Malware Defense (Tech.) • Malware Trends • The Future of Malware Defense • The Future of Automated Malware Generation5
  • 6. Malware Distribution Networks (MDNs)6
  • 7. Malware Distribution Networks Malware has evolved into a profitable business for cyber criminals •Complex/Organized/Distributed Network •Malware Distribution Network (MDNs) •Pay-per-install (PPI) clients (RogueAV, SpamBot, keylogger) •PPI Services •PPI Affiliates (landing pages, redirection services, etc.)7
  • 8. Malware Distribution Networks (MDNs) 2 3 4 1 Source: Microsoft Security Intelligence Threat Report (http://www.microsoft.com/sir )8
  • 9. Malware Distribution Networks (MDNs) Single Sample Repository A repository that does not update the malicious executable for the lifetime of the repository. Multiple Sample Repository A repository that performs updates to the malicious executable over time, but is not generating the samples for each request Polymorphic/Metamorphic Repository A repository that produces a unique malicious executable for every download request9
  • 10. Example: Blackhole Exploit Kit Blackhole contains an integrated AV scanner and will auto-repackage if malware is detected Figure: Blackhole exploit kit download chain Source: Manufacturing Compromise: The Emergence of Exploit-as-a-Service (http://cseweb.ucsd.edu/~voelker/pubs/eaas-ccs12.pdf)10
  • 11. Exploit Kits and Malware Blackhole | Ingognito || ZeroAccess | TDSS Source: Manufacturing Compromise: The Emergence of Exploit-as-a-Service (http://cseweb.ucsd.edu/~voelker/pubs/eaas-ccs12.pdf)11
  • 12. Agenda • Current State of Automated Malware Generation • Current State of Malware Defense (Tech.) • Malware Trends • The Future of Malware Defense • The Future of Automated Malware Generation12
  • 13. Current State of Malware Defense (Tech.)13
  • 14. Current Techniques • Hash • Signatures • Heuristics • Semantics-aware detection14
  • 15. Current Techniques Attacker Defender Easier to bypass Easier to implement Harder to change Harder to implement15
  • 16. Hash-based detection • Full file hashing (cryptographic checksum) •MD5, SHA1, SHA256 • Portable Executable (PE) •Sectional hashing •Custom hashing •Fuzzy hashing (ssdeep) • Error on the side of caution16
  • 17. Defeating Hash-based detection • Create Unique malware sample per user request •Randomizing single byte in irrelevant file offset •Re-packaging binary (FSG, ASPack, Themida) •Re-building malware dynamically17
  • 18. Signature-based detection • Regular Expression based signatures (PCRE, RE2) • Byte-signatures rule ASPack {         strings:         $ = { 60 E8 ?? ?? ?? ?? 5D 81 ED ?? ?? (43 | 44) ?? B8 ?? ?? (43 | 44) ?? 03 C5 }         $ = { 60 EB ?? 5D EB ?? FF ?? ?? ?? ?? ?? E9 }         $ = { 60 EB 03 5D FF E5 E8 F8 FF FF FF 81 ED 1B 6A 44 00 BB 10 6A 44 00 03 DD 2B 9D 2A }         $ = { 60 E8 00 00 00 00 5D ?? ?? ?? ?? ?? ?? BB ?? ?? ?? ?? 03 DD }         $ = { 60 E8 41 06 00 00 EB 41 }         $ = { 60 E8 7? 05 00 00 EB (33 | 4C) }              condition:                  for any of them : ($ at entrypoint) } • Deeper contextual content scanning with proprietary language18
  • 19. Defeating Signature-based detection • Syntax mutation easily defeats this technique • Garbage Code Insertion e.g. NOP, “MOV ax, ax”, “SUB ax 0” • Register Renaming • Subroutine Permutation • Code Reordering through Jumps • Equivalent instruction substitution Instruction Equivalent instruction MOV EAX, EBX PUSH EBX, POP EAX Call Emulated Call Misused Call CALL <target> PUSH <PC + sizeof(PUSH) + sizeof(JMP)> CALL <target> JMP <target> .target POP <register-name> • Same behavior but different syntax19
  • 20. Heuristics are introduced… AV engines were forced to evolve and use heuristics by way of emulation/behavioral analysis due to: •Polymorphic engines • Encrypt body with randomly generated encryption algorithm • Private key normally in decoding engine •Metamorphic engines • Employs obfuscation/substitution techniques instead of encryption • Junk insertion, equivalent instruction substitution, etc.20
  • 21. Heuristics-based detection General term for the different techniques used to detect malware by their behavior Emulation, API hooking, sand-boxing, file anomalies and other analysis techniques Rule A Rule B Rule C IF Rule A then Rule B then Rule C then Poison Ivy Source: (http://http://hooked-on-mnemonics.blogspot.com)21
  • 22. Defeating Heuristics-based detection • Detect emulation and execute different code path • Break emulation engine • Avoid the heuristics • Overall solid method • Possible false positives22
  • 23. Semantics-aware Detection • Captured execution trace is transformed into a higher-level representation capturing its semantic meaning, i.e., the trace is first abstracted before being compared to a malicious behavior • Make the time to build the code flow or extraction of a model infeasible for real-time AV using time lock puzzles • Intermediate representation (IR) • Abstract Syntax Trees, Register Transfer Language23
  • 24. Semantics-aware detection Good idea in theory, but unknown (to me) how widely implemented this is in security products24
  • 25. Defeating Semantics-aware detection Implementation is difficult Limited support for equivalent code sequences a = b * 2 a = b << 1 A left arithmetic shift by n is equivalent to multiplying by 2n (provided the value does not overflow) Focus on same techniques used to defeat signatures and heuristics + likelihood of limited support less popular instructions25
  • 26. Recap26
  • 27. Agenda • Current State of Automated Malware Generation • Current State of Malware Defense (Tech.) • Malware Trends • The Future of Malware Defense • The Future of Automated Malware Generation27
  • 28. Malware Trends28
  • 29. Malware Detection Reality Check • How well are current detection techniques working? 33%!29
  • 30. Malware Samples Observation: # of Malware Samples are increasing Source: Mcafee Global Q12012 Threat Report (http://mcafee.com/us/resources/reports/rp-quarterly-threat-q1-2012.pdf)30
  • 31. Mobile Malware Samples Observation: # of Android Malware Samples are increasing Source: Kaspersky Q12012 Threat Report (http://www.securelist.com/en/analysis/204792231/IT_Threat_Evolution_Q1_2012)31
  • 32. Use of Behavior Sandboxes Client binary is malware but isn’t detected. Suspicious files are sent back to “home base/cloud” lab for analysis 1.Sent to sandbox system 2.Meta data report is created for easier export of new rules a. Hash and blacklist entries are added b. Signatures are added c. Heuristic detection is added32
  • 33. The Overworked Malware Analyst33
  • 34. Solving the problem with people Malware Analysts Malware Samples Samples A D!! L O O VER34
  • 35. Agenda • Current State of Automated Malware Generation • Current State of Malware Defense (Tech.) • Malware Trends • The Future of Malware Defense • The Future of Automated Malware Generation35
  • 36. The Future of Malware Defense Skynet? …probably not But some of the concepts aren’t too far fetched…36
  • 37. The Future of Malware Defense Perhaps malware detection should have more science applied to it.37
  • 38. The Malware Infinity Problem Malware detection As malware samples approaches ∞ we can’t manually add detection for every file. We must model WHAT actions malware take, HOW it makes those actions and WHERE it makes connected. Malware Attribution As Attack Surface approaches ∞ we can’t defend everything from everyone. We must model WHO is after WHICH assets and HOW they attack.38
  • 39. The Future of Malware Defense IF we are going to start modeling we must make some assumptions: 1.Attackers are going to change their code and techniques only enough to avoid detection 2.The majority of malware/exploits code and techniques will continue to represent future malware/exploits code and techniques39
  • 40. The Who is important… “Researchers at Symantec traced the group’s work after finding a number of similarities between the Google attack code and methods and those used against other companies and organizations over the last few years. The researchers, who describe their findings in a report published Friday, say the gang — which they have dubbed the “Elderwood gang” based on the name of a parameter used in the attack codes — appears to have breached more than 1,000 computers in companies spread throughout several sectors – including defense, shipping, oil and gas, financial, technology and ISPs. The group has also targeted non- governmental organizations, particularly ones connected to human rights activities related to Tibet and China” Source: http://www.wired.com/threatlevel/2012/09/google- hacker-gang-returns/40
  • 41. Statistics A discipline that makes you understand data and makes you make decisions based on data S T A T I Data S Decisions T I C S41
  • 42. Train the Machines •Classify •Cluster42
  • 43. Automatic Classification Steps: 1.Extract features 2.Train models using ML algorithms 3.Feature Selection 4.Use models as classifiers 5.Use models to classify unknown files as 0 or 1 Source: http://eval.symantec.com/mktginfo/enterprise/white_papers/b-dlp_machine_learning.WP_en-us.pdf43
  • 44. Machine learning Where we train computers to make statistical decisions on real-time data based on inputted data While machine learning as a concept has been around for decades and has been used in everything from anti-spam engines to Google™ algorithms for translating text, it is only now being applied to web filtering, DLP and malware content analysis.44
  • 45. Historical Observation Historically certain malware has •No icon •No description or company in resource section •Is packed •Lives in windows directory or user profile These are the type of “features” that expert humans would feed to machine learning classifiers to train on45
  • 46. Expert Humans train Machines “You can’t effectively and consistently manage what you can’t measure, and you can’t measure what you haven’t defined…” SOURCE: http://fairwiki.riskmanagementinsight.com/?page_id=3 •The job of the human •List features •The job of the machine •Model which features are important, in what grouping and in what order •Classify •Cluster46
  • 47. Machine Learning (ML) Algorithms • Naive Baysian Classifier (each feature is independent of the other features) • Support Vector Machine (SVM) when high dimensionality (high dimensionality.. more than a thousand of variables are in the model) • Random Forest when you want an interpretable model (< 2000 features) • Marchov Chains (Natural Language Processing) for when you want to assess the sequence probability47
  • 48. The Future of Malware Defense Network File System Physical Memory Inspection Point Every Layer provides various degrees of “features” to inspect48
  • 49. The Future of Malware Defense49
  • 50. Existing Academic work… • D. Plonka and P. Barford. Context-Aware Clustering of DNS Query Traffic. In Proceedings of the 8th ACM SIGCOMM conference on Internet Measurement, October 2008. • R. Perdisci, W. Lee, and N. Feamster. Behavioral Clustering of HTTP- Based Malware and Signature Generation Using Malicious Network Traces. In Proceedings of the 7th USENIX conference on Networked Systems Design and Implementation, April 2010. • K. Rieck, P. Trinius, C. Willems, T. Holz. Automatic Analysis of Malware Behavior using Machine Learning. e Journal of Computer Security, 201150
  • 51. Projects using machine learning •Razorbacktm - http://sourceforge.net/projects/razorbacktm/files/ •Malheur - http://www.mlsec.org/malheur/ •Malvic - http://www.malvic.org •Adobe Open Source Malware Classification Tool http://sourceforge.net/projects/malclassifier.adobe/ • 98.21% accuracy • 6.7% false positive rate • 7 features = DebugSize, ImageVersion, IatRVA, ExportSize, ResourceSize, VirtualSize2, NumberOfSections51
  • 52. Statistics Based Detection Tools52
  • 53. The Future of Malware Defense •Using Machine learning for malware detection is only as useful as the features you create and the good and bad sample sets it’s trained on. • Features • Good Sample Set • Bad Sample Set • If you have 1000’s of samples but on the same malware or sample exploit…not good!!!53
  • 54. PDF Example Features • Compressed JavaScript • PDF header location e.g %PDF - within first 1024 bytes • Does it contain an embedded file (e.g. flash, sound file) • Signed by a trusted certificate • Encoded/Encrypted Streams e.g. FlatDecode • Names hex escaped • Bogus xref table Reference: http://blog.fireeye.com/files/27c3_julia_wolf_omg-wtf-pdf.pdf54
  • 55. Detecting shellcode • Marchov chains To determine probability of instruction sequences 0.3 • Technique clustering 0.7 0.4 0.6 XOR ECX, ECX ; ECX = 0 MOV ESI, [FS:ECX + 0x30] ; ESI = &(PEB) ([FS:0x30]) MOV ESI, [ESI + 0x0C] ; ESI = PEB->Ldr MOV ESI, [ESI + 0x1C] ; ESI = PEB->Ldr.InInitOrder next_module: MOV EBP, [ESI + 0x08] ; EBP = InInitOrder[X].base_address MOV EDI, [ESI + 0x20] ; EBP = InInitOrder[X].module_name (unicode) MOV ESI, [ESI] ; ESI = InInitOrder[X].flink (next module) CMP [EDI + 12*2], CL ; modulename[12] == 0 ? JNE next_module ; No: try next module.55
  • 56. Shellcode detection Decoder routine clustering Detect entropy of bytes to indicated encoded payload ...features =]56
  • 57. Malware features in action … • Features: •Static: • Packed • File size • Origin •Dynamic (Network) • Makes a connection • Number of DNS request • Encrypted Communication • Burst/length of communication •Dynamic (File) • Register keys • File level modifications57
  • 58. The Future of Malware Defense • Choose features that are harder for the attacker to change. •E.g. bot network communication protocol (if not encrypted)58
  • 59. Agenda • Current State of Automated Malware Generation • Current State of Malware Defense (Tech.) • Malware Trends • The Future of Malware Defense • The Future of Automated Malware Generation59
  • 60. The Future of Automated Malware Generation60
  • 61. The Future of Malware Offense The Attacker has a few things in their favor: 1.Prone to False Positives Machine learning can be prone to false positives and false negatives if feature and sample sets aren’t extensive enough 1.Avoid Feature Indicators Detection via machine learning can be defeated if an attacker can find out where the features are and avoid them 1.New Features Come Out… You cant protect yourself from a new weapon if you dont know it exist61
  • 62. Prone to false positives If the defense side creates models based on a small sample set or a sample set that doesn’t represent a diverse enough sample set than the model will be too restrictive – false negatives If the defense creates models based only on malicious files and not enough good files there will be tons of false positives An Attacker can always try poison the sample sets if they have enough manipulation power and resources (VirusTotal)62
  • 63. Avoid feature indicators • Attackers can always do the same research and model generic malware and avoid features that are being used by most malware • …to instead use features that that are more popular in benign software • This will also avoid being placed in known clusters63
  • 64. New features come out… • If format changes, or gets updated: •A new file/protocol parser must be created/updated to understand and extract features •The model must be retrained and shipped out64
  • 65. …OR Just keep is simple Encrypt binaries with a user-specific key so that AV can’t decrypt it •Targeted binary like Gauss •Encrypted DLL with user key •Zeus •Encrypted the downloaded binary with user key65
  • 66. Conclusion • Complex/Organized Network • Malware distribution network (MDNs) •Pay-per-install (PPI) clients •Malware crypt services will include • Feature verification • anti-clustering technology  the Future? • anti-classification technology  The Future? Will this be the future of automated malware generation? Or will it just be more of the same?66
  • 67. Conclusion Today, what I hope that you learned is that if you want to truly understand your defensive technology you have to understand it’s limitations and look at things from an attacker/offensive viewpoint.67
  • 68. Conclusion Proper security is all about a defense-in-depth strategy. Create multiple layers of defense. Every layer presenting a different set of challenges, requiring different skill sets and technology. So every layer will increase the time and effort to compromise your environment and exfiltration data.68
  • 69. Conclusion External reconnaissance Penetration Internal reconnaissance + stage persistent state Exfiltration If security strategy is successful: via your layered defenses the attack is stopped before exfiltration of data can happen.69
  • 70. Questions? questions.py: while len(questions) > 0: if time <= 0: break print answers[questions.pop()]70
  • 71. Thanks Pacsec! Stephan Chenette | @StephanChenette Director of Research and Development IOActive, Inc. http://ioactive.com71

×