Inbot10 vxclass

Z
zynamics GmbHzynamics GmbH
VxClass
Clustering Malware, Generating
          Signatures




        zynamics GmbH
      info@zynamics.com
Overview

• Introduction
• What is VxClass? How does it work?
  – The Malware Pipeline
• Generating signatures from
  clusters of malware
Malware
• Keeping up with the flood of malware is hard:
  – Steady increase in number of new variants
     measured by “unique hash” (MD5, SHA1)
  – Off-the-shelf tools to produce malware-variants
     think Swizzor
  – AV-signature databases growing fast
     problems: duplicates, junk, false-positives, …
  – Time of human malware analysts is scarce
     don’t let them do repetitive and error-prone work
 Automated methods needed
VxClass
• Full infrastructure for processing new malware
  – Automated generic unpacking to remove crypters
     based on full-system emulation
  – Comparison engine to detect similarities
    between executables (based on zynamics BinDiff)
     often better than a human analyst
  – Clustering algorithms for grouping into “families”
     allows to visualize the malware phylogeny
  – Generation of byte-based AV-signatures
    for each cluster
Malware Pipeline
Malware is processed in the following stages
                                                                         Fast
 New           Unpacking   Memory       Disassembly     Graph
Malware                    Dumps                      Structure      Comparison
                Engine                     Engine
                                                                      Scheduler

                                               Approximizations
                                               used to schedule
                                               “full comparison”


     Byte      Signature     Malware                    Similarity   Comparison
  Signatures                 Clusters    Clusterer        Scores
               Generator                                               Engine
Unpacking Engine

• Full-system Emulation
  – Uses snapshot of fully booted Win XP SP3 (32-bit)
  – Malware is injected into the system
  – Different execution modes
     • Execute a pre-defined number of instructions
     • Examine repeated snapshots until “good enough”
  – Acquires new processes and new kernel memory
     injected memory into other processes is not yet acquired
 Full-system emulation solves many problems
     legacy APIs, API detections, …
Disassembly Engine

• Processes full executables
  or memory dumps
• Scan for and recognize typical
  function prologues
• Generates
  – Flowgraphs
  – Executable Callgraphs
• Compiler and library filtering (FLIRT)
Comparison Engine

The “lifeblood” of VxClass
  – Based on algorithms developed for
    the industry-standard zynamics BinDiff
  – Disregard byte sequences,
    focus on structural comparison
  – Operates on function flowgraphs and the
    callgraph structure
     performs comparison based on these structures
     instead of concrete byte/instruction sequences
  Highly resilient against compiler/platform changes
     different compilers and settings, even different CPUs (!)
Comparison Engine
Example: GCC vs. Visual C++
  below is SpiderMonkey versus escript.dll (Adobe Reader)
Comparison Engine
Example: Mac OS X vs. iPhone
  allows cross-CPU comparison
Scheduling Engine

• Calculating a full similarity matrix is O(n²)
      prohibitively expensive for large sample sets
• Need to reduce complexity
   Filter samples that are not similar at all
• Fast comparison engine using
  128-bit flowgraph “hashes” (MD-Index)
   – Calculating MD-Indices is fast
   – Fast comparison via search for common hashes
• Result of fast comparison prioritizes samples
      yields the complete matrix eventually
MD-Index
Hashing flowgraphs for fast database lookup




                               128-bit Hash Value




                         Result: Fast DB lookup for
                         particular functions
Clustering

• Based on the similarity matrix generate
  clusters in different ways
  – Compute connected components
        use a similarity threshold for graph edges
  – Apply phylogeny algorithms (bioinformatics)
        yields a family tree
  – Use any other clustering algorithm
Clustering

Example tree using phylogeny algorithms
Clustering
What to do with those clusters?
  tag, name and otherwise analyze them using the web interface
Signature Generator

• Fact: Items in the same cluster/family
  usually share a lot of code
  – Comparison algorithms work like an
    “intersection operator”:

      Executable A


                            Executable B

      Common Code
Signature Generator
• To build a signature, find functions “common to all
  elements of a cluster”
• Map matched similar functions in each executable:
Signature Generator


• Eliminate functions not present everywhere
Signature Generator
• A k-LCS algorithm makes sure only functions that are
  in the same order retained
    this works because functions are identified by their address
Signature Generator

• Repeat for the basic blocks in each executable
 Results in a “common core” of basic blocks
     these occur in all items of the cluster and in the same order
• Compute regular k-LCS to determine common byte
  sequences
     approximate (exact k-LCS has exponential complexity)
• Fill the “gaps” with wildcards:
  Worm.SigGen-20080603193253-1876:0:*:03ff*0100*4424*8d54
  24*8d4fff*525051*0100*83c418*85c07c*8b4424*8d4c24*4100*
  51c64424*02e8*5424*83c40c8d4424*52685c*4100*8b4424*8b46
  048b4e08*d1e085c9894604*4100*535657a8018bf175*84c074*8a
  d0b9*4100...


     Converted to ClamAV format
Signature Generator
    Stats: Classified 5000 random executables from VirusTotal and
    named resulting clusters.
    Signatures were generated and applied to 15000 new executables.

Cluster Name            # executables   sig. size in bytes   new detected variants

Win32.KillAV.Variants            183                1785                      111
Win32.Bacuy.Variants             599               27942                      863
Win32.SkinTrim                   173             290318                       356
Win32.SwizzorA                    15               69286                      929
Win32.WinTrim                    114                3925                      126
FakeAlert                         54                 460                        0
Win32.Chifrax                     12               40098                       26
Signature Generator

• What about false positives?
   – Scanning 22239 known-good executables (ClamAV)
   – Seemingly false positives were always due to bugs
      bugs in FLIRT, bugs in our library filtering
   – False positive rates empirically around 0.005%


• What about false negatives, then?
   – By construction always either valid signature or none
Signature Generator

• Signatures consist of malware bytes
  minus the “variable” bits
   generated signatures carry some
    “predictive power”
   All except one of the generated signatures caught
    some “new” variant of the same malware
Performance

• One (beefy) machine processes ≈1400
  samples/day
  – Includes unpacking
  – Higher performance if specific unpacking happens
    first
• Scalable: VxClass routinely runs on a
  4-machine cluster
  – Scaling to 20-25 machines should be possible
Limitations

• Heavy obfuscation of control flow
    breaks classification and leads to empty signatures
• Virtualizing packers
• Unpacking only works on 32-bit Windows
   – No Linux/Mac OS X/Mobile unpacking
   – 64-bit support is in the works
• Using the generated signatures in a AV product
  requires good unpacking capability
    signatures are generated post-unpacking

  But: Manual intervention possible (upload IDBs)
Questions?

   http://zynamics.com/
   http://blog.zynamics.com/



christian.blichmann@zynamics.com
1 of 26

Recommended

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap by
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapFelipe Prado
61 views50 slides
Fault tolerance by
Fault toleranceFault tolerance
Fault toleranceMichał Waleszczuk
215 views26 slides
Application Profiling for Memory and Performance by
Application Profiling for Memory and PerformanceApplication Profiling for Memory and Performance
Application Profiling for Memory and Performancepradeepfn
1.7K views30 slides
Group meeting: UniSan - Proactive Kernel Memory Initialization to Eliminate D... by
Group meeting: UniSan - Proactive Kernel Memory Initialization to Eliminate D...Group meeting: UniSan - Proactive Kernel Memory Initialization to Eliminate D...
Group meeting: UniSan - Proactive Kernel Memory Initialization to Eliminate D...Yu-Hsin Hung
168 views39 slides
JProfiler / an introduction by
JProfiler / an introductionJProfiler / an introduction
JProfiler / an introductionTommaso Torti
1.6K views29 slides
Owning computers without shell access dark by
Owning computers without shell access darkOwning computers without shell access dark
Owning computers without shell access darkRoyce Davis
3.5K views23 slides

More Related Content

What's hot

Introduction to .NET Performance Measurement by
Introduction to .NET Performance MeasurementIntroduction to .NET Performance Measurement
Introduction to .NET Performance MeasurementSasha Goldshtein
6.6K views19 slides
Profiler Guided Java Performance Tuning by
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuningosa_ora
2.8K views63 slides
Modern Evasion Techniques by
Modern Evasion TechniquesModern Evasion Techniques
Modern Evasion TechniquesJason Lang
5.2K views58 slides
Stealthy, Hypervisor-based Malware Analysis by
Stealthy, Hypervisor-based Malware AnalysisStealthy, Hypervisor-based Malware Analysis
Stealthy, Hypervisor-based Malware AnalysisTamas K Lengyel
13.1K views58 slides
Practical Malware Analysis: Ch 11: Malware Behavior by
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorSam Bowne
3.3K views59 slides
Nagios Conference 2011 - Michael Medin - NSClient++: Whats New by
Nagios Conference 2011 - Michael Medin - NSClient++: Whats NewNagios Conference 2011 - Michael Medin - NSClient++: Whats New
Nagios Conference 2011 - Michael Medin - NSClient++: Whats NewNagios
2.1K views69 slides

What's hot(10)

Introduction to .NET Performance Measurement by Sasha Goldshtein
Introduction to .NET Performance MeasurementIntroduction to .NET Performance Measurement
Introduction to .NET Performance Measurement
Sasha Goldshtein6.6K views
Profiler Guided Java Performance Tuning by osa_ora
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuning
osa_ora2.8K views
Modern Evasion Techniques by Jason Lang
Modern Evasion TechniquesModern Evasion Techniques
Modern Evasion Techniques
Jason Lang5.2K views
Stealthy, Hypervisor-based Malware Analysis by Tamas K Lengyel
Stealthy, Hypervisor-based Malware AnalysisStealthy, Hypervisor-based Malware Analysis
Stealthy, Hypervisor-based Malware Analysis
Tamas K Lengyel13.1K views
Practical Malware Analysis: Ch 11: Malware Behavior by Sam Bowne
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware Behavior
Sam Bowne3.3K views
Nagios Conference 2011 - Michael Medin - NSClient++: Whats New by Nagios
Nagios Conference 2011 - Michael Medin - NSClient++: Whats NewNagios Conference 2011 - Michael Medin - NSClient++: Whats New
Nagios Conference 2011 - Michael Medin - NSClient++: Whats New
Nagios2.1K views
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an... by SegInfo
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an..."Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
SegInfo 1.2K views
Sql Wars - SQL Clone vs Docker Containers by Alessandro Alpi
Sql Wars - SQL Clone vs Docker Containers Sql Wars - SQL Clone vs Docker Containers
Sql Wars - SQL Clone vs Docker Containers
Alessandro Alpi318 views
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba... by Sam Bowne
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
Sam Bowne2.1K views

Similar to Inbot10 vxclass

VxClass for Incident Response by
VxClass for Incident ResponseVxClass for Incident Response
VxClass for Incident Responsezynamics GmbH
617 views34 slides
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D... by
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Silvio Cesare
966 views22 slides
[RAT資安小聚] Study on Automatically Evading Malware Detection by
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware DetectionAj MaChInE
795 views71 slides
Scripting and Automation within the MAX Platform - Mark Petrie by
Scripting and Automation within the MAX Platform - Mark Petrie Scripting and Automation within the MAX Platform - Mark Petrie
Scripting and Automation within the MAX Platform - Mark Petrie MAXfocus
5.1K views60 slides
Malware Analysis and Defeating using Virtual Machines by
Malware Analysis and Defeating using Virtual MachinesMalware Analysis and Defeating using Virtual Machines
Malware Analysis and Defeating using Virtual Machinesintertelinvestigations
1.4K views74 slides
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas... by
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Lastline, Inc.
1.3K views70 slides

Similar to Inbot10 vxclass(20)

VxClass for Incident Response by zynamics GmbH
VxClass for Incident ResponseVxClass for Incident Response
VxClass for Incident Response
zynamics GmbH617 views
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D... by Silvio Cesare
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Silvio Cesare966 views
[RAT資安小聚] Study on Automatically Evading Malware Detection by Aj MaChInE
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection
Aj MaChInE795 views
Scripting and Automation within the MAX Platform - Mark Petrie by MAXfocus
Scripting and Automation within the MAX Platform - Mark Petrie Scripting and Automation within the MAX Platform - Mark Petrie
Scripting and Automation within the MAX Platform - Mark Petrie
MAXfocus 5.1K views
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas... by Lastline, Inc.
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Lastline, Inc.1.3K views
How to build observability into a serverless application by Yan Cui
How to build observability into a serverless applicationHow to build observability into a serverless application
How to build observability into a serverless application
Yan Cui669 views
Penetration Testing and Intrusion Detection System by Bikrant Gautam
Penetration Testing and Intrusion Detection SystemPenetration Testing and Intrusion Detection System
Penetration Testing and Intrusion Detection System
Bikrant Gautam1.2K views
The Future of Automated Malware Generation by Stephan Chenette
The Future of Automated Malware GenerationThe Future of Automated Malware Generation
The Future of Automated Malware Generation
Stephan Chenette7.7K views
Reading Group Presentation: The Power of Procrastination by Michael Rushanan
Reading Group Presentation: The Power of ProcrastinationReading Group Presentation: The Power of Procrastination
Reading Group Presentation: The Power of Procrastination
Michael Rushanan739 views
Leveraging Intra-Node Parallelization in HPCC Systems by HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems75 views
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up by DevOps_Fest
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give upDevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up
DevOps_Fest138 views
Reversing Microsoft patches to reveal vulnerable code by Harsimran Walia
Reversing Microsoft patches to reveal vulnerable codeReversing Microsoft patches to reveal vulnerable code
Reversing Microsoft patches to reveal vulnerable code
Harsimran Walia664 views
how-to-bypass-AM-PPL by nitinscribd
how-to-bypass-AM-PPLhow-to-bypass-AM-PPL
how-to-bypass-AM-PPL
nitinscribd305 views
GPCE16: Automatic Non-functional Testing of Code Generators Families by Mohamed BOUSSAA
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators Families
Mohamed BOUSSAA859 views
Exactly-once Stream Processing Done Right with Matthias J Sax by HostedbyConfluent
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
HostedbyConfluent819 views
Managing Millions of Tests Using Databricks by Databricks
Managing Millions of Tests Using DatabricksManaging Millions of Tests Using Databricks
Managing Millions of Tests Using Databricks
Databricks331 views

More from zynamics GmbH

How to really obfuscate your pdf malware by
How to really obfuscate   your pdf malwareHow to really obfuscate   your pdf malware
How to really obfuscate your pdf malwarezynamics GmbH
3.5K views52 slides
How to really obfuscate your pdf malware by
How to really obfuscate your pdf malwareHow to really obfuscate your pdf malware
How to really obfuscate your pdf malwarezynamics GmbH
1.6K views52 slides
Architectural Diversity (German) by
Architectural Diversity (German)Architectural Diversity (German)
Architectural Diversity (German)zynamics GmbH
2.1K views40 slides
Architectural Diversity (German) by
Architectural Diversity (German)Architectural Diversity (German)
Architectural Diversity (German)zynamics GmbH
557 views40 slides
Uni mannheim debuggers by
Uni mannheim debuggersUni mannheim debuggers
Uni mannheim debuggerszynamics GmbH
1.7K views53 slides
Introduction to mobile reversing by
Introduction to mobile reversingIntroduction to mobile reversing
Introduction to mobile reversingzynamics GmbH
963 views32 slides

More from zynamics GmbH(10)

How to really obfuscate your pdf malware by zynamics GmbH
How to really obfuscate   your pdf malwareHow to really obfuscate   your pdf malware
How to really obfuscate your pdf malware
zynamics GmbH3.5K views
How to really obfuscate your pdf malware by zynamics GmbH
How to really obfuscate your pdf malwareHow to really obfuscate your pdf malware
How to really obfuscate your pdf malware
zynamics GmbH1.6K views
Architectural Diversity (German) by zynamics GmbH
Architectural Diversity (German)Architectural Diversity (German)
Architectural Diversity (German)
zynamics GmbH2.1K views
Architectural Diversity (German) by zynamics GmbH
Architectural Diversity (German)Architectural Diversity (German)
Architectural Diversity (German)
zynamics GmbH557 views
Uni mannheim debuggers by zynamics GmbH
Uni mannheim debuggersUni mannheim debuggers
Uni mannheim debuggers
zynamics GmbH1.7K views
Introduction to mobile reversing by zynamics GmbH
Introduction to mobile reversingIntroduction to mobile reversing
Introduction to mobile reversing
zynamics GmbH963 views
0-knowledge fuzzing white paper by zynamics GmbH
0-knowledge fuzzing white paper0-knowledge fuzzing white paper
0-knowledge fuzzing white paper
zynamics GmbH488 views
Formale Methoden im Reverse Engineering by zynamics GmbH
Formale Methoden im Reverse EngineeringFormale Methoden im Reverse Engineering
Formale Methoden im Reverse Engineering
zynamics GmbH1.2K views

Inbot10 vxclass

  • 1. VxClass Clustering Malware, Generating Signatures zynamics GmbH info@zynamics.com
  • 2. Overview • Introduction • What is VxClass? How does it work? – The Malware Pipeline • Generating signatures from clusters of malware
  • 3. Malware • Keeping up with the flood of malware is hard: – Steady increase in number of new variants measured by “unique hash” (MD5, SHA1) – Off-the-shelf tools to produce malware-variants think Swizzor – AV-signature databases growing fast problems: duplicates, junk, false-positives, … – Time of human malware analysts is scarce don’t let them do repetitive and error-prone work  Automated methods needed
  • 4. VxClass • Full infrastructure for processing new malware – Automated generic unpacking to remove crypters based on full-system emulation – Comparison engine to detect similarities between executables (based on zynamics BinDiff) often better than a human analyst – Clustering algorithms for grouping into “families” allows to visualize the malware phylogeny – Generation of byte-based AV-signatures for each cluster
  • 5. Malware Pipeline Malware is processed in the following stages Fast New Unpacking Memory Disassembly Graph Malware Dumps Structure Comparison Engine Engine Scheduler Approximizations used to schedule “full comparison” Byte Signature Malware Similarity Comparison Signatures Clusters Clusterer Scores Generator Engine
  • 6. Unpacking Engine • Full-system Emulation – Uses snapshot of fully booted Win XP SP3 (32-bit) – Malware is injected into the system – Different execution modes • Execute a pre-defined number of instructions • Examine repeated snapshots until “good enough” – Acquires new processes and new kernel memory injected memory into other processes is not yet acquired Full-system emulation solves many problems legacy APIs, API detections, …
  • 7. Disassembly Engine • Processes full executables or memory dumps • Scan for and recognize typical function prologues • Generates – Flowgraphs – Executable Callgraphs • Compiler and library filtering (FLIRT)
  • 8. Comparison Engine The “lifeblood” of VxClass – Based on algorithms developed for the industry-standard zynamics BinDiff – Disregard byte sequences, focus on structural comparison – Operates on function flowgraphs and the callgraph structure performs comparison based on these structures instead of concrete byte/instruction sequences Highly resilient against compiler/platform changes different compilers and settings, even different CPUs (!)
  • 9. Comparison Engine Example: GCC vs. Visual C++ below is SpiderMonkey versus escript.dll (Adobe Reader)
  • 10. Comparison Engine Example: Mac OS X vs. iPhone allows cross-CPU comparison
  • 11. Scheduling Engine • Calculating a full similarity matrix is O(n²) prohibitively expensive for large sample sets • Need to reduce complexity Filter samples that are not similar at all • Fast comparison engine using 128-bit flowgraph “hashes” (MD-Index) – Calculating MD-Indices is fast – Fast comparison via search for common hashes • Result of fast comparison prioritizes samples yields the complete matrix eventually
  • 12. MD-Index Hashing flowgraphs for fast database lookup 128-bit Hash Value Result: Fast DB lookup for particular functions
  • 13. Clustering • Based on the similarity matrix generate clusters in different ways – Compute connected components use a similarity threshold for graph edges – Apply phylogeny algorithms (bioinformatics) yields a family tree – Use any other clustering algorithm
  • 14. Clustering Example tree using phylogeny algorithms
  • 15. Clustering What to do with those clusters? tag, name and otherwise analyze them using the web interface
  • 16. Signature Generator • Fact: Items in the same cluster/family usually share a lot of code – Comparison algorithms work like an “intersection operator”: Executable A Executable B Common Code
  • 17. Signature Generator • To build a signature, find functions “common to all elements of a cluster” • Map matched similar functions in each executable:
  • 18. Signature Generator • Eliminate functions not present everywhere
  • 19. Signature Generator • A k-LCS algorithm makes sure only functions that are in the same order retained this works because functions are identified by their address
  • 20. Signature Generator • Repeat for the basic blocks in each executable  Results in a “common core” of basic blocks these occur in all items of the cluster and in the same order • Compute regular k-LCS to determine common byte sequences approximate (exact k-LCS has exponential complexity) • Fill the “gaps” with wildcards: Worm.SigGen-20080603193253-1876:0:*:03ff*0100*4424*8d54 24*8d4fff*525051*0100*83c418*85c07c*8b4424*8d4c24*4100* 51c64424*02e8*5424*83c40c8d4424*52685c*4100*8b4424*8b46 048b4e08*d1e085c9894604*4100*535657a8018bf175*84c074*8a d0b9*4100... Converted to ClamAV format
  • 21. Signature Generator Stats: Classified 5000 random executables from VirusTotal and named resulting clusters. Signatures were generated and applied to 15000 new executables. Cluster Name # executables sig. size in bytes new detected variants Win32.KillAV.Variants 183 1785 111 Win32.Bacuy.Variants 599 27942 863 Win32.SkinTrim 173 290318 356 Win32.SwizzorA 15 69286 929 Win32.WinTrim 114 3925 126 FakeAlert 54 460 0 Win32.Chifrax 12 40098 26
  • 22. Signature Generator • What about false positives? – Scanning 22239 known-good executables (ClamAV) – Seemingly false positives were always due to bugs bugs in FLIRT, bugs in our library filtering – False positive rates empirically around 0.005% • What about false negatives, then? – By construction always either valid signature or none
  • 23. Signature Generator • Signatures consist of malware bytes minus the “variable” bits  generated signatures carry some “predictive power”  All except one of the generated signatures caught some “new” variant of the same malware
  • 24. Performance • One (beefy) machine processes ≈1400 samples/day – Includes unpacking – Higher performance if specific unpacking happens first • Scalable: VxClass routinely runs on a 4-machine cluster – Scaling to 20-25 machines should be possible
  • 25. Limitations • Heavy obfuscation of control flow breaks classification and leads to empty signatures • Virtualizing packers • Unpacking only works on 32-bit Windows – No Linux/Mac OS X/Mobile unpacking – 64-bit support is in the works • Using the generated signatures in a AV product requires good unpacking capability signatures are generated post-unpacking But: Manual intervention possible (upload IDBs)
  • 26. Questions? http://zynamics.com/ http://blog.zynamics.com/ christian.blichmann@zynamics.com