SlideShare a Scribd company logo
•

•

•

•

•

•


•
•


•


•


•
•
•
•
•
•
•
•
•
•
•

•

•

•
Program p   Birthmark              MATCH!



                        Similar?



Program q   Birthmark              Different




 The software similarity problem.
•

•

•

•
proc(){
               L_0           L_0:                   W|IEH}R
                               while (v1 || v2) {
               L_3           L_1:
                                 if (v3) {
 true                        L_2:
               L_6
                                 } else {
        true                 L_4:
                                 }
L_1            L_7           L_5:
                      true     }
true                         L_7:
                               return;
L_2            L_4
                             }
               true

               L_5




A control flow graph, its structured
form, and its string representation                           .
•


•

•

•

•
•

•

•

•
•


•

•

                         n
    d1 ( p, q )   p q1         pi   qi
                         i 1
•

•

•

                  d (r , q)
    R {r   D} | 1             t
                     q
•

•


•

•

•

•

•
M1    S (P )
              1

    M2     S ( P2 )
    M 1 ' {ai      M 1} {b j } : 1       M1   j       M2
    M 2 ' {ai      M 2 } {b j } : 1      M2       j   M1
    C : M1' M 2 '       R
                      a,          if a   M1, b M 2
    C ( a, b) { b ,       if b M 2 , a M 2
               ed (a, b), if a M 1 , b M 2

Find a bijection f:M1’M2’ such that the
distance, d is minimized.
            d         a M1 '
                               C (a, f (a))
•

•

•


•
                  d ( p, q )
    p: p   E, | 1              t , d ( p, q )   q
                      q
•


•

•
Samples                                                              Malware
Unknown                                                                    New
                 From                                                    Signature
                                                                                     Database
 Sample
               Honeypots




                                                                        From
                                                                      Honeypot?                       New
                          Dynamic Analysis
                                              No                                                    Signature




                                                     End of            Static
          Packed    Yes          Emulate                        Yes
                                                   Unpacking?                        Classify
                                                                      Analysis




                                             No
                                                                          Non
                                                                                     Malicious
                                                                        Malicious




  The Malwise malware classification system                                                     .
•



•

•

•



•
Malware Detection Rates
                                          Classification
        False Positives                    Algorithm
                                                                 Klez           Netsky               Roron        Frethem

                                          Maximum                       36                49                 81             289
Similarity   K-Subgraphs   Q-Grams
                                          Exact                         20                29                 17             139
       0.0       1302161     2334251      Heuristic
                                          Approximate                   20                27                 43             144
       0.1        463170      413667      Q-Grams                       20                31                 79             226
       0.2        356345       40055      Optimal Distance              22                46                 73             220
                                          Q-Grams +
       0.3        285202        7899      Optimal Distance              20                43                 73             217
       0.4        200326        3790
       0.5        129790         327            False Positives with 10,000
       0.6         46320             11         Malware
       0.7         10784             0              Classification       False              FP
                                                     Algorithm          Positives       Percentage
       0.8          5883             0
                                                Q-Grams                         10             0.62
       0.9            19             0
                                                Q-Grams + Optimal
       1.0             0             0          Distance                            7          0.43
ao       b       d      e      g      k     m       q      a         ao       b      d      e      g      k      m      q      a
ao          0.44    0.28   0.27   0.28   0.55   0.44   0.44   0.47   ao          0.70   0.28   0.28   0.27   0.75   0.70   0.70   0.75
b    0.44           0.27   0.27   0.27   0.51   1.00   1.00   0.58   b    0.74          0.31   0.34   0.33   0.82   1.00   1.00   0.87
d    0.28   0.27           0.48   0.56   0.27   0.27   0.27   0.27   d    0.28   0.29          0.50   0.74   0.29   0.29   0.29   0.29
e    0.27   0.27    0.48          0.59   0.27   0.27   0.27   0.27   e    0.31   0.34   0.50          0.64   0.32   0.34   0.34   0.33
g    0.28   0.27    0.56   0.59          0.27   0.27   0.27   0.27   g    0.27   0.33   0.74   0.64          0.29   0.33   0.33   0.30
k    0.55   0.51    0.27   0.27   0.27          0.51   0.51   0.75   k    0.75   0.82   0.29   0.30   0.29          0.82   0.82   0.96
m    0.44   1.00    0.27   0.27   0.27   0.51          1.00   0.58   m    0.74   1.00   0.31   0.34   0.33   0.82          1.00   0.87
q    0.44   1.00    0.27   0.27   0.27   0.51   1.00          0.58   q    0.74   1.00   0.31   0.34   0.33   0.82   1.00          0.87
a    0.47   0.58    0.27   0.27   0.27   0.75   0.58   0.58          a    0.75   0.87   0.30   0.31   0.30   0.96   0.87   0.87


                   Exact Matching                                    Heuristic Approximate Matching
      ao       b       d      e      g      k     m       q      a         ao       b      d      e      g      k      m      q      a
ao          0.86    0.53   0.64   0.59   0.86   0.86   0.86   0.86   ao          0.86   0.49   0.54   0.50   0.87   0.86   0.86   0.86
b    0.88           0.66   0.76   0.71   0.97   1.00   1.00   0.97   b    0.87          0.57   0.63   0.62   0.96   1.00   1.00   0.96
d    0.65   0.72           0.88   0.93   0.73   0.72   0.72   0.73   d    0.61   0.64          0.85   0.91   0.64   0.64   0.64   0.64
e    0.72   0.80    0.87          0.93   0.80   0.80   0.80   0.80   e    0.64   0.69   0.85          0.90   0.68   0.69   0.69   0.68
g    0.69   0.77    0.93   0.93          0.77   0.77   0.77   0.77   g    0.62   0.68   0.91   0.91          0.68   0.68   0.68   0.68
k    0.88   0.97    0.67   0.77   0.72          0.97   0.97   0.99   k    0.88   0.96   0.58   0.62   0.61          0.96   0.96   0.99
m    0.88   1.00    0.66   0.76   0.71   0.97          1.00   0.97   m    0.87   1.00   0.57   0.63   0.62   0.96          1.00   0.96
q    0.88   1.00    0.66   0.76   0.71   0.97   1.00          0.97   q    0.87   1.00   0.57   0.63   0.62   0.96   1.00          0.96
a    0.87   0.97    0.67   0.77   0.72   0.99   0.97   0.97          a    0.87   0.96   0.58   0.62   0.61   0.99   0.96   0.96


                           Q-Grams                                   Optimal Distance Using
                                                                     Assignment Problem
•
•
    Benign and Malicious
    Processing Time
                Benign      Malware
    % Samples
                Time(s)     Time(s)
           10        0.02        0.16
           20        0.02        0.28
           30        0.03        0.30
           40        0.03        0.36
           50        0.06        0.84
           60        0.09        0.94
           70        0.13        0.97
           80        0.25        1.03
           90        0.56        1.31
          100        8.06      585.16
•

•


•


•


•

•

More Related Content

More from Silvio Cesare

Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
 
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Silvio Cesare
 
Wire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisWire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary Analysis
Silvio Cesare
 
Simseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSimseer - A Software Similarity Web Service
Simseer - A Software Similarity Web Service
Silvio Cesare
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware Classification
Silvio Cesare
 
Automated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxAutomated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in Linux
Silvio Cesare
 
Simple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSimple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux Distributions
Silvio Cesare
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of Malware
Silvio Cesare
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
Silvio Cesare
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
Silvio Cesare
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For Emulation
Silvio Cesare
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
Silvio Cesare
 

More from Silvio Cesare (12)

Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
 
Wire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisWire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary Analysis
 
Simseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSimseer - A Software Similarity Web Service
Simseer - A Software Similarity Web Service
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware Classification
 
Automated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxAutomated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in Linux
 
Simple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSimple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux Distributions
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of Malware
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For Emulation
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

  • 1.
  • 6.
  • 7. Program p Birthmark MATCH! Similar? Program q Birthmark Different The software similarity problem.
  • 9. proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5 A control flow graph, its structured form, and its string representation .
  • 12. • • • n d1 ( p, q ) p q1 pi qi i 1
  • 13. • • • d (r , q) R {r D} | 1 t q
  • 15. M1 S (P ) 1 M2 S ( P2 ) M 1 ' {ai M 1} {b j } : 1 M1 j M2 M 2 ' {ai M 2 } {b j } : 1 M2 j M1 C : M1' M 2 ' R a, if a M1, b M 2 C ( a, b) { b , if b M 2 , a M 2 ed (a, b), if a M 1 , b M 2 Find a bijection f:M1’M2’ such that the distance, d is minimized. d a M1 ' C (a, f (a))
  • 16. • • • • d ( p, q ) p: p E, | 1 t , d ( p, q ) q q
  • 18. Samples Malware Unknown New From Signature Database Sample Honeypots From Honeypot? New Dynamic Analysis No Signature End of Static Packed Yes Emulate Yes Unpacking? Classify Analysis No Non Malicious Malicious The Malwise malware classification system .
  • 20. Malware Detection Rates Classification False Positives Algorithm Klez Netsky Roron Frethem Maximum 36 49 81 289 Similarity K-Subgraphs Q-Grams Exact 20 29 17 139 0.0 1302161 2334251 Heuristic Approximate 20 27 43 144 0.1 463170 413667 Q-Grams 20 31 79 226 0.2 356345 40055 Optimal Distance 22 46 73 220 Q-Grams + 0.3 285202 7899 Optimal Distance 20 43 73 217 0.4 200326 3790 0.5 129790 327 False Positives with 10,000 0.6 46320 11 Malware 0.7 10784 0 Classification False FP Algorithm Positives Percentage 0.8 5883 0 Q-Grams 10 0.62 0.9 19 0 Q-Grams + Optimal 1.0 0 0 Distance 7 0.43
  • 21. ao b d e g k m q a ao b d e g k m q a ao 0.44 0.28 0.27 0.28 0.55 0.44 0.44 0.47 ao 0.70 0.28 0.28 0.27 0.75 0.70 0.70 0.75 b 0.44 0.27 0.27 0.27 0.51 1.00 1.00 0.58 b 0.74 0.31 0.34 0.33 0.82 1.00 1.00 0.87 d 0.28 0.27 0.48 0.56 0.27 0.27 0.27 0.27 d 0.28 0.29 0.50 0.74 0.29 0.29 0.29 0.29 e 0.27 0.27 0.48 0.59 0.27 0.27 0.27 0.27 e 0.31 0.34 0.50 0.64 0.32 0.34 0.34 0.33 g 0.28 0.27 0.56 0.59 0.27 0.27 0.27 0.27 g 0.27 0.33 0.74 0.64 0.29 0.33 0.33 0.30 k 0.55 0.51 0.27 0.27 0.27 0.51 0.51 0.75 k 0.75 0.82 0.29 0.30 0.29 0.82 0.82 0.96 m 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 m 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87 q 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 q 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87 a 0.47 0.58 0.27 0.27 0.27 0.75 0.58 0.58 a 0.75 0.87 0.30 0.31 0.30 0.96 0.87 0.87 Exact Matching Heuristic Approximate Matching ao b d e g k m q a ao b d e g k m q a ao 0.86 0.53 0.64 0.59 0.86 0.86 0.86 0.86 ao 0.86 0.49 0.54 0.50 0.87 0.86 0.86 0.86 b 0.88 0.66 0.76 0.71 0.97 1.00 1.00 0.97 b 0.87 0.57 0.63 0.62 0.96 1.00 1.00 0.96 d 0.65 0.72 0.88 0.93 0.73 0.72 0.72 0.73 d 0.61 0.64 0.85 0.91 0.64 0.64 0.64 0.64 e 0.72 0.80 0.87 0.93 0.80 0.80 0.80 0.80 e 0.64 0.69 0.85 0.90 0.68 0.69 0.69 0.68 g 0.69 0.77 0.93 0.93 0.77 0.77 0.77 0.77 g 0.62 0.68 0.91 0.91 0.68 0.68 0.68 0.68 k 0.88 0.97 0.67 0.77 0.72 0.97 0.97 0.99 k 0.88 0.96 0.58 0.62 0.61 0.96 0.96 0.99 m 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 m 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96 q 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 q 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96 a 0.87 0.97 0.67 0.77 0.72 0.99 0.97 0.97 a 0.87 0.96 0.58 0.62 0.61 0.99 0.96 0.96 Q-Grams Optimal Distance Using Assignment Problem
  • 22. • • Benign and Malicious Processing Time Benign Malware % Samples Time(s) Time(s) 10 0.02 0.16 20 0.02 0.28 30 0.03 0.30 40 0.03 0.36 50 0.06 0.84 60 0.09 0.94 70 0.13 0.97 80 0.25 1.03 90 0.56 1.31 100 8.06 585.16