•••••••
••••
•••••••••
••••
Program p   Birthmark              MATCH!                        Similar?Program q   Birthmark              Different The ...
••••
proc(){               L_0           L_0:                   W|IEH}R                               while (v1 || v2) {       ...
•••••
••••
•••                         n    d1 ( p, q )   p q1         pi   qi                         i 1
•••                  d (r , q)    R {r   D} | 1             t                     q
•••••••
M1    S (P )              1    M2     S ( P2 )    M 1  {ai      M 1} {b j } : 1       M1   j       M2    M 2  {ai      M 2...
••••                  d ( p, q )    p: p   E, | 1              t , d ( p, q )   q                      q
•••
Samples                                                              MalwareUnknown                                       ...
•••••
Malware Detection Rates                                          Classification        False Positives                    ...
ao       b       d      e      g      k     m       q      a         ao       b      d      e      g      k      m      q ...
••    Benign and Malicious    Processing Time                Benign      Malware    % Samples                Time(s)     T...
••••••
Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs
Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs
Upcoming SlideShare
Loading in …5
×

Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

2,432 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,432
On SlideShare
0
From Embeds
0
Number of Embeds
895
Actions
Shares
0
Downloads
64
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

  1. 1. •••••••
  2. 2. ••••
  3. 3. •••••••••
  4. 4. ••••
  5. 5. Program p Birthmark MATCH! Similar?Program q Birthmark Different The software similarity problem.
  6. 6. ••••
  7. 7. proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: }L_1 L_7 L_5: true }true L_7: return;L_2 L_4 } true L_5A control flow graph, its structuredform, and its string representation .
  8. 8. •••••
  9. 9. ••••
  10. 10. ••• n d1 ( p, q ) p q1 pi qi i 1
  11. 11. ••• d (r , q) R {r D} | 1 t q
  12. 12. •••••••
  13. 13. M1 S (P ) 1 M2 S ( P2 ) M 1 {ai M 1} {b j } : 1 M1 j M2 M 2 {ai M 2 } {b j } : 1 M2 j M1 C : M1 M 2 R a, if a M1, b M 2 C ( a, b) { b , if b M 2 , a M 2 ed (a, b), if a M 1 , b M 2Find a bijection f:M1’M2’ such that thedistance, d is minimized. d a M1 C (a, f (a))
  14. 14. •••• d ( p, q ) p: p E, | 1 t , d ( p, q ) q q
  15. 15. •••
  16. 16. Samples MalwareUnknown New From Signature Database Sample Honeypots From Honeypot? New Dynamic Analysis No Signature End of Static Packed Yes Emulate Yes Unpacking? Classify Analysis No Non Malicious Malicious The Malwise malware classification system .
  17. 17. •••••
  18. 18. Malware Detection Rates Classification False Positives Algorithm Klez Netsky Roron Frethem Maximum 36 49 81 289Similarity K-Subgraphs Q-Grams Exact 20 29 17 139 0.0 1302161 2334251 Heuristic Approximate 20 27 43 144 0.1 463170 413667 Q-Grams 20 31 79 226 0.2 356345 40055 Optimal Distance 22 46 73 220 Q-Grams + 0.3 285202 7899 Optimal Distance 20 43 73 217 0.4 200326 3790 0.5 129790 327 False Positives with 10,000 0.6 46320 11 Malware 0.7 10784 0 Classification False FP Algorithm Positives Percentage 0.8 5883 0 Q-Grams 10 0.62 0.9 19 0 Q-Grams + Optimal 1.0 0 0 Distance 7 0.43
  19. 19. ao b d e g k m q a ao b d e g k m q aao 0.44 0.28 0.27 0.28 0.55 0.44 0.44 0.47 ao 0.70 0.28 0.28 0.27 0.75 0.70 0.70 0.75b 0.44 0.27 0.27 0.27 0.51 1.00 1.00 0.58 b 0.74 0.31 0.34 0.33 0.82 1.00 1.00 0.87d 0.28 0.27 0.48 0.56 0.27 0.27 0.27 0.27 d 0.28 0.29 0.50 0.74 0.29 0.29 0.29 0.29e 0.27 0.27 0.48 0.59 0.27 0.27 0.27 0.27 e 0.31 0.34 0.50 0.64 0.32 0.34 0.34 0.33g 0.28 0.27 0.56 0.59 0.27 0.27 0.27 0.27 g 0.27 0.33 0.74 0.64 0.29 0.33 0.33 0.30k 0.55 0.51 0.27 0.27 0.27 0.51 0.51 0.75 k 0.75 0.82 0.29 0.30 0.29 0.82 0.82 0.96m 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 m 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87q 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 q 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87a 0.47 0.58 0.27 0.27 0.27 0.75 0.58 0.58 a 0.75 0.87 0.30 0.31 0.30 0.96 0.87 0.87 Exact Matching Heuristic Approximate Matching ao b d e g k m q a ao b d e g k m q aao 0.86 0.53 0.64 0.59 0.86 0.86 0.86 0.86 ao 0.86 0.49 0.54 0.50 0.87 0.86 0.86 0.86b 0.88 0.66 0.76 0.71 0.97 1.00 1.00 0.97 b 0.87 0.57 0.63 0.62 0.96 1.00 1.00 0.96d 0.65 0.72 0.88 0.93 0.73 0.72 0.72 0.73 d 0.61 0.64 0.85 0.91 0.64 0.64 0.64 0.64e 0.72 0.80 0.87 0.93 0.80 0.80 0.80 0.80 e 0.64 0.69 0.85 0.90 0.68 0.69 0.69 0.68g 0.69 0.77 0.93 0.93 0.77 0.77 0.77 0.77 g 0.62 0.68 0.91 0.91 0.68 0.68 0.68 0.68k 0.88 0.97 0.67 0.77 0.72 0.97 0.97 0.99 k 0.88 0.96 0.58 0.62 0.61 0.96 0.96 0.99m 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 m 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96q 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 q 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96a 0.87 0.97 0.67 0.77 0.72 0.99 0.97 0.97 a 0.87 0.96 0.58 0.62 0.61 0.99 0.96 0.96 Q-Grams Optimal Distance Using Assignment Problem
  20. 20. •• Benign and Malicious Processing Time Benign Malware % Samples Time(s) Time(s) 10 0.02 0.16 20 0.02 0.28 30 0.03 0.30 40 0.03 0.36 50 0.06 0.84 60 0.09 0.94 70 0.13 0.97 80 0.25 1.03 90 0.56 1.31 100 8.06 585.16
  21. 21. ••••••

×