# Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

## by Silvio Cesare

• 1,512 views

### Categories

Uploaded via SlideShare as Microsoft PowerPoint

### 5 Embeds712

 http://www.foocodechu.com 612 http://foocodechu.com 36 http://jeetdesai.blogspot.com 36 http://jeetdesai.blogspot.in 27 http://jeetdesai.blogspot.de 1

### Statistics

Likes
0
32
0
Embed Views
712
Views on SlideShare
800
Total Views
1,512

## Malware Variant Detection Using Similarity Search over Sets of Control Flow GraphsPresentation Transcript

• •••••••
• ••••
• •••••••••
• ••••
• Program p Birthmark MATCH! Similar?Program q Birthmark Different The software similarity problem.
• ••••
• proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: }L_1 L_7 L_5: true }true L_7: return;L_2 L_4 } true L_5A control flow graph, its structuredform, and its string representation .
• •••••
• ••••
• ••• n d1 ( p, q ) p q1 pi qi i 1
• ••• d (r , q) R {r D} | 1 t q
• •••••••
• M1 S (P ) 1 M2 S ( P2 ) M 1 {ai M 1} {b j } : 1 M1 j M2 M 2 {ai M 2 } {b j } : 1 M2 j M1 C : M1 M 2 R a, if a M1, b M 2 C ( a, b) { b , if b M 2 , a M 2 ed (a, b), if a M 1 , b M 2Find a bijection f:M1’M2’ such that thedistance, d is minimized. d a M1 C (a, f (a))
• •••• d ( p, q ) p: p E, | 1 t , d ( p, q ) q q
• •••
• Samples MalwareUnknown New From Signature Database Sample Honeypots From Honeypot? New Dynamic Analysis No Signature End of Static Packed Yes Emulate Yes Unpacking? Classify Analysis No Non Malicious Malicious The Malwise malware classification system .
• •••••
• Malware Detection Rates Classification False Positives Algorithm Klez Netsky Roron Frethem Maximum 36 49 81 289Similarity K-Subgraphs Q-Grams Exact 20 29 17 139 0.0 1302161 2334251 Heuristic Approximate 20 27 43 144 0.1 463170 413667 Q-Grams 20 31 79 226 0.2 356345 40055 Optimal Distance 22 46 73 220 Q-Grams + 0.3 285202 7899 Optimal Distance 20 43 73 217 0.4 200326 3790 0.5 129790 327 False Positives with 10,000 0.6 46320 11 Malware 0.7 10784 0 Classification False FP Algorithm Positives Percentage 0.8 5883 0 Q-Grams 10 0.62 0.9 19 0 Q-Grams + Optimal 1.0 0 0 Distance 7 0.43
• ao b d e g k m q a ao b d e g k m q aao 0.44 0.28 0.27 0.28 0.55 0.44 0.44 0.47 ao 0.70 0.28 0.28 0.27 0.75 0.70 0.70 0.75b 0.44 0.27 0.27 0.27 0.51 1.00 1.00 0.58 b 0.74 0.31 0.34 0.33 0.82 1.00 1.00 0.87d 0.28 0.27 0.48 0.56 0.27 0.27 0.27 0.27 d 0.28 0.29 0.50 0.74 0.29 0.29 0.29 0.29e 0.27 0.27 0.48 0.59 0.27 0.27 0.27 0.27 e 0.31 0.34 0.50 0.64 0.32 0.34 0.34 0.33g 0.28 0.27 0.56 0.59 0.27 0.27 0.27 0.27 g 0.27 0.33 0.74 0.64 0.29 0.33 0.33 0.30k 0.55 0.51 0.27 0.27 0.27 0.51 0.51 0.75 k 0.75 0.82 0.29 0.30 0.29 0.82 0.82 0.96m 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 m 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87q 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 q 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87a 0.47 0.58 0.27 0.27 0.27 0.75 0.58 0.58 a 0.75 0.87 0.30 0.31 0.30 0.96 0.87 0.87 Exact Matching Heuristic Approximate Matching ao b d e g k m q a ao b d e g k m q aao 0.86 0.53 0.64 0.59 0.86 0.86 0.86 0.86 ao 0.86 0.49 0.54 0.50 0.87 0.86 0.86 0.86b 0.88 0.66 0.76 0.71 0.97 1.00 1.00 0.97 b 0.87 0.57 0.63 0.62 0.96 1.00 1.00 0.96d 0.65 0.72 0.88 0.93 0.73 0.72 0.72 0.73 d 0.61 0.64 0.85 0.91 0.64 0.64 0.64 0.64e 0.72 0.80 0.87 0.93 0.80 0.80 0.80 0.80 e 0.64 0.69 0.85 0.90 0.68 0.69 0.69 0.68g 0.69 0.77 0.93 0.93 0.77 0.77 0.77 0.77 g 0.62 0.68 0.91 0.91 0.68 0.68 0.68 0.68k 0.88 0.97 0.67 0.77 0.72 0.97 0.97 0.99 k 0.88 0.96 0.58 0.62 0.61 0.96 0.96 0.99m 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 m 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96q 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 q 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96a 0.87 0.97 0.67 0.77 0.72 0.99 0.97 0.97 a 0.87 0.96 0.58 0.62 0.61 0.99 0.96 0.96 Q-Grams Optimal Distance Using Assignment Problem
• •• Benign and Malicious Processing Time Benign Malware % Samples Time(s) Time(s) 10 0.02 0.16 20 0.02 0.28 30 0.03 0.30 40 0.03 0.36 50 0.06 0.84 60 0.09 0.94 70 0.13 0.97 80 0.25 1.03 90 0.56 1.31 100 8.06 585.16
• ••••••