Malware Variants Identification in Practice

Motivation Challenges & Alternatives Experiments Concluding Remarks
Malware Variants Identification in Practice
Marcus Botacin1, André Grégio1, Paulo L´ıcio de Geus2
1Federal University of Paraná (UFPR) – {mfbotacin, gregio}@inf.ufpr.br
2University of Campinas (Unicamp) – paulo@lasca.ic.unicamp.br
SBSEG 2019
Malware Variants Identification in Practice 1 / 26 SBSeg’19

Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions

Motivation
Agenda
1 Motivation
Motivation
Challenge 1
Challenge 2
3 Experiments
Evaluation
Limitations
Conclusions

Motivation
Malware at Scale!
Figure: Source: https://www.infosecurity-magazine.com/news/
360k-new-malware-samples-every-day/
Figure: Source: https://money.cnn.com/2015/04/14/technology/
security/cyber-attack-hacks-security/index.html

Motivation
Current Approaches.
Figure: Function-based, Graph Modeling.

Challenge 1
Agenda
1 Motivation
Motivation
Challenge 1
Challenge 2
3 Experiments
Evaluation
Limitations
Conclusions

Challenge 1
Same-Behavior Function Replacement
CreateProcess
CreateFile
RegCreateKey
NtSuspendThread
Figure: Original
sample’s CG.
CreateThread
CreateFileTransacted
RegLoadKey
RtlWow64SuspendThread
Figure: Variant sample’s
CG.
Modularity
File System Change
Registry Change
Application Interference
Figure: Behavioral
graph from both
samples.

Challenge 1
Behavioral Classiﬁcation
Compression
Cryptography.
Debug.
Delay.
Environment.
Escalation.
Exfiltration.
Fingerprint.
File System.
Interference.
Internet.
Modularity.
Monitoring.
Registry.
Evidence Removal.
Side Effects.
System Changes.
Target Information.
Timing Attacks.

Challenge 1
Behavior-based Graph.
Figure: Behavior-based graph for a given sample.

Challenge 2
Agenda
1 Motivation
Motivation
Challenge 1
Challenge 2
3 Experiments
Evaluation
Limitations
Conclusions

Challenge 2
Malware Embedding
Figure: Sample 1. The original sample.
Figure: Sample 2. Variant sample embedding the original one.

Challenge 2
Matching Metrics
Definition
The similarity of two malware, represented as sets, A and B, of
vertices or edges of two graphs, is defined as:
Sim(A, B) =
|A ∩ B|
|A ∪ B|
(1)
Definition
The similarity of two malware, represented as sets, A and B, of
vertices or edges of two graphs, is defined as:
Sim(A, B) = max
|A ∩ B|
|B|
,
|B ∩ A|
|A|
(2)

Challenge 2
Continence Results
Table: Continence of
Sample 1 in Sample
2.
CG A B C
I 0.66 0.52 0.64
J 0.75 0.49 0.50
K 0.42 0.80 0.44
Table: Continence of
Sample 2 in Sample
1.
CG A B C
I 0.57 0.56 0.43
J 0.33 0.51 0.44
K 0.76 0.65 0.44
Table: Maximum
continence of Sample
1 and Sample 2.
CG A B C
I 0.66 0.56 0.64
J 0.75 0.51 0.50
K 0.76 0.80 0.44

Evaluation
Agenda
1 Motivation
Motivation
Challenge 1
Challenge 2
3 Experiments
Evaluation
Limitations
Conclusions

Evaluation
The Whitelisting Eﬀect.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Mimail.a x Mimail.c Mimail.c x Mimail.e Mimail.f x Mimail.g
Evaluating the whitelist effect.
Without whitelist
With whitelist
Figure: Evaluating whitelisting eﬀect. Similarity scores are higher when
using the whitelist-based approach.

Evaluation
Advantages of the Behavioral model.
0
0.1
0.2
0.3
0.4
0.5
Mimail.a x Klez.a Mimail.c x Klez.a Mimail.k x Klez.g
Comparing function−based to the behavior−based approaches.
Function−based
Behavior−based
Figure: Function vs. Behavior-based approaches. Scores are higher when
considering behavioral patterns.

Evaluation
Evaluating Metrics.
0
0.2
0.4
0.6
0.8
1
Mimail.a x Mimail.c Mimail.c x Mimail.e Mimail.f x Mimail.g
Evaluating the proposed metric.
Usual metric
Proposed metric
Figure: Proposed metric. Scores are higher when using it in comparison
to the usual one.

Evaluation
Solutions Comparison.
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
H x Q I x Q J x Q L x Q M x Q
Mimail’s similarity.
Solution #1
Solution #2
Our solution
Figure: Mimail’s sample similarity. Our solution’s scores are higher when
compared to other ones.

Evaluation
Domain Transformation and Similarity Measures.
0
10
20
30
40
50
60
70
80
90
100
50% 60% 70% 80% 90%
Threshold evaluation
F.Mimail
F.Klez
F.Cross
B.Mimail
B.Klez
B.Cross
Figure: Threshold evaluation. This should be higher than 80% in order to
proper label the cross dataset.

Evaluation
Real World Experiments (1/2)
Table: Identiﬁed variants among unknown, wild-collected samples.
Family Sample Hash Label
1
A c2ef1aabb15c979e932f5ea1d214cbeb Generic vb.OBY
B 747b9fe5819a76529abc161bb449b8eb Generic vb.OBO
C 39a04a11234d931bfa60d68ba8df9021 Generic vb.OBL
2
A 96d13246971e4368b9ed90c6f996a884 Atros4.CENI
B e23588078ba6a5f5ca1c961a8336ec08 Atros4.CENI
C 31a2b6adc781328cb1d77e5debb318ﬀ Atros4.CENI

Evaluation
Real World Experiments (2/2)
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
1.a x 1.b 1.a x 1.c 1.b x 1.c 2.a x 2.b 2.a x 2.c 2.b x 2.c
Study case − Identified variants
ssdeep
Our solution
Coverage
Figure: Study case: variant identiﬁcation. Our approach outperforms
others even on low coverage scenarios.

Limitations
Agenda
1 Motivation
Motivation
Challenge 1
Challenge 2
3 Experiments
Evaluation
Limitations
Conclusions

Limitations
Matching Complex Behaviors is Challenging!
Figure: DLL injection functions among other function calls.
Figure: Proposed DLL injection class.

Conclusions
Agenda
1 Motivation
Motivation
Challenge 1
Challenge 2
3 Experiments
Evaluation
Limitations
Conclusions

Conclusions
Conclusions
Challenges & Lessons
Anti-disassembly breaks CG extraction.
Transparent, dynamic tracing is a viable alternative.
Same-Function Replacement breaks malware clustering.
Behavior-based clustering is a viable alternative.
Dead code breaks similarity metrics.
Continence metric is a viable alternative.

Conclusions
Questions & Comments.
Contact
mfbotacin@inf.ufpr.br
Additional Material
https://github.com/marcusbotacin/Malware.Variants

Malware Variants Identification in Practice

Recommended

Recommended

More Related Content

Similar to Malware Variants Identification in Practice

Similar to Malware Variants Identification in Practice (20)

More from Marcus Botacin

More from Marcus Botacin (20)

Recently uploaded

Recently uploaded (20)

Malware Variants Identification in Practice