Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification

© 2018 CrySyS Lab, BME
EXAMPLES OF LOCALITY SENSITIVE
HASHING AND THEIR USAGE FOR
MALWARE CLASSIFICATION
Csongor Tamás
Budapest University of Technology and Economics
csotam@crysys.hu
Supervisors: Dr. Boldizsár Bencsáth, Dr. Levente Buttyán
w w w . c r y s y s . h u

|
Problem statement
 We need a feed of fresh ransomware!
 Fresh – from last week or month
 But how?
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 2
NotPetya

|
Solution Concept
Old ransomware Feed of new files
Method
Fresh ransomware

|
Solution Concept
Search corpus Feed of new files
Method
Similar samples

|
Solution Concept
YARA rules Feed of new files
YARA rule matching
Fresh samples
Bad automatic
generation
Slow (0.015 s)

|
Solution Concept
Method
Fresh ransomware

|
Solution Concept
Method
Fresh ransomware
+ database
malware

|
Solution Concept
Method
Fresh ransomware
+ database
LSH
malware

|
Locality Sensitive Hashing
Examples for Locality Sensitive Hashing and their usage for malware similarity checking 10
 What is Locality Sensitive Hashing?
–similar data –> ˝similar hash˝
–„aims to maximize the probability of a
collision for similar items”
–Distance can be calculated between two
digests (hashes)
–Similar files (hashes) are ˝close˝ to each
other, others are ˝far˝

|
11
 SSDEEP
–Context Triggered Piecewise Hashing
Examples for Locality Sensitive Hashing and their usage for malware similarity checking

|
12
 SSDEEP
–Context Triggered Piecewise Hashing
 SDHASH
–Statistically improbable features
 TLSH
–TrendMicro Locality Sensitive Hash
–5-grams –> statistics –> hash

|
SSDEEP
13
o r h a n d s o f g o l
da r e a l w a y s c o l
d
,
F
o r l a n d s o f g o l
da r e a l w a y s c o l
d
,
F

|
14
 Reasons:
– Small data to store
– Fast automatic generation
– Fast comparison
YARA SSDEEP TLSH
0.015s 0.003s 0.002s
SSDEEP TLSH
0.100s 0.037s
YARA SSDEEP TLSH
Whole binary <110 bytes 70 bytes

|
15
 Reasons:
– Small data to store
– Fast automatic generation
– Fast comparison
YARA SSDEEP TLSH
0.015s 0.003s 0.002s
SSDEEP TLSH
0.100s 0.037s
YARA SSDEEP TLSH
Whole binary <110 bytes 70 bytes
But are they applicable?

|
Testing LSH on a small dataset
16
 Dataset:
–34681 real binaries
–NOT classified
 Clustering algorithms:
–1. simple – if two samples are ˝close˝ they
belong to the same group
–2. k-medoids – k group centers
–3. if similar to at least a few group members

|
Testing LSH on a small dataset
17
 Results:
–(evaluation by hand)
–Samples in the same group are similar
–SDHASH is not applicable
–SSDEEP score (˝closeness˝) is badly scaled
»0 - 100 (mismatch - perfect match)
–Similar samples in different groups
– TLSH appears to be the best for this application
»With threshold = 70

|
Search SSDEEP
18
 Original sample (GandCrabV4.X):
 Similars:

|
Search SSDEEP
19
 Similars:

|
Search SSDEEP
20
 Similars:

|
Search SSDEEP
21
 Similars:

|
Search TLSH
22
 Original sample (Saturn):
 Similars:

|
Search TLSH
23
 Similars:

|
 Similars:
Search TLSH
24Examples for Locality Sensitive Hashing and their usage for malware similarity checking

|
Moving on to the database
25
 Generate hashes for every sample
–~ 1-2 months
 Grouping algorithms use XREF
 XREF is not scalable
 300000000
2
* 0.002s ~= 2 853 881 y
 Search will do

|
Ransomware corpus & search
26
 Currently 477 samples from 15 families
 Search currently uses 1 process, 1 thread
 Search for similars to 1 sample
–SSDEEP –> ~10-20 minutes (prefix filter)
–TLSH –> ~50 minutes
 Search for similars to 477 samples
–SSDEEP –> 14 hours
–TLSH –> 29 hours

|
Search
Search corpus Malware database
LSH
Similar samples

|
Final Solution
Old ransomwares Feed of new files
LSH
Fresh ransomwares

|
Future work
29
 Parallelization
 Widen ransomware corpus
 Develop better LSH
 Label database

Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification

Recommended

Recommended

More Related Content

Similar to Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification

Similar to Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification (20)

More from hacktivity

More from hacktivity (11)

Recently uploaded

Recently uploaded (20)

Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification