SlideShare a Scribd company logo
© 2018 CrySyS Lab, BME
EXAMPLES OF LOCALITY SENSITIVE
HASHING AND THEIR USAGE FOR
MALWARE CLASSIFICATION
Csongor Tamás
Budapest University of Technology and Economics
csotam@crysys.hu
Supervisors: Dr. Boldizsár Bencsáth, Dr. Levente Buttyán
w w w . c r y s y s . h u
|
Problem statement
 We need a feed of fresh ransomware!
 Fresh – from last week or month
 But how?
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 2
NotPetya
|
Solution Concept
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 3
Old ransomware Feed of new files
Method
Fresh ransomware
|
Solution Concept
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 4
Search corpus Feed of new files
Method
Similar samples
|
Solution Concept
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 5
YARA rules Feed of new files
YARA rule matching
Fresh samples
Bad automatic
generation
Slow (0.015 s)
|
Solution Concept
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 6
Old ransomware Feed of new files
Method
Fresh ransomware
|
Solution Concept
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 7
Old ransomware Feed of new files
Method
Fresh ransomware
+ database
malware
|
Solution Concept
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 8
Old ransomware Feed of new files
Method
Fresh ransomware
+ database
malware
|
Solution Concept
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 9
Old ransomware Feed of new files
Method
Fresh ransomware
+ database
LSH
malware
|
Locality Sensitive Hashing
Examples for Locality Sensitive Hashing and their usage for malware similarity checking 10
 What is Locality Sensitive Hashing?
–similar data –> ˝similar hash˝
–„aims to maximize the probability of a
collision for similar items”
–Distance can be calculated between two
digests (hashes)
–Similar files (hashes) are ˝close˝ to each
other, others are ˝far˝
|
Locality Sensitive Hashing
11
 SSDEEP
–Context Triggered Piecewise Hashing
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Locality Sensitive Hashing
12
 SSDEEP
–Context Triggered Piecewise Hashing
 SDHASH
–Statistically improbable features
 TLSH
–TrendMicro Locality Sensitive Hash
–5-grams –> statistics –> hash
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
SSDEEP
13
o r h a n d s o f g o l
da r e a l w a y s c o l
d
,
F
o r l a n d s o f g o l
da r e a l w a y s c o l
d
,
F
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Locality Sensitive Hashing
14
 Reasons:
– Small data to store
– Fast automatic generation
– Fast comparison
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
YARA SSDEEP TLSH
0.015s 0.003s 0.002s
SSDEEP TLSH
0.100s 0.037s
YARA SSDEEP TLSH
Whole binary <110 bytes 70 bytes
|
Locality Sensitive Hashing
15
 Reasons:
– Small data to store
– Fast automatic generation
– Fast comparison
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
YARA SSDEEP TLSH
0.015s 0.003s 0.002s
SSDEEP TLSH
0.100s 0.037s
YARA SSDEEP TLSH
Whole binary <110 bytes 70 bytes
But are they applicable?
|
Testing LSH on a small dataset
16
 Dataset:
–34681 real binaries
–NOT classified
 Clustering algorithms:
–1. simple – if two samples are ˝close˝ they
belong to the same group
–2. k-medoids – k group centers
–3. if similar to at least a few group members
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Testing LSH on a small dataset
17
 Results:
–(evaluation by hand)
–Samples in the same group are similar
–SDHASH is not applicable
–SSDEEP score (˝closeness˝) is badly scaled
»0 - 100 (mismatch - perfect match)
–Similar samples in different groups
– TLSH appears to be the best for this application
»With threshold = 70
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Search SSDEEP
18
 Original sample (GandCrabV4.X):
 Similars:
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Search SSDEEP
19
 Original sample (GandCrabV4.X):
 Similars:
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Search SSDEEP
20
 Original sample (GandCrabV4.X):
 Similars:
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Search SSDEEP
21
 Original sample (GandCrabV4.X):
 Similars:
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Search TLSH
22
 Original sample (Saturn):
 Similars:
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Search TLSH
23
 Original sample (Saturn):
 Similars:
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
 Original sample (Saturn):
 Similars:
Search TLSH
24Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Moving on to the database
25
 Generate hashes for every sample
–~ 1-2 months
 Grouping algorithms use XREF
 XREF is not scalable
 300000000
2
* 0.002s ~= 2 853 881 y
 Search will do
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Ransomware corpus & search
26
 Currently 477 samples from 15 families
 Search currently uses 1 process, 1 thread
 Search for similars to 1 sample
–SSDEEP –> ~10-20 minutes (prefix filter)
–TLSH –> ~50 minutes
 Search for similars to 477 samples
–SSDEEP –> 14 hours
–TLSH –> 29 hours
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
|
Search
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 27
Search corpus Malware database
LSH
Similar samples
|
Final Solution
EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 28
Old ransomwares Feed of new files
LSH
Fresh ransomwares
|
Future work
29
 Parallelization
 Widen ransomware corpus
 Develop better LSH
 Label database
Examples for Locality Sensitive Hashing and their usage for malware similarity checking
© 2018 CrySyS Lab, BME
Questions?

More Related Content

Similar to Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification

Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)J Singh
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
Kishor Datta Gupta
 
Blast
BlastBlast
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
ericndunek
 
Database Searching
Database SearchingDatabase Searching
Database Searching
Meghaj Mallick
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
mikaelhuss
 
Regular Expression Denial of Service RegexDoS
Regular Expression Denial of  Service RegexDoSRegular Expression Denial of  Service RegexDoS
Regular Expression Denial of Service RegexDoS
Michael Hidalgo
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
Ptidej Team
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashingMR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
FFRI, Inc.
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
ALLIENU
 
Are RESTful APIs Well-designed? Detection of their Linguistic (Anti)Patterns
Are RESTful APIs Well-designed? Detection of their Linguistic (Anti)PatternsAre RESTful APIs Well-designed? Detection of their Linguistic (Anti)Patterns
Are RESTful APIs Well-designed? Detection of their Linguistic (Anti)Patterns
Francis Palma
 
An lsh based blocking approach with a homomorphic matching technique for priv...
An lsh based blocking approach with a homomorphic matching technique for priv...An lsh based blocking approach with a homomorphic matching technique for priv...
An lsh based blocking approach with a homomorphic matching technique for priv...
ieeepondy
 
Malwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionMalwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant Extraction
IOSR Journals
 
2021_TLSH_SOC_pub.pdf
2021_TLSH_SOC_pub.pdf2021_TLSH_SOC_pub.pdf
2021_TLSH_SOC_pub.pdf
JonathanOliver26
 
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacksNilu Desai
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareSilvio Cesare
 
Sequence database
Sequence databaseSequence database
Sequence database
Dr.M.Prasad Naidu
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Universitas Pembangunan Panca Budi
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
barathvaj
 

Similar to Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification (20)

Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
Blast
BlastBlast
Blast
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Regular Expression Denial of Service RegexDoS
Regular Expression Denial of  Service RegexDoSRegular Expression Denial of  Service RegexDoS
Regular Expression Denial of Service RegexDoS
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashingMR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Are RESTful APIs Well-designed? Detection of their Linguistic (Anti)Patterns
Are RESTful APIs Well-designed? Detection of their Linguistic (Anti)PatternsAre RESTful APIs Well-designed? Detection of their Linguistic (Anti)Patterns
Are RESTful APIs Well-designed? Detection of their Linguistic (Anti)Patterns
 
An lsh based blocking approach with a homomorphic matching technique for priv...
An lsh based blocking approach with a homomorphic matching technique for priv...An lsh based blocking approach with a homomorphic matching technique for priv...
An lsh based blocking approach with a homomorphic matching technique for priv...
 
Malwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionMalwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant Extraction
 
2021_TLSH_SOC_pub.pdf
2021_TLSH_SOC_pub.pdf2021_TLSH_SOC_pub.pdf
2021_TLSH_SOC_pub.pdf
 
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacks
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of Malware
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 

More from hacktivity

Zsombor Kovács - Cheaters for Everything from Minesweeper to Mobile Banking ...
Zsombor Kovács - 	Cheaters for Everything from Minesweeper to Mobile Banking ...Zsombor Kovács - 	Cheaters for Everything from Minesweeper to Mobile Banking ...
Zsombor Kovács - Cheaters for Everything from Minesweeper to Mobile Banking ...
hacktivity
 
Vincent Ruijter - ~Securing~ Attacking Kubernetes
Vincent Ruijter - ~Securing~ Attacking KubernetesVincent Ruijter - ~Securing~ Attacking Kubernetes
Vincent Ruijter - ~Securing~ Attacking Kubernetes
hacktivity
 
Balázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a TunnelBalázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a Tunnel
hacktivity
 
Mikhail Egorov - Hunting for bugs in Adobe Experience Manager webapps
Mikhail Egorov - Hunting for bugs in Adobe Experience Manager webappsMikhail Egorov - Hunting for bugs in Adobe Experience Manager webapps
Mikhail Egorov - Hunting for bugs in Adobe Experience Manager webapps
hacktivity
 
Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...
Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...
Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...
hacktivity
 
Gabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle Weaponization
Gabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle WeaponizationGabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle Weaponization
Gabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle Weaponization
hacktivity
 
Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...
Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...
Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...
hacktivity
 
Gergely Biczók - Interdependent Privacy & the Psychology of Likes
Gergely Biczók - Interdependent Privacy & the Psychology of LikesGergely Biczók - Interdependent Privacy & the Psychology of Likes
Gergely Biczók - Interdependent Privacy & the Psychology of Likes
hacktivity
 
Paolo Stagno - A Drone Tale: All Your Drones Belong To Us
Paolo Stagno - A Drone Tale: All Your Drones Belong To UsPaolo Stagno - A Drone Tale: All Your Drones Belong To Us
Paolo Stagno - A Drone Tale: All Your Drones Belong To Us
hacktivity
 
Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.
Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.
Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.
hacktivity
 
Zoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m Five
Zoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m FiveZoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m Five
Zoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m Five
hacktivity
 

More from hacktivity (11)

Zsombor Kovács - Cheaters for Everything from Minesweeper to Mobile Banking ...
Zsombor Kovács - 	Cheaters for Everything from Minesweeper to Mobile Banking ...Zsombor Kovács - 	Cheaters for Everything from Minesweeper to Mobile Banking ...
Zsombor Kovács - Cheaters for Everything from Minesweeper to Mobile Banking ...
 
Vincent Ruijter - ~Securing~ Attacking Kubernetes
Vincent Ruijter - ~Securing~ Attacking KubernetesVincent Ruijter - ~Securing~ Attacking Kubernetes
Vincent Ruijter - ~Securing~ Attacking Kubernetes
 
Balázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a TunnelBalázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a Tunnel
 
Mikhail Egorov - Hunting for bugs in Adobe Experience Manager webapps
Mikhail Egorov - Hunting for bugs in Adobe Experience Manager webappsMikhail Egorov - Hunting for bugs in Adobe Experience Manager webapps
Mikhail Egorov - Hunting for bugs in Adobe Experience Manager webapps
 
Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...
Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...
Rodrigo Branco - How Offensive Security is Defining the Way We Compute // Key...
 
Gabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle Weaponization
Gabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle WeaponizationGabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle Weaponization
Gabrial Cirlig & Stefan Tanase - Smart Car Forensics and Vehicle Weaponization
 
Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...
Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...
Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...
 
Gergely Biczók - Interdependent Privacy & the Psychology of Likes
Gergely Biczók - Interdependent Privacy & the Psychology of LikesGergely Biczók - Interdependent Privacy & the Psychology of Likes
Gergely Biczók - Interdependent Privacy & the Psychology of Likes
 
Paolo Stagno - A Drone Tale: All Your Drones Belong To Us
Paolo Stagno - A Drone Tale: All Your Drones Belong To UsPaolo Stagno - A Drone Tale: All Your Drones Belong To Us
Paolo Stagno - A Drone Tale: All Your Drones Belong To Us
 
Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.
Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.
Jack S (linkcabin) - Becoming The Quiz Master: Thanks RE.
 
Zoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m Five
Zoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m FiveZoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m Five
Zoltán Balázs - Ethereum Smart Contract Hacking Explained like I’m Five
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 

Csongor Tamás - Examples of Locality Sensitive Hashing & their Usage for Malware Classification

  • 1. © 2018 CrySyS Lab, BME EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION Csongor Tamás Budapest University of Technology and Economics csotam@crysys.hu Supervisors: Dr. Boldizsár Bencsáth, Dr. Levente Buttyán w w w . c r y s y s . h u
  • 2. | Problem statement  We need a feed of fresh ransomware!  Fresh – from last week or month  But how? EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 2 NotPetya
  • 3. | Solution Concept EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 3 Old ransomware Feed of new files Method Fresh ransomware
  • 4. | Solution Concept EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 4 Search corpus Feed of new files Method Similar samples
  • 5. | Solution Concept EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 5 YARA rules Feed of new files YARA rule matching Fresh samples Bad automatic generation Slow (0.015 s)
  • 6. | Solution Concept EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 6 Old ransomware Feed of new files Method Fresh ransomware
  • 7. | Solution Concept EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 7 Old ransomware Feed of new files Method Fresh ransomware + database malware
  • 8. | Solution Concept EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 8 Old ransomware Feed of new files Method Fresh ransomware + database malware
  • 9. | Solution Concept EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 9 Old ransomware Feed of new files Method Fresh ransomware + database LSH malware
  • 10. | Locality Sensitive Hashing Examples for Locality Sensitive Hashing and their usage for malware similarity checking 10  What is Locality Sensitive Hashing? –similar data –> ˝similar hash˝ –„aims to maximize the probability of a collision for similar items” –Distance can be calculated between two digests (hashes) –Similar files (hashes) are ˝close˝ to each other, others are ˝far˝
  • 11. | Locality Sensitive Hashing 11  SSDEEP –Context Triggered Piecewise Hashing Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 12. | Locality Sensitive Hashing 12  SSDEEP –Context Triggered Piecewise Hashing  SDHASH –Statistically improbable features  TLSH –TrendMicro Locality Sensitive Hash –5-grams –> statistics –> hash Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 13. | SSDEEP 13 o r h a n d s o f g o l da r e a l w a y s c o l d , F o r l a n d s o f g o l da r e a l w a y s c o l d , F Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 14. | Locality Sensitive Hashing 14  Reasons: – Small data to store – Fast automatic generation – Fast comparison Examples for Locality Sensitive Hashing and their usage for malware similarity checking YARA SSDEEP TLSH 0.015s 0.003s 0.002s SSDEEP TLSH 0.100s 0.037s YARA SSDEEP TLSH Whole binary <110 bytes 70 bytes
  • 15. | Locality Sensitive Hashing 15  Reasons: – Small data to store – Fast automatic generation – Fast comparison Examples for Locality Sensitive Hashing and their usage for malware similarity checking YARA SSDEEP TLSH 0.015s 0.003s 0.002s SSDEEP TLSH 0.100s 0.037s YARA SSDEEP TLSH Whole binary <110 bytes 70 bytes But are they applicable?
  • 16. | Testing LSH on a small dataset 16  Dataset: –34681 real binaries –NOT classified  Clustering algorithms: –1. simple – if two samples are ˝close˝ they belong to the same group –2. k-medoids – k group centers –3. if similar to at least a few group members Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 17. | Testing LSH on a small dataset 17  Results: –(evaluation by hand) –Samples in the same group are similar –SDHASH is not applicable –SSDEEP score (˝closeness˝) is badly scaled »0 - 100 (mismatch - perfect match) –Similar samples in different groups – TLSH appears to be the best for this application »With threshold = 70 Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 18. | Search SSDEEP 18  Original sample (GandCrabV4.X):  Similars: Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 19. | Search SSDEEP 19  Original sample (GandCrabV4.X):  Similars: Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 20. | Search SSDEEP 20  Original sample (GandCrabV4.X):  Similars: Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 21. | Search SSDEEP 21  Original sample (GandCrabV4.X):  Similars: Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 22. | Search TLSH 22  Original sample (Saturn):  Similars: Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 23. | Search TLSH 23  Original sample (Saturn):  Similars: Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 24. |  Original sample (Saturn):  Similars: Search TLSH 24Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 25. | Moving on to the database 25  Generate hashes for every sample –~ 1-2 months  Grouping algorithms use XREF  XREF is not scalable  300000000 2 * 0.002s ~= 2 853 881 y  Search will do Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 26. | Ransomware corpus & search 26  Currently 477 samples from 15 families  Search currently uses 1 process, 1 thread  Search for similars to 1 sample –SSDEEP –> ~10-20 minutes (prefix filter) –TLSH –> ~50 minutes  Search for similars to 477 samples –SSDEEP –> 14 hours –TLSH –> 29 hours Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 27. | Search EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 27 Search corpus Malware database LSH Similar samples
  • 28. | Final Solution EXAMPLES OF LOCALITY SENSITIVE HASHING AND THEIR USAGE FOR MALWARE CLASSIFICATION 28 Old ransomwares Feed of new files LSH Fresh ransomwares
  • 29. | Future work 29  Parallelization  Widen ransomware corpus  Develop better LSH  Label database Examples for Locality Sensitive Hashing and their usage for malware similarity checking
  • 30. © 2018 CrySyS Lab, BME Questions?