SlideShare a Scribd company logo
1 of 24
Download to read offline
Recognizing malware
Libor Mořkovský
Computer virus
bacterial cell
based on work by Anderson Brito
Computer virus
executable file
entry point
Computer virus
Inserting code into files is never “good”.
executable file
entry point
image courtesy of
Looking Glass Studios
Malware
How do you recognize a thief?
image courtesy of
Looking Glass Studios
Twentieth Century Fox
Malware
How do you recognize a thief?
Malware
How do you recognize a thief?
image courtesy of
Looking Glass Studios
Twentieth Century Fox
Paramount Pictures
Malware
completely different behaviors are considered “bad”
we need a judge to decide who crossed the line
•
•
Malware | Many faces
unlike real thieves, malware can be duplicated
not only duplicated, but also modified
all this is done by machines
too much work to judge each one manually
•
•
•
•
Finding similar files
oooooooooooo
o
oo
o
oo
oooo
ooooo
oooo
o
oo
o
oo
o
ooooooooo
o
o
MDS1
MDS2
class
oo
oo
oo
oo
CLEAN
MALWARE
QUERY
UNKNOWN
Finding similar files
need a file representation
need a distance function
•
•
Finding similar files | File vector
each executable file is represented by a feature vector
the PE format is complex, so we keep exactly one
version of the extractor code (C++)
the vector comprises static and dynamic features, the
exact content is proprietary
•
•
•
Database record
• One record = constant vector of over 100 attributes
• the “file fingerprint”
• Each attribute has a data type and semantic
Attribute Data Type Semantic
sha256 32 byte array CHECKSUM
pe_sect_cnt uint16_t VALUE
pe_sect_rawoff_entry uint32_t OFFSET
• The complete contents of the vector are kept secret
• static and dynamic features of PE executables
Finding similar files | Distance
sum of partial distances
each distance operator assigned manually
weights assigned manually to equalize contribution
•
•
•
Nearest neighbor query
• Compound distance function
• Data type and semantic determine partial dist. func.
Data Type Semantic Partial distance function
32 byte array CHECKSUM RETURN_ZERO
uint16_t VALUE EQUAL_RET32
uint32_t OFFSET LOG
• Each partial distance function = one kernel function
• Over 100 kernels for every NN query
• Intermediate results kept in the “Scratchpad”
Finding similar files | Data
~60 M data points
sparse and well separated
(in many cases)
•
•
Finding similar files | Implementation
we started with GPUs
their high memory throughput allows “naive”
implementation and rapid prototyping
column-oriented database
•
•
•
Classification | Requirements
find easily what is responsible for
a mistake – transparency
fix the problem quickly – tractability
•
•
Classification | Algorithm
Instance based classifier.
Classification | Optimizations
scaling and HW problems with GPUs
we invested in algorithmic optimizations:
VP-tree, distance bounded search
hand optimized distance function (assembly)
CPU version is ~100x faster
•
•
•
•
Classification | Deployment
→
FileSHAandu
ser id →
←Fileprevale
nce ←
←
Fileclass
ification ←
→
Filefinger
print →
← Generic detections ←
↑ File classifications and
Evo-gen detections
→ Threats →
Set updates ↓
Medusa
Scavenger
Avast users
FileRep
Classification | Deployment
→
FileSHAandu
ser id →
←Fileprevale
nce ←
←
Fileclass
ification ←
→
Filefinger
print →
← Generic detections ←
↑ File classifications and
Evo-gen detections
→ Threats →
Set updates ↓
Medusa
Scavenger
Avast users
FileRep
Classification | Deployment
→
FileSHAandu
ser id →
←Fileprevale
nce ←
←
Fileclass
ification ←
→
Filefinger
print →
← Generic detections ←
↑ File classifications and
Evo-gen detections
→ Threats →
Set updates ↓
Medusa
Scavenger
Avast users
FileRep
Rule generator
detect more variants in the wild
(our) rule is a conjunction of several conditions
known as Win32:Evo-Gen
completely different optimization problem than
classification - still uses the GPU
•
•
•
•
Q&A

More Related Content

What's hot

CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingSam Bowne
 
Practical Malware Analysis Ch12
Practical Malware Analysis Ch12Practical Malware Analysis Ch12
Practical Malware Analysis Ch12Sam Bowne
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgSam Bowne
 
Basic Malware Analysis
Basic Malware AnalysisBasic Malware Analysis
Basic Malware AnalysisAlbert Hui
 
Introduction to Malware Analysis
Introduction to Malware AnalysisIntroduction to Malware Analysis
Introduction to Malware AnalysisAndrew McNicol
 
Practical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network SignaturesPractical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network SignaturesSam Bowne
 
Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging Sam Bowne
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgSam Bowne
 
CNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware BehaviorCNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware BehaviorSam Bowne
 
Awesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir TasksAwesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir TasksJonathan Magen
 
CNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsCNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsSam Bowne
 
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static TechniquesCNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static TechniquesSam Bowne
 
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs Sam Bowne
 
Basic Dynamic Analysis of Malware
Basic Dynamic Analysis of MalwareBasic Dynamic Analysis of Malware
Basic Dynamic Analysis of MalwareNatraj G
 
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an..."Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...SegInfo
 
CNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbgCNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbgSam Bowne
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorSam Bowne
 
CNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows ProgramsCNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows ProgramsSam Bowne
 
Materials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan GunterMaterials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan GunterDan Gunter
 

What's hot (20)

CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data Encoding
 
Practical Malware Analysis Ch12
Practical Malware Analysis Ch12Practical Malware Analysis Ch12
Practical Malware Analysis Ch12
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
 
Basic Malware Analysis
Basic Malware AnalysisBasic Malware Analysis
Basic Malware Analysis
 
Introduction to Malware Analysis
Introduction to Malware AnalysisIntroduction to Malware Analysis
Introduction to Malware Analysis
 
Practical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network SignaturesPractical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network Signatures
 
Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbg
 
CNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware BehaviorCNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware Behavior
 
Awesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir TasksAwesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir Tasks
 
CNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsCNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows Programs
 
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static TechniquesCNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
 
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
 
Basic Dynamic Analysis of Malware
Basic Dynamic Analysis of MalwareBasic Dynamic Analysis of Malware
Basic Dynamic Analysis of Malware
 
9: OllyDbg
9: OllyDbg9: OllyDbg
9: OllyDbg
 
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an..."Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
 
CNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbgCNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbg
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware Behavior
 
CNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows ProgramsCNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows Programs
 
Materials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan GunterMaterials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan Gunter
 

Similar to Libor Mořkovský - Recognizing Malware

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapFelipe Prado
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareLastline, Inc.
 
Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012Stephan Chenette
 
Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment isc2-hellenic
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecurityTao Xie
 
Adversarial machine learning for av software
Adversarial machine learning for av softwareAdversarial machine learning for av software
Adversarial machine learning for av softwarejunseok seo
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013midnite_runr
 
The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022lior mazor
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
Malware collection and analysis
Malware collection and analysisMalware collection and analysis
Malware collection and analysisChong-Kuan Chen
 
CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)PROIDEA
 
Intro2 malwareanalysisshort
Intro2 malwareanalysisshortIntro2 malwareanalysisshort
Intro2 malwareanalysisshortVincent Ohprecio
 
BinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in HadoopBinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in HadoopJason Trost
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sampleMichel Coene
 
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...Wayne Huang
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
 
Discovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and ProfitDiscovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and ProfitAbhisek Datta
 

Similar to Libor Mořkovský - Recognizing Malware (20)

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012
 
Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Adversarial machine learning for av software
Adversarial machine learning for av softwareAdversarial machine learning for av software
Adversarial machine learning for av software
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
 
The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022
 
Defending Your "Gold"
Defending Your "Gold"Defending Your "Gold"
Defending Your "Gold"
 
You suck at Memory Analysis
You suck at Memory AnalysisYou suck at Memory Analysis
You suck at Memory Analysis
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Malware collection and analysis
Malware collection and analysisMalware collection and analysis
Malware collection and analysis
 
CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)
 
Intro2 malwareanalysisshort
Intro2 malwareanalysisshortIntro2 malwareanalysisshort
Intro2 malwareanalysisshort
 
BinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in HadoopBinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in Hadoop
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sample
 
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
Discovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and ProfitDiscovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and Profit
 

More from Machine Learning Prague

Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksMachine Learning Prague
 
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czTomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czMachine Learning Prague
 
Michael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMichael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMachine Learning Prague
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Machine Learning Prague
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisKateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisMachine Learning Prague
 
Jiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingJiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingMachine Learning Prague
 
Jan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsJan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsMachine Learning Prague
 
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMarek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMachine Learning Prague
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsMachine Learning Prague
 

More from Machine Learning Prague (13)

Vít Listík - Email.cz workshop
Vít Listík - Email.cz workshopVít Listík - Email.cz workshop
Vít Listík - Email.cz workshop
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
 
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czTomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
 
Jan Pospíšil - Azure ML
Jan Pospíšil - Azure MLJan Pospíšil - Azure ML
Jan Pospíšil - Azure ML
 
Michael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMichael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at Yandex
 
Adam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the OddballsAdam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the Oddballs
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisKateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment Analysis
 
Jiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingJiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative Writing
 
Jan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsJan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal Assistants
 
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMarek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 

Recently uploaded

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Libor Mořkovský - Recognizing Malware

  • 2. Computer virus bacterial cell based on work by Anderson Brito
  • 4. Computer virus Inserting code into files is never “good”. executable file entry point
  • 5. image courtesy of Looking Glass Studios Malware How do you recognize a thief?
  • 6. image courtesy of Looking Glass Studios Twentieth Century Fox Malware How do you recognize a thief?
  • 7. Malware How do you recognize a thief? image courtesy of Looking Glass Studios Twentieth Century Fox Paramount Pictures
  • 8. Malware completely different behaviors are considered “bad” we need a judge to decide who crossed the line • •
  • 9. Malware | Many faces unlike real thieves, malware can be duplicated not only duplicated, but also modified all this is done by machines too much work to judge each one manually • • • •
  • 11. Finding similar files need a file representation need a distance function • •
  • 12. Finding similar files | File vector each executable file is represented by a feature vector the PE format is complex, so we keep exactly one version of the extractor code (C++) the vector comprises static and dynamic features, the exact content is proprietary • • • Database record • One record = constant vector of over 100 attributes • the “file fingerprint” • Each attribute has a data type and semantic Attribute Data Type Semantic sha256 32 byte array CHECKSUM pe_sect_cnt uint16_t VALUE pe_sect_rawoff_entry uint32_t OFFSET • The complete contents of the vector are kept secret • static and dynamic features of PE executables
  • 13. Finding similar files | Distance sum of partial distances each distance operator assigned manually weights assigned manually to equalize contribution • • • Nearest neighbor query • Compound distance function • Data type and semantic determine partial dist. func. Data Type Semantic Partial distance function 32 byte array CHECKSUM RETURN_ZERO uint16_t VALUE EQUAL_RET32 uint32_t OFFSET LOG • Each partial distance function = one kernel function • Over 100 kernels for every NN query • Intermediate results kept in the “Scratchpad”
  • 14. Finding similar files | Data ~60 M data points sparse and well separated (in many cases) • •
  • 15. Finding similar files | Implementation we started with GPUs their high memory throughput allows “naive” implementation and rapid prototyping column-oriented database • • •
  • 16. Classification | Requirements find easily what is responsible for a mistake – transparency fix the problem quickly – tractability • •
  • 18. Classification | Optimizations scaling and HW problems with GPUs we invested in algorithmic optimizations: VP-tree, distance bounded search hand optimized distance function (assembly) CPU version is ~100x faster • • • •
  • 19. Classification | Deployment → FileSHAandu ser id → ←Fileprevale nce ← ← Fileclass ification ← → Filefinger print → ← Generic detections ← ↑ File classifications and Evo-gen detections → Threats → Set updates ↓ Medusa Scavenger Avast users FileRep
  • 20. Classification | Deployment → FileSHAandu ser id → ←Fileprevale nce ← ← Fileclass ification ← → Filefinger print → ← Generic detections ← ↑ File classifications and Evo-gen detections → Threats → Set updates ↓ Medusa Scavenger Avast users FileRep
  • 21. Classification | Deployment → FileSHAandu ser id → ←Fileprevale nce ← ← Fileclass ification ← → Filefinger print → ← Generic detections ← ↑ File classifications and Evo-gen detections → Threats → Set updates ↓ Medusa Scavenger Avast users FileRep
  • 22. Rule generator detect more variants in the wild (our) rule is a conjunction of several conditions known as Win32:Evo-Gen completely different optimization problem than classification - still uses the GPU • • • •
  • 23.
  • 24. Q&A