SlideShare a Scribd company logo
Recognizing malware
Libor Mořkovský
Computer virus
bacterial cell
based on work by Anderson Brito
Computer virus
executable file
entry point
Computer virus
Inserting code into files is never “good”.
executable file
entry point
image courtesy of
Looking Glass Studios
Malware
How do you recognize a thief?
image courtesy of
Looking Glass Studios
Twentieth Century Fox
Malware
How do you recognize a thief?
Malware
How do you recognize a thief?
image courtesy of
Looking Glass Studios
Twentieth Century Fox
Paramount Pictures
Malware
completely different behaviors are considered “bad”
we need a judge to decide who crossed the line
•
•
Malware | Many faces
unlike real thieves, malware can be duplicated
not only duplicated, but also modified
all this is done by machines
too much work to judge each one manually
•
•
•
•
Finding similar files
oooooooooooo
o
oo
o
oo
oooo
ooooo
oooo
o
oo
o
oo
o
ooooooooo
o
o
MDS1
MDS2
class
oo
oo
oo
oo
CLEAN
MALWARE
QUERY
UNKNOWN
Finding similar files
need a file representation
need a distance function
•
•
Finding similar files | File vector
each executable file is represented by a feature vector
the PE format is complex, so we keep exactly one
version of the extractor code (C++)
the vector comprises static and dynamic features, the
exact content is proprietary
•
•
•
Database record
• One record = constant vector of over 100 attributes
• the “file fingerprint”
• Each attribute has a data type and semantic
Attribute Data Type Semantic
sha256 32 byte array CHECKSUM
pe_sect_cnt uint16_t VALUE
pe_sect_rawoff_entry uint32_t OFFSET
• The complete contents of the vector are kept secret
• static and dynamic features of PE executables
Finding similar files | Distance
sum of partial distances
each distance operator assigned manually
weights assigned manually to equalize contribution
•
•
•
Nearest neighbor query
• Compound distance function
• Data type and semantic determine partial dist. func.
Data Type Semantic Partial distance function
32 byte array CHECKSUM RETURN_ZERO
uint16_t VALUE EQUAL_RET32
uint32_t OFFSET LOG
• Each partial distance function = one kernel function
• Over 100 kernels for every NN query
• Intermediate results kept in the “Scratchpad”
Finding similar files | Data
~60 M data points
sparse and well separated
(in many cases)
•
•
Finding similar files | Implementation
we started with GPUs
their high memory throughput allows “naive”
implementation and rapid prototyping
column-oriented database
•
•
•
Classification | Requirements
find easily what is responsible for
a mistake – transparency
fix the problem quickly – tractability
•
•
Classification | Algorithm
Instance based classifier.
Classification | Optimizations
scaling and HW problems with GPUs
we invested in algorithmic optimizations:
VP-tree, distance bounded search
hand optimized distance function (assembly)
CPU version is ~100x faster
•
•
•
•
Classification | Deployment
→
FileSHAandu
ser id →
←Fileprevale
nce ←
←
Fileclass
ification ←
→
Filefinger
print →
← Generic detections ←
↑ File classifications and
Evo-gen detections
→ Threats →
Set updates ↓
Medusa
Scavenger
Avast users
FileRep
Classification | Deployment
→
FileSHAandu
ser id →
←Fileprevale
nce ←
←
Fileclass
ification ←
→
Filefinger
print →
← Generic detections ←
↑ File classifications and
Evo-gen detections
→ Threats →
Set updates ↓
Medusa
Scavenger
Avast users
FileRep
Classification | Deployment
→
FileSHAandu
ser id →
←Fileprevale
nce ←
←
Fileclass
ification ←
→
Filefinger
print →
← Generic detections ←
↑ File classifications and
Evo-gen detections
→ Threats →
Set updates ↓
Medusa
Scavenger
Avast users
FileRep
Rule generator
detect more variants in the wild
(our) rule is a conjunction of several conditions
known as Win32:Evo-Gen
completely different optimization problem than
classification - still uses the GPU
•
•
•
•
Q&A

More Related Content

What's hot

CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data Encoding
Sam Bowne
 
Practical Malware Analysis Ch12
Practical Malware Analysis Ch12Practical Malware Analysis Ch12
Practical Malware Analysis Ch12
Sam Bowne
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Sam Bowne
 
Basic Malware Analysis
Basic Malware AnalysisBasic Malware Analysis
Basic Malware Analysis
Albert Hui
 
Introduction to Malware Analysis
Introduction to Malware AnalysisIntroduction to Malware Analysis
Introduction to Malware Analysis
Andrew McNicol
 
Practical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network SignaturesPractical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network Signatures
Sam Bowne
 
Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging
Sam Bowne
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbg
Sam Bowne
 
CNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware BehaviorCNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware Behavior
Sam Bowne
 
Awesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir TasksAwesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir Tasks
Jonathan Magen
 
CNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsCNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows Programs
Sam Bowne
 
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static TechniquesCNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
Sam Bowne
 
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Sam Bowne
 
Basic Dynamic Analysis of Malware
Basic Dynamic Analysis of MalwareBasic Dynamic Analysis of Malware
Basic Dynamic Analysis of Malware
Natraj G
 
9: OllyDbg
9: OllyDbg9: OllyDbg
9: OllyDbg
Sam Bowne
 
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an..."Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
SegInfo
 
CNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbgCNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbg
Sam Bowne
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware Behavior
Sam Bowne
 
CNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows ProgramsCNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows Programs
Sam Bowne
 
Materials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan GunterMaterials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan Gunter
Dan Gunter
 

What's hot (20)

CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data Encoding
 
Practical Malware Analysis Ch12
Practical Malware Analysis Ch12Practical Malware Analysis Ch12
Practical Malware Analysis Ch12
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
 
Basic Malware Analysis
Basic Malware AnalysisBasic Malware Analysis
Basic Malware Analysis
 
Introduction to Malware Analysis
Introduction to Malware AnalysisIntroduction to Malware Analysis
Introduction to Malware Analysis
 
Practical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network SignaturesPractical Malware Analysis Ch 14: Malware-Focused Network Signatures
Practical Malware Analysis Ch 14: Malware-Focused Network Signatures
 
Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging Practical Malware Analysis: Ch 8: Debugging
Practical Malware Analysis: Ch 8: Debugging
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbg
 
CNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware BehaviorCNIT 126 Ch 11: Malware Behavior
CNIT 126 Ch 11: Malware Behavior
 
Awesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir TasksAwesome Concurrency with Elixir Tasks
Awesome Concurrency with Elixir Tasks
 
CNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsCNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows Programs
 
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static TechniquesCNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
CNIT 126 Ch 0: Malware Analysis Primer & 1: Basic Static Techniques
 
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
Practical Malware Analysis: Ch 7: Analyzing Malicious Windows Programs
 
Basic Dynamic Analysis of Malware
Basic Dynamic Analysis of MalwareBasic Dynamic Analysis of Malware
Basic Dynamic Analysis of Malware
 
9: OllyDbg
9: OllyDbg9: OllyDbg
9: OllyDbg
 
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an..."Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
 
CNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbgCNIT 126 Ch 9: OllyDbg
CNIT 126 Ch 9: OllyDbg
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware Behavior
 
CNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows ProgramsCNIT 126 Ch 7: Analyzing Malicious Windows Programs
CNIT 126 Ch 7: Analyzing Malicious Windows Programs
 
Materials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan GunterMaterials Project Validation, Provenance, and Sandboxes by Dan Gunter
Materials Project Validation, Provenance, and Sandboxes by Dan Gunter
 

Similar to Libor Mořkovský - Recognizing Malware

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
Felipe Prado
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Lastline, Inc.
 
Eusecwest
EusecwestEusecwest
Eusecwest
zynamics GmbH
 
Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012
Stephan Chenette
 
Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment
isc2-hellenic
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
Tao Xie
 
Adversarial machine learning for av software
Adversarial machine learning for av softwareAdversarial machine learning for av software
Adversarial machine learning for av software
junseok seo
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
midnite_runr
 
The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022
lior mazor
 
Defending Your "Gold"
Defending Your "Gold"Defending Your "Gold"
Defending Your "Gold"
Will Schroeder
 
You suck at Memory Analysis
You suck at Memory AnalysisYou suck at Memory Analysis
You suck at Memory Analysis
Francisco Ribeiro
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
sporst
 
Malware collection and analysis
Malware collection and analysisMalware collection and analysis
Malware collection and analysis
Chong-Kuan Chen
 
CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)
PROIDEA
 
Intro2 malwareanalysisshort
Intro2 malwareanalysisshortIntro2 malwareanalysisshort
Intro2 malwareanalysisshort
Vincent Ohprecio
 
BinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in HadoopBinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in Hadoop
Jason Trost
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sample
Michel Coene
 
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
Wayne Huang
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
Discovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and ProfitDiscovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and Profit
Abhisek Datta
 

Similar to Libor Mořkovský - Recognizing Malware (20)

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012Watchtowers of the Internet - Source Boston 2012
Watchtowers of the Internet - Source Boston 2012
 
Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Adversarial machine learning for av software
Adversarial machine learning for av softwareAdversarial machine learning for av software
Adversarial machine learning for av software
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
 
The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022The Hacking Games - Operation System Vulnerabilities Meetup 29112022
The Hacking Games - Operation System Vulnerabilities Meetup 29112022
 
Defending Your "Gold"
Defending Your "Gold"Defending Your "Gold"
Defending Your "Gold"
 
You suck at Memory Analysis
You suck at Memory AnalysisYou suck at Memory Analysis
You suck at Memory Analysis
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Malware collection and analysis
Malware collection and analysisMalware collection and analysis
Malware collection and analysis
 
CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)CONFidence 2017: Hiding in plain sight (Adam Burt)
CONFidence 2017: Hiding in plain sight (Adam Burt)
 
Intro2 malwareanalysisshort
Intro2 malwareanalysisshortIntro2 malwareanalysisshort
Intro2 malwareanalysisshort
 
BinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in HadoopBinaryPig - Scalable Malware Analytics in Hadoop
BinaryPig - Scalable Malware Analytics in Hadoop
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sample
 
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
Discovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and ProfitDiscovering Vulnerabilities For Fun and Profit
Discovering Vulnerabilities For Fun and Profit
 

More from Machine Learning Prague

Vít Listík - Email.cz workshop
Vít Listík - Email.cz workshopVít Listík - Email.cz workshop
Vít Listík - Email.cz workshop
Machine Learning Prague
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
Machine Learning Prague
 
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czTomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Machine Learning Prague
 
Jan Pospíšil - Azure ML
Jan Pospíšil - Azure MLJan Pospíšil - Azure ML
Jan Pospíšil - Azure ML
Machine Learning Prague
 
Michael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMichael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at Yandex
Machine Learning Prague
 
Adam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the OddballsAdam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the Oddballs
Machine Learning Prague
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Machine Learning Prague
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisKateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Machine Learning Prague
 
Jiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingJiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative Writing
Machine Learning Prague
 
Jan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsJan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal Assistants
Machine Learning Prague
 
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMarek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Machine Learning Prague
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
Machine Learning Prague
 

More from Machine Learning Prague (13)

Vít Listík - Email.cz workshop
Vít Listík - Email.cz workshopVít Listík - Email.cz workshop
Vít Listík - Email.cz workshop
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
 
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czTomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
 
Jan Pospíšil - Azure ML
Jan Pospíšil - Azure MLJan Pospíšil - Azure ML
Jan Pospíšil - Azure ML
 
Michael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMichael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at Yandex
 
Adam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the OddballsAdam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the Oddballs
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisKateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment Analysis
 
Jiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingJiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative Writing
 
Jan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsJan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal Assistants
 
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMarek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 

Recently uploaded

Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 

Recently uploaded (20)

Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 

Libor Mořkovský - Recognizing Malware

  • 2. Computer virus bacterial cell based on work by Anderson Brito
  • 4. Computer virus Inserting code into files is never “good”. executable file entry point
  • 5. image courtesy of Looking Glass Studios Malware How do you recognize a thief?
  • 6. image courtesy of Looking Glass Studios Twentieth Century Fox Malware How do you recognize a thief?
  • 7. Malware How do you recognize a thief? image courtesy of Looking Glass Studios Twentieth Century Fox Paramount Pictures
  • 8. Malware completely different behaviors are considered “bad” we need a judge to decide who crossed the line • •
  • 9. Malware | Many faces unlike real thieves, malware can be duplicated not only duplicated, but also modified all this is done by machines too much work to judge each one manually • • • •
  • 11. Finding similar files need a file representation need a distance function • •
  • 12. Finding similar files | File vector each executable file is represented by a feature vector the PE format is complex, so we keep exactly one version of the extractor code (C++) the vector comprises static and dynamic features, the exact content is proprietary • • • Database record • One record = constant vector of over 100 attributes • the “file fingerprint” • Each attribute has a data type and semantic Attribute Data Type Semantic sha256 32 byte array CHECKSUM pe_sect_cnt uint16_t VALUE pe_sect_rawoff_entry uint32_t OFFSET • The complete contents of the vector are kept secret • static and dynamic features of PE executables
  • 13. Finding similar files | Distance sum of partial distances each distance operator assigned manually weights assigned manually to equalize contribution • • • Nearest neighbor query • Compound distance function • Data type and semantic determine partial dist. func. Data Type Semantic Partial distance function 32 byte array CHECKSUM RETURN_ZERO uint16_t VALUE EQUAL_RET32 uint32_t OFFSET LOG • Each partial distance function = one kernel function • Over 100 kernels for every NN query • Intermediate results kept in the “Scratchpad”
  • 14. Finding similar files | Data ~60 M data points sparse and well separated (in many cases) • •
  • 15. Finding similar files | Implementation we started with GPUs their high memory throughput allows “naive” implementation and rapid prototyping column-oriented database • • •
  • 16. Classification | Requirements find easily what is responsible for a mistake – transparency fix the problem quickly – tractability • •
  • 18. Classification | Optimizations scaling and HW problems with GPUs we invested in algorithmic optimizations: VP-tree, distance bounded search hand optimized distance function (assembly) CPU version is ~100x faster • • • •
  • 19. Classification | Deployment → FileSHAandu ser id → ←Fileprevale nce ← ← Fileclass ification ← → Filefinger print → ← Generic detections ← ↑ File classifications and Evo-gen detections → Threats → Set updates ↓ Medusa Scavenger Avast users FileRep
  • 20. Classification | Deployment → FileSHAandu ser id → ←Fileprevale nce ← ← Fileclass ification ← → Filefinger print → ← Generic detections ← ↑ File classifications and Evo-gen detections → Threats → Set updates ↓ Medusa Scavenger Avast users FileRep
  • 21. Classification | Deployment → FileSHAandu ser id → ←Fileprevale nce ← ← Fileclass ification ← → Filefinger print → ← Generic detections ← ↑ File classifications and Evo-gen detections → Threats → Set updates ↓ Medusa Scavenger Avast users FileRep
  • 22. Rule generator detect more variants in the wild (our) rule is a conjunction of several conditions known as Win32:Evo-Gen completely different optimization problem than classification - still uses the GPU • • • •
  • 23.
  • 24. Q&A