Understand How Machine Learning Defends Against Zero-Day Threats

1
Understand How Machine
Learning Defends Against
Zero-Day Threats
Vinoo Thomas
Senior Product Manager
Intel Security
Rahul Mohandas
Research Manager
Intel Security
Track Sponsored by:

2
Speakers
Vinoo Thomas
Senior Product Manager
Intel Security
Rahul Mohandas
Research Manager
Intel Security

3
Agenda
• Detection Challenges
• Machine Learning Approaches
• Modeling Machine Learning classifiers
• Attacks on Machine Learning Defenses
• Real Protect
• Deep Learning in Sandbox
To participate in the polling question, download the mobile app.

5
The Age of “Signatures” Is Fading
• This technique is reactive by nature. Although very precise, the
sheer number and growth in malware variants is making this
unsustainable
• Malware authors are continuously monitoring antivirus vendor
detection and releasing new variants
• Use of commercial, open source or underground packers and
protectors makes repacking new variants trivial
Signatures identify with near certainty that an object is either malicious or clean
1001010
1101010
1011101
010

6
Detection Challenges
Image: https://www2.picturepush.com
What did this
snake eat for
lunch? ;)

7
Unpacking Challenges
Think of it as a file, inside another executable file,
which can be inside another executable file
Think Russian dolls (Matryoshka)
When executed, the “outer” executable will unpack
the contents of the “inner” executable into memory
and execute it.
Image: https://www.pinterest.com
The innermost executable is the “real” executable!

8
Field Example—Mimikatz
Source: http://blog.gentilkiwi.com/mimikatz

11
Mimikatz Detection
Resources, strings, packer and compiler details,
compile time, API, and function calls are readily
available for authoring signatures.
Native binary has thousands of
interesting features!
Image: http://www.abcya.com/word_clouds.htm

12
Modifying A Compiled Binary
Source: http://www.gironsec.com

13
Mimikatz—Packed with MPRESS

14
Mimikatz—Post MPRESS
Previously available static features are destroyed
and made unavailable by the packer!
Limited choices available for authoring a generic
signature.

15
VBS/Houdini—Initial Variant

16
VBS/Houdini—Subsequent Variants

17
Machine Learning Approaches

18
Sources of Features
10010101
10101010
11101010
Static Analysis (file type, resources, meta-data)
Fuzzy Hashing (identical byte or checksum sequences)
Import Address Hash (function calls, order of function calls)
Dynamic Analysis (file system, registry, network behaviors)
Memory Analysis (process or system memory analysis)

19
Leveraging Multiple
Sources of Knowledge
• Identify a suspicious characteristic or activity
• The object is given a reputation and confidence level if
existing signatures based methods don’t detect
• Pre-execution: Static file feature extraction
(file type, import hash, entry point, resources, strings,
packer and compiler details, compile time, APIs, section
names)
• Post-execution: Behavioral features and memory analysis
(behavioral sequence, process tree, file system, registry
events, network communication events, mutex, strings from
memory)
A hybrid approach provides
the best classification rates!

20
Extracting Static Features
• File type, resources, and strings
• Packer and compiler details
• Compile time, entry point
• Import address hash,
• Function calls and APIs
Ransomware: CTB-Locker (pre-execution)
Image: http://www.abcya.com/word_clouds.htm

21
Extracting Behavioral Features
File system, registry and network changes actions it begins encrypting files
Ransomware: CTB-Locker (post-execution)

22
Building Feature Vectors
CreateProcess("c:userroamingmalware.exe")
CreateRegistryKey("HKLM","SoftwareCTB-Locker)
SetRegistryValue("InstallDate","213355533")
GetEntryPoint(“Return Address”, 55 EB)
Features
AF12ACE76D
F2A212AC6E
22F1CAFFA8
Features Hash
AF12ACE76D F2A212AC6E 22F1CAFFA8
BBAF11284E
BBAF11284E
Feature Vector

23
Unsupervised Machine Learning
Height
Weight
We are given a large set of dogs of different breeds (Chihuahuas, Beagles, Dachshunds)We can use two features to distinguish them - their height and weight.How can we determine which dog falls into which breed?

24
Similarity: Prototype-Based Clustering
Dogs
Chihuahuas
Beagles
Dachshunds
Euclidian distance
between two objects
Height
Weight

25
Similarity: Classification-Based on Clustering
Dogs
Beagle
Chihuahuas
Beagles
Dachshunds
Height
Weight
Euclidian distance
between two objects

26
Classification with Real Protect
Graphic representation of clusters with samples which are similar

27
Modeling Machine Learning Classifier

28
Modeling a Machine Learning Classifier
Input Data
• Executables, compiled code, documents
Feature Engineering
• N-grams, entropy of sections
Labels
• Is malicious or clean?
• Belongs to a certain family of malware
• Capabilities (keyloggers, backdoors)
Model
• Assigns a sample to an output class
• Support vector machines, Naïve Bayes,
random forests, neural networks
Output Layer
Hidden Layers Output Layer

29
Attacking Machine Learning Defenses

30
Exploratory: Obfuscate to Evade Detection

31
Causative: Poisoning Sample Collections
2. Submit samples to VirusTotal
or any other public malware
collection site
1. Insert signature
fragments into
clean files
4. Many vendors reshare the
samples and trust the
malicious classification
6. Potential FP
on clean files
by the model
5. Vendor using malicious
sample for training models
3. Trusted vendor
will start detecting
those files

32
Source: Virus Bulletin

33
Source: Reuters

34
Defenses Against Machine
Learning Attacks
Exploratory attack
• Training data: Prevent the attacker from knowing training
data
• Feature selection: Harden classifiers against attack by
using multiple features
Causative attack: Attacker has some degree of control
over the training data. Learning should be resilient to
poisoning attacks
• Do empirical analysis of training instances to make it more
resilient
• Human in loop approach

36
Real Protect
• Detects zero-day malware in near real time
• Classification of malware based on behavior and static analysis
• Uses machine learning to automate classification
• Signature-less, small client footprint
• Supports both offline mode and online mode (cloud) of classification
• Improves detection up to 30% on top of .DAT and McAfee® Global Threat Intelligence detections
• Augments McAfee endpoint security products for Windows
• Produces actionable threat intelligence
• Useful for patient zero discovery, threat actor attribution and forensic investigations
• Available now!
• Standalone: www.mcafee.com/us/downloads/free-tools/raptor.aspx
• Consumer Cloud AV product
• Enterprise availability in McAfee Endpoint Security 10.5 this year

37
McAfee® Endpoint Security 10 Threat Prevention
Layered Approach
Whitelisting (Hash + Cert)
.DAT
McAfee Global Threat Intelligence
McAfee Threat Intelligence Exchange (Hash + Cert)
Real Protect - Static
Dynamic App Containment
Real Protect - Behavioral
Threat
Prevention
Web Control
Firewall
TIE
Future Modules
Pre-Execution
Post-Execution
Post-Execution

38
Deep Learning in the Sandbox

39
ATDml technology in a Nutshell
ATDml = Signatureless deep learning classifier that leverages sandboxing technology to
achieve high-precision malware conviction rate

40
Deep Learning in the Sandbox
Malware samples
Sandbox
Original Binary
Feature Vector
Behavior
Trained
Parameters
Prediction
Training
Prediction
Framework
Feature Vector
Feature Normalization
Dimensionality reduction
Unpacked File
Deep Learning
Output Layer
Hidden Layers
Input Layer

41
What Are We Going to Demo Here?
1. Shows advanced ways of evading detection
by utilizing a crypter by adding static and
behavioral evasion
2. How deep learning in the sandbox is able to
detect the most evasive and previously
unseen malware
Unmask the
Attack

44
ATDml Value Proposition
1. Zero-day detection by deep analysis: Efficient
classification of new and previously unseen
malware by leveraging deep learning
2. Resilience to evasion: Model to be highly
resilient to evasive techniques used to bypass
detection
3. Identify intention of attack: Ability to bring in
malware attribution to identify the intention of
the attack

Intel and the Intel and McAfee logos are trademarks of Intel Corporation in the US and/or other countries. Other marks and brands may be claimed as the property of others. The product
plans, specifications and descriptions herein are provided for information only and subject to change without notice, and are provided without warranty of any kind, express or implied.
Copyright © 2016 Intel Corporation.

Understand How Machine Learning Defends Against Zero-Day Threats

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Understand How Machine Learning Defends Against Zero-Day Threats

Similar to Understand How Machine Learning Defends Against Zero-Day Threats (20)

Understand How Machine Learning Defends Against Zero-Day Threats

Editor's Notes