Talk at the Workshop for Robustness of AI Systems Against Adversarial Attacks 2020 (RAISA3)
https://www.skrasser.com/blog/2020/08/31/adversarial-machine-learning-and-robust-classification/
Of Search Lights and Blind Spots: Machine Learning in Cybersecurity
1. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
OF SEARCH LIGHTS AND BLIND SPOTS:
MACHINE LEARNING IN
CYBERSECURITY
SVEN KRASSER, CHIEF SCIENTIST, CROWDSTRIKE
2. WHO?
§ CrowdStrike
§ Endpoint protection & breach
prevention
§ Endpoint sensor connecting to Cloud
§ Processing 3 trillion events per week
§ My team: Data Science
§ Malware and threat research
§ Sandbox and dynamic analysis
§ Data engineering
§ Machine Learning research
§ Machine Learning software development
§ Hybrid-Analysis.com
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
9. PROJECTIONS THROUGH 2022
Source: Gartner (2019)
75%Data governance initiatives not
adequately considering AI security
risks, resulting in financial loss
30%Cyberattacks leveraging data
poisoning, model theft, or
adversarial samples
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
10. “DO YOU SECURE YOUR ML SYSTEMS TODAY?"
Source: Shankar et al., “Adversarial Machine Learning – Industry Perspectives” (2020)
14%*
“Yes”
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.* ⅓ of organizations polled are in the cybersecurity space
13. WHY TALK ABOUT THIS FIELD TODAY?
§ Data is plentiful and unencumbered
§ Challenges translate into other domains
§ Static analysis, while limited, is a cheap workhorse
§ Reducing volume of low-effort attacks
§ Saving compute (and hence dollars) for more complex analysis
§ Pre-execution detection
§ Detection on-the-wire (attachment) and at rest (storage)
14. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
AV
Update
New
M
alware
1 Day
AV
Update
DetectionRate
18. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
HOW THE GAME WAS PLAYED
Manual evasions and corresponding countermeasures
19. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Hashbusting Polymorphism Packing
Droppers
File
Infectors/Hiding
in Regular Files
Wrapped
Scripts
TRADITIONAL ATTACKER ARSENAL
21. ① Adversaries
focus on traditional
evasions, which
stick out to ML
② Adversaries
target ML blind
spots
③ Adversaries
leverage ML for
robust evasions
The panacea “track”
23. WORKING IN FEATURE SPACE
§ Choosing a feature space that always produces realizable files
§ Such as specific binary traits that can be added (but not necessarily removed), e.g. Al-Dujaili
et al. (2018)
§ Imported function names, resources, sections, strings, digital signature, etc.
§ Similar to how an adversary would attack the model
§ Use a substitute model with such a feature space to attack a blackbox model
§ E.g. MalGAN, Hu and Tan (2017)
§ Create (likely) unrealizable feature vectors with some utility
§ Not a realizable attack but allows better preparing for one
§ Increasing robustness at training time
§ Creating pseudo variants for test time (“new family” scenario)
24. WORKING IN PROBLEM SPACE
A look at both realizable and real-world attacks
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
25. Ashkenazy and Zini (2019)
“CHAFF” ATTACK
§ Attack on a security vendor production model deployed on endpoints
§ Unconstrained sparse string-based features
§ “This string exists somewhere in the file”
§ Likely heavily weighted
§ Non-monotonic model
§ Extracting strings from files from the product’s whitelist
§ How to toggle the corresponding features?
§ Add the string somewhere
§ Appending to the end of a Portable Executable (the “overlay”) generally keeps the executable
working
§ à All realizable
§ Bypass achieved
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
26. Winning Offensive Solution – Fleshman (2019)
ML STATIC EVASION COMPETITION
§ Modify malware to bypass 3 non-production research models
§ MalConv (DNN, raw bytes)
§ Non-negative MalConv
§ EMBER (engineered features and LightGBM; Anderson and Roth, 2018)
§ Modified files are verified in a sandbox environment
§ DNN models have only unconstrained features (data anywhere can nudge)
§ EMBER has some unconstrained features
§ Byte entropy histogram (continuous features)
§ Strings
§ Data injected in various areas
§ Overlay
§ New sections
§ Empty space at end of sections (alignment)
§ Bypass achieved
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
27. Anderson et al. (2018)
LEARNING TO EVADE
§ Reinforcement Learning approach to pick the best sequence of modifications to
achieve evasiveness
§ Action space
§ Modest evasiveness achieved (but no manual intervention as in previous two
approaches)
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Add import
Change section
names
Create section
Appending data
to sections
New EP that
jumps to
original EP
Removing
signer info
Changing
debug info
Packing Unpacking
Breaking
header
checksum
Add to overlay Etc.
28. Elkind (2019)
MITIGATING THROUGH REGULARIZATION
§ Premise
§ We know of several perturbation techniques resulting in realizable attacks
§ We want the model to ignore such modifications without constraining the feature space and
reducing expressiveness
§ Pairwise Hidden Regularization
§ Penalize differences in hidden representations ℎ() in DNN between original file 𝑥 and
perturbed file %𝑥
§ min 𝐿𝑜𝑠𝑠 𝜃 + 𝜆 ℎ 𝑥, 𝜃 − ℎ(%𝑥, 𝜃) !
§ Training on perturbed pairs
§ Notionally, perturbed files have a modified overlay (appended data)
§ Other modifications can be implemented accordingly (e.g. adding sections)
§ Models more robust; evasions more expensive
2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
29. CONCLUSIONS
Educating decision
makers about ML
Off-the-shelf
guardrails; best
practices for safety
Cost reduction for
the adversary;
means to increase it
again
Opportunity for
defenders to
achieve higher
levels of robustness
Detectability; avoid
silent failure