Machine learning-based (ML) techniques for network intrusion detection have gained notable traction in the web security industry over the past decade. Some Intrusion Detection Systems (IDS) successfully used these techniques to detect and deflect network intrusions before they could cause significant harm to network services. Simply put, IDS systems construct a signature model of how normal traffic looks, using data retrieved from web access logs as input. Then, an online processing system is put in place to maintain a model of how expected network traffic looks like, and/or how malicious traffic looks like. When traffic that is deviant from the expected model exceeds the defined threshold, the IDS flags it as malicious. The theory behind it was that the more data the system sees, the more accurate the model would become. This provides a flexible system for traffic analysis, seemingly perfect for the constantly evolving and growing web traffic patterns.
However, this fairytale did not last for long. It was soon found that the attackers had been avoiding detection by ‘poisoning’ the classifier models used by these PCA systems. The adversaries slowly train the detection model by sending large volumes of seemingly benign web traffic to make the classification model more tolerant to outliers and actual malicious attempts. They succeeded.
In this talk, we will do a live demo of this 'model-poisoning' attack and analyze methods that have been proposed to decrease the susceptibility of ML-based network anomaly detection systems from being manipulated by attackers. Instead of diving into the ML theory behind this, we will emphasize on examples of these systems working in the real world, the attacks that render them impotent, and how it affects developers looking to protect themselves from network intrusion. Most importantly, we will look towards the future of ML-based network intrusion detection.
3. My goal
• give an overview of Machine Learning Anomaly Detectors
• spark discussions on when/where/how to create these
• explore how “safe” these systems are
• discuss where we go from here
16. The big ML + Anomaly Detection Problem
a lot of machine learning + anomaly
detection research, but not a lot of
successful systems in the real world.
WHY?
17. The big ML + Anomaly Detection Problem
Anomaly Detection:
Traditional Machine Learning:
find novel attacks,
identify never seen before things
learn patterns,
identify similar things
18. What makes Anomaly Detection so
different?
fundamentally different from other ML problems
• very high cost of errors
• lack of training data
• “semantic gap”
• difficulties in evaluation
• adversarial setting
19. Really bad if the system is wrong…
•compared to other
learning applications,
very intolerant to errors
•what happens if we have a
high false positive rate?
•high false negative rate?
20. Lack of training data…
•what data to train model on?
•so hard to clean input data!
21. Hard to interpret the results/alerts…
the “semantic gap”
ok… I got the alert…
why did I get the alert…?
22. The evaluation problem
• devising a sound evaluation scheme is even more
difficult than building the system itself
• problems with relying on ML Anomaly Detection
evaluations in academic research papers
24. How have real world AD systems failed?
• many false positives
• hard to find attack-free training data
• used without deep understanding
• model-poisoning
30. How to select features?
• often ends up being the most challenging
piece of the problem
• isn’t it just a parameter optimization problem?
31. How to select features?
Difficulties:
• too many possible combinations to iterate!
• hard to evaluate
• frequently changing “optimal”
• performance accuracy not the only criteria
• improved model interpretability
• shorter training times
• enhanced generalization / reduced overfitting
32. Principal Component Analysis
• common statistical method to automatically
select features
How?
• transform data into different dimensions
• returns an ordered-list of dimensions
(principal components) that can best
represent data’s variance
54. Can Machine Learning be secure?
not easy to achieve for unsupervised, online learning
slow adversaries down
gives you time to
detect when you’re
being targeted
55. How do you defend against this?
Improved PCA
• Antidote
• Principal component pursuit
• Robust PCA
56. Robust statistics
use median instead of mean
PCA’s ‘variance’ maximization vs. Antidote’s ‘median absolute deviation’
find an appropriate distribution that models your dataset
normal/gaussian vs. laplacian
distributions
use robust PCA
57. My own tests.
I ran my own simulations with
some real data…
why did I do this?
71. • not so good, but improving…
• pure ML-based anomaly detectors are still vulnerable
to compromise
• use ML to find features and thresholds, then run
streaming anomaly detection using static rules
Anomaly detection systems today
72. What next?
• do more tests on AD systems that others have created
• other defenses against poisoning techniques
• experiment on mode resilient ML models