Presented 26-JUN-2018 at Identiverse. Tells the story of how STEALTHbits tired, failed, and then learned to use Machine Learning in their Security solutions. Along the way you learn about foundation concepts in ML (e.g. features, data science), and you get to have an emotional connection to a fake toy car.
Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse
1. Jonathan Sander, Chief Technology Officer
@sanderiam
Pushing Machine
Learning Down the
Security Stack to Make
It Effective
2. AGENDA
Machine Learning Basics to form
a vocabulary about what & why
Watching Machine Learning being
applied to SIEM
Why many SIEM Machine Learning
applications fail
The lesson we took away and how
we have applied it
“Machine Learning” by XKCD
(https://xkcd.com/1838/)
3. MACHINE
LEARNING
ALGORITHM
Neural network,
naïve Bayes,
decision tree,
clustering,
regression, etc.
MODEL
The way you will
make the algorithm
apply to your use
case
FEATURES
If the model is the
graph, then these
are the points and
lines
DATA
The reason it’s
called “data
science” is because
this is where the
real work is
4. “Deep Learning Cars” by Samuel Arzt
(https://www.youtube.com/watch?v=Aut32pR5PQA)
5. You use Machine Learning when you
know the data and the outcome, but
not how to turn one into the
other. (Sort of… but that’s a good place to
start)
6. WHY USE ML? WHAT IS IT DOING THAT’S ATTRACTIVE?
• Machine Learning makes prediction cheap.
• How many oil burning vehicles were there when the first wells were
dug?
• How many business problems were broken down into arithmetic before
the first computers were being introduced outside research?
• How much communication was digital ready when the internet was first
born?
• What problem will we transform from their current form to prediction
problems?
7. WE DIDN’T KNOW MACHINE LEARNING WAS A HAMMER YET,
BUT SIEM SURE LOOKED LIKE A NAIL…
• SIEM has tons of data coming in from many sources (when you’re
doing it right)
• The outcomes that are desired are pretty clear
– Find things that represent leading indicators and threats
– Guide systems and staff to address those conditions by arming them with
data
• SIEM begins its life as log aggregation, morphs into a “single pane
of glass” and then changes again to be “analytics” (or at least the
data stream for it)
• This is all rule based in the beginning, which is like the worst
game of whack a mole
• Then we have the emergence on UEBA and others that use more math
and ML methods to attempt to cut through that noise and pump up the
signal
8. It’s at this point we collectively learn the phrases
“false positive” and “false negative”
10. IF AT FIRST YOU DON’T SUCCEED…
PEER GROUP ANALYSIS
HYBRID
REDEFINED SUCCESS
PEER GROUP ANALYSIS
This takes the data and corrals
it, but doesn’t solve all the
problems (ie. what happens when
peer groups shift with
seasons).
REDEFINED SUCCESS
The goal becomes pure anomaly
detection and there are efforts
to use “fine tuning” of Machine
Learning models and algorithms.
HYBRID
There is a combination of
aggressive rules and Machine
Learning. The rules essentially
try to narrow the scope of
data.
11. None of this is using Machine Learning
to do what it is best at doing. My
thesis is the issue is we never got
over being rule based – being
procedural. The troubles were not to be
solved by refinements in how the
systems work.The trouble is the data.
12. Who had the title or role Webmaster
at some point in their career?
Let’s embrace the Data Scientist.
14. HOW THIS TRANSLATED INTO OUR JOURNEY
• We came out of the gate attacking the UEBA problem
at the top
– We were fooled by early success because we
modeled our data, which we knew well, in ways
that yielded useful results
– Then we tackled problems of scale,
architecture, etc. (geek comfort food)
• When we started feeding the system other data, it
failed
– That data didn’t fit our models, and we weren't
even looking at the right features for that
type of data in a similar model
– At this stage, I suspect even the algorithms
didn’t apply
• We scaled back immensely and decided to conquer
the data we knew well that we had success with to
UEBA
15. THINGS WE’VE IGNORED &
WHERE TO FIND THEM…
SUPERVISED VS. UNSUPERVISED MACHINE LEARNING
ADVERSARIAL MODELS
THE “HOW” FROM A TECH POINT OF VIEW
THE SUPER-COOL MATH!
The “deep learning” series by
3Blue1Brown
https://www.youtube.com/watch?v=aircAruvnKk
Get your hands dirty:
https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
The Master Algorithm
How the Quest for the Ultimate Learning
Machine Will Remake Our World
by Pedro Domingos
ISBN-13: 9780465065707
Prediction Machines
The Simple Economics of Artificial Intelligence
by Ajay Agrawal,
Joshua Gans & Avi Goldfarb
ISBN-13: 9781633695672
Thank you!
Editor's Notes
An algorithm
Neural network, naïve Bayes, decision tree, clustering, regression, …
A model
The way you will make the algorithm apply to your use case
The features
If the model is the graph, then these are the points and lines
The data
The reason it’s called “data science” is because this is where the real work is
We get things like peer group analysis
This takes the data and corrals it, but doesn’t solve all the problems
For example, what happens when peer groups shift with seasons
Some redefine success
The goal becomes pure anomaly detection
There are efforts to use “fine tuning” of ML models and algorithms
Others go “hybrid”
There is a combination of aggressive rules and ML
The rules essentially try to narrow the scope of data
Features are hard
Everyone in software knows that picking what to measure is hard
This is doubly so when you don’t get explicit feedback on feature impact
Data science must be done close to the data
The ingenuity of measuring the “lines of sight” for the cars does not translate
Bring your expertise. You will find you know features instinctively – hardest part is making that explicit.
Algorithms get the attention because (ironically) math seems like the cool part
…but picking the right algorithm is also about knowing the data
We came out of the gate attacking the UEBA problem at the top
We were fooled by early success because we modeled our data, which we knew well, in ways that yielded useful results
Then we tackled problems of scale, architecture, etc. (geek comfort food)
When we started feeding the system other data, it failed
That data didn’t fit our models, and we weren't even looking at the right features for that type of data in a similar model
At this stage, I suspect even the algorithms didn’t apply
We scaled back immensely and decided to conquer the data we knew well that we had success with to start with – and did so with a vastly simpler set of technology