Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse

Jonathan Sander, Chief Technology Officer
@sanderiam
Pushing Machine
Learning Down the
Security Stack to Make
It Effective

AGENDA
Machine Learning Basics to form
a vocabulary about what & why
Watching Machine Learning being
applied to SIEM
Why many SIEM Machine Learning
applications fail
The lesson we took away and how
we have applied it
“Machine Learning” by XKCD
(https://xkcd.com/1838/)

MACHINE
LEARNING
ALGORITHM
Neural network,
naïve Bayes,
decision tree,
clustering,
regression, etc.
MODEL
The way you will
make the algorithm
apply to your use
case
FEATURES
If the model is the
graph, then these
are the points and
lines
DATA
The reason it’s
called “data
science” is because
this is where the
real work is

“Deep Learning Cars” by Samuel Arzt
(https://www.youtube.com/watch?v=Aut32pR5PQA)

You use Machine Learning when you
know the data and the outcome, but
not how to turn one into the
other. (Sort of… but that’s a good place to
start)

WHY USE ML? WHAT IS IT DOING THAT’S ATTRACTIVE?
• Machine Learning makes prediction cheap.
• How many oil burning vehicles were there when the first wells were
dug?
• How many business problems were broken down into arithmetic before
the first computers were being introduced outside research?
• How much communication was digital ready when the internet was first
born?
• What problem will we transform from their current form to prediction
problems?

WE DIDN’T KNOW MACHINE LEARNING WAS A HAMMER YET,
BUT SIEM SURE LOOKED LIKE A NAIL…
• SIEM has tons of data coming in from many sources (when you’re
doing it right)
• The outcomes that are desired are pretty clear
– Find things that represent leading indicators and threats
– Guide systems and staff to address those conditions by arming them with
data
• SIEM begins its life as log aggregation, morphs into a “single pane
of glass” and then changes again to be “analytics” (or at least the
data stream for it)
• This is all rule based in the beginning, which is like the worst
game of whack a mole
• Then we have the emergence on UEBA and others that use more math
and ML methods to attempt to cut through that noise and pump up the
signal

It’s at this point we collectively learn the phrases
“false positive” and “false negative”

SIEM
EVT
FEEDS
SYSLOG
FEEDS
SECURITY
WINDOWS STUFF? NETWORK STUFF?LINUX/UNIX THREATS SCANNERSAPPS

IF AT FIRST YOU DON’T SUCCEED…
PEER GROUP ANALYSIS
HYBRID
REDEFINED SUCCESS
PEER GROUP ANALYSIS
This takes the data and corrals
it, but doesn’t solve all the
problems (ie. what happens when
peer groups shift with
seasons).
REDEFINED SUCCESS
The goal becomes pure anomaly
detection and there are efforts
to use “fine tuning” of Machine
Learning models and algorithms.
HYBRID
There is a combination of
aggressive rules and Machine
Learning. The rules essentially
try to narrow the scope of
data.

None of this is using Machine Learning
to do what it is best at doing. My
thesis is the issue is we never got
over being rule based – being
procedural. The troubles were not to be
solved by refinements in how the
systems work.The trouble is the data.

Who had the title or role Webmaster
at some point in their career?
Let’s embrace the Data Scientist.

FEATURES
ARE HARD
DATA
SCIENCE
MUST BE
DONE CLOSE
TO THE DATA
ALGORITHMS
GET THE
ATTENTION
BECAUSE
MATH SEEMS
LIKE THE
COOL PART
WHY WE CALL IT “DATA SCIENCE”

HOW THIS TRANSLATED INTO OUR JOURNEY
• We came out of the gate attacking the UEBA problem
at the top
– We were fooled by early success because we
modeled our data, which we knew well, in ways
that yielded useful results
– Then we tackled problems of scale,
architecture, etc. (geek comfort food)
• When we started feeding the system other data, it
failed
– That data didn’t fit our models, and we weren't
even looking at the right features for that
type of data in a similar model
– At this stage, I suspect even the algorithms
didn’t apply
• We scaled back immensely and decided to conquer
the data we knew well that we had success with to
UEBA

THINGS WE’VE IGNORED &
WHERE TO FIND THEM…
SUPERVISED VS. UNSUPERVISED MACHINE LEARNING
ADVERSARIAL MODELS
THE “HOW” FROM A TECH POINT OF VIEW
THE SUPER-COOL MATH!
The “deep learning” series by
3Blue1Brown
https://www.youtube.com/watch?v=aircAruvnKk
Get your hands dirty:
https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
The Master Algorithm
How the Quest for the Ultimate Learning
Machine Will Remake Our World
by Pedro Domingos
ISBN-13: 9780465065707
Prediction Machines
The Simple Economics of Artificial Intelligence
by Ajay Agrawal,
Joshua Gans & Avi Goldfarb
ISBN-13: 9781633695672
Thank you!

Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse

Recommended

Recommended

More Related Content

Similar to Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse

Similar to Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse (20)

Recently uploaded

Recently uploaded (20)

Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse

Editor's Notes