Extending Splunk with
Machine-learning Predictive Analytics

Rich Collier
Solutions Architect
rich@prelert.com
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Dete...
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Dete...
Overcoming limitations of
Human Analysis
• Judging what’s “normal” is not always easy

• Humans don’t always choose the ri...
IPTables (firewall)
• How to find most anomalous users (aggressive
brute force attackers)?
• Here is a typical (manual) pr...
Step 1) Search

Questions:
What’s normal?
What about that
spike?
Probably should try to visualize counts by SRC over time…
Step 2) stats
command, sort by
count

Question: How to
show as a function
of time, not just
overall?
Step 3) add
bucketing for
breakdown by time

Question: What is
an anomalous
count per bucket?
100? 1000?
10,000? Maybe we ...
Step 4) add some
“basic” statistical
analysis:
avg +/- 2
Question: How to
show the individual
“outliers” (and not
lose the...
Step 5) use
eventstats to repair
time problem and
add “where” clause
to only show those
outside of +/-2
Question: Are thes...
Problem: Statistical modeling is
INCORRECT for this data
– (-75) events doesn’t make
sense for avg - 2
– how much confiden...
The Problem: +/-2
assumes data is
Gaussian (Bell Curve)
Clearly, this data is
better fit by a
Poisson curve
Examples of Non-Gaussian Data
status=503
Memory Utilization

CPU load

status=404
Revenue Transactions
One More Problem…
• Even if the demonstrated technique was
accurate:
– Still need to persist what you’ve learned “so far”
...
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Dete...
First, an Analogy
• How could I accurately predict how much
Postal-mail you are likely to get delivered to
your home tomor...
I Would…
• Watch your mail delivery for a while
– 1 day?
– 1 week?
– 1 month?
– 1 year?

• Use my observations to create a...
Average?
Std. Deviation?
Probability Distribution Function?
A Probability Distribution Function!
% likelihood (probability)

Best for my house

pieces of mail per day
A Probability Distribution Function!
% likelihood (probability)

College Student?

pieces of mail per day
% likelihood (probability)

A Probability Distribution Function!

My Mom

pieces of mail per day
Using Machine Learning
to build a Probability Distribution Function
• PDF must be built specifically for each
“instance”
•...
Using Machine Learning
to build a Probability Distribution Function

23
Now what?
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Dete...
Finding “what’s unexpected”…
Your job is often looking for unexpected change in your
environment, either proactively throu...
% likelihood (probability)

Using the PDF to Find
What is Unexpected
zero pieces
of mail?
fifteen
pieces of
mail?

pieces ...
Relate back to data in Splunk
• # Pieces of mail = # events of a certain type
– number of failed logins
– number of errors...
Back to our Example!
• Prelert Anomaly Detective
– Automatically, and correctly
models data via self-learning
– Applies sophisticated
Bayesian ...
• Results are:
– Accurate outliers
– Automatically clustered
and scored by their
probabilistic “unlikelihood”
– Relevant i...
• Drill-downs:
– Automatically constructs
useful search syntax and
time selection
– Shows anomalies in
context of the orig...
Automated Anomaly Detection

• Less time searching & troubleshooting
• Proactive trustworthy alerts without
thresholds
• A...
Automated Anomaly Detection for
splunk>

Additional
Use Cases
Use Case
• Data sources:
– App logs
– Network performance
– SQL-Server metrics

• Prelert identifies
network discards that...
Use Case
• Data source: Netstat
• Prelert finds a rare FTP
connection from a
server that doesn’t
normally use FTP

Servers...
Use Case
• Data source: Custom
logs
• Prelert identifies unusual
$0.60 transaction –
traced to bug in currency
conversion
...
Use Case
• Data source:
BlueCoat proxy
• Prelert identifies
users abusing
Internet privileges
gambling sites

porn sites

...
Use Case
• Response time of
online bank website
• Prelert alerts on
spikes without the
need to create a
single threshold

...
Use Case
• Data source: BlueCoat
proxy
• Prelert identifies client
attempting to exploit an
outside IIS webserver

Unusual...
Automated Anomaly Detection
for splunk>
Upcoming SlideShare
Loading in...5
×

SplunkLive! Prelert Session - Extending Splunk with Machine Learning

1,359

Published on

Published in: Technology, Education
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,359
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • [no audio here]
  • Probability of data comes in all shapes and sizes – rarely does it fit a nice bell curve
  • Transcript of "SplunkLive! Prelert Session - Extending Splunk with Machine Learning"

    1. 1. Extending Splunk with Machine-learning Predictive Analytics Rich Collier Solutions Architect rich@prelert.com
    2. 2. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    3. 3. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    4. 4. Overcoming limitations of Human Analysis • Judging what’s “normal” is not always easy • Humans don’t always choose the right techniques
    5. 5. IPTables (firewall) • How to find most anomalous users (aggressive brute force attackers)? • Here is a typical (manual) process
    6. 6. Step 1) Search Questions: What’s normal? What about that spike? Probably should try to visualize counts by SRC over time…
    7. 7. Step 2) stats command, sort by count Question: How to show as a function of time, not just overall?
    8. 8. Step 3) add bucketing for breakdown by time Question: What is an anomalous count per bucket? 100? 1000? 10,000? Maybe we should try to use some more stats?
    9. 9. Step 4) add some “basic” statistical analysis: avg +/- 2 Question: How to show the individual “outliers” (and not lose the concept of time)?
    10. 10. Step 5) use eventstats to repair time problem and add “where” clause to only show those outside of +/-2 Question: Are these 161 results accurate? (I hope you didn’t build an alert and get 161 of them!)
    11. 11. Problem: Statistical modeling is INCORRECT for this data – (-75) events doesn’t make sense for avg - 2 – how much confidence do you have in avg + 2 ? Result: • Wrong model= false positives/negatives
    12. 12. The Problem: +/-2 assumes data is Gaussian (Bell Curve) Clearly, this data is better fit by a Poisson curve
    13. 13. Examples of Non-Gaussian Data status=503 Memory Utilization CPU load status=404 Revenue Transactions
    14. 14. One More Problem… • Even if the demonstrated technique was accurate: – Still need to persist what you’ve learned “so far” so that you don’t have to keep re-inspecting historical data as new data comes in – This requires you to manually write/read information into a summary index
    15. 15. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    16. 16. First, an Analogy • How could I accurately predict how much Postal-mail you are likely to get delivered to your home tomorrow?
    17. 17. I Would… • Watch your mail delivery for a while – 1 day? – 1 week? – 1 month? – 1 year? • Use my observations to create a…
    18. 18. Average? Std. Deviation? Probability Distribution Function?
    19. 19. A Probability Distribution Function! % likelihood (probability) Best for my house pieces of mail per day
    20. 20. A Probability Distribution Function! % likelihood (probability) College Student? pieces of mail per day
    21. 21. % likelihood (probability) A Probability Distribution Function! My Mom pieces of mail per day
    22. 22. Using Machine Learning to build a Probability Distribution Function • PDF must be built specifically for each “instance” • PDF should be constructed automatically merely by watching the data
    23. 23. Using Machine Learning to build a Probability Distribution Function 23
    24. 24. Now what?
    25. 25. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    26. 26. Finding “what’s unexpected”… Your job is often looking for unexpected change in your environment, either proactively through monitoring or reactively through diagnostics/troubleshooting
    27. 27. % likelihood (probability) Using the PDF to Find What is Unexpected zero pieces of mail? fifteen pieces of mail? pieces of mail per day
    28. 28. Relate back to data in Splunk • # Pieces of mail = # events of a certain type – number of failed logins – number of errors of different types – number of events with certain status codes – etc. • Or, performance metrics – response time – utilization %
    29. 29. Back to our Example!
    30. 30. • Prelert Anomaly Detective – Automatically, and correctly models data via self-learning – Applies sophisticated Bayesian techniques – Persists “on-going” analysis to allow real-time alerting – Makes it easy to use 3 significant alerts, not 161!
    31. 31. • Results are: – Accurate outliers – Automatically clustered and scored by their probabilistic “unlikelihood” – Relevant in time, easy to make alerts – Clickable for drill-down
    32. 32. • Drill-downs: – Automatically constructs useful search syntax and time selection – Shows anomalies in context of the original data – Serve as a possible jumping-off point for subsequent manual mining
    33. 33. Automated Anomaly Detection • Less time searching & troubleshooting • Proactive trustworthy alerts without thresholds • Auto-discovers the previously unknown
    34. 34. Automated Anomaly Detection for splunk> Additional Use Cases
    35. 35. Use Case • Data sources: – App logs – Network performance – SQL-Server metrics • Prelert identifies network discards that cause app to disconnect from DB Correlating Anomalies Across Data Types
    36. 36. Use Case • Data source: Netstat • Prelert finds a rare FTP connection from a server that doesn’t normally use FTP Servers making unusual TCP connections
    37. 37. Use Case • Data source: Custom logs • Prelert identifies unusual $0.60 transaction – traced to bug in currency conversion Revenue Transactions
    38. 38. Use Case • Data source: BlueCoat proxy • Prelert identifies users abusing Internet privileges gambling sites porn sites Clients pervasively visiting rare URLs
    39. 39. Use Case • Response time of online bank website • Prelert alerts on spikes without the need to create a single threshold Monitoring Performance w/o Thresholds
    40. 40. Use Case • Data source: BlueCoat proxy • Prelert identifies client attempting to exploit an outside IIS webserver Unusual outbound traffic rates
    41. 41. Automated Anomaly Detection for splunk>

    ×