SplunkLive! Prelert Session - Extending Splunk with Machine Learning

1,521
-1

Published on

Published in: Technology, Education
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,521
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • [no audio here]
  • Probability of data comes in all shapes and sizes – rarely does it fit a nice bell curve
  • SplunkLive! Prelert Session - Extending Splunk with Machine Learning

    1. 1. Extending Splunk with Machine-learning Predictive Analytics Rich Collier Solutions Architect rich@prelert.com
    2. 2. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    3. 3. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    4. 4. Overcoming limitations of Human Analysis • Judging what’s “normal” is not always easy • Humans don’t always choose the right techniques
    5. 5. IPTables (firewall) • How to find most anomalous users (aggressive brute force attackers)? • Here is a typical (manual) process
    6. 6. Step 1) Search Questions: What’s normal? What about that spike? Probably should try to visualize counts by SRC over time…
    7. 7. Step 2) stats command, sort by count Question: How to show as a function of time, not just overall?
    8. 8. Step 3) add bucketing for breakdown by time Question: What is an anomalous count per bucket? 100? 1000? 10,000? Maybe we should try to use some more stats?
    9. 9. Step 4) add some “basic” statistical analysis: avg +/- 2 Question: How to show the individual “outliers” (and not lose the concept of time)?
    10. 10. Step 5) use eventstats to repair time problem and add “where” clause to only show those outside of +/-2 Question: Are these 161 results accurate? (I hope you didn’t build an alert and get 161 of them!)
    11. 11. Problem: Statistical modeling is INCORRECT for this data – (-75) events doesn’t make sense for avg - 2 – how much confidence do you have in avg + 2 ? Result: • Wrong model= false positives/negatives
    12. 12. The Problem: +/-2 assumes data is Gaussian (Bell Curve) Clearly, this data is better fit by a Poisson curve
    13. 13. Examples of Non-Gaussian Data status=503 Memory Utilization CPU load status=404 Revenue Transactions
    14. 14. One More Problem… • Even if the demonstrated technique was accurate: – Still need to persist what you’ve learned “so far” so that you don’t have to keep re-inspecting historical data as new data comes in – This requires you to manually write/read information into a summary index
    15. 15. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    16. 16. First, an Analogy • How could I accurately predict how much Postal-mail you are likely to get delivered to your home tomorrow?
    17. 17. I Would… • Watch your mail delivery for a while – 1 day? – 1 week? – 1 month? – 1 year? • Use my observations to create a…
    18. 18. Average? Std. Deviation? Probability Distribution Function?
    19. 19. A Probability Distribution Function! % likelihood (probability) Best for my house pieces of mail per day
    20. 20. A Probability Distribution Function! % likelihood (probability) College Student? pieces of mail per day
    21. 21. % likelihood (probability) A Probability Distribution Function! My Mom pieces of mail per day
    22. 22. Using Machine Learning to build a Probability Distribution Function • PDF must be built specifically for each “instance” • PDF should be constructed automatically merely by watching the data
    23. 23. Using Machine Learning to build a Probability Distribution Function 23
    24. 24. Now what?
    25. 25. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
    26. 26. Finding “what’s unexpected”… Your job is often looking for unexpected change in your environment, either proactively through monitoring or reactively through diagnostics/troubleshooting
    27. 27. % likelihood (probability) Using the PDF to Find What is Unexpected zero pieces of mail? fifteen pieces of mail? pieces of mail per day
    28. 28. Relate back to data in Splunk • # Pieces of mail = # events of a certain type – number of failed logins – number of errors of different types – number of events with certain status codes – etc. • Or, performance metrics – response time – utilization %
    29. 29. Back to our Example!
    30. 30. • Prelert Anomaly Detective – Automatically, and correctly models data via self-learning – Applies sophisticated Bayesian techniques – Persists “on-going” analysis to allow real-time alerting – Makes it easy to use 3 significant alerts, not 161!
    31. 31. • Results are: – Accurate outliers – Automatically clustered and scored by their probabilistic “unlikelihood” – Relevant in time, easy to make alerts – Clickable for drill-down
    32. 32. • Drill-downs: – Automatically constructs useful search syntax and time selection – Shows anomalies in context of the original data – Serve as a possible jumping-off point for subsequent manual mining
    33. 33. Automated Anomaly Detection • Less time searching & troubleshooting • Proactive trustworthy alerts without thresholds • Auto-discovers the previously unknown
    34. 34. Automated Anomaly Detection for splunk> Additional Use Cases
    35. 35. Use Case • Data sources: – App logs – Network performance – SQL-Server metrics • Prelert identifies network discards that cause app to disconnect from DB Correlating Anomalies Across Data Types
    36. 36. Use Case • Data source: Netstat • Prelert finds a rare FTP connection from a server that doesn’t normally use FTP Servers making unusual TCP connections
    37. 37. Use Case • Data source: Custom logs • Prelert identifies unusual $0.60 transaction – traced to bug in currency conversion Revenue Transactions
    38. 38. Use Case • Data source: BlueCoat proxy • Prelert identifies users abusing Internet privileges gambling sites porn sites Clients pervasively visiting rare URLs
    39. 39. Use Case • Response time of online bank website • Prelert alerts on spikes without the need to create a single threshold Monitoring Performance w/o Thresholds
    40. 40. Use Case • Data source: BlueCoat proxy • Prelert identifies client attempting to exploit an outside IIS webserver Unusual outbound traffic rates
    41. 41. Automated Anomaly Detection for splunk>

    ×