Successfully reported this slideshow.
Upcoming SlideShare
×

# SplunkLive! Prelert Session - Extending Splunk with Machine Learning

2,237 views

Published on

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### SplunkLive! Prelert Session - Extending Splunk with Machine Learning

1. 1. Extending Splunk with Machine-learning Predictive Analytics Rich Collier Solutions Architect rich@prelert.com
2. 2. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
3. 3. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
4. 4. Overcoming limitations of Human Analysis • Judging what’s “normal” is not always easy • Humans don’t always choose the right techniques
5. 5. IPTables (firewall) • How to find most anomalous users (aggressive brute force attackers)? • Here is a typical (manual) process
6. 6. Step 1) Search Questions: What’s normal? What about that spike? Probably should try to visualize counts by SRC over time…
7. 7. Step 2) stats command, sort by count Question: How to show as a function of time, not just overall?
8. 8. Step 3) add bucketing for breakdown by time Question: What is an anomalous count per bucket? 100? 1000? 10,000? Maybe we should try to use some more stats?
9. 9. Step 4) add some “basic” statistical analysis: avg +/- 2 Question: How to show the individual “outliers” (and not lose the concept of time)?
10. 10. Step 5) use eventstats to repair time problem and add “where” clause to only show those outside of +/-2 Question: Are these 161 results accurate? (I hope you didn’t build an alert and get 161 of them!)
11. 11. Problem: Statistical modeling is INCORRECT for this data – (-75) events doesn’t make sense for avg - 2 – how much confidence do you have in avg + 2 ? Result: • Wrong model= false positives/negatives
12. 12. The Problem: +/-2 assumes data is Gaussian (Bell Curve) Clearly, this data is better fit by a Poisson curve
13. 13. Examples of Non-Gaussian Data status=503 Memory Utilization CPU load status=404 Revenue Transactions
14. 14. One More Problem… • Even if the demonstrated technique was accurate: – Still need to persist what you’ve learned “so far” so that you don’t have to keep re-inspecting historical data as new data comes in – This requires you to manually write/read information into a summary index
15. 15. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
16. 16. First, an Analogy • How could I accurately predict how much Postal-mail you are likely to get delivered to your home tomorrow?
17. 17. I Would… • Watch your mail delivery for a while – 1 day? – 1 week? – 1 month? – 1 year? • Use my observations to create a…
18. 18. Average? Std. Deviation? Probability Distribution Function?
19. 19. A Probability Distribution Function! % likelihood (probability) Best for my house pieces of mail per day
20. 20. A Probability Distribution Function! % likelihood (probability) College Student? pieces of mail per day
21. 21. % likelihood (probability) A Probability Distribution Function! My Mom pieces of mail per day
22. 22. Using Machine Learning to build a Probability Distribution Function • PDF must be built specifically for each “instance” • PDF should be constructed automatically merely by watching the data
23. 23. Using Machine Learning to build a Probability Distribution Function 23
24. 24. Now what?
25. 25. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
26. 26. Finding “what’s unexpected”… Your job is often looking for unexpected change in your environment, either proactively through monitoring or reactively through diagnostics/troubleshooting
27. 27. % likelihood (probability) Using the PDF to Find What is Unexpected zero pieces of mail? fifteen pieces of mail? pieces of mail per day
28. 28. Relate back to data in Splunk • # Pieces of mail = # events of a certain type – number of failed logins – number of errors of different types – number of events with certain status codes – etc. • Or, performance metrics – response time – utilization %
29. 29. Back to our Example!
30. 30. • Prelert Anomaly Detective – Automatically, and correctly models data via self-learning – Applies sophisticated Bayesian techniques – Persists “on-going” analysis to allow real-time alerting – Makes it easy to use 3 significant alerts, not 161!
31. 31. • Results are: – Accurate outliers – Automatically clustered and scored by their probabilistic “unlikelihood” – Relevant in time, easy to make alerts – Clickable for drill-down
32. 32. • Drill-downs: – Automatically constructs useful search syntax and time selection – Shows anomalies in context of the original data – Serve as a possible jumping-off point for subsequent manual mining
33. 33. Automated Anomaly Detection • Less time searching & troubleshooting • Proactive trustworthy alerts without thresholds • Auto-discovers the previously unknown
34. 34. Automated Anomaly Detection for splunk> Additional Use Cases
35. 35. Use Case • Data sources: – App logs – Network performance – SQL-Server metrics • Prelert identifies network discards that cause app to disconnect from DB Correlating Anomalies Across Data Types
36. 36. Use Case • Data source: Netstat • Prelert finds a rare FTP connection from a server that doesn’t normally use FTP Servers making unusual TCP connections
37. 37. Use Case • Data source: Custom logs • Prelert identifies unusual \$0.60 transaction – traced to bug in currency conversion Revenue Transactions
38. 38. Use Case • Data source: BlueCoat proxy • Prelert identifies users abusing Internet privileges gambling sites porn sites Clients pervasively visiting rare URLs
39. 39. Use Case • Response time of online bank website • Prelert alerts on spikes without the need to create a single threshold Monitoring Performance w/o Thresholds
40. 40. Use Case • Data source: BlueCoat proxy • Prelert identifies client attempting to exploit an outside IIS webserver Unusual outbound traffic rates
41. 41. Automated Anomaly Detection for splunk>