Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Splunk live! Customer Presentation – Prelert

355

Published on

From Splunklive! San Francisco

From Splunklive! San Francisco

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
355
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • [no audio here]
  • Probability of data comes in all shapes and sizes – rarely does it fit a nice bell curve
  • index="invite" | timechart span=1h count as mycount | predict mycount | rename upper95(prediction(mycount)) as ceiling | rename lower95(prediction(mycount)) as floor | eval alarm1=if(mycount > ceiling, "10000", "0") | eval alarm2=if(mycount < floor, "-10000", "0") | table _time,alarm1,alarm2,mycount,ceiling,floor
  • Prelert has users analyzing 100,000+ simultaneous unique metrics, not just 20!
  • Transcript

    • 1. Anomaly Detection using Machine Learning Predictive Analytics the anomaly detection company
    • 2. Terminology • Machine-learning  Autonomous self-learning without the assistance of humans (unsupervised learning) • Predictive Analytics  Probabilistic prediction of behavior based upon observed past behavior • Anomaly Detection  what’s “different” or weird” versus what’s “good” or “bad”
    • 3. Q: What’s Interesting Here? 3
    • 4. A: Only What’s Behaving Abnormally 4
    • 5. Anomaly Detection - an Analogy • How could I accurately predict how much Postal-mail you are likely to get delivered to your home tomorrow? • And, how would I know if the amount you received was “abnormal”?
    • 6. A practical methodology would involve… • First, determine what’s normal before I can declare what’s abnormal • Watch your mail delivery volume for a while…  1 day?  1 week?  1 month? • Notice, that you intuitively feel like you’ll gain accuracy in your predictions with more data that you see. • Ideally, use those observations to create a…
    • 7. Probability Distribution Function pieces of mail per day %likelihood(probability)
    • 8. Probability Distribution Function pieces of mail per day %likelihood(probability) Best for my house
    • 9. Probability Distribution Function pieces of mail per day %likelihood(probability) College Student?
    • 10. Probability Distribution Function pieces of mail per day %likelihood(probability) My Mom
    • 11. Finding “what’s unexpected”… Your job is often looking for unexpected change in your environment, either proactively through monitoring or reactively through diagnostics/troubleshooting
    • 12. Using the PDF to Find What is Unexpected pieces of mail per day %likelihood(probability) zero pieces of mail? fifteen pieces of mail?
    • 13. Relate back to IT and Security data • # Pieces of mail = # events of a certain type  Number of failed logins  Number of errors of different types  Number of events with certain status codes  Etc. • Or, performance metrics  Response time  Utilization % => Every kind of data will need its own unique “model” (probability distribution function)
    • 14. Do You Know How to Accurately Model? • Which one(s) models your data best? • You will want to get it right 14 source: “Doing Data Science” O’Neil & Schutt avg +/- 2 stdev assumes Gaussian (Normal) Distribution!
    • 15. Gaussian (“Normal”) Distribution 15
    • 16. Non-Gaussian Data status=503 status=404 CPU load Memory Utilization Revenue Transactions
    • 17. Standard Deviations – Not so Good 33,000+ performance metrics analyzed using +/- 2.5σ 0 1000 2000 3000 4000 5000 6000 7000 28 Feb 00:00 28 Feb 12:00 01 Mar 00:00 01 Mar 12:00 02 Mar 00:00 02 Mar 12:00 03 Mar 00:00 03 Mar 12:00 • Never less than 900 alerts per hour • Real outage (circled) overshadowed by ~6000 extraneous alerts Total # Alerts
    • 18. Don’t worry, we have you covered • Prelert uses sophisticated machine-learning techniques to best-fit the right statistical model for your data. • Better models = better outlier detection = less false alarms 20
    • 19. 21 DEMO
    • 20. Kinds of Anomalies Detected 22 Deviations in event count vs. time Deviations in values vs. time Rare occurrences of things Population/Peer outliers
    • 21. #1) Deviations in Event Counts/Rates • Use Case: Online Commerce Site  Cyclical online ordering volume (credit cards, etc.)  Service outage on May 10th orders not being processed, dip in afternoon volume 23
    • 22. Hard to automatically detect because… • Tricky to catch with thresholds because overall count didn’t dip below low watermark • Output of Splunk “predict”: 24
    • 23. Prelert finds the anomaly perfectly 25 • No extraneous false alarms • Despite the inherent challenges of the periodic nature of the data
    • 24. #2) Deviations in Performance Metrics • Use Case: Online travel portal • Makes web services calls to airlines for fare quotes • Each airline responds to fare request with its own typical response time (20 airlines): 26
    • 25. Hard to automatically detect because… • Tricky to construct unique thresholds for each airline individually • Cannot do “avg +/- 2σ” because it is too noisy for this kind of data • Splunk’s “predict” doesn’t support explosion out via by clause (“by airline”) 27
    • 26. Prelert finds the anomaly perfectly 28 • Only 1 of the many airlines is having an issue
    • 27. #3) Rare Items as Anomalies • Use Case: Security team @ services company • Wanted to profile typical processes on each host using netstat • Goal was to identify rare processes that “start up and communicate” for each host, individually 29
    • 28. Hard to automatically detect because… • Each host has it’s own separate “set” of typical processes that are potentially unique • i.e. FTP may run routinely run on server A, but never runs on server B • Maintaining a running list of “typical processes” across hundreds of servers not practical • Splunk “rare” command is not truly a rarity measurement, just “least occurring” 30
    • 29. Prelert finds the anomaly perfectly 31 • Finds FTP process running for 3 hours on system that doesn’t normally run FTP
    • 30. #4) Population / Peer Outliers • Use Case: Proxy log data  Need to determine which users/systems are sending out requests/data much differently than the others 32
    • 31. Hard to automatically detect because… • Peer analysis is impossible without Prelert 33
    • 32. Prelert finds the anomaly perfectly 34 • One particular host sending many requests (20,000/hr) to an IIS webserver • This is an attempt to hack the webserver
    • 33. Anomaly Detective App • Free to download and try – 100% native Splunk app • Easy to use – “push button anomaly detection” • More powerful anomaly detection than Splunk on its own • Scalable for big data sets 35 http://goo.gl/KJY9B
    • 34. Bonus – Anomaly Cross-Correlation • Use Case: Retail company with flaky POS application (gift card redemption)  App occasionally disconnects from DB  Team suspects either a DB or a network problem, but hard to find cause • Prelert configured to run anomaly detection across 3 data types simultaneously  App logs (unstructured) – count by dynamic message type  SQL Server performance metrics  Network performance metrics 36
    • 35. Result: Instant Answers 37 Symptom: Sudden influx of DB errors in log Symptom: Drop in SQL Server client connections Cause: Network spike and TCP discards

    ×