Strata 2014 Anomaly Detection

10,088 views

Published on

This talk describes the general architecture common to anomaly detections systems that are based on probabilistic models. By examining several realistic use cases, I illustrate the common themes and practical implementation methods.

Published in: Technology
2 Comments
48 Likes
Statistics
Notes
No Downloads
Views
Total views
10,088
On SlideShare
0
From Embeds
0
Number of Embeds
223
Actions
Shares
0
Downloads
358
Comments
2
Likes
48
Embeds 0
No embeds

No notes for slide

Strata 2014 Anomaly Detection

  1. 1. ® Anomaly Detection © MapR Technologies, confidential ®
  2. 2. Agenda •  •  •  •  •  What is anomaly detection? Some examples Some generalization More interesting examples Sample implementation methods © MapR Technologies, confidential ®
  3. 3. Who I am •  Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org @ted_dunning •  Committer, mentor, champion, PMC member on several Apache projects •  Mahout, Drill, Zookeeper others © MapR Technologies, confidential ®
  4. 4. Who we are •  MapR makes the technology leading distribution including Hadoop •  MapR integrates real-time data semantics directly into a system that also runs Hadoop programs seamlessly •  The biggest and best choose MapR –  Google, Amazon –  Largest credit card, retailer, health insurance, telco –  Ping me for info © MapR Technologies, confidential ®
  5. 5. What is Anomaly Detection? •  What just happened that shouldn’t? –  but I don’t know what failure looks like (yet) •  Find the problem before other people see it –  especially customers and CEO’s •  But don’t wake me up if it isn’t really broken © MapR Technologies, confidential ®
  6. 6. © MapR Technologies, confidential ®
  7. 7. Looks pretty anomalous to me © MapR Technologies, confidential ®
  8. 8. Will the real anomaly please stand up? © MapR Technologies, confidential ®
  9. 9. What Are We Really Doing •  We want action when something breaks (dies/falls over/otherwise gets in trouble) •  But action is expensive •  So we don’t want false alarms •  And we don’t want false negatives •  We need to trade off costs © MapR Technologies, confidential ®
  10. 10. A Second Look © MapR Technologies, confidential ®
  11. 11. A Second Look 99.9%-ile © MapR Technologies, confidential ®
  12. 12. With Spikes © MapR Technologies, confidential ®
  13. 13. With Spikes 99.9%-ile including spikes © MapR Technologies, confidential ®
  14. 14. How Hard Can it Be? x x>t? Alarm ! t Online Summarizer 99.9%-ile © MapR Technologies, confidential ®
  15. 15. On-line Percentile Estimates •  Apache Mahout has on-line percentile estimator –  very high accuracy for extreme tails –  new in version 0.9 !! •  What’s the big deal with anomaly detection? •  This looks like a solved problem © MapR Technologies, confidential ®
  16. 16. Spot the Anomaly Anomaly? © MapR Technologies, confidential ®
  17. 17. Maybe not! © MapR Technologies, confidential ®
  18. 18. Where’s Waldo? This is the real anomaly © MapR Technologies, confidential ®
  19. 19. Normal Isn’t Just Normal •  What we want is a model of what is normal •  What doesn’t fit the model is the anomaly •  For simple signals, the model can be simple … x ~ N(0, ε ) •  The real world is rarely so accommodating © MapR Technologies, confidential ®
  20. 20. We Do Windows © MapR Technologies, confidential ®
  21. 21. We Do Windows © MapR Technologies, confidential ®
  22. 22. We Do Windows © MapR Technologies, confidential ®
  23. 23. We Do Windows © MapR Technologies, confidential ®
  24. 24. We Do Windows © MapR Technologies, confidential ®
  25. 25. We Do Windows © MapR Technologies, confidential ®
  26. 26. We Do Windows © MapR Technologies, confidential ®
  27. 27. We Do Windows © MapR Technologies, confidential ®
  28. 28. We Do Windows © MapR Technologies, confidential ®
  29. 29. We Do Windows © MapR Technologies, confidential ®
  30. 30. We Do Windows © MapR Technologies, confidential ®
  31. 31. We Do Windows © MapR Technologies, confidential ®
  32. 32. We Do Windows © MapR Technologies, confidential ®
  33. 33. We Do Windows © MapR Technologies, confidential ®
  34. 34. We Do Windows © MapR Technologies, confidential ®
  35. 35. Windows on the World •  The set of windowed signals is a nice model of our original signal •  Clustering can find the prototypes –  Fancier techniques available using sparse coding •  The result is a dictionary of shapes •  New signals can be encoded by shifting, scaling and adding shapes from the dictionary © MapR Technologies, confidential ®
  36. 36. Most Common Shapes (for EKG) © MapR Technologies, confidential ®
  37. 37. Reconstructed signal Original signal Reconstructed signal Reconstruction error © MapR Technologies, confidential ®
  38. 38. An Anomaly Original technique for finding 1-d anomaly works against reconstruction error © MapR Technologies, confidential ®
  39. 39. Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do. © MapR Technologies, confidential ®
  40. 40. A Different Kind of Anomaly © MapR Technologies, confidential ®
  41. 41. Model Delta Anomaly Detection δ + δ>t? Alarm ! t Model Online Summarizer 99.9%-ile © MapR Technologies, confidential ®
  42. 42. The Real Inside Scoop •  The model-delta anomaly detector is really just a mixture distribution –  the model we know about already –  and a normally distributed error •  The output (delta) is (roughly) the log probability of the mixture distribution •  Thinking about probability distributions is good © MapR Technologies, confidential ®
  43. 43. Example: Event Stream (timing) •  Events of various types arrive at irregular intervals –  we can assume Poisson distribution •  The key question is whether frequency has changed relative to expected values •  Want alert as soon as possible © MapR Technologies, confidential ®
  44. 44. Poisson Distribution •  Time between events is exponentially distributed Δt ~ λ e − λt •  This means that long delays are exponentially rare − λT P(Δt > T ) = e − log P(Δt > T ) = λT •  If we know λ we can select a good threshold –  or we can pick a threshold empirically © MapR Technologies, confidential ®
  45. 45. Converting Event Times to Anomaly 99.9%-ile 99.99%-ile © MapR Technologies, confidential ®
  46. 46. Example: Event Stream (sequence) •  The scenario: –  Web visitors are being subjected to phishing attacks –  The hook directs the visitors to a mocked login –  When they enter their credentials, the attacker uses them to log in –  We assume captcha’s or similar are part of the authentication so the attacker has to show the images on the login page to the visitor •  The problem: –  We don’t really know how the attack works © MapR Technologies, confidential ®
  47. 47. Normal Event Flow Image-1 Image-2 login Image-3 Image-4 © MapR Technologies, confidential ®
  48. 48. Phishing Flow Real login Image1 Image1 login Image1 Image1 Image1 Image1 login Image1 Image1 Fake login © MapR Technologies, confidential ®
  49. 49. Key Observations •  Regardless of exact details, there are patterns •  Event stream per user shows these patterns •  Phishing will have different patterns at much lower rate •  Measuring statistical surprise gives a good anomaly (fraud or malfunction) indicator © MapR Technologies, confidential ®
  50. 50. Recap (out of order) •  Anomaly detection is best done with a probability model •  Deep learning is a neat way to build this model –  converting to symbolic dynamics simplifies life •  -log p is a good way to convert to anomaly measure •  Adaptive quantile estimation works for autosetting thresholds © MapR Technologies, confidential ®
  51. 51. Recap •  Different systems require different models •  Continuous time-series –  sparse coding or deep learning to build signal model •  Events in time –  rate model base on variable rate Poisson –  segregated rate model •  Events with labels –  language modeling –  hidden Markov models © MapR Technologies, confidential ®
  52. 52. How Do I Build Such a System •  The key is to combine real-time and long-time –  real-time evaluates data stream against model –  long-time is how we build the model •  Extended Lambda architecture is my favorite •  See my other talks on slideshare.net for info •  Ping me directly © MapR Technologies, confidential ®
  53. 53. Hadoop is Not Very Real-time Unprocessed Data now t Fully processed Latest full period Hadoop job takes this long for this data © MapR Technologies, confidential ®
  54. 54. Real-time and Long-time together Blended View view now t Hadoop works great back here Storm works here © MapR Technologies, confidential ®
  55. 55. Who I am •  Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org @ted_dunning •  Committer, mentor, champion, PMC member on several Apache projects •  Mahout, Drill, Zookeeper others © MapR Technologies, confidential ®
  56. 56. © MapR Technologies, confidential ® ®

×