How to find what you didn't know to look for, oractical anomaly detection
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
575
On Slideshare
575
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • USE THIS
  • USE THIS

Transcript

  • 1. © 2014 MapR Technologies 1 © MapR Technologies, confidential Anomaly Detection, Time-Series Data Bases and More How to Find What You Didn’t Know to Look For
  • 2. © 2014 MapR Technologies 2 Agenda • What is anomaly detection? • Some examples • Some generalization • Compression == Truth • Deep dive into deep learning • Why this matters for time series databases This goes beyond what was described in the abstract…
  • 3. © 2014 MapR Technologies 3 Who I am • Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org Twitter @ted_dunning • Committer, mentor, champion, PMC member on several Apache projects • Mahout, Drill, Zookeeper, Spark and others • Hashtag for today’s talk #HadoopSummit
  • 4. © 2014 MapR Technologies 4 Who we are • MapR makes the technology leading distribution including Hadoop • MapR integrates real-time data semantics directly into a system that also runs Hadoop programs seamlessly • The biggest and best choose MapR – Google, Amazon – Largest credit card, retailer, health insurance, telco – Ping me for info
  • 5. © 2014 MapR Technologies 5 A New Look at Anomaly Detection • O’Reilly series Practical Machine Learning 2nd volume • Print copies available at Hadoop Summit at MapR booth • eBook available at http://bit.ly/1jQ9QuL • Book signing by both authors at 4pm Wed (today)
  • 6. © 2014 MapR Technologies 6 What is Anomaly Detection? • What just happened that shouldn’t? – but I don’t know what failure looks like (yet) • Find the problem before other people see it – especially customers and CEO’s • But don’t wake me up if it isn’t really broken
  • 7. © 2014 MapR Technologies 7
  • 8. © 2014 MapR Technologies 8 Looks pretty anomalous to me
  • 9. © 2014 MapR Technologies 9 Will the real anomaly please stand up?
  • 10. © 2014 MapR Technologies 10 What Are We Really Doing • We want action when something breaks (dies/falls over/otherwise gets in trouble) • But action is expensive • So we don’t want false alarms • And we don’t want false negatives • We need to trade off costs
  • 11. © 2014 MapR Technologies 11 A Second Look
  • 12. © 2014 MapR Technologies 12 A Second Look 99.9%-ile
  • 13. © 2014 MapR Technologies 13 Online Summarizer 99.9%-ile t x > t ? Alarm ! x How Hard Can it Be?
  • 14. © 2014 MapR Technologies 14 On-line Percentile Estimates • Apache Mahout has on-line percentile estimator – very high accuracy for extreme tails – new in version 0.9 !! • What’s the big deal with anomaly detection? • This looks like a solved problem
  • 15. © 2014 MapR Technologies 15 Already Done? Etsy Skyline?
  • 16. © 2014 MapR Technologies 16 What About This? 0 5 10 15 −20246810 offset+noise+pulse1+pulse2 A B
  • 17. © 2014 MapR Technologies 17 Spot the Anomaly Anomaly?
  • 18. © 2014 MapR Technologies 18 Maybe not!
  • 19. © 2014 MapR Technologies 19 Where’s Waldo? This is the real anomaly
  • 20. © 2014 MapR Technologies 20 Normal Isn’t Just Normal • What we want is a model of what is normal • What doesn’t fit the model is the anomaly • For simple signals, the model can be simple … • The real world is rarely so accommodating x ~ N(0,e)
  • 21. © 2014 MapR Technologies 21 We Do Windows
  • 22. © 2014 MapR Technologies 22 We Do Windows
  • 23. © 2014 MapR Technologies 23 We Do Windows
  • 24. © 2014 MapR Technologies 24 We Do Windows
  • 25. © 2014 MapR Technologies 25 We Do Windows
  • 26. © 2014 MapR Technologies 26 We Do Windows
  • 27. © 2014 MapR Technologies 27 We Do Windows
  • 28. © 2014 MapR Technologies 28 We Do Windows
  • 29. © 2014 MapR Technologies 29 We Do Windows
  • 30. © 2014 MapR Technologies 30 We Do Windows
  • 31. © 2014 MapR Technologies 31 We Do Windows
  • 32. © 2014 MapR Technologies 32 We Do Windows
  • 33. © 2014 MapR Technologies 33 We Do Windows
  • 34. © 2014 MapR Technologies 34 We Do Windows
  • 35. © 2014 MapR Technologies 35 We Do Windows
  • 36. © 2014 MapR Technologies 36 Windows on the World • The set of windowed signals is a nice model of our original signal • Clustering can find the prototypes – Fancier techniques available using sparse coding • The result is a dictionary of shapes • New signals can be encoded by shifting, scaling and adding shapes from the dictionary
  • 37. © 2014 MapR Technologies 37 Most Common Shapes (for EKG)
  • 38. © 2014 MapR Technologies 38 Reconstructed signal Original signal Reconstructed signal Reconstruction error < 1 bit / sample
  • 39. © 2014 MapR Technologies 39 An Anomaly Original technique for finding 1-d anomaly works against reconstruction error
  • 40. © 2014 MapR Technologies 40 Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do.
  • 41. © 2014 MapR Technologies 41 A Different Kind of Anomaly
  • 42. © 2014 MapR Technologies 42 Model Delta Anomaly Detection Online Summarizer δ > t ? 99.9%-ile t Alarm ! Model - + δ
  • 43. © 2014 MapR Technologies 43 The Real Inside Scoop • The model-delta anomaly detector is really just a sum of random variables – the model we know about already – and a normally distributed error • The output (delta) is (roughly) the log probability of the sum distribution (really δ2) • Thinking about probability distributions is good
  • 44. © 2014 MapR Technologies 44 Example: Event Stream (timing) • Events of various types arrive at irregular intervals – we can assume Poisson distribution • The key question is whether frequency has changed relative to expected values • Want alert as soon as possible
  • 45. © 2014 MapR Technologies 45 Poisson Distribution • Time between events is exponentially distributed • This means that long delays are exponentially rare • If we know λ we can select a good threshold – or we can pick a threshold empirically Dt ~ le-lt P(Dt > T) = e-lT -logP(Dt > T) = lT
  • 46. © 2014 MapR Technologies 46 Converting Event Times to Anomaly 99.9%-ile 99.99%-ile
  • 47. © 2014 MapR Technologies 51 Recap (out of order) • Anomaly detection is best done with a probability model • -log p is a good way to convert to anomaly measure • Adaptive quantile estimation works for auto-setting thresholds
  • 48. © 2014 MapR Technologies 52 Recap • Different systems require different models • Continuous time-series – sparse coding to build signal model • Events in time – rate model base on variable rate Poisson – segregated rate model • Events with labels – language modeling – hidden Markov models
  • 49. © 2014 MapR Technologies 53 But Wait! Compression is Truth • Maximizing log πk is minimizing compressed size – (each symbol takes -log πk bits on average) • Maximizing log πk happens where πk = pk – (maximum likelihood principle)
  • 50. © 2014 MapR Technologies 54 But Auto-encoders Find Max Likelihood • Minimal error => maximum likelihood • Maximum likelihood => maximum compression • So good anomaly detectors give good compression
  • 51. © 2014 MapR Technologies 55 In Case You Want the Details E pk logpk[ ] = pk logpk k å log x £ x -1 pk log pk pkk å £ pk 1- pk pk æ è ç ö ø ÷ = pk k å - pk k å k å = 0 pk logpk - pk log pk k å £ 0 k å pk logpk £ pk log pk k å k å E pk logpk[ ] » 1 n logpxi i å
  • 52. © 2014 MapR Technologies 56 Pause To Reflect on Clustering • Use windowing to apportion signal – Hamming windows add up to 1 • Find nearest cluster for each window – Can use dot product because all clusters normalized • Scale cluster to right size – Dot product again • Subtract from original signal
  • 53. © 2014 MapR Technologies 57 Auto-encoding - Information Bottleneck Original Signal Encoder Reconstructed Signal
  • 54. © 2014 MapR Technologies 58 x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction Clustering as Neural Network Hidden layer is 1 of k Could be m of kSparsity allows k >> 100
  • 55. © 2014 MapR Technologies 59 Overlapping Networks Time series input Reconstructed time series ... ... ... ... ... ... ... ... ... ...
  • 56. © 2014 MapR Technologies 60 Deep Learning ... ... ... ... ... ... ... ... ... ... ... ... ... ... A B A'
  • 57. © 2014 MapR Technologies 61 What About the Database? • We don’t have to keep the reconstruction • We can keep the first level nodes – And the reconstruction error • To keep the first level nodes – We can keep the second level nodes – Plus the reconstruction error
  • 58. © 2014 MapR Technologies 62 What Does it Matter? • Even one level of auto-encoding compresses – 30-50x in EKG example with k-means • Multiple levels compress more – Understanding => Truth => Compression • Higher levels give semantic search
  • 59. © 2014 MapR Technologies 63 How Do I Build Such a System • The key is to combine real-time and long-time – real-time evaluates data stream against model – long-time is how we build the model • Extended Lambda architecture is my favorite • See my other talks on slideshare.net for info • Ping me directly
  • 60. © 2014 MapR Technologies 64 t now Hadoop is Not Very Real-time UnprocessedData Fully processed Latest full period Hadoop job takes this long for this data
  • 61. © 2014 MapR Technologies 65 t now Hadoop works great back here Storm works here Real-time and Long-time together Blended viewBlended viewBlended View
  • 62. © 2014 MapR Technologies 66 Who I am • Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org @ted_dunning • Committer, mentor, champion, PMC member on several Apache projects • Mahout, Drill, Zookeeper others
  • 63. © 2014 MapR Technologies 67 © MapR Technologies, confidential