How to find what you didn't know to look for, oractical anomaly detection
Upcoming SlideShare
Loading in...5
×
 

How to find what you didn't know to look for, oractical anomaly detection

on

  • 359 views

 

Statistics

Views

Total Views
359
Views on SlideShare
359
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • USE THIS
  • USE THIS

How to find what you didn't know to look for, oractical anomaly detection How to find what you didn't know to look for, oractical anomaly detection Presentation Transcript

  • © 2014 MapR Technologies 1 © MapR Technologies, confidential Anomaly Detection, Time-Series Data Bases and More How to Find What You Didn’t Know to Look For
  • © 2014 MapR Technologies 2 Agenda • What is anomaly detection? • Some examples • Some generalization • Compression == Truth • Deep dive into deep learning • Why this matters for time series databases This goes beyond what was described in the abstract…
  • © 2014 MapR Technologies 3 Who I am • Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org Twitter @ted_dunning • Committer, mentor, champion, PMC member on several Apache projects • Mahout, Drill, Zookeeper, Spark and others • Hashtag for today’s talk #HadoopSummit
  • © 2014 MapR Technologies 4 Who we are • MapR makes the technology leading distribution including Hadoop • MapR integrates real-time data semantics directly into a system that also runs Hadoop programs seamlessly • The biggest and best choose MapR – Google, Amazon – Largest credit card, retailer, health insurance, telco – Ping me for info
  • © 2014 MapR Technologies 5 A New Look at Anomaly Detection • O’Reilly series Practical Machine Learning 2nd volume • Print copies available at Hadoop Summit at MapR booth • eBook available at http://bit.ly/1jQ9QuL • Book signing by both authors at 4pm Wed (today)
  • © 2014 MapR Technologies 6 What is Anomaly Detection? • What just happened that shouldn’t? – but I don’t know what failure looks like (yet) • Find the problem before other people see it – especially customers and CEO’s • But don’t wake me up if it isn’t really broken
  • © 2014 MapR Technologies 7
  • © 2014 MapR Technologies 8 Looks pretty anomalous to me
  • © 2014 MapR Technologies 9 Will the real anomaly please stand up?
  • © 2014 MapR Technologies 10 What Are We Really Doing • We want action when something breaks (dies/falls over/otherwise gets in trouble) • But action is expensive • So we don’t want false alarms • And we don’t want false negatives • We need to trade off costs
  • © 2014 MapR Technologies 11 A Second Look
  • © 2014 MapR Technologies 12 A Second Look 99.9%-ile
  • © 2014 MapR Technologies 13 Online Summarizer 99.9%-ile t x > t ? Alarm ! x How Hard Can it Be?
  • © 2014 MapR Technologies 14 On-line Percentile Estimates • Apache Mahout has on-line percentile estimator – very high accuracy for extreme tails – new in version 0.9 !! • What’s the big deal with anomaly detection? • This looks like a solved problem
  • © 2014 MapR Technologies 15 Already Done? Etsy Skyline?
  • © 2014 MapR Technologies 16 What About This? 0 5 10 15 −20246810 offset+noise+pulse1+pulse2 A B
  • © 2014 MapR Technologies 17 Spot the Anomaly Anomaly?
  • © 2014 MapR Technologies 18 Maybe not!
  • © 2014 MapR Technologies 19 Where’s Waldo? This is the real anomaly
  • © 2014 MapR Technologies 20 Normal Isn’t Just Normal • What we want is a model of what is normal • What doesn’t fit the model is the anomaly • For simple signals, the model can be simple … • The real world is rarely so accommodating x ~ N(0,e)
  • © 2014 MapR Technologies 21 We Do Windows
  • © 2014 MapR Technologies 22 We Do Windows
  • © 2014 MapR Technologies 23 We Do Windows
  • © 2014 MapR Technologies 24 We Do Windows
  • © 2014 MapR Technologies 25 We Do Windows
  • © 2014 MapR Technologies 26 We Do Windows
  • © 2014 MapR Technologies 27 We Do Windows
  • © 2014 MapR Technologies 28 We Do Windows
  • © 2014 MapR Technologies 29 We Do Windows
  • © 2014 MapR Technologies 30 We Do Windows
  • © 2014 MapR Technologies 31 We Do Windows
  • © 2014 MapR Technologies 32 We Do Windows
  • © 2014 MapR Technologies 33 We Do Windows
  • © 2014 MapR Technologies 34 We Do Windows
  • © 2014 MapR Technologies 35 We Do Windows
  • © 2014 MapR Technologies 36 Windows on the World • The set of windowed signals is a nice model of our original signal • Clustering can find the prototypes – Fancier techniques available using sparse coding • The result is a dictionary of shapes • New signals can be encoded by shifting, scaling and adding shapes from the dictionary
  • © 2014 MapR Technologies 37 Most Common Shapes (for EKG)
  • © 2014 MapR Technologies 38 Reconstructed signal Original signal Reconstructed signal Reconstruction error < 1 bit / sample
  • © 2014 MapR Technologies 39 An Anomaly Original technique for finding 1-d anomaly works against reconstruction error
  • © 2014 MapR Technologies 40 Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do.
  • © 2014 MapR Technologies 41 A Different Kind of Anomaly
  • © 2014 MapR Technologies 42 Model Delta Anomaly Detection Online Summarizer δ > t ? 99.9%-ile t Alarm ! Model - + δ
  • © 2014 MapR Technologies 43 The Real Inside Scoop • The model-delta anomaly detector is really just a sum of random variables – the model we know about already – and a normally distributed error • The output (delta) is (roughly) the log probability of the sum distribution (really δ2) • Thinking about probability distributions is good
  • © 2014 MapR Technologies 44 Example: Event Stream (timing) • Events of various types arrive at irregular intervals – we can assume Poisson distribution • The key question is whether frequency has changed relative to expected values • Want alert as soon as possible
  • © 2014 MapR Technologies 45 Poisson Distribution • Time between events is exponentially distributed • This means that long delays are exponentially rare • If we know λ we can select a good threshold – or we can pick a threshold empirically Dt ~ le-lt P(Dt > T) = e-lT -logP(Dt > T) = lT
  • © 2014 MapR Technologies 46 Converting Event Times to Anomaly 99.9%-ile 99.99%-ile
  • © 2014 MapR Technologies 51 Recap (out of order) • Anomaly detection is best done with a probability model • -log p is a good way to convert to anomaly measure • Adaptive quantile estimation works for auto-setting thresholds
  • © 2014 MapR Technologies 52 Recap • Different systems require different models • Continuous time-series – sparse coding to build signal model • Events in time – rate model base on variable rate Poisson – segregated rate model • Events with labels – language modeling – hidden Markov models
  • © 2014 MapR Technologies 53 But Wait! Compression is Truth • Maximizing log πk is minimizing compressed size – (each symbol takes -log πk bits on average) • Maximizing log πk happens where πk = pk – (maximum likelihood principle)
  • © 2014 MapR Technologies 54 But Auto-encoders Find Max Likelihood • Minimal error => maximum likelihood • Maximum likelihood => maximum compression • So good anomaly detectors give good compression
  • © 2014 MapR Technologies 55 In Case You Want the Details E pk logpk[ ] = pk logpk k å log x £ x -1 pk log pk pkk å £ pk 1- pk pk æ è ç ö ø ÷ = pk k å - pk k å k å = 0 pk logpk - pk log pk k å £ 0 k å pk logpk £ pk log pk k å k å E pk logpk[ ] » 1 n logpxi i å
  • © 2014 MapR Technologies 56 Pause To Reflect on Clustering • Use windowing to apportion signal – Hamming windows add up to 1 • Find nearest cluster for each window – Can use dot product because all clusters normalized • Scale cluster to right size – Dot product again • Subtract from original signal
  • © 2014 MapR Technologies 57 Auto-encoding - Information Bottleneck Original Signal Encoder Reconstructed Signal
  • © 2014 MapR Technologies 58 x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction Clustering as Neural Network Hidden layer is 1 of k Could be m of kSparsity allows k >> 100
  • © 2014 MapR Technologies 59 Overlapping Networks Time series input Reconstructed time series ... ... ... ... ... ... ... ... ... ...
  • © 2014 MapR Technologies 60 Deep Learning ... ... ... ... ... ... ... ... ... ... ... ... ... ... A B A'
  • © 2014 MapR Technologies 61 What About the Database? • We don’t have to keep the reconstruction • We can keep the first level nodes – And the reconstruction error • To keep the first level nodes – We can keep the second level nodes – Plus the reconstruction error
  • © 2014 MapR Technologies 62 What Does it Matter? • Even one level of auto-encoding compresses – 30-50x in EKG example with k-means • Multiple levels compress more – Understanding => Truth => Compression • Higher levels give semantic search
  • © 2014 MapR Technologies 63 How Do I Build Such a System • The key is to combine real-time and long-time – real-time evaluates data stream against model – long-time is how we build the model • Extended Lambda architecture is my favorite • See my other talks on slideshare.net for info • Ping me directly
  • © 2014 MapR Technologies 64 t now Hadoop is Not Very Real-time UnprocessedData Fully processed Latest full period Hadoop job takes this long for this data
  • © 2014 MapR Technologies 65 t now Hadoop works great back here Storm works here Real-time and Long-time together Blended viewBlended viewBlended View
  • © 2014 MapR Technologies 66 Who I am • Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org @ted_dunning • Committer, mentor, champion, PMC member on several Apache projects • Mahout, Drill, Zookeeper others
  • © 2014 MapR Technologies 67 © MapR Technologies, confidential