More Related Content
Similar to How to find what you didn't know to look for, oractical anomaly detection (20)
More from DataWorks Summit (20)
How to find what you didn't know to look for, oractical anomaly detection
- 1. © 2014 MapR Technologies 1
© MapR Technologies, confidential
Anomaly Detection, Time-Series Data Bases
and More
How to Find What You Didn’t
Know to Look For
- 2. © 2014 MapR Technologies 2
Agenda
• What is anomaly detection?
• Some examples
• Some generalization
• Compression == Truth
• Deep dive into deep learning
• Why this matters for time series databases
This goes beyond what was described in the abstract…
- 3. © 2014 MapR Technologies 3
Who I am
• Ted Dunning, Chief Application Architect, MapR
tdunning@mapr.com
tdunning@apache.org
Twitter @ted_dunning
• Committer, mentor, champion, PMC member on several Apache
projects
• Mahout, Drill, Zookeeper, Spark and others
• Hashtag for today’s talk #HadoopSummit
- 4. © 2014 MapR Technologies 4
Who we are
• MapR makes the technology leading distribution including
Hadoop
• MapR integrates real-time data semantics directly into a system
that also runs Hadoop programs seamlessly
• The biggest and best choose MapR
– Google, Amazon
– Largest credit card, retailer, health insurance, telco
– Ping me for info
- 5. © 2014 MapR Technologies 5
A New Look at Anomaly Detection
• O’Reilly series Practical Machine Learning 2nd volume
• Print copies available at Hadoop Summit at MapR booth
• eBook available at http://bit.ly/1jQ9QuL
• Book signing by both authors at
4pm Wed (today)
- 6. © 2014 MapR Technologies 6
What is Anomaly Detection?
• What just happened that shouldn’t?
– but I don’t know what failure looks like (yet)
• Find the problem before other people see it
– especially customers and CEO’s
• But don’t wake me up if it isn’t really broken
- 8. © 2014 MapR Technologies 8
Looks pretty
anomalous
to me
- 9. © 2014 MapR Technologies 9
Will the real anomaly
please stand up?
- 10. © 2014 MapR Technologies 10
What Are We Really Doing
• We want action when something breaks
(dies/falls over/otherwise gets in trouble)
• But action is expensive
• So we don’t want false alarms
• And we don’t want false negatives
• We need to trade off costs
- 12. © 2014 MapR Technologies 12
A Second Look
99.9%-ile
- 13. © 2014 MapR Technologies 13
Online
Summarizer
99.9%-ile
t
x > t ? Alarm !
x
How Hard Can it Be?
- 14. © 2014 MapR Technologies 14
On-line Percentile Estimates
• Apache Mahout has on-line percentile estimator
– very high accuracy for extreme tails
– new in version 0.9 !!
• What’s the big deal with anomaly detection?
• This looks like a solved problem
- 15. © 2014 MapR Technologies 15
Already Done? Etsy Skyline?
- 16. © 2014 MapR Technologies 16
What About This?
0 5 10 15
−20246810
offset+noise+pulse1+pulse2
A
B
- 17. © 2014 MapR Technologies 17
Spot the Anomaly
Anomaly?
- 19. © 2014 MapR Technologies 19
Where’s Waldo?
This is the real
anomaly
- 20. © 2014 MapR Technologies 20
Normal Isn’t Just Normal
• What we want is a model of what is normal
• What doesn’t fit the model is the anomaly
• For simple signals, the model can be simple …
• The real world is rarely so accommodating
x ~ N(0,e)
- 36. © 2014 MapR Technologies 36
Windows on the World
• The set of windowed signals is a nice model of our original signal
• Clustering can find the prototypes
– Fancier techniques available using sparse coding
• The result is a dictionary of shapes
• New signals can be encoded by shifting, scaling and adding
shapes from the dictionary
- 37. © 2014 MapR Technologies 37
Most Common Shapes (for EKG)
- 38. © 2014 MapR Technologies 38
Reconstructed signal
Original
signal
Reconstructed
signal
Reconstruction
error
< 1 bit / sample
- 39. © 2014 MapR Technologies 39
An Anomaly
Original technique for finding
1-d anomaly works against
reconstruction error
- 40. © 2014 MapR Technologies 40
Close-up of anomaly
Not what you want your
heart to do.
And not what the model
expects it to do.
- 41. © 2014 MapR Technologies 41
A Different Kind of Anomaly
- 42. © 2014 MapR Technologies 42
Model Delta Anomaly Detection
Online
Summarizer
δ > t ?
99.9%-ile
t
Alarm !
Model
-
+ δ
- 43. © 2014 MapR Technologies 43
The Real Inside Scoop
• The model-delta anomaly detector is really just a sum of random
variables
– the model we know about already
– and a normally distributed error
• The output (delta) is (roughly) the log probability of the sum
distribution (really δ2)
• Thinking about probability distributions is good
- 44. © 2014 MapR Technologies 44
Example: Event Stream (timing)
• Events of various types arrive at irregular intervals
– we can assume Poisson distribution
• The key question is whether frequency has changed relative to
expected values
• Want alert as soon as possible
- 45. © 2014 MapR Technologies 45
Poisson Distribution
• Time between events is exponentially distributed
• This means that long delays are exponentially rare
• If we know λ we can select a good threshold
– or we can pick a threshold empirically
Dt ~ le-lt
P(Dt > T) = e-lT
-logP(Dt > T) = lT
- 46. © 2014 MapR Technologies 46
Converting Event Times to Anomaly
99.9%-ile
99.99%-ile
- 47. © 2014 MapR Technologies 51
Recap (out of order)
• Anomaly detection is best done with a probability model
• -log p is a good way to convert to anomaly measure
• Adaptive quantile estimation works for auto-setting thresholds
- 48. © 2014 MapR Technologies 52
Recap
• Different systems require different models
• Continuous time-series
– sparse coding to build signal model
• Events in time
– rate model base on variable rate Poisson
– segregated rate model
• Events with labels
– language modeling
– hidden Markov models
- 49. © 2014 MapR Technologies 53
But Wait! Compression is Truth
• Maximizing log πk is minimizing compressed size
– (each symbol takes -log πk bits on average)
• Maximizing log πk happens where πk = pk
– (maximum likelihood principle)
- 50. © 2014 MapR Technologies 54
But Auto-encoders Find Max Likelihood
• Minimal error => maximum likelihood
• Maximum likelihood => maximum compression
• So good anomaly detectors give good compression
- 51. © 2014 MapR Technologies 55
In Case You Want the Details
E pk logpk[ ] = pk logpk
k
å
log x £ x -1
pk log
pk
pkk
å £ pk 1-
pk
pk
æ
è
ç
ö
ø
÷ = pk
k
å - pk
k
å
k
å = 0
pk logpk - pk log pk
k
å £ 0
k
å
pk logpk £ pk log pk
k
å
k
å
E pk logpk[ ] »
1
n
logpxi
i
å
- 52. © 2014 MapR Technologies 56
Pause To Reflect on Clustering
• Use windowing to apportion signal
– Hamming windows add up to 1
• Find nearest cluster for each window
– Can use dot product because all clusters normalized
• Scale cluster to right size
– Dot product again
• Subtract from original signal
- 53. © 2014 MapR Technologies 57
Auto-encoding - Information Bottleneck
Original Signal
Encoder
Reconstructed Signal
- 54. © 2014 MapR Technologies 58
x1 x2
...
xn-1 xn
x'1 x'2
...
x'n-1 x'n
Input
Hidden layer
(clusters)
Reconstruction
Clustering as Neural Network
Hidden layer is 1
of k
Could be m
of kSparsity allows k >>
100
- 55. © 2014 MapR Technologies 59
Overlapping Networks
Time series input
Reconstructed time series
...
...
...
...
...
...
...
...
...
...
- 56. © 2014 MapR Technologies 60
Deep Learning
...
...
...
...
... ... ... ... ...
... ... ... ... ...
A
B
A'
- 57. © 2014 MapR Technologies 61
What About the Database?
• We don’t have to keep the reconstruction
• We can keep the first level nodes
– And the reconstruction error
• To keep the first level nodes
– We can keep the second level nodes
– Plus the reconstruction error
- 58. © 2014 MapR Technologies 62
What Does it Matter?
• Even one level of auto-encoding compresses
– 30-50x in EKG example with k-means
• Multiple levels compress more
– Understanding => Truth => Compression
• Higher levels give semantic search
- 59. © 2014 MapR Technologies 63
How Do I Build Such a System
• The key is to combine real-time and long-time
– real-time evaluates data stream against model
– long-time is how we build the model
• Extended Lambda architecture is my favorite
• See my other talks on slideshare.net for info
• Ping me directly
- 60. © 2014 MapR Technologies 64
t
now
Hadoop is Not Very Real-time
UnprocessedData
Fully processed Latest full
period
Hadoop job
takes this long
for this data
- 61. © 2014 MapR Technologies 65
t
now
Hadoop works great
back here
Storm
works
here
Real-time and Long-time together
Blended viewBlended viewBlended View
- 62. © 2014 MapR Technologies 66
Who I am
• Ted Dunning, Chief Application Architect, MapR
tdunning@mapr.com
tdunning@apache.org
@ted_dunning
• Committer, mentor, champion, PMC member on several Apache
projects
• Mahout, Drill, Zookeeper others
- 63. © 2014 MapR Technologies 67
© MapR Technologies, confidential
Editor's Notes
- USE THIS
- USE THIS