Mathematical bridges From Old to New

®
© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Ted Dunning

®
Steps in Anomaly Detection
•  Build a model: Collect and process data for training a model
•  Use the machine learning model to determine what is the normal
pattern
•  Decide how far away from this normal pattern you’ll consider to
be anomalous
•  Use the AD model to detect anomalies in new data
–  Methods such as clustering for discovery can be helpful

®
How hard is it to set an alert for anomalies?
Grey data is from normal events; x’s are anomalies.
Where would you set the threshold?

®
Basic idea: 
Set adaptive thresholds

®
What Are We Really Doing
•  We want action when something breaks
(dies/falls over/otherwise gets in trouble)
•  But action is expensive
•  So we don’t want too many false alarms
•  And we don’t want too many false negatives
•  What’s the right threshold to set for alerts?
–  We need to trade off costs

®
A Second Look

®
A Second Look
99.9%-ile

®
Cool algorithm: t-digest

®
Online
Summarizer
99.9%-ile
t
x > t ? Alarm !
x
How Hard Can it Be?

®
Using t-Digest
•  The t-digest is an on-line percentile estimator
–  very high accuracy for extreme tails
•  t-digest also available everywhere
–  in ElasticSearch, in Solr
–  in streamlib (open source library on github)
–  in Mahout Math (open source library on github)
–  standalone (github and Maven Central)
•  Very handy for general distributions, few assumptions
•  For latency, exponential binning may be useful
–  See, for instance, hdrhistorgram

®
So are we all done?

®
What About This?
0 5 10 15
−20246810
offset+noise+pulse1+pulse2
A
B

®
Model Delta Anomaly Detection
Online
Summarizer
δ > t ?
99.9%-ile
t
Alarm !
Model
-
+ δ

®
Spot the Anomaly
Anomaly?

®
Maybe not!

®
Where’s Waldo?
This is the real
anomaly

®
Normal Isn’t Just Normal
•  What we want is a model of what is normal
•  What doesn’t fit the model is the anomaly
•  For simple signals, the model can be simple …
•  The real world is rarely so accommodating
x ~ m(t)+ N(0,ε)

®
We Do Windows

®
Windows on the World
•  The set of windowed signals is a nice model of our original signal
•  Clustering can find the prototypes
–  Fancier techniques available using sparse coding
•  The result is a dictionary of shapes
•  New signals can be encoded by shifting, scaling and adding
shapes from the dictionary

®
Most Common Shapes (for EKG)

®
Reconstructed signal
Original
signal
Reconstructed
signal
Reconstruction
error
< 1 bit / sample

®
An Anomaly
Original technique for finding
1-d anomaly works against
reconstruction error

®
Close-up of anomaly
Not what you want your
heart to do.
And not what the model
expects it to do.

®
A Different Kind of Anomaly

®
Model Delta Anomaly Detection
Online
Summarizer
δ > t ?
99.9%-ile
t
Alarm !
Model
-
+ δ

®
The Real Inside Scoop
•  The model-delta anomaly detector is really just a sum of random
variables
–  the model we know about already
–  and a normally distributed error
•  The output (delta) is (roughly) the log probability of the sum
distribution (really δ2)
•  Thinking about probability distributions is good

®
Some k-means Caveats
•  But Eamonn Keogh says that k-means can’t work on time-series
•  That is silly … and kind of correct, k-means does have limits
–  Other kinds of auto-encoders are much more powerful
•  More fun and code demos at
–  https://github.com/tdunning/k-means-auto-encoder
http://www.cs.ucr.edu/~eamonn/meaningless.pdf
Clustering of Time Series Subsequences is Meaningless:
Implications for Previous and Future Research
Eamonn Keogh Jessica Lin
Computer Science & Engineering Department
University of California - Riverside
Riverside, CA 92521
{eamonn, jessica}@cs.ucr.edu
Abstract
Given the recent explosion of interest in streaming data and online algorithms, clustering of time series
subsequences, extracted via a sliding window, has received much attention. In this work we make a
surprising claim. Clustering of time series subsequences is meaningless. More concretely, clusters extracted
from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by
any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random.
While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has
never appeared in the literature. We can justify calling our claim surprising, since it invalidates the
contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative

®
The Limits of Clustering as Auto-encoder
•  Clustering is like trying to tile your sample distribution
•  Can be used to approximate a signal
•  Filling d dimensional region with k clusters should give
•  If d is large, this is no good
ε ≈ 1/ kd

®
0 500 1000 1500 2000
−2−1012
Time series training data (first 2000 samples)
Time
●
●
●
Test data
Reconstruction
Error

®
●
●
●
●
●
●
0 500 1000 1500 2000
0.000.050.100.15
Reconstruction error for time−series data
Centroids
MAVError
●
●
●
●
●
●
●
●
Training data
Held−out data

®
Another Example
•  Take points randomly in , project non-linearly into
•  Approximation using clustering should give

®
●
●
●
●
●
●
●
●
0 500 1000 1500 2000
0.00.51.01.52.0
Reconstruction error for random points
Centroids
Error
●
●
●
●
●
●
●
●
●
●
Training data
Held−out data

®
●
●
●
●
●
●
●
●
0 500 1000 1500 2000
0.00.51.01.52.0
Error is approximately cube root of k
k
Error ●
●
Actual
Cube root model

®
Moral For Auto-encoders
•  The simplest auto-encoders can be good models
•  For more complex spaces/signals, more elaborate models may
be required
–  Winner take (absolutely) all may be problematic
–  In particular, models that allow sparse linear combination may be better
•  Consider deep learning, recurrent networks, denoising

®
How Does Clustering Do Reconstruction?
x1 x2
...
xn-1 xn
Input
For normalized cluster centroids,
dot-product and distance are equivalent

®
x1 x2
...
xn-1 xn
Input
Winner takes all with k-means

®
x1 x2
...
xn-1 xn
x'1 x'2
...
x'n-1 x'n
Input
Hidden layer
(clusters)
Reconstruction
Dot-product scales
centroid to reconstruct

®
AKA - Neural Network
x1 x2
...
xn-1 xn
x'1 x'2
...
x'n-1 x'n
Input
Hidden layer
(clusters)
Reconstruction

®
What If … We Had More Layers?
...
...
...
...
... ... ... ... ...
... ... ... ... ...
A
B
A'

®
Other Thoughts
•  What if we allow more than one cluster to be active?
–  k-sparse learning!

®
Other Thoughts

®
Other Thoughts
•  Well, almost

®
Summary
•  Start with philosophy
–  Anomaly detection is finding normal, then finding discrepancy
•  Model the world with probabilities
–  Realistic probabilistic models and statistical inference are optimal
•  Very simple techniques can extend easily to very fancy ones

®
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)

®
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams

®
Thank you for coming today!

®
bit.ly/big-data-science-june-2016
Find my slides & other related materials to this talk here:
or search:

®
…helping you put data technology to work
●  Find answers
●  Ask technical questions
●  Join on-demand training course
discussions
●  Follow release announcements
●  Share and vote on product ideas
●  Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com

Mathematical bridges From Old to New

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mathematical bridges From Old to New

Similar to Mathematical bridges From Old to New (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

Mathematical bridges From Old to New