Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

Some Simple Math for
Anomaly Detection
#Monitorama PDX
2014.05.05
Toufic Boubez, Ph.D.
Co-Founder, CTO
Metafor Software
toufic@metaforsoftware.com
@tboubez

3
Preamble
• I lied!
– There are no “simple” tricks
– If it’s too good to be true, it probably is
• I usually beat up on parametric, Gaussian, supervised techniques
– This talk is to show some alternatives
– Only enough time to cover a couple of relatively simple but very useful
techniques
– Oh, and I will still beat up on the usual suspects
• Adrian and James are right! Listen to them! 
– What’s the point of collecting all that data if you can’t get useful information
out of it!?
• Note: real data
• Note: no y-axis labels on charts – on purpose!!
• Note to self: remember to SLOW DOWN!
• Note to self: mention the cats!! Everybody loves cats!!

4
• Co-Founder/CTO Metafor Software
• Co-Founder/CTO Layer 7 Technologies
– Acquired by Computer Associates in 2013
– I escaped 
• Co-Founder/CTO Saffron Technology
• IBM Chief Architect for SOA
• Co-Author, Co-Editor: WS-Trust, WS-
SecureConversation, WS-Federation, WS-Policy
• Building large scale software systems for >20
years (I’m older than I look, I know!)
Toufic intro – who I am

6
The WoC side-effects: alert fatigue
“Alert fatigue is the single
biggest problem we have
right now … We need to be
more intelligent about our
alerts or we’ll all go insane.”
- John Vincent (@lusis)
(#monitoringsucks)

7
Watching screens cannot scale + it’s useless

8
Gotta turn things over to the machines

9
TO THE RESCUE: Anomaly Detection!!
• Anomaly detection (also known as outlier
detection) is the search for items or events
which do not conform to an expected pattern.
[Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A
survey". ACM Computing Surveys 41 (3): 1]
• For devops: Need to know when one or more
of our metrics is going wonky

10
Attempt #1: thresholds …
• Roots in manufacturing process QC

11
… are based on Gaussian distributions
• Make assumptions about probability
distributions and process behaviour
– Data is normally distributed with a useful and
usable mean and standard deviation
– Data is probabilistically “stationary”

12
Three-Sigma Rule
• Three-sigma rule
– ~68% of the values lie within 1 std deviation of the mean
– ~95% of the values lie within 2 std deviations
– 99.73% of the values lie within 3 std deviations: anything
else is an outlier

13
Aaahhhh
• The mysterious red lines explained

14
Stationary Gaussian distributions are powerful
• Because far far in the future, in a galaxy far far
away:
– I can make the same predictions because the
statistical properties of the data haven’t changed
– I can easily compare different metrics since they
have similar statistical properties
• Let’s do this!!
• BUT…
• Cue in DRAMATIC MUSIC

19
WTF!? So what gives!?
• Remember this?

20
Histogram – probability distribution

21
Histogram – probability distribution

22
Attempts #2, #3, etc: mo’ better thresholds
• Static thresholds ineffective on dynamic data
– Thresholds use the mean as predictor and alert if
data falls more than 3 sigma outside the mean
• Need “moving” or “adaptive” thresholds:
– Value of mean changes with time to
accommodate new data values/trends

23
Moving Averages “big idea”
• At any point in time in a well-behaved time
series, your next value should not significantly
deviate from the general trend of your data
• Mean as a predictor is too static, relies on too
much past data (ALL of the data!)
• Instead of overall mean use a finite window of
past values, predict most likely next value
• Alert if actual value “significantly” (3 sigmas?)
deviates from predicted value

24
Moving Averages typical method
• Generate a “smoothed” version of the time series
– Average over a sliding (moving) window
• Compute the squared error between raw series
and its smoothed version
• Compute a new effective standard deviation by
smoothing the squared error
• Generate a moving threshold:
– Outliers are 3-sigma outside the new, smoothed data!
• Ta-da!

25
Simple and Weighted Moving Averages
• Simple Moving Average
– Average of last N values in your time series
• S[t] <- sum(X[t-(N-1):t])/N
– Each value in the window contributes equally to
prediction
– …INCLUDING spikes and outliers
• Weigthed Moving Average
– Similar to SMA but assigns linearly (arithmetically)
decreasing weights to every value in the window
– Older values contribute less to the prediction

26
Exponential Smoothing techniques
• Exponential Smoothing
– Similar to weighted average, but with weights decay
exponentially over the whole set of historic samples
• S[t]=αX[t-1] + (1-α)S[t-1]
– Does not deal with trends in data
• DES
– In addition to data smoothing factor (α), introduces a trend
smoothing factor (β)
– Better at dealing with trending
– Does not deal with seasonality in data
• TES, Holt-Winters
– Introduces additional seasonality factor
– … and so on

30
Exponential smoothing predictions

31
Hmmmm, so are we doomed?
• No!
• ALL smoothing predictive methods work best
with normally distributed data!
• But there are lots of other non-Gaussian
based techniques
– We can only scratch the surface in this talk

36
Trick #2: Kolmogorov-Smirnov test
• Non-parametric test
– Compare two probability
distributions
– Makes no assumptions (e.g.
Gaussian) about the
distributions of the samples
– Measures maximum
distance between
cumulative distributions
– Can be used to compare
periodic/seasonal metric
periods (e.g. day-to-day or
week-to-week)
http://en.wikipedia.org/wiki/Kolmogorov%E2%
80%93Smirnov_test

44
Trick #3: Diffing/Derivatives
• Often, even when the data itself is not
stationary, its derivatives tends to be!
• Most frequently, first difference is sufficient:
dS(t) <- S(t+1) – S(t)
• Can then perform some analytics on first
difference

46
Its first difference – possible random walk?

47
We’re not doomed, but: Know your data!!
• You need to understand the statistical properties
of your data, and where it comes from, in order
to determine what kind of analytics to use.
– Your data is very important!
– You spend time collecting it so spend time analyzing
it!
• A large amount of data center data is non-
Gaussian
– Guassian statistics won’t work
– Use appropriate techniques

48
More?
• Only scratched the surface
• I want to talk more about algorithms, analytics,
current issues, etc, in more depth, but time’s up!!
– Come talk to me or email me if interested.
• Thank you!
toufic@metaforsoftware.com
@tboubez

49
Oh yeah, and we’re hiring!
In Vancouver, BC

Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

Similar to Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05 (20)

Recently uploaded

Recently uploaded (20)

Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05