Simple math to get signal out of your data noise - Anomaly Detection - Toufic Boubez - Metafor Software - Velocity Santa Clara 2014-06-25

1
Some Simple Math to Get Signal
out of your data noise
#VelocityConf
25.06.2014
Toufic Boubez, Ph.D.
Co-Founder, CTO
Metafor Software
toufic@metaforsoftware.com

2
Preamble
• I lied: There is no “simple” math for Anomaly Detection!
• I usually beat up on parametric, Gaussian, supervised techniques
– This talk is to show some alternatives
– Only enough time to cover a couple (four, really) of relatively simple
but very useful techniques
– Oh, and I will actually still start up by beating up on the usual suspects,
but don’t despair, there’s good stuff towards the end
• Note: all real data
• Note: no y-axis labels on charts – on purpose!!
• Note to self: remember to SLOW DOWN!
• Note to self: mention the cats!! Everybody loves cats!!

3
• Co-Founder/CTO Metafor Software
• Co-Founder/CTO Layer 7 Technologies
– Acquired by Computer Associates in 2013
– I escaped 
• Co-Founder/CTO Saffron Technology
• IBM Chief Architect for SOA/Web Services
• Co-Author, Co-Editor: WS-Trust, WS-
SecureConversation, WS-Federation, WS-Policy
• Building large scale software systems for >20
years (I’m older than I look, I know!)
Toufic intro – who I am

5
The WoC side-effects: alert fatigue
“Alert fatigue is the single biggest problem we
have right now … We need to be more
intelligent about our alerts or we’ll all go
insane.”
- John Vincent (@lusis)
(#monitoringsucks)
We have forensic tools for
analytics after the fact BUT
we need to KNOW that
something has happened!
We need alerts!

6
Watching screens cannot scale

7
Time to turn things over to the machines!

8
Attempt #1: static thresholds …
• Roots in manufacturing process QC

9
… are based on Gaussian distributions
• Make assumptions about probability
distributions and process behaviour
– Data is normally distributed with a useful and
usable mean and standard deviation
– Data is probabilistically “stationary”

10
Three-Sigma Rule
• Three-sigma rule
– ~68% of the values lie within 1 std deviation of the mean
– ~95% of the values lie within 2 std deviations
– 99.73% of the values lie within 3 std deviations: anything
else is an outlier

11
Aaahhhh
• The mysterious red lines explained
mean
3s
3s

12
Stationary Gaussian distributions are powerful
• Because far far in the future, in a galaxy far far
away:
– I can make the same predictions because the
statistical properties of the data haven’t changed
– I can easily compare different metrics since they
have similar statistical properties
• Let’s do this!!
• BUT…
• Cue in DRAMATIC MUSIC

17
WTF!? So what gives!?
• Remember this?

18
Histogram – probability distribution

19
Histogram – probability distribution

20
Attempts #2, #3, etc: mo’ better thresholds
• Static thresholds ineffective on dynamic data
– Thresholds use the (static) mean as predictor and
alert if data falls more than 3 sigma away
• Need “moving” or “adaptive” thresholds:
– Value of mean changes with time to
accommodate new data values, trends, periodicity

21
Moving Averages “big idea”
• At any point in time in a well-behaved time
series, your next value should not significantly
deviate from the general trend of your data
• Mean as a predictor is too static, relies on too
much past data (ALL of the data!)
• Instead of overall mean use a finite window of
past values, predict most likely next value
• Alert if actual value “significantly” (3 sigmas?)
deviates from predicted value

22
Moving Averages typical method
• Generate a “smoothed” version of the time series
– Average over a sliding (moving) window
• Compute the squared error between raw series
and its smoothed version
• Compute a new effective standard deviation
(sigma’) by smoothing the squared error
• Generate a moving threshold:
– Outliers are 3-sigma’ outside the new, smoothed data!
• Ta-da!

23
Simple and Weighted Moving Averages
• Simple Moving Average
– Average of last N values in your time series
• S[t] <- sum(X[t-(N-1):t])/N
– Each value in the window contributes equally to
prediction
– …INCLUDING spikes and outliers
• Weigthed Moving Average
– Similar to SMA but assigns linearly (arithmetically)
decreasing weights to every value in the window
– Older values contribute less to the prediction

24
Exponential Smoothing techniques
• Exponential Smoothing
– Similar to weighted average, but with weights decay
exponentially over the whole set of historic samples
• S[t]=αX[t-1] + (1-α)S[t-1]
– Does not deal with trends in data
• DES
– In addition to data smoothing factor (α), introduces a trend
smoothing factor (β)
– Better at dealing with trending
– Does not deal with seasonality in data
• TES, Holt-Winters
– Introduces additional seasonality factor
– … and so on

28
Exponential smoothing predictions

29
Hmmmm, so are we doomed?
• No!
• ALL smoothing predictive methods work best
with normally distributed data!
• But there are lots of other non-Gaussian
based techniques
– We can only scratch the surface in this talk

34
Trick #2: Kolmogorov-Smirnov test
• Non-parametric test
– Compare two probability
distributions
– Makes no assumptions (e.g.
Gaussian) about the
distributions of the samples
– Measures maximum
distance between
cumulative distributions
– Can be used to compare
periodic/seasonal metric
periods (e.g. day-to-day or
week-to-week)
http://en.wikipedia.org/wiki/Kolmogorov%E2%
80%93Smirnov_test

42
Trick #3: Box Plots / Tukey
• Again, need non-parametric method:
– Does not rely on mean and standard deviation
• When you can’t count on good old Gaussian:
– Median is always a great alternative to the mean
– Quartiles are an alternative to standard deviation
• Q1 = 25% Quartile (25% of the data)
• Q2 = 50% Quartile == Median (50% of the data)
• Q3 = 75% Quartile (75% of the data)
• Interquartile Range (IQR) = Q3 – Q1

43
Example: box plots and fences for a Gaussian
http://en.wikipedia.org/wiki/Interquartile_range

44
IQR method for streaming time series
• IQR method works well for some non-normal
distributions
– Generates continuously adaptive fences at
(Q1 - 1.5xIRQ) and (Q3 + 1.5xIQR)
– Adjusted box plot uses fences at
(Q1 - 1.5xIRQ) and (Q3 + 1.5x IQR)
• Method:
– As time series is streaming, for every window:
• Re-compute quartiles
• Re-compute IQR, fences
• Determine if any outliers
• Repeat

49
Trick #4: Diffing/Derivatives
• Often, even when the data itself is not
stationary, its derivatives tends to be!
• Most frequently, first difference is sufficient:
dS(t) <- S(t+1) – S(t)
• Can then perform some analytics on first
difference

51
Its first difference – possible random walk?

52
Trick #5: Neural Networks
• Really?
• No time – To Be Continued!

53
We’re not doomed, but: Know your data!!
• You need to understand the statistical properties
of your data, and where it comes from, in order
to determine what kind of analytics to use.
– Your data is very important!
– You spend time collecting it so spend time analyzing
it!
• A large amount of data center data is non-
Gaussian
– Guassian statistics won’t work
– Use appropriate techniques

54
More?
• Only scratched the surface
• I want to talk more about algorithms, analytics,
current issues, etc, in more depth, but time’s up!!
– Come talk to me or email me if interested.
• Office Hour: Tomorrow at 11:30 Booth #801
• Thank you!
toufic@metaforsoftware.com
@tboubez

55
Oh yeah, and we’re hiring!
In Vancouver, Canada

Simple math to get signal out of your data noise - Anomaly Detection - Toufic Boubez - Metafor Software - Velocity Santa Clara 2014-06-25

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Recently uploaded

Recently uploaded (20)

Simple math to get signal out of your data noise - Anomaly Detection - Toufic Boubez - Metafor Software - Velocity Santa Clara 2014-06-25