Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Some Simple Math for
Anomaly Detection
#Monitorama PDX
2014.05.05
Toufic Boubez, Ph.D.
Co-Founder, CTO
Metafor Software
to...
3
Preamble
• I lied!
– There are no “simple” tricks
– If it’s too good to be true, it probably is
• I usually beat up on p...
4
• Co-Founder/CTO Metafor Software
• Co-Founder/CTO Layer 7 Technologies
– Acquired by Computer Associates in 2013
– I es...
5
Wall of Charts™
6
The WoC side-effects: alert fatigue
“Alert fatigue is the single
biggest problem we have
right now … We need to be
more ...
7
Watching screens cannot scale + it’s useless
8
Gotta turn things over to the machines
9
TO THE RESCUE: Anomaly Detection!!
• Anomaly detection (also known as outlier
detection) is the search for items or even...
10
Attempt #1: thresholds …
• Roots in manufacturing process QC
11
… are based on Gaussian distributions
• Make assumptions about probability
distributions and process behaviour
– Data i...
12
Three-Sigma Rule
• Three-sigma rule
– ~68% of the values lie within 1 std deviation of the mean
– ~95% of the values li...
13
Aaahhhh
• The mysterious red lines explained
14
Stationary Gaussian distributions are powerful
• Because far far in the future, in a galaxy far far
away:
– I can make ...
15
Then THIS happens
16
3-sigma rule alerts
17
Or worse, THIS happens!
18
3-sigma rule alerts
19
WTF!? So what gives!?
• Remember this?
20
Histogram – probability distribution
21
Histogram – probability distribution
22
Attempts #2, #3, etc: mo’ better thresholds
• Static thresholds ineffective on dynamic data
– Thresholds use the mean a...
23
Moving Averages “big idea”
• At any point in time in a well-behaved time
series, your next value should not significant...
24
Moving Averages typical method
• Generate a “smoothed” version of the time series
– Average over a sliding (moving) win...
25
Simple and Weighted Moving Averages
• Simple Moving Average
– Average of last N values in your time series
• S[t] <- su...
26
Exponential Smoothing techniques
• Exponential Smoothing
– Similar to weighted average, but with weights decay
exponent...
27
Let’s look at an example
28
Holt-Winters predictions
29
A harder example
30
Exponential smoothing predictions
31
Hmmmm, so are we doomed?
• No!
• ALL smoothing predictive methods work best
with normally distributed data!
• But there...
32
Trick #1: Histogram!
33
THIS is normal
34
This isn’t
35
Neither is this
36
Trick #2: Kolmogorov-Smirnov test
• Non-parametric test
– Compare two probability
distributions
– Makes no assumptions ...
37
KS with windowing
38
39
40
41
42
43
KS Test on difficult data
44
Trick #3: Diffing/Derivatives
• Often, even when the data itself is not
stationary, its derivatives tends to be!
• Most...
45
CPU time series
46
Its first difference – possible random walk?
47
We’re not doomed, but: Know your data!!
• You need to understand the statistical properties
of your data, and where it ...
48
More?
• Only scratched the surface
• I want to talk more about algorithms, analytics,
current issues, etc, in more dept...
49
Oh yeah, and we’re hiring!
In Vancouver, BC
Upcoming SlideShare
Loading in …5
×

of

You’ve finished this document.
Download and read it offline.
Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 1 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 2 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 3 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 4 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 5 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 6 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 7 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 8 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 9 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 10 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 11 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 12 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 13 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 14 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 15 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 16 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 17 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 18 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 19 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 20 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 21 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 22 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 23 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 24 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 25 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 26 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 27 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 28 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 29 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 30 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 31 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 32 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 33 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 34 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 35 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 36 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 37 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 38 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 39 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 40 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 41 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 42 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 43 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 44 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 45 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 46 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 47 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 48 Simple math for anomaly detection   toufic boubez - metafor software - monitorama pdx 2014-05-05 Slide 49
Upcoming SlideShare
Simple math to get signal out of your data noise - Anomaly Detection - Toufic Boubez - Metafor Software - Velocity Santa Clara 2014-06-25
Next
Download to read offline and view in fullscreen.

17

Share

Download to read offline

Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

Download to read offline

This is my presentation at Monitorama PDX in Portland on May 5, 2014

Simple math to get some signal out of your noisy sea of data

You’ve instrumented your system and application to the hilt. You can now “measure all the things”. Your team has set up thousands of metrics collecting millions of data points a day. Now what?

Most IT ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this mountain of data and extracting signal from the noise is not easy. The choice of what analytic method to use ranges from simple statistical analysis to sophisticated machine learning techniques. And one algorithm doesn’t fit all data.

Related Books

Free with a 30 day trial from Scribd

See all

Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

  1. 1. 1
  2. 2. Some Simple Math for Anomaly Detection #Monitorama PDX 2014.05.05 Toufic Boubez, Ph.D. Co-Founder, CTO Metafor Software toufic@metaforsoftware.com @tboubez
  3. 3. 3 Preamble • I lied! – There are no “simple” tricks – If it’s too good to be true, it probably is • I usually beat up on parametric, Gaussian, supervised techniques – This talk is to show some alternatives – Only enough time to cover a couple of relatively simple but very useful techniques – Oh, and I will still beat up on the usual suspects • Adrian and James are right! Listen to them!  – What’s the point of collecting all that data if you can’t get useful information out of it!? • Note: real data • Note: no y-axis labels on charts – on purpose!! • Note to self: remember to SLOW DOWN! • Note to self: mention the cats!! Everybody loves cats!!
  4. 4. 4 • Co-Founder/CTO Metafor Software • Co-Founder/CTO Layer 7 Technologies – Acquired by Computer Associates in 2013 – I escaped  • Co-Founder/CTO Saffron Technology • IBM Chief Architect for SOA • Co-Author, Co-Editor: WS-Trust, WS- SecureConversation, WS-Federation, WS-Policy • Building large scale software systems for >20 years (I’m older than I look, I know!) Toufic intro – who I am
  5. 5. 5 Wall of Charts™
  6. 6. 6 The WoC side-effects: alert fatigue “Alert fatigue is the single biggest problem we have right now … We need to be more intelligent about our alerts or we’ll all go insane.” - John Vincent (@lusis) (#monitoringsucks)
  7. 7. 7 Watching screens cannot scale + it’s useless
  8. 8. 8 Gotta turn things over to the machines
  9. 9. 9 TO THE RESCUE: Anomaly Detection!! • Anomaly detection (also known as outlier detection) is the search for items or events which do not conform to an expected pattern. [Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys 41 (3): 1] • For devops: Need to know when one or more of our metrics is going wonky
  10. 10. 10 Attempt #1: thresholds … • Roots in manufacturing process QC
  11. 11. 11 … are based on Gaussian distributions • Make assumptions about probability distributions and process behaviour – Data is normally distributed with a useful and usable mean and standard deviation – Data is probabilistically “stationary”
  12. 12. 12 Three-Sigma Rule • Three-sigma rule – ~68% of the values lie within 1 std deviation of the mean – ~95% of the values lie within 2 std deviations – 99.73% of the values lie within 3 std deviations: anything else is an outlier
  13. 13. 13 Aaahhhh • The mysterious red lines explained
  14. 14. 14 Stationary Gaussian distributions are powerful • Because far far in the future, in a galaxy far far away: – I can make the same predictions because the statistical properties of the data haven’t changed – I can easily compare different metrics since they have similar statistical properties • Let’s do this!! • BUT… • Cue in DRAMATIC MUSIC
  15. 15. 15 Then THIS happens
  16. 16. 16 3-sigma rule alerts
  17. 17. 17 Or worse, THIS happens!
  18. 18. 18 3-sigma rule alerts
  19. 19. 19 WTF!? So what gives!? • Remember this?
  20. 20. 20 Histogram – probability distribution
  21. 21. 21 Histogram – probability distribution
  22. 22. 22 Attempts #2, #3, etc: mo’ better thresholds • Static thresholds ineffective on dynamic data – Thresholds use the mean as predictor and alert if data falls more than 3 sigma outside the mean • Need “moving” or “adaptive” thresholds: – Value of mean changes with time to accommodate new data values/trends
  23. 23. 23 Moving Averages “big idea” • At any point in time in a well-behaved time series, your next value should not significantly deviate from the general trend of your data • Mean as a predictor is too static, relies on too much past data (ALL of the data!) • Instead of overall mean use a finite window of past values, predict most likely next value • Alert if actual value “significantly” (3 sigmas?) deviates from predicted value
  24. 24. 24 Moving Averages typical method • Generate a “smoothed” version of the time series – Average over a sliding (moving) window • Compute the squared error between raw series and its smoothed version • Compute a new effective standard deviation by smoothing the squared error • Generate a moving threshold: – Outliers are 3-sigma outside the new, smoothed data! • Ta-da!
  25. 25. 25 Simple and Weighted Moving Averages • Simple Moving Average – Average of last N values in your time series • S[t] <- sum(X[t-(N-1):t])/N – Each value in the window contributes equally to prediction – …INCLUDING spikes and outliers • Weigthed Moving Average – Similar to SMA but assigns linearly (arithmetically) decreasing weights to every value in the window – Older values contribute less to the prediction
  26. 26. 26 Exponential Smoothing techniques • Exponential Smoothing – Similar to weighted average, but with weights decay exponentially over the whole set of historic samples • S[t]=αX[t-1] + (1-α)S[t-1] – Does not deal with trends in data • DES – In addition to data smoothing factor (α), introduces a trend smoothing factor (β) – Better at dealing with trending – Does not deal with seasonality in data • TES, Holt-Winters – Introduces additional seasonality factor – … and so on
  27. 27. 27 Let’s look at an example
  28. 28. 28 Holt-Winters predictions
  29. 29. 29 A harder example
  30. 30. 30 Exponential smoothing predictions
  31. 31. 31 Hmmmm, so are we doomed? • No! • ALL smoothing predictive methods work best with normally distributed data! • But there are lots of other non-Gaussian based techniques – We can only scratch the surface in this talk
  32. 32. 32 Trick #1: Histogram!
  33. 33. 33 THIS is normal
  34. 34. 34 This isn’t
  35. 35. 35 Neither is this
  36. 36. 36 Trick #2: Kolmogorov-Smirnov test • Non-parametric test – Compare two probability distributions – Makes no assumptions (e.g. Gaussian) about the distributions of the samples – Measures maximum distance between cumulative distributions – Can be used to compare periodic/seasonal metric periods (e.g. day-to-day or week-to-week) http://en.wikipedia.org/wiki/Kolmogorov%E2% 80%93Smirnov_test
  37. 37. 37 KS with windowing
  38. 38. 38
  39. 39. 39
  40. 40. 40
  41. 41. 41
  42. 42. 42
  43. 43. 43 KS Test on difficult data
  44. 44. 44 Trick #3: Diffing/Derivatives • Often, even when the data itself is not stationary, its derivatives tends to be! • Most frequently, first difference is sufficient: dS(t) <- S(t+1) – S(t) • Can then perform some analytics on first difference
  45. 45. 45 CPU time series
  46. 46. 46 Its first difference – possible random walk?
  47. 47. 47 We’re not doomed, but: Know your data!! • You need to understand the statistical properties of your data, and where it comes from, in order to determine what kind of analytics to use. – Your data is very important! – You spend time collecting it so spend time analyzing it! • A large amount of data center data is non- Gaussian – Guassian statistics won’t work – Use appropriate techniques
  48. 48. 48 More? • Only scratched the surface • I want to talk more about algorithms, analytics, current issues, etc, in more depth, but time’s up!! – Come talk to me or email me if interested. • Thank you! toufic@metaforsoftware.com @tboubez
  49. 49. 49 Oh yeah, and we’re hiring! In Vancouver, BC
  • TimonSchrder

    May. 7, 2018
  • lebm

    Jul. 4, 2017
  • BiyingTAN

    Jun. 10, 2017
  • briankgood

    Jan. 5, 2017
  • NelsonMacy

    Nov. 14, 2016
  • AdrianCPrelipcean

    Oct. 23, 2016
  • DavidAlisch

    Jun. 21, 2016
  • uzy_exe

    Jun. 17, 2016
  • PetroRudenko

    Dec. 28, 2015
  • AlexandraVecherskaya

    Nov. 2, 2015
  • vvajdic

    Jun. 26, 2015
  • hexaddikt

    Apr. 24, 2015
  • Eniod

    Nov. 2, 2014
  • cniclsh

    Sep. 19, 2014
  • deniszh

    Jul. 20, 2014
  • fqrouter

    May. 24, 2014
  • tehmasp

    May. 11, 2014

This is my presentation at Monitorama PDX in Portland on May 5, 2014 Simple math to get some signal out of your noisy sea of data You’ve instrumented your system and application to the hilt. You can now “measure all the things”. Your team has set up thousands of metrics collecting millions of data points a day. Now what? Most IT ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this mountain of data and extracting signal from the noise is not easy. The choice of what analytic method to use ranges from simple statistical analysis to sophisticated machine learning techniques. And one algorithm doesn’t fit all data.

Views

Total views

8,341

On Slideshare

0

From embeds

0

Number of embeds

2,419

Actions

Downloads

120

Shares

0

Comments

0

Likes

17

×