SlideShare a Scribd company logo
1 of 49
1
Some Simple Math for
Anomaly Detection
#Monitorama PDX
2014.05.05
Toufic Boubez, Ph.D.
Co-Founder, CTO
Metafor Software
toufic@metaforsoftware.com
@tboubez
3
Preamble
• I lied!
– There are no “simple” tricks
– If it’s too good to be true, it probably is
• I usually beat up on parametric, Gaussian, supervised techniques
– This talk is to show some alternatives
– Only enough time to cover a couple of relatively simple but very useful
techniques
– Oh, and I will still beat up on the usual suspects
• Adrian and James are right! Listen to them! 
– What’s the point of collecting all that data if you can’t get useful information
out of it!?
• Note: real data
• Note: no y-axis labels on charts – on purpose!!
• Note to self: remember to SLOW DOWN!
• Note to self: mention the cats!! Everybody loves cats!!
4
• Co-Founder/CTO Metafor Software
• Co-Founder/CTO Layer 7 Technologies
– Acquired by Computer Associates in 2013
– I escaped 
• Co-Founder/CTO Saffron Technology
• IBM Chief Architect for SOA
• Co-Author, Co-Editor: WS-Trust, WS-
SecureConversation, WS-Federation, WS-Policy
• Building large scale software systems for >20
years (I’m older than I look, I know!)
Toufic intro – who I am
5
Wall of Charts™
6
The WoC side-effects: alert fatigue
“Alert fatigue is the single
biggest problem we have
right now … We need to be
more intelligent about our
alerts or we’ll all go insane.”
- John Vincent (@lusis)
(#monitoringsucks)
7
Watching screens cannot scale + it’s useless
8
Gotta turn things over to the machines
9
TO THE RESCUE: Anomaly Detection!!
• Anomaly detection (also known as outlier
detection) is the search for items or events
which do not conform to an expected pattern.
[Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A
survey". ACM Computing Surveys 41 (3): 1]
• For devops: Need to know when one or more
of our metrics is going wonky
10
Attempt #1: thresholds …
• Roots in manufacturing process QC
11
… are based on Gaussian distributions
• Make assumptions about probability
distributions and process behaviour
– Data is normally distributed with a useful and
usable mean and standard deviation
– Data is probabilistically “stationary”
12
Three-Sigma Rule
• Three-sigma rule
– ~68% of the values lie within 1 std deviation of the mean
– ~95% of the values lie within 2 std deviations
– 99.73% of the values lie within 3 std deviations: anything
else is an outlier
13
Aaahhhh
• The mysterious red lines explained
14
Stationary Gaussian distributions are powerful
• Because far far in the future, in a galaxy far far
away:
– I can make the same predictions because the
statistical properties of the data haven’t changed
– I can easily compare different metrics since they
have similar statistical properties
• Let’s do this!!
• BUT…
• Cue in DRAMATIC MUSIC
15
Then THIS happens
16
3-sigma rule alerts
17
Or worse, THIS happens!
18
3-sigma rule alerts
19
WTF!? So what gives!?
• Remember this?
20
Histogram – probability distribution
21
Histogram – probability distribution
22
Attempts #2, #3, etc: mo’ better thresholds
• Static thresholds ineffective on dynamic data
– Thresholds use the mean as predictor and alert if
data falls more than 3 sigma outside the mean
• Need “moving” or “adaptive” thresholds:
– Value of mean changes with time to
accommodate new data values/trends
23
Moving Averages “big idea”
• At any point in time in a well-behaved time
series, your next value should not significantly
deviate from the general trend of your data
• Mean as a predictor is too static, relies on too
much past data (ALL of the data!)
• Instead of overall mean use a finite window of
past values, predict most likely next value
• Alert if actual value “significantly” (3 sigmas?)
deviates from predicted value
24
Moving Averages typical method
• Generate a “smoothed” version of the time series
– Average over a sliding (moving) window
• Compute the squared error between raw series
and its smoothed version
• Compute a new effective standard deviation by
smoothing the squared error
• Generate a moving threshold:
– Outliers are 3-sigma outside the new, smoothed data!
• Ta-da!
25
Simple and Weighted Moving Averages
• Simple Moving Average
– Average of last N values in your time series
• S[t] <- sum(X[t-(N-1):t])/N
– Each value in the window contributes equally to
prediction
– …INCLUDING spikes and outliers
• Weigthed Moving Average
– Similar to SMA but assigns linearly (arithmetically)
decreasing weights to every value in the window
– Older values contribute less to the prediction
26
Exponential Smoothing techniques
• Exponential Smoothing
– Similar to weighted average, but with weights decay
exponentially over the whole set of historic samples
• S[t]=αX[t-1] + (1-α)S[t-1]
– Does not deal with trends in data
• DES
– In addition to data smoothing factor (α), introduces a trend
smoothing factor (β)
– Better at dealing with trending
– Does not deal with seasonality in data
• TES, Holt-Winters
– Introduces additional seasonality factor
– … and so on
27
Let’s look at an example
28
Holt-Winters predictions
29
A harder example
30
Exponential smoothing predictions
31
Hmmmm, so are we doomed?
• No!
• ALL smoothing predictive methods work best
with normally distributed data!
• But there are lots of other non-Gaussian
based techniques
– We can only scratch the surface in this talk
32
Trick #1: Histogram!
33
THIS is normal
34
This isn’t
35
Neither is this
36
Trick #2: Kolmogorov-Smirnov test
• Non-parametric test
– Compare two probability
distributions
– Makes no assumptions (e.g.
Gaussian) about the
distributions of the samples
– Measures maximum
distance between
cumulative distributions
– Can be used to compare
periodic/seasonal metric
periods (e.g. day-to-day or
week-to-week)
http://en.wikipedia.org/wiki/Kolmogorov%E2%
80%93Smirnov_test
37
KS with windowing
38
39
40
41
42
43
KS Test on difficult data
44
Trick #3: Diffing/Derivatives
• Often, even when the data itself is not
stationary, its derivatives tends to be!
• Most frequently, first difference is sufficient:
dS(t) <- S(t+1) – S(t)
• Can then perform some analytics on first
difference
45
CPU time series
46
Its first difference – possible random walk?
47
We’re not doomed, but: Know your data!!
• You need to understand the statistical properties
of your data, and where it comes from, in order
to determine what kind of analytics to use.
– Your data is very important!
– You spend time collecting it so spend time analyzing
it!
• A large amount of data center data is non-
Gaussian
– Guassian statistics won’t work
– Use appropriate techniques
48
More?
• Only scratched the surface
• I want to talk more about algorithms, analytics,
current issues, etc, in more depth, but time’s up!!
– Come talk to me or email me if interested.
• Thank you!
toufic@metaforsoftware.com
@tboubez
49
Oh yeah, and we’re hiring!
In Vancouver, BC

More Related Content

What's hot

Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp
Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsAppScaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp
Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp
mustafa sarac
 
Debugging PySpark - PyCon US 2018
Debugging PySpark -  PyCon US 2018Debugging PySpark -  PyCon US 2018
Debugging PySpark - PyCon US 2018
Holden Karau
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 

What's hot (20)

Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 
Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp
Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsAppScaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp
Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp
 
Kanban VS Scrum
Kanban VS ScrumKanban VS Scrum
Kanban VS Scrum
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Debugging PySpark - PyCon US 2018
Debugging PySpark -  PyCon US 2018Debugging PySpark -  PyCon US 2018
Debugging PySpark - PyCon US 2018
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Intro to Kanban - AgileDayChile2011 Keynote
Intro to Kanban - AgileDayChile2011 KeynoteIntro to Kanban - AgileDayChile2011 Keynote
Intro to Kanban - AgileDayChile2011 Keynote
 
Team Topologies - how and why to design your teams - AllDayDevOps 2017
Team Topologies - how and why to design your teams - AllDayDevOps 2017Team Topologies - how and why to design your teams - AllDayDevOps 2017
Team Topologies - how and why to design your teams - AllDayDevOps 2017
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
 
Heart of agile by Pierre Hervouet
Heart of agile by Pierre HervouetHeart of agile by Pierre Hervouet
Heart of agile by Pierre Hervouet
 

Viewers also liked

devops - what's missing? what's next?
devops - what's missing? what's next?devops - what's missing? what's next?
devops - what's missing? what's next?
Andrew Shafer
 

Viewers also liked (20)

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
devops - what's missing? what's next?
devops - what's missing? what's next?devops - what's missing? what's next?
devops - what's missing? what's next?
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.com
 
Calculus - St. Petersburg Electrotechnical University "LETI"
Calculus - St. Petersburg Electrotechnical University "LETI"Calculus - St. Petersburg Electrotechnical University "LETI"
Calculus - St. Petersburg Electrotechnical University "LETI"
 
9주차
9주차9주차
9주차
 
Pearson y sperman
Pearson y spermanPearson y sperman
Pearson y sperman
 
Hs Industrial Insights Manufacturing
Hs Industrial Insights   ManufacturingHs Industrial Insights   Manufacturing
Hs Industrial Insights Manufacturing
 
Estadística: Cálculos SPSS
Estadística: Cálculos SPSSEstadística: Cálculos SPSS
Estadística: Cálculos SPSS
 
Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)
 
Exploratory Statistics with R
Exploratory Statistics with RExploratory Statistics with R
Exploratory Statistics with R
 
disenos experimentales
disenos experimentalesdisenos experimentales
disenos experimentales
 
Mineral processing-design-and-operation, gupta
Mineral processing-design-and-operation, guptaMineral processing-design-and-operation, gupta
Mineral processing-design-and-operation, gupta
 
Jk (1)
Jk (1)Jk (1)
Jk (1)
 
JKSimMet Course - Part 1
JKSimMet Course - Part 1JKSimMet Course - Part 1
JKSimMet Course - Part 1
 
The Binomial, Poisson, and Normal Distributions
The Binomial, Poisson, and Normal DistributionsThe Binomial, Poisson, and Normal Distributions
The Binomial, Poisson, and Normal Distributions
 
Presentación del curso,agosto,18,2014
Presentación del curso,agosto,18,2014Presentación del curso,agosto,18,2014
Presentación del curso,agosto,18,2014
 
Estadística: Revisión Estadística
Estadística: Revisión EstadísticaEstadística: Revisión Estadística
Estadística: Revisión Estadística
 
Clase 4 diseños de bloques - final
Clase 4   diseños de bloques - finalClase 4   diseños de bloques - final
Clase 4 diseños de bloques - final
 

Similar to Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
tboubez
 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
tboubez
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins
rgveroniki
 

Similar to Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05 (20)

Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
forecasting model
forecasting modelforecasting model
forecasting model
 
Chapters 14 and 15 presentation
Chapters 14 and 15 presentationChapters 14 and 15 presentation
Chapters 14 and 15 presentation
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
Multivariate Analysis
Multivariate AnalysisMultivariate Analysis
Multivariate Analysis
 
Multivariate Analysis.ppt
Multivariate Analysis.pptMultivariate Analysis.ppt
Multivariate Analysis.ppt
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 
Statistics for analytics
Statistics for analyticsStatistics for analytics
Statistics for analytics
 
lec21.VAE_1.pdf
lec21.VAE_1.pdflec21.VAE_1.pdf
lec21.VAE_1.pdf
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins
 
4 26 2013 1 IME 674 Quality Assurance Reliability EXAM TERM PROJECT INFO...
4 26 2013 1 IME 674  Quality Assurance   Reliability EXAM   TERM PROJECT INFO...4 26 2013 1 IME 674  Quality Assurance   Reliability EXAM   TERM PROJECT INFO...
4 26 2013 1 IME 674 Quality Assurance Reliability EXAM TERM PROJECT INFO...
 
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningAnomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Spc training
Spc training Spc training
Spc training
 

Recently uploaded

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 

Simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

  • 1. 1
  • 2. Some Simple Math for Anomaly Detection #Monitorama PDX 2014.05.05 Toufic Boubez, Ph.D. Co-Founder, CTO Metafor Software toufic@metaforsoftware.com @tboubez
  • 3. 3 Preamble • I lied! – There are no “simple” tricks – If it’s too good to be true, it probably is • I usually beat up on parametric, Gaussian, supervised techniques – This talk is to show some alternatives – Only enough time to cover a couple of relatively simple but very useful techniques – Oh, and I will still beat up on the usual suspects • Adrian and James are right! Listen to them!  – What’s the point of collecting all that data if you can’t get useful information out of it!? • Note: real data • Note: no y-axis labels on charts – on purpose!! • Note to self: remember to SLOW DOWN! • Note to self: mention the cats!! Everybody loves cats!!
  • 4. 4 • Co-Founder/CTO Metafor Software • Co-Founder/CTO Layer 7 Technologies – Acquired by Computer Associates in 2013 – I escaped  • Co-Founder/CTO Saffron Technology • IBM Chief Architect for SOA • Co-Author, Co-Editor: WS-Trust, WS- SecureConversation, WS-Federation, WS-Policy • Building large scale software systems for >20 years (I’m older than I look, I know!) Toufic intro – who I am
  • 6. 6 The WoC side-effects: alert fatigue “Alert fatigue is the single biggest problem we have right now … We need to be more intelligent about our alerts or we’ll all go insane.” - John Vincent (@lusis) (#monitoringsucks)
  • 7. 7 Watching screens cannot scale + it’s useless
  • 8. 8 Gotta turn things over to the machines
  • 9. 9 TO THE RESCUE: Anomaly Detection!! • Anomaly detection (also known as outlier detection) is the search for items or events which do not conform to an expected pattern. [Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys 41 (3): 1] • For devops: Need to know when one or more of our metrics is going wonky
  • 10. 10 Attempt #1: thresholds … • Roots in manufacturing process QC
  • 11. 11 … are based on Gaussian distributions • Make assumptions about probability distributions and process behaviour – Data is normally distributed with a useful and usable mean and standard deviation – Data is probabilistically “stationary”
  • 12. 12 Three-Sigma Rule • Three-sigma rule – ~68% of the values lie within 1 std deviation of the mean – ~95% of the values lie within 2 std deviations – 99.73% of the values lie within 3 std deviations: anything else is an outlier
  • 13. 13 Aaahhhh • The mysterious red lines explained
  • 14. 14 Stationary Gaussian distributions are powerful • Because far far in the future, in a galaxy far far away: – I can make the same predictions because the statistical properties of the data haven’t changed – I can easily compare different metrics since they have similar statistical properties • Let’s do this!! • BUT… • Cue in DRAMATIC MUSIC
  • 17. 17 Or worse, THIS happens!
  • 19. 19 WTF!? So what gives!? • Remember this?
  • 22. 22 Attempts #2, #3, etc: mo’ better thresholds • Static thresholds ineffective on dynamic data – Thresholds use the mean as predictor and alert if data falls more than 3 sigma outside the mean • Need “moving” or “adaptive” thresholds: – Value of mean changes with time to accommodate new data values/trends
  • 23. 23 Moving Averages “big idea” • At any point in time in a well-behaved time series, your next value should not significantly deviate from the general trend of your data • Mean as a predictor is too static, relies on too much past data (ALL of the data!) • Instead of overall mean use a finite window of past values, predict most likely next value • Alert if actual value “significantly” (3 sigmas?) deviates from predicted value
  • 24. 24 Moving Averages typical method • Generate a “smoothed” version of the time series – Average over a sliding (moving) window • Compute the squared error between raw series and its smoothed version • Compute a new effective standard deviation by smoothing the squared error • Generate a moving threshold: – Outliers are 3-sigma outside the new, smoothed data! • Ta-da!
  • 25. 25 Simple and Weighted Moving Averages • Simple Moving Average – Average of last N values in your time series • S[t] <- sum(X[t-(N-1):t])/N – Each value in the window contributes equally to prediction – …INCLUDING spikes and outliers • Weigthed Moving Average – Similar to SMA but assigns linearly (arithmetically) decreasing weights to every value in the window – Older values contribute less to the prediction
  • 26. 26 Exponential Smoothing techniques • Exponential Smoothing – Similar to weighted average, but with weights decay exponentially over the whole set of historic samples • S[t]=αX[t-1] + (1-α)S[t-1] – Does not deal with trends in data • DES – In addition to data smoothing factor (α), introduces a trend smoothing factor (β) – Better at dealing with trending – Does not deal with seasonality in data • TES, Holt-Winters – Introduces additional seasonality factor – … and so on
  • 27. 27 Let’s look at an example
  • 31. 31 Hmmmm, so are we doomed? • No! • ALL smoothing predictive methods work best with normally distributed data! • But there are lots of other non-Gaussian based techniques – We can only scratch the surface in this talk
  • 36. 36 Trick #2: Kolmogorov-Smirnov test • Non-parametric test – Compare two probability distributions – Makes no assumptions (e.g. Gaussian) about the distributions of the samples – Measures maximum distance between cumulative distributions – Can be used to compare periodic/seasonal metric periods (e.g. day-to-day or week-to-week) http://en.wikipedia.org/wiki/Kolmogorov%E2% 80%93Smirnov_test
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. 42
  • 43. 43 KS Test on difficult data
  • 44. 44 Trick #3: Diffing/Derivatives • Often, even when the data itself is not stationary, its derivatives tends to be! • Most frequently, first difference is sufficient: dS(t) <- S(t+1) – S(t) • Can then perform some analytics on first difference
  • 46. 46 Its first difference – possible random walk?
  • 47. 47 We’re not doomed, but: Know your data!! • You need to understand the statistical properties of your data, and where it comes from, in order to determine what kind of analytics to use. – Your data is very important! – You spend time collecting it so spend time analyzing it! • A large amount of data center data is non- Gaussian – Guassian statistics won’t work – Use appropriate techniques
  • 48. 48 More? • Only scratched the surface • I want to talk more about algorithms, analytics, current issues, etc, in more depth, but time’s up!! – Come talk to me or email me if interested. • Thank you! toufic@metaforsoftware.com @tboubez
  • 49. 49 Oh yeah, and we’re hiring! In Vancouver, BC