SlideShare a Scribd company logo
Anomaly Detection Analytics for the
Data Centre
devopsdays Vancouver
25 October 2013
Toufic Boubez, Ph.D.
Co-Founder, CTO
Metafor Software
Toufic intro – who I am
• Co-Founder/CTO Metafor Software
• Co-Founder/CTO Layer 7 Technologies
– Acquired by Computer Associates in 2013
– I escaped 

• Co-Founder/CTO Saffron Technology
• Chief Architect IBM (SOA)
• Building large scale software systems for 20
years (I’m older than I look, I know!)
2
Why this talk?
• April: devopsdays Austin: Open Space talk
– Blog: http://metaforsoftware.com/beyond-the-prettycharts-a-report-from-devopsdays-in-austin/

• June: devopsdays Silicon Valley presentation:
– Five major lessons learned

• Explore issues mentioned in June
•
•
•
•

Note: real data
Note: no labels on charts – on purpose!!
Note to self: remember to SLOW DOWN!
Note to self: mention the cats!! Everybody loves cats!!

3
Wall of Charts™

4
The Wall of Charts side-effects
Alert Overload

Metrics Overload

“Alert fatigue is the single
biggest problem we have
right now … We need to be
more intelligent about our
alerts or we’ll all go insane.”
- John
Vincent, Monitorama, March
2013

5
Need mo’ better alerting
– So what if my unicorn usage is at 89-91%, and has been stable?
– I’d much rather know if it’s at 60% and has been rapidly increasing

– Static thresholds and rules won’t help you in this case
– Need some intelligent Anomaly Detection mechanism

6
Anomaly Detection for DevOps
• Anomaly detection (also known as outlier
detection) is the search for items or events
which do not conform to an expected pattern.
[Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A
survey". ACM Computing Surveys 41 (3): 1]

• For devops: Need to know when one or more
of our metrics is going wonky

7
#monitoringsucks vs #iheartmonitoring
• Proper monitoring tools should give us all the
information we need to be PROACTIVE
– But they don’t

• Current monitoring tools assume that the
underlying system is relatively static
– Surround it with static thresholds and rules.
– Good for detecting catastrophic events but not
much else
– BUT WHY!!??
8
“Traditional” analytics …
• Roots in manufacturing process QC

9
… are based on Gaussian distributions
• Makes assumptions about probability
distributions and process behaviour
– Usually assumes data is normally distributed with
a useful and usable mean and standard deviation

• Blah blah blah what does it mean?

10
What’s normal!!??

11
Distribution Schmistribution

12
Three-Sigma Rule
• Three-sigma rule
– ~68% of the values lie within 1 std deviation of the mean
– ~95% of the values lie within 2 std deviations
– 99.73% of the values lie within 3 std deviations

13
Aaahhhh
• The mysterious red lines explained

14
Moving Averages for detecting outliers
• Big idea:
– Based on past values, predict most likely next value
– Alert if actual value “significantly” deviates from
predicted value

• Simple Moving Average
– Average of last N values in your time series
• S[t] <- sum(X[t-(N-1):t])/N

– Each value in the window contributes equally to
prediction
– Idea is that your next value should not significantly
deviate from the general trend of your data
15
Weighted Moving Average
• Weigthed Moving Average
– Similar to SMA but assigns linearly (arithmetically)
decreasing weights to every value in the window
– Older values contribute less to the prediction

• Neither SMA or WMA deal well with
periodicity in your data

16
Exponential Smoothing
• Exponential Smoothing
– Similar to weighted average, but with weights decay
exponentially over the whole set of historic samples
• S[t]=αX[t-1] + (1-α)S[t-1]

– Is as almost as bad as moving averages in dealing with
periodicity and trending time series!!

• DES: Holt-Winters
– In addition to data smoothing factor (α), introduces a
trend smoothing factor (β)
– Better at dealing with periodicity and trending

• ALL assume Gaussian!
17
Gaussian distributions are powerful because:
• Far far in the future, in a galaxy far far away:
– I can make the same predictions because the
statistical properties of the data haven’t changed
– I can compare different metrics since they have
similar statistical properties

• BUT…
• Cue in DRAMATIC MUSIC
18
What’s my distribution?

19
Another common distribution

20
Let’s look at an example

21
Histogram – probability distribution

22
3-sigma rule

23
Holt-Winters predictions

24
Are we doomed?
• There’s A LOT you can do with the data, other
than just looking at it and putting thresholds!
– Adaptive Mixture of Gaussians
– Non-parametric techniques
(http://www.metaforsoftware.com/everythingyou-should-know-about-anomaly-detectionknow-your-data-parametric-or-non-parametric/)
– Spectral analysis

25
Mixture of Gaussians

26
We’re not doomed, but: Know your data!!
• You need to understand the statistical
properties of your data, and where it comes
from, in order to determine what kind of
analytics to use.
• A large amount of data center data is nonGaussian
– Guassian statistics won’t work
– Use appropriate techniques
27
Pet Peeve #1: How much data do we need?
• Trend towards higher and higher sampling
rates in data collection
• Reminds me of Jorge Luis Borges’ story about
Funes the Memorious
– Perfect recollection of the slightest details of every
instant of his life, but lost the ability for
abstraction

• Our brain works on abstraction
– We notice patterns BECAUSE we can abstract
28
The danger of over-abstraction

+
= comfortable?
29
So, how much data DO you need?
• You don’t need more resolution that twice
your highest frequency (Nyquist-Shanon
sampling theorem)
• Most of the algorithms for analytics will
smooth, average, filter, and pre-process the
data.
• Watch out for correlated metrics (e.g. used vs.
available memory)
30
Think: Is all data important to collect?
• Two camps:
– Data is data, let’s collect and analyze everything and
figure out the trends.
– Not all data is important, so let’s figure out what’s
important first and understand the underlying model
so we don’t waste resources on the rest.

• Similar to the very public bun fight between
Noam Chomsky and Peter Norvig
– http://norvig.com/chomsky.html

• Unresolved as far as I know 
31
Do we need both metrics?

32
More?
• Only scratched the surface
• I want to talk more about analytics, in more
depth, but time’s up!!
– (Actually Jenny won’t let me)

• Come talk to me during the breaks!
• Thank you!

33

More Related Content

Similar to Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
tboubez
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
tboubez
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
tboubez
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
Turi, Inc.
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
Anusuya123
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
bcantrill
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
AlbertoLugoGonzalez
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
Manzur Ashraf
 
Agile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control MathsAgile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control Maths
Axelisys Limited
 
EDA
EDAEDA
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
PallabiSahoo5
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
Quinton Anderson
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
Connor McDonald
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2Daria Bogdanova
 
ch 2 Tools of Research.docx
ch 2 Tools of Research.docxch 2 Tools of Research.docx
ch 2 Tools of Research.docx
ssuserf200491
 
Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB Biology
Eran Earland
 
Management by data
Management by dataManagement by data
Management by data
Luca Foresti
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
Jen Stirrup
 

Similar to Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25 (20)

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Intro scikitlearnstatsmodels
Intro scikitlearnstatsmodelsIntro scikitlearnstatsmodels
Intro scikitlearnstatsmodels
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
 
Agile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control MathsAgile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control Maths
 
EDA
EDAEDA
EDA
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
 
ch 2 Tools of Research.docx
ch 2 Tools of Research.docxch 2 Tools of Research.docx
ch 2 Tools of Research.docx
 
Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB Biology
 
Management by data
Management by dataManagement by data
Management by data
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25

  • 1. Anomaly Detection Analytics for the Data Centre devopsdays Vancouver 25 October 2013 Toufic Boubez, Ph.D. Co-Founder, CTO Metafor Software
  • 2. Toufic intro – who I am • Co-Founder/CTO Metafor Software • Co-Founder/CTO Layer 7 Technologies – Acquired by Computer Associates in 2013 – I escaped  • Co-Founder/CTO Saffron Technology • Chief Architect IBM (SOA) • Building large scale software systems for 20 years (I’m older than I look, I know!) 2
  • 3. Why this talk? • April: devopsdays Austin: Open Space talk – Blog: http://metaforsoftware.com/beyond-the-prettycharts-a-report-from-devopsdays-in-austin/ • June: devopsdays Silicon Valley presentation: – Five major lessons learned • Explore issues mentioned in June • • • • Note: real data Note: no labels on charts – on purpose!! Note to self: remember to SLOW DOWN! Note to self: mention the cats!! Everybody loves cats!! 3
  • 5. The Wall of Charts side-effects Alert Overload Metrics Overload “Alert fatigue is the single biggest problem we have right now … We need to be more intelligent about our alerts or we’ll all go insane.” - John Vincent, Monitorama, March 2013 5
  • 6. Need mo’ better alerting – So what if my unicorn usage is at 89-91%, and has been stable? – I’d much rather know if it’s at 60% and has been rapidly increasing – Static thresholds and rules won’t help you in this case – Need some intelligent Anomaly Detection mechanism 6
  • 7. Anomaly Detection for DevOps • Anomaly detection (also known as outlier detection) is the search for items or events which do not conform to an expected pattern. [Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys 41 (3): 1] • For devops: Need to know when one or more of our metrics is going wonky 7
  • 8. #monitoringsucks vs #iheartmonitoring • Proper monitoring tools should give us all the information we need to be PROACTIVE – But they don’t • Current monitoring tools assume that the underlying system is relatively static – Surround it with static thresholds and rules. – Good for detecting catastrophic events but not much else – BUT WHY!!?? 8
  • 9. “Traditional” analytics … • Roots in manufacturing process QC 9
  • 10. … are based on Gaussian distributions • Makes assumptions about probability distributions and process behaviour – Usually assumes data is normally distributed with a useful and usable mean and standard deviation • Blah blah blah what does it mean? 10
  • 13. Three-Sigma Rule • Three-sigma rule – ~68% of the values lie within 1 std deviation of the mean – ~95% of the values lie within 2 std deviations – 99.73% of the values lie within 3 std deviations 13
  • 14. Aaahhhh • The mysterious red lines explained 14
  • 15. Moving Averages for detecting outliers • Big idea: – Based on past values, predict most likely next value – Alert if actual value “significantly” deviates from predicted value • Simple Moving Average – Average of last N values in your time series • S[t] <- sum(X[t-(N-1):t])/N – Each value in the window contributes equally to prediction – Idea is that your next value should not significantly deviate from the general trend of your data 15
  • 16. Weighted Moving Average • Weigthed Moving Average – Similar to SMA but assigns linearly (arithmetically) decreasing weights to every value in the window – Older values contribute less to the prediction • Neither SMA or WMA deal well with periodicity in your data 16
  • 17. Exponential Smoothing • Exponential Smoothing – Similar to weighted average, but with weights decay exponentially over the whole set of historic samples • S[t]=αX[t-1] + (1-α)S[t-1] – Is as almost as bad as moving averages in dealing with periodicity and trending time series!! • DES: Holt-Winters – In addition to data smoothing factor (α), introduces a trend smoothing factor (β) – Better at dealing with periodicity and trending • ALL assume Gaussian! 17
  • 18. Gaussian distributions are powerful because: • Far far in the future, in a galaxy far far away: – I can make the same predictions because the statistical properties of the data haven’t changed – I can compare different metrics since they have similar statistical properties • BUT… • Cue in DRAMATIC MUSIC 18
  • 21. Let’s look at an example 21
  • 22. Histogram – probability distribution 22
  • 25. Are we doomed? • There’s A LOT you can do with the data, other than just looking at it and putting thresholds! – Adaptive Mixture of Gaussians – Non-parametric techniques (http://www.metaforsoftware.com/everythingyou-should-know-about-anomaly-detectionknow-your-data-parametric-or-non-parametric/) – Spectral analysis 25
  • 27. We’re not doomed, but: Know your data!! • You need to understand the statistical properties of your data, and where it comes from, in order to determine what kind of analytics to use. • A large amount of data center data is nonGaussian – Guassian statistics won’t work – Use appropriate techniques 27
  • 28. Pet Peeve #1: How much data do we need? • Trend towards higher and higher sampling rates in data collection • Reminds me of Jorge Luis Borges’ story about Funes the Memorious – Perfect recollection of the slightest details of every instant of his life, but lost the ability for abstraction • Our brain works on abstraction – We notice patterns BECAUSE we can abstract 28
  • 29. The danger of over-abstraction + = comfortable? 29
  • 30. So, how much data DO you need? • You don’t need more resolution that twice your highest frequency (Nyquist-Shanon sampling theorem) • Most of the algorithms for analytics will smooth, average, filter, and pre-process the data. • Watch out for correlated metrics (e.g. used vs. available memory) 30
  • 31. Think: Is all data important to collect? • Two camps: – Data is data, let’s collect and analyze everything and figure out the trends. – Not all data is important, so let’s figure out what’s important first and understand the underlying model so we don’t waste resources on the rest. • Similar to the very public bun fight between Noam Chomsky and Peter Norvig – http://norvig.com/chomsky.html • Unresolved as far as I know  31
  • 32. Do we need both metrics? 32
  • 33. More? • Only scratched the surface • I want to talk more about analytics, in more depth, but time’s up!! – (Actually Jenny won’t let me) • Come talk to me during the breaks! • Thank you! 33

Editor's Notes

  1. TOUFIC = “WHAT WE DO!”