SlideShare a Scribd company logo
1 of 43
Beyond Pretty Charts
Analytics for the Cloud Infrastructure
Velocity Europe 2013
Toufic Boubez, Ph.D.
Co-Founder, CTO
Metafor Software
toufic@metaforsoftware.com
@tboubez
Toufic intro – who I am
• Co-Founder/CTO Metafor Software
• Co-Founder/CTO Layer 7 Technologies
– Acquired by Computer Associates in 2013
– I escaped 

• Co-Founder/CTO Saffron Technology
• IBM Chief Architect for SOA
• Co-Author, Co-Editor: WS-Trust, WSSecureConversation, WS-Federation, WS-Policy
• Building large scale software systems for 20 years
(I’m older than I look, I know!)
2
Genesis of this talk
• Evolving from various conference presentations
– Blog:http://www.metaforsoftware.com/category/ano
maly-detection-101/
– Many briefly mentioned issues, never explored
– Needed more details and examples
•
•
•
•

Note: real data
Note: no y-axis labels on charts – on purpose!!
Note to self: remember to SLOW DOWN!
Note to self: mention the cats!! Everybody loves cats!!

3
Wall of Charts™

4
The WoC side-effects: alert fatigue
“Alert fatigue is the single
biggest problem we have
right now … We need to be
more intelligent about our
alerts or we’ll all go insane.”
- John Vincent (@lusis)
(#monitoringsucks)

5
The fallacy of thresholds
• So what if my unicorn usage is at 89-91%, and has been stable?
• I’d much rather know if it’s at 60% and has been rapidly increasing

• Static thresholds and rules won’t help you in this case
6
Work smarter not harder
• We don’t need more metrics
• We don’t need more thresholds and rules
• We DO need better, smarter tools

7
TO THE RESCUE: Anomaly Detection!!
• Anomaly detection (also known as outlier
detection) is the search for items or events
which do not conform to an expected pattern.
[Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A
survey". ACM Computing Surveys 41 (3): 1]

• For devops: Need to know when one or more
of our metrics is going wonky

8
#monitoringsucks vs #i monitoring
• Proper monitoring tools should give us all the
information we need to be PROACTIVE
– But they don’t

• Current monitoring tools assume that the
underlying system is relatively static
– Surround it with static thresholds and rules.
– Good for detecting catastrophic events but not
much else
– WHY!!??
9
“Traditional” analytics …
• Roots in manufacturing process QC

10
… are based on Gaussian distributions
• Make assumptions about probability
distributions and process behaviour
– Usually assume data is normally distributed
with a useful and usable mean and standard
deviation

11
What’s normal!!??

12
THIS is normal

13
Three-Sigma Rule
• Three-sigma rule
– ~68% of the values lie within 1 std deviation of the mean
– ~95% of the values lie within 2 std deviations
– 99.73% of the values lie within 3 std deviations: anything
else is an outlier

14
Aaahhhh
• The mysterious red lines explained

15
The four horsemen
• Four horsemen of the modelpocalypse™ 
[Abe Stanway & Jon Cowie http://www.slideshare.net/jonlives/bring-thenoise]

– Seasonality
– Spike influence
– Normality
– Parameters

16
Moving Averages for detecting outliers
• Moving Averages “Big idea”:
– At any point in time in a well-behaved time series,
your next value should not significantly deviate
from the general trend of your data
– Mean as a predictor is too static, relies on too
much past data (ALL of the data!)
– Instead of overall mean use a finite window of
past values, predict most likely next value
– Alert if actual value “significantly” (3 sigmas?)
deviates from predicted value
17
Simple and Weighted Moving Averages
• Simple Moving Average
– Average of last N values in your time series
• S[t] <- sum(X[t-(N-1):t])/N

– Each value in the window contributes equally to
prediction
– …INCLUDING spikes and outliers

• Weigthed Moving Average
– Similar to SMA but assigns linearly (arithmetically)
decreasing weights to every value in the window
– Older values contribute less to the prediction
18
Exponential Smoothing
• Exponential Smoothing
– Similar to weighted average, but with weights decay exponentially
over the whole set of historic samples
• S[t]=αX[t-1] + (1-α)S[t-1]

– Does not deal with trends in data

• DES
– In addition to data smoothing factor (α), introduces a trend smoothing
factor (β)
– Better at dealing with trending
– Does not deal with seasonality in data

• TES, Holt-Winters
– Introduces additional seasonality factor
– … and so on

• ALL assume Gaussian!

19
Gaussian distributions are powerful because:
• Far far in the future, in a galaxy far far away:
– I can make the same predictions because the
statistical properties of the data haven’t changed
– I can easily compare different metrics since they
have similar statistical properties

• BUT…
• Cue in DRAMATIC MUSIC
20
What’s my distribution?

21
Another common distribution

22
Let’s look at an example

23
3-sigma rule

24
Holt-Winters predictions

25
Histogram – probability distribution

26
Another example

27
3-sigma rule

28
Holt-Winters predictions

29
Histogram – probability distribution

30
Are we doomed?
• No!
• There are lots of other non-Gaussian based
techniques:
– Adaptive Mixture of Gaussians
– Non-parametric techniques
(http://www.metaforsoftware.com/everythingyou-should-know-about-anomaly-detectionknow-your-data-parametric-or-non-parametric/)
– Spectral analysis
31
Kolmogorov-Smirnov test
• Non-parametric test
– Compare two probability
distributions
– Makes no assumptions (e.g.
Gaussian) about the
distributions of the samples
– Measures maximum
distance between
cumulative distributions
– Can be used to compare
periodic/seasonal metric
periods (e.g. day-to-day or
week-to-week)

http://en.wikipedia.org/wiki/Kolmogorov%E2%
80%93Smirnov_test

32
KS test with bootstrap

33
What about slow trends?

34
KS test on slow memory leak

35
Histogram – probability distribution

36
We’re not doomed, but: Know your data!!
• You need to understand the statistical
properties of your data, and where it comes
from, in order to determine what kind of
analytics to use.
• A large amount of data center data is nonGaussian
– Guassian statistics won’t work
– Use appropriate techniques
37
Pet Peeve: How much data do we need?
• Trend towards higher and higher sampling
rates in data collection
• Reminds me of Jorge Luis Borges’ story about
Funes the Memorious
– Perfect recollection of the slightest details of every
instant of his life, but lost the ability for
abstraction

• Our brain works on abstraction
– We notice patterns BECAUSE we can abstract
38
The danger of over-abstraction

+
= comfortable?
39
So, how much data DO you need?
• You don’t need more resolution that twice
your highest frequency (Nyquist-Shanon
sampling theorem)
• Most of the algorithms for analytics will
smooth, average, filter, and pre-process the
data.
• Watch out for correlated metrics (e.g. used vs.
available memory)
40
Think: Is all data important to collect?
• Two camps:
– Data is data, let’s collect and analyze everything and
figure out the trends.
– Not all data is important, so let’s figure out what’s
important first and understand the underlying model
so we don’t waste resources on the rest.

• Similar to the very public bun fight between
Noam Chomsky and Peter Norvig
– http://norvig.com/chomsky.html

• Unresolved as far as I know 
41
Shout out to etsy
• Check out kale:
• Check out kale for some analytics:
– http://codeascraft.com/2013/06/11/introducingkale/
– https://github.com/etsy/skyline/blob/master/src/
analyzer/algorithms.py

42
More?
• Only scratched the surface
• I want to talk more about algorithms, analytics,
current issues, etc, in more depth, but time’s up!!
– Go back in time to me Office Hours session, or
– Come talk to me or email me if interested.

• Thank you!
toufic@metaforsoftware.com
@tboubez
43

More Related Content

Viewers also liked

Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012
Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012
Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012Tim Morrow
 
Velocity EU 2013 What is the velocity of an unladen swallow?
Velocity EU 2013 What is the velocity of an unladen swallow?Velocity EU 2013 What is the velocity of an unladen swallow?
Velocity EU 2013 What is the velocity of an unladen swallow?pdyball
 
Performance and Metrics at Lonely Planet
Performance and Metrics at Lonely PlanetPerformance and Metrics at Lonely Planet
Performance and Metrics at Lonely PlanetMark Jennings
 
Data viz as_interface_makoto_inoue
Data viz as_interface_makoto_inoueData viz as_interface_makoto_inoue
Data viz as_interface_makoto_inoueMakoto Inoue
 
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?Andy Davies
 
Bring the Noise
Bring the NoiseBring the Noise
Bring the NoiseJon Cowie
 
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...MeasureWorks
 
Velocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youVelocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youPatrick Meenan
 
Integrating multiple CDNs at Etsy
Integrating multiple CDNs at EtsyIntegrating multiple CDNs at Etsy
Integrating multiple CDNs at EtsyLaurie Denness
 
Getting 100B Metrics to Disk
Getting 100B Metrics to DiskGetting 100B Metrics to Disk
Getting 100B Metrics to Diskjthurman42
 
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...James Wickett
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsJohn Allspaw
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Velocity 2013 london developer-friendly web performance testing in continuou...
Velocity 2013 london  developer-friendly web performance testing in continuou...Velocity 2013 london  developer-friendly web performance testing in continuou...
Velocity 2013 london developer-friendly web performance testing in continuou...Michael Klepikov
 
What HTTP/2.0 Will Do For You
What HTTP/2.0 Will Do For YouWhat HTTP/2.0 Will Do For You
What HTTP/2.0 Will Do For YouMark Nottingham
 
Anomaly Detection for Security
Anomaly Detection for SecurityAnomaly Detection for Security
Anomaly Detection for SecurityCody Rioux
 
The Dark of Building an Production Incident Syste
The Dark of Building an Production Incident SysteThe Dark of Building an Production Incident Syste
The Dark of Building an Production Incident SysteAlois Reitbauer
 
Traffic anomaly detection and attack
Traffic anomaly detection and attackTraffic anomaly detection and attack
Traffic anomaly detection and attackQrator Labs
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsManojit Nandi
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 

Viewers also liked (20)

Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012
Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012
Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012
 
Velocity EU 2013 What is the velocity of an unladen swallow?
Velocity EU 2013 What is the velocity of an unladen swallow?Velocity EU 2013 What is the velocity of an unladen swallow?
Velocity EU 2013 What is the velocity of an unladen swallow?
 
Performance and Metrics at Lonely Planet
Performance and Metrics at Lonely PlanetPerformance and Metrics at Lonely Planet
Performance and Metrics at Lonely Planet
 
Data viz as_interface_makoto_inoue
Data viz as_interface_makoto_inoueData viz as_interface_makoto_inoue
Data viz as_interface_makoto_inoue
 
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
 
Bring the Noise
Bring the NoiseBring the Noise
Bring the Noise
 
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
 
Velocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youVelocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and you
 
Integrating multiple CDNs at Etsy
Integrating multiple CDNs at EtsyIntegrating multiple CDNs at Etsy
Integrating multiple CDNs at Etsy
 
Getting 100B Metrics to Disk
Getting 100B Metrics to DiskGetting 100B Metrics to Disk
Getting 100B Metrics to Disk
 
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Velocity 2013 london developer-friendly web performance testing in continuou...
Velocity 2013 london  developer-friendly web performance testing in continuou...Velocity 2013 london  developer-friendly web performance testing in continuou...
Velocity 2013 london developer-friendly web performance testing in continuou...
 
What HTTP/2.0 Will Do For You
What HTTP/2.0 Will Do For YouWhat HTTP/2.0 Will Do For You
What HTTP/2.0 Will Do For You
 
Anomaly Detection for Security
Anomaly Detection for SecurityAnomaly Detection for Security
Anomaly Detection for Security
 
The Dark of Building an Production Incident Syste
The Dark of Building an Production Incident SysteThe Dark of Building an Production Incident Syste
The Dark of Building an Production Incident Syste
 
Traffic anomaly detection and attack
Traffic anomaly detection and attackTraffic anomaly detection and attack
Traffic anomaly detection and attack
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 

Similar to Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastructure.

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...tboubez
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...tboubez
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...tboubez
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
I love the smell of data in the morning (getting started with data science) ...
I love the smell of data in the morning (getting started with data science)  ...I love the smell of data in the morning (getting started with data science)  ...
I love the smell of data in the morning (getting started with data science) ...Troy Magennis
 
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...
Simulation in Social Sciences -  Lecture 6 in Introduction to Computational S...Simulation in Social Sciences -  Lecture 6 in Introduction to Computational S...
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...Lauri Eloranta
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTPConnor McDonald
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptxAnusuya123
 
Management by data
Management by dataManagement by data
Management by dataLuca Foresti
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modellingQuinton Anderson
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemapsbcantrill
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Chapters 14 and 15 presentation
Chapters 14 and 15 presentationChapters 14 and 15 presentation
Chapters 14 and 15 presentationWilliam Perkins
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 

Similar to Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastructure. (20)

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
 
SQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMiningSQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMining
 
Intro scikitlearnstatsmodels
Intro scikitlearnstatsmodelsIntro scikitlearnstatsmodels
Intro scikitlearnstatsmodels
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
I love the smell of data in the morning (getting started with data science) ...
I love the smell of data in the morning (getting started with data science)  ...I love the smell of data in the morning (getting started with data science)  ...
I love the smell of data in the morning (getting started with data science) ...
 
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...
Simulation in Social Sciences -  Lecture 6 in Introduction to Computational S...Simulation in Social Sciences -  Lecture 6 in Introduction to Computational S...
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
 
Management by data
Management by dataManagement by data
Management by data
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Chapters 14 and 15 presentation
Chapters 14 and 15 presentationChapters 14 and 15 presentation
Chapters 14 and 15 presentation
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Data visualization
Data visualizationData visualization
Data visualization
 
data analysis.ppt
data analysis.pptdata analysis.ppt
data analysis.ppt
 

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastructure.

  • 1. Beyond Pretty Charts Analytics for the Cloud Infrastructure Velocity Europe 2013 Toufic Boubez, Ph.D. Co-Founder, CTO Metafor Software toufic@metaforsoftware.com @tboubez
  • 2. Toufic intro – who I am • Co-Founder/CTO Metafor Software • Co-Founder/CTO Layer 7 Technologies – Acquired by Computer Associates in 2013 – I escaped  • Co-Founder/CTO Saffron Technology • IBM Chief Architect for SOA • Co-Author, Co-Editor: WS-Trust, WSSecureConversation, WS-Federation, WS-Policy • Building large scale software systems for 20 years (I’m older than I look, I know!) 2
  • 3. Genesis of this talk • Evolving from various conference presentations – Blog:http://www.metaforsoftware.com/category/ano maly-detection-101/ – Many briefly mentioned issues, never explored – Needed more details and examples • • • • Note: real data Note: no y-axis labels on charts – on purpose!! Note to self: remember to SLOW DOWN! Note to self: mention the cats!! Everybody loves cats!! 3
  • 5. The WoC side-effects: alert fatigue “Alert fatigue is the single biggest problem we have right now … We need to be more intelligent about our alerts or we’ll all go insane.” - John Vincent (@lusis) (#monitoringsucks) 5
  • 6. The fallacy of thresholds • So what if my unicorn usage is at 89-91%, and has been stable? • I’d much rather know if it’s at 60% and has been rapidly increasing • Static thresholds and rules won’t help you in this case 6
  • 7. Work smarter not harder • We don’t need more metrics • We don’t need more thresholds and rules • We DO need better, smarter tools 7
  • 8. TO THE RESCUE: Anomaly Detection!! • Anomaly detection (also known as outlier detection) is the search for items or events which do not conform to an expected pattern. [Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys 41 (3): 1] • For devops: Need to know when one or more of our metrics is going wonky 8
  • 9. #monitoringsucks vs #i monitoring • Proper monitoring tools should give us all the information we need to be PROACTIVE – But they don’t • Current monitoring tools assume that the underlying system is relatively static – Surround it with static thresholds and rules. – Good for detecting catastrophic events but not much else – WHY!!?? 9
  • 10. “Traditional” analytics … • Roots in manufacturing process QC 10
  • 11. … are based on Gaussian distributions • Make assumptions about probability distributions and process behaviour – Usually assume data is normally distributed with a useful and usable mean and standard deviation 11
  • 14. Three-Sigma Rule • Three-sigma rule – ~68% of the values lie within 1 std deviation of the mean – ~95% of the values lie within 2 std deviations – 99.73% of the values lie within 3 std deviations: anything else is an outlier 14
  • 15. Aaahhhh • The mysterious red lines explained 15
  • 16. The four horsemen • Four horsemen of the modelpocalypse™  [Abe Stanway & Jon Cowie http://www.slideshare.net/jonlives/bring-thenoise] – Seasonality – Spike influence – Normality – Parameters 16
  • 17. Moving Averages for detecting outliers • Moving Averages “Big idea”: – At any point in time in a well-behaved time series, your next value should not significantly deviate from the general trend of your data – Mean as a predictor is too static, relies on too much past data (ALL of the data!) – Instead of overall mean use a finite window of past values, predict most likely next value – Alert if actual value “significantly” (3 sigmas?) deviates from predicted value 17
  • 18. Simple and Weighted Moving Averages • Simple Moving Average – Average of last N values in your time series • S[t] <- sum(X[t-(N-1):t])/N – Each value in the window contributes equally to prediction – …INCLUDING spikes and outliers • Weigthed Moving Average – Similar to SMA but assigns linearly (arithmetically) decreasing weights to every value in the window – Older values contribute less to the prediction 18
  • 19. Exponential Smoothing • Exponential Smoothing – Similar to weighted average, but with weights decay exponentially over the whole set of historic samples • S[t]=αX[t-1] + (1-α)S[t-1] – Does not deal with trends in data • DES – In addition to data smoothing factor (α), introduces a trend smoothing factor (β) – Better at dealing with trending – Does not deal with seasonality in data • TES, Holt-Winters – Introduces additional seasonality factor – … and so on • ALL assume Gaussian! 19
  • 20. Gaussian distributions are powerful because: • Far far in the future, in a galaxy far far away: – I can make the same predictions because the statistical properties of the data haven’t changed – I can easily compare different metrics since they have similar statistical properties • BUT… • Cue in DRAMATIC MUSIC 20
  • 23. Let’s look at an example 23
  • 26. Histogram – probability distribution 26
  • 30. Histogram – probability distribution 30
  • 31. Are we doomed? • No! • There are lots of other non-Gaussian based techniques: – Adaptive Mixture of Gaussians – Non-parametric techniques (http://www.metaforsoftware.com/everythingyou-should-know-about-anomaly-detectionknow-your-data-parametric-or-non-parametric/) – Spectral analysis 31
  • 32. Kolmogorov-Smirnov test • Non-parametric test – Compare two probability distributions – Makes no assumptions (e.g. Gaussian) about the distributions of the samples – Measures maximum distance between cumulative distributions – Can be used to compare periodic/seasonal metric periods (e.g. day-to-day or week-to-week) http://en.wikipedia.org/wiki/Kolmogorov%E2% 80%93Smirnov_test 32
  • 33. KS test with bootstrap 33
  • 34. What about slow trends? 34
  • 35. KS test on slow memory leak 35
  • 36. Histogram – probability distribution 36
  • 37. We’re not doomed, but: Know your data!! • You need to understand the statistical properties of your data, and where it comes from, in order to determine what kind of analytics to use. • A large amount of data center data is nonGaussian – Guassian statistics won’t work – Use appropriate techniques 37
  • 38. Pet Peeve: How much data do we need? • Trend towards higher and higher sampling rates in data collection • Reminds me of Jorge Luis Borges’ story about Funes the Memorious – Perfect recollection of the slightest details of every instant of his life, but lost the ability for abstraction • Our brain works on abstraction – We notice patterns BECAUSE we can abstract 38
  • 39. The danger of over-abstraction + = comfortable? 39
  • 40. So, how much data DO you need? • You don’t need more resolution that twice your highest frequency (Nyquist-Shanon sampling theorem) • Most of the algorithms for analytics will smooth, average, filter, and pre-process the data. • Watch out for correlated metrics (e.g. used vs. available memory) 40
  • 41. Think: Is all data important to collect? • Two camps: – Data is data, let’s collect and analyze everything and figure out the trends. – Not all data is important, so let’s figure out what’s important first and understand the underlying model so we don’t waste resources on the rest. • Similar to the very public bun fight between Noam Chomsky and Peter Norvig – http://norvig.com/chomsky.html • Unresolved as far as I know  41
  • 42. Shout out to etsy • Check out kale: • Check out kale for some analytics: – http://codeascraft.com/2013/06/11/introducingkale/ – https://github.com/etsy/skyline/blob/master/src/ analyzer/algorithms.py 42
  • 43. More? • Only scratched the surface • I want to talk more about algorithms, analytics, current issues, etc, in more depth, but time’s up!! – Go back in time to me Office Hours session, or – Come talk to me or email me if interested. • Thank you! toufic@metaforsoftware.com @tboubez 43

Editor's Notes

  1. TOUFIC = “WHAT WE DO!”