SlideShare a Scribd company logo
Know Your Data:
The stats behind the alerts
Dave McAllister
Sr. OSS Technologist
NGINX
©2023 F5
2
Quick:
What’s the difference between
Mean, Median and Mode?
©2023 F5
3
Quick:
What’s the difference between
Mean, Median and Mode?
And for extra credit, What’s the 9th Dedekind number?
©2023 F5
5
Monitoring is a numbers game
• Metrics are numbers that represent selected
behavior
• Generally
• Timestamped
• Key-Values
• Data, to be useful, must be
• Aggregated
• Analyzed
• Visualized
©2023 F5
6
• How do you deal with outliers (spikes) in
monitoring?
• How do you get a representative value when
values build on each other?
• How do you arrive at values to represent rate of
change over time?
Do you know what your alert is
really showing you?
Some questions to ponder
©2023 F5
7
Mean, Median, Mode
Median:
Represents the middle
value in a set of ordered
data
Mean:
Measure of central
tendency, represents
average value of a set of
data
Mode:
Value that appears most
frequently in a set of data
Data:
2, 6, 4, 9, 5, 1, 7, 8, 1, 9, 9, 1, 10, 2, 9, 6, 7, 2, 1, 4, 7, 1, 10, 9, 2, 7, 1, 1, 4, 3, 5, 6,
3, 8, 1, 8, 4, 7, 6, 3, 9, 9, 9, 4, 9, 1, 4, 1, 9, 8, 10, 10, 1, 1, 1, 7, 10, 9, 7, 3, 7, 4
Median = 6 Mode = 1
Mean = 5.444
©2023 F5
8
Mean, Median, Mode
Data:
2, 6, 4, 9, 5, 1, 7, 8, 1, 9, 9, 1, 10, 2, 9, 6, 7, 2, 1, 4, 7, 1, 10, 9, 2, 7, 1, 1, 4, 3, 5, 6, 3, 8, 1, 8, 4, 7, 6, 3, 9, 9, 9, 4, 9, 1,
4, 1, 9, 8, 10, 10, 1, 1, 1, 7, 10, 9, 7, 3, 7, 4
Mean = 5.444
Or is it 4.130 or 2.791?
Median:
Represents the middle
value in a set of ordered
data
Mean:
Measure of central
tendency, represents
average value of a set of
data
Mode:
Value that appears most
frequently in a set of data
©2023 F5
9
Arithmetic, Harmonic, Geometric, Trimmed, Weighted, Moving
Means to an End
• Each has potential uses and drawbacks
• Often already implemented in monitoring software
• Can give very different results
• Can make like and unlike comparisons easier
©2023 F5
10
Also often called Average
• Most common
• Is the central point in a normal distribution
• This is not the 50% mark (mostly)
• Useful for comparing current to previous conditions
• May be aggregated into groups (time series)
In a time series, we usually calculate
constantly to incorporate new data
Arithmetic
Internet
Load
Balancer
300
Request/sec
200
Request/sec
100
Request/sec
Amean = (100 + 200 + 300) / 3
Amean = 200 Requests per second
©2023 F5
11
• Multiply all the items together, take the nth
root
• Often used for things growing exponentially
• In DevOps
• Average number of deploys per unit of
time
• Average lead time for changes
• MTTR
• Throughput
Geometric
DevOps team: optimizing app deployment
Sprint 1: 5% reduction
Sprint 2: 10% reduction
Sprint 3: 3% reduction
Sprint 4: 7% reduction
How well are they reducing deployment times?
X1=1-0.05 = 0.95, x2 = 0.90, X3 = 0.97, x4 = 0.93
GM = 0.937 or
Deployment has a 6.3% improvement
©2023 F5
12
Harmonic
• Divide n by the sum of the reciprocals
• Measure the performance where
multiple systems are involved
• Weights the lowest figure the highest
• In DevOps
• Performance within range
• Overall indication of latency or thruput
• Use in complex environments
• Especially useful for outliers
Internet
Load
Balancer
300
Request/sec
200
Request/sec
100
Request/sec
HM = 150 Requests per second
©2023 F5
13
Take a comparison look
What’s more important to the equation?
• Latency
• Thruput
• GM is the moral equivalent
• AM weights the larger number
• HM weights the lower value
Average
Latency
ms/sec
Average
Count/sec
Suggested
metric AM GM HM
250 40 10000 145.00 100.00 68.97
200 50 10000 125.00 100.00 80.00
400 25 10000 212.50 100.00 47.06
198 70 13860 134.00 117.73 103.43
105 99 10395 102.00 101.96 101.91
474 73 34602 273.50 186.02 126.52
195 97 18915 146.00 137.53 129.55
196 138 27048 167.00 164.46 161.96
133 10 1330 71.50 36.47 18.60
298 55 16390 176.50 128.02 92.86
0.00
50.00
100.00
150.00
200.00
250.00
300.00
1 2 3 4 5 6 7 8 9 10
Means overlay
AM GM HM
©2023 F5
14
• Amazingly underutilized!
• Center value of a sorted list
• Median is always the 50% point of a normal curve
Median
Mean 25.25
Median 26
©2023 F5
15
Choosing Between Mean and Median
• Mean can be impacted by outliers
• Resilience is better in median
• In DevOps
• Response time monitoring
• Anomaly Detection
• Capacity Planning
Old New
Outlier 250
25.54 Mean 31.15
26 Median 26.5
©2023 F5
16
Old New
Outlier 291
21.90 Mean 28.63
22 Median 22.5
18.61 Harmonic 19.06
©2023 F5
17
If you are using P95
you are using using a percentage value
Congrats!
©2023 F5
18
Slight sidetrack: Measure of Variability
How the numbers behave.
• Standard Deviation
• Range
• InterQuartile Range (IQR)
• Variance
• Clusters
• Outliers
Properly used, variability can
help you target outliers
©2023 F5
19
Those outside edges are good candidates for outliers
Outlier Formula, visually
Q2
IQR
Outlier Formula
Q3
Q1
©2023 F5
20
How about the Mode?
• The most commonly recurring
value in the set
• Often presented as a histograph
• Not commonly used in DevOps,
mostly inferential
• Log Analysis
• Security Monitoring
• User Behavior Analysis
0 2 4 6 8 10 12 14
one
two
three
four
five
six
seven
eight
nine
ten
Mode Chart
©2023 F5
21
Slight sidetrack: Descriptive versus Inferential stats
• Descriptive uses the whole data set to draw statistical conclusions
• Used for visualization
• Can define and extract trends
• Inferential uses a sampled set to draw conclusions
• Used for predictions or hypotenuse testing
• Can also visualize
• But this leads us to sampling
©2023 F5
22
Dealing with the data
• Monitoring is now a data problem
• Observability signals: Metrics, Traces, Logs
• Analysis is often
• Aggregated or Analyzed in segments: Time-defined
• Sampled and inferential
• Random sampling
• Stratified sampling
• Cluster sampling
• Systematic sampling
• Purposive sampling
©2023 F5
23
Sampling No Sampling
©2023 F5
24
• Changes behavior from Descriptive to Inferential
• Can hide outlier behavior
• Metrics are not usually sampled
• May make forensics tougher
• Lack of direct correlation
• A necessary evil
• But understand what you are giving up
Sampling
©2023 F5
25
Distributions
• Normal
• Data equally distributed
• Poisson
• used to model the occurrence of rare
events
• Beta
• Success/failure of binomial events
• Exponential
• Time between async events
• Weibull
• Likelihood of failure
• Log-normal
• Values based on many small events
©2023 F5
26
Slight sidetrack: Standard deviation
• Measures the variability of your data
• Identifies trends and outliers
• NOT percentage based
• Except with coefficient of variability
• CV=Mean / std dev X 100
• Useful for measurement ignoring range
• SRE cases
• Lead times
• Recovery times
• Anomalies (alerts)
• SLO / SLI
This Photo by Unknown Author is licensed under CC BY-NC-ND
©2023 F5
27
Deeper dive: Weibull
• Usually used for time-to-failure
• Defined by a Shape and a Scale
parameter
• This can be challenging
• Don’t ask the math
• F(t)=1−e−(t/λ)β
• R does it for you!
Component Time-to-Failure
Spinning Rust 500 hours
Memory 1000 hours
Power Supply 1500 hours
CPU 2000 hours
SSD 2500 hours
library(fitdistrplus)
data <- c(500, 1000, 1500, 2000, 2500)
fit.weib <- fitdist(data, "weibull")
summary(fit.weib)
Fitting of the distribution ' weibull ' by maximum likelihood
Parameters : shape 1.0624082 scale 2158.256
p.failure <- pweibull(300, shape = fit.weib$estimate[1],
scale = fit.weib$estimate[2])
1 - p.failure
Failure (disk, memory) = 64.81%
Failure (entire system) = 75.52%
Fdisk(300)≈1−e−0.6918≈0.50025
Fmemory(300)≈1− −0.3490≈0.2959Fmemory(300)≈1−e−0.3490≈0.2959
Fpower(300)≈1− −0.2327≈0.2092Fpower(300)≈1−e−0.2327≈0.2092
FCPU(300)≈1− −0.1745≈0.1593FCPU(300)≈1−e−0.1745≈0.1593
FSSD(300)≈1− −0.1396≈0.1298FSSD(300)≈1−e−0.1396≈0.1298
©2023 F5
28
Deeper dive: Exponential
• Models the “rate” (time between
events that are unrelated)
• Use cases
• Network performance
• User Requests (traces)
• Messaging service
• System failures
• f(x) = me–mx
• λ = 1/µ
• e ~ 2.71828182846
Average
Latency
ms/sec
Average
Count/sec
Suggested
metric AM GM HM
250 40 10000 145.00 100.00 68.97
200 50 10000 125.00 100.00 80.00
400 25 10000 212.50 100.00 47.06
198 70 13860 134.00 117.73 103.43
105 99 10395 102.00 101.96 101.91
474 73 34602 273.50 186.02 126.52
195 97 18915 146.00 137.53 129.55
196 138 27048 167.00 164.46 161.96
133 10 1330 71.50 36.47 18.60
298 55 16390 176.50 128.02 92.86
Latency
λ = 1/244.9
Probability density of 158 ms = 0.2142%
Cumulative density = 47.5422%
Expdist median ≈ 169.7 ms
Throughput
λ = 1/65.7
Probability density at 58 count = 0.628%
Cumulative density = 58.7%
Expdist median ≈ 45.6
©2023 F5
29
You may stumble upon:
“On scale, statistics are not your friend”
WRONG
On scale, probability is not your friend.
Slight Sidetrack
and a pet peeve
©2023 F5
30
Conditional Probability – The Monty Hall Problem
Bayes Theorem
• You have three doors
• Behind one of them is a car
• The other two are goats
• You pick Door # 2
1 3
2
©2023 F5
31
Conditional Probability – The Monty Hall Problem
Bayes Theorem
• You have three doors
• Behind one of them is a car
• The other two are goats
• You pick Door # 2
• The host shows you door # 1 and
offers to let you switch
• Do you switch or not?
1 3
2
©2023 F5
32
Conditional Probability
Bayes Theorem
• Mathematical framework for
updating probability for an event as
new information becomes available
• Based on this theorem
• Initial probability of Door 2 = 1/3
• Initial probability of Doors 1 or 3 = 2/3
• Since it isn’t behind Door # 1 then
Door # 3 is the remaining 2/3
1 3
2
©2023 F5
33
Bayes Theorem in DevOps
Incident Troubleshooting
• You're seeing an increase in error rates
after a recent deployment. Historically,
80% of such incidents have been tied to
database issues after major updates.
• Bayes' theorem suggests that the
current issue is database-related
considering the recent deployment and
historical data.
Predictive Monitoring
• Historically there's a 5% chance that
your application will crash on any given
day. Now, you get alerts about
increasing memory consumption.
• Bayes' theorem indicates the probability
of an impending application crash.
Optimizing A/B Testing
• You're testing a new feature against a
baseline to determine its impact on
server load. Initially, both variants might
be assumed equally likely to have an
effect.
• Data from the test allows you to update
the probability distribution of the effect
of the new feature on server load.
Bayes' theorem provides a mathematical framework for incorporating new evidence into existing beliefs
©2023 F5
34
Common Pitfalls in Statistics
Looking at the
wrong central
measure
Ignoring Scale Confusing
correlation with
causation
Getting causation
backwards
Failing to see
biases
©2023 F5
35
Correlation and Causation
©2023 F5
36
• Statistics are how we tend to analyze our metrics
• Statistics are aggregation and reduction to reveal central tendencies
• They do not show individual behavior
• Most choices make use of very few basics
• But other choices may show amazing inferential results
• And finally
Summary
©2023 F5
37
• Statistics are how we tend to analyze our metrics
• Statistics are aggregation and reduction to reveal central tendencies
• They do not show individual behavior
• Most choices make use of very few basics
• But other choices may show amazing inferential results
• And finally
Summary
The most effective debugging tool is still careful thought,
coupled with judiciously placed print statements.
-Brian Kernighan Unix for Beginners 1979
©2023 F5
38
D(9) = 286 386 577 668 298
411 128 469 151 667 598
498 812 366
©2023 F5
39
Thanks!
Linkedin:
in/davemc
NGINX Community Slack
Slides on
GitHub

More Related Content

Similar to Know Your Data: The stats behind your alerts

Short Data Rules for Observability.pdf
Short Data Rules for Observability.pdfShort Data Rules for Observability.pdf
Short Data Rules for Observability.pdf
Dave McAllister
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
Sanghamitra Deb
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Peter Tröger
 
Using Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformUsing Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS Platform
DevOps.com
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration TestingNick van Beest
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
Betclic Everest Group Tech Team
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Thomas Ploetz
 
Presentation Machine Learning
Presentation Machine LearningPresentation Machine Learning
Presentation Machine LearningPeriklis Gogas
 
Process Control
Process ControlProcess Control
Process Control
Ronald Shewchuk
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
Masashi Narumoto
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
Time-resource v&v for complex systems
Time-resource v&v for complex systemsTime-resource v&v for complex systems
Time-resource v&v for complex systems
Predictable Network Solutions Ltd.
 
Amaya_Presentation
Amaya_PresentationAmaya_Presentation
Amaya_PresentationIsaias Amaya
 
Observability - the good, the bad, and the ugly
Observability - the good, the bad, and the uglyObservability - the good, the bad, and the ugly
Observability - the good, the bad, and the ugly
Aleksandr Tavgen
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Aleksandr Tavgen
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
ImXaib
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedinAsoka Korale
 

Similar to Know Your Data: The stats behind your alerts (20)

Short Data Rules for Observability.pdf
Short Data Rules for Observability.pdfShort Data Rules for Observability.pdf
Short Data Rules for Observability.pdf
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
Using Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformUsing Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS Platform
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration Testing
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Presentation Machine Learning
Presentation Machine LearningPresentation Machine Learning
Presentation Machine Learning
 
Process Control
Process ControlProcess Control
Process Control
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Time-resource v&v for complex systems
Time-resource v&v for complex systemsTime-resource v&v for complex systems
Time-resource v&v for complex systems
 
Amaya_Presentation
Amaya_PresentationAmaya_Presentation
Amaya_Presentation
 
Observability - the good, the bad, and the ugly
Observability - the good, the bad, and the uglyObservability - the good, the bad, and the ugly
Observability - the good, the bad, and the ugly
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedin
 

More from All Things Open

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
All Things Open
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
All Things Open
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
All Things Open
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
All Things Open
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
All Things Open
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
All Things Open
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
All Things Open
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
All Things Open
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
All Things Open
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
All Things Open
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
All Things Open
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
All Things Open
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
All Things Open
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
All Things Open
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
All Things Open
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
All Things Open
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
All Things Open
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
All Things Open
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
All Things Open
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
All Things Open
 

More from All Things Open (20)

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Know Your Data: The stats behind your alerts

  • 1. Know Your Data: The stats behind the alerts Dave McAllister Sr. OSS Technologist NGINX
  • 2. ©2023 F5 2 Quick: What’s the difference between Mean, Median and Mode?
  • 3. ©2023 F5 3 Quick: What’s the difference between Mean, Median and Mode? And for extra credit, What’s the 9th Dedekind number?
  • 4. ©2023 F5 5 Monitoring is a numbers game • Metrics are numbers that represent selected behavior • Generally • Timestamped • Key-Values • Data, to be useful, must be • Aggregated • Analyzed • Visualized
  • 5. ©2023 F5 6 • How do you deal with outliers (spikes) in monitoring? • How do you get a representative value when values build on each other? • How do you arrive at values to represent rate of change over time? Do you know what your alert is really showing you? Some questions to ponder
  • 6. ©2023 F5 7 Mean, Median, Mode Median: Represents the middle value in a set of ordered data Mean: Measure of central tendency, represents average value of a set of data Mode: Value that appears most frequently in a set of data Data: 2, 6, 4, 9, 5, 1, 7, 8, 1, 9, 9, 1, 10, 2, 9, 6, 7, 2, 1, 4, 7, 1, 10, 9, 2, 7, 1, 1, 4, 3, 5, 6, 3, 8, 1, 8, 4, 7, 6, 3, 9, 9, 9, 4, 9, 1, 4, 1, 9, 8, 10, 10, 1, 1, 1, 7, 10, 9, 7, 3, 7, 4 Median = 6 Mode = 1 Mean = 5.444
  • 7. ©2023 F5 8 Mean, Median, Mode Data: 2, 6, 4, 9, 5, 1, 7, 8, 1, 9, 9, 1, 10, 2, 9, 6, 7, 2, 1, 4, 7, 1, 10, 9, 2, 7, 1, 1, 4, 3, 5, 6, 3, 8, 1, 8, 4, 7, 6, 3, 9, 9, 9, 4, 9, 1, 4, 1, 9, 8, 10, 10, 1, 1, 1, 7, 10, 9, 7, 3, 7, 4 Mean = 5.444 Or is it 4.130 or 2.791? Median: Represents the middle value in a set of ordered data Mean: Measure of central tendency, represents average value of a set of data Mode: Value that appears most frequently in a set of data
  • 8. ©2023 F5 9 Arithmetic, Harmonic, Geometric, Trimmed, Weighted, Moving Means to an End • Each has potential uses and drawbacks • Often already implemented in monitoring software • Can give very different results • Can make like and unlike comparisons easier
  • 9. ©2023 F5 10 Also often called Average • Most common • Is the central point in a normal distribution • This is not the 50% mark (mostly) • Useful for comparing current to previous conditions • May be aggregated into groups (time series) In a time series, we usually calculate constantly to incorporate new data Arithmetic Internet Load Balancer 300 Request/sec 200 Request/sec 100 Request/sec Amean = (100 + 200 + 300) / 3 Amean = 200 Requests per second
  • 10. ©2023 F5 11 • Multiply all the items together, take the nth root • Often used for things growing exponentially • In DevOps • Average number of deploys per unit of time • Average lead time for changes • MTTR • Throughput Geometric DevOps team: optimizing app deployment Sprint 1: 5% reduction Sprint 2: 10% reduction Sprint 3: 3% reduction Sprint 4: 7% reduction How well are they reducing deployment times? X1=1-0.05 = 0.95, x2 = 0.90, X3 = 0.97, x4 = 0.93 GM = 0.937 or Deployment has a 6.3% improvement
  • 11. ©2023 F5 12 Harmonic • Divide n by the sum of the reciprocals • Measure the performance where multiple systems are involved • Weights the lowest figure the highest • In DevOps • Performance within range • Overall indication of latency or thruput • Use in complex environments • Especially useful for outliers Internet Load Balancer 300 Request/sec 200 Request/sec 100 Request/sec HM = 150 Requests per second
  • 12. ©2023 F5 13 Take a comparison look What’s more important to the equation? • Latency • Thruput • GM is the moral equivalent • AM weights the larger number • HM weights the lower value Average Latency ms/sec Average Count/sec Suggested metric AM GM HM 250 40 10000 145.00 100.00 68.97 200 50 10000 125.00 100.00 80.00 400 25 10000 212.50 100.00 47.06 198 70 13860 134.00 117.73 103.43 105 99 10395 102.00 101.96 101.91 474 73 34602 273.50 186.02 126.52 195 97 18915 146.00 137.53 129.55 196 138 27048 167.00 164.46 161.96 133 10 1330 71.50 36.47 18.60 298 55 16390 176.50 128.02 92.86 0.00 50.00 100.00 150.00 200.00 250.00 300.00 1 2 3 4 5 6 7 8 9 10 Means overlay AM GM HM
  • 13. ©2023 F5 14 • Amazingly underutilized! • Center value of a sorted list • Median is always the 50% point of a normal curve Median Mean 25.25 Median 26
  • 14. ©2023 F5 15 Choosing Between Mean and Median • Mean can be impacted by outliers • Resilience is better in median • In DevOps • Response time monitoring • Anomaly Detection • Capacity Planning Old New Outlier 250 25.54 Mean 31.15 26 Median 26.5
  • 15. ©2023 F5 16 Old New Outlier 291 21.90 Mean 28.63 22 Median 22.5 18.61 Harmonic 19.06
  • 16. ©2023 F5 17 If you are using P95 you are using using a percentage value Congrats!
  • 17. ©2023 F5 18 Slight sidetrack: Measure of Variability How the numbers behave. • Standard Deviation • Range • InterQuartile Range (IQR) • Variance • Clusters • Outliers Properly used, variability can help you target outliers
  • 18. ©2023 F5 19 Those outside edges are good candidates for outliers Outlier Formula, visually Q2 IQR Outlier Formula Q3 Q1
  • 19. ©2023 F5 20 How about the Mode? • The most commonly recurring value in the set • Often presented as a histograph • Not commonly used in DevOps, mostly inferential • Log Analysis • Security Monitoring • User Behavior Analysis 0 2 4 6 8 10 12 14 one two three four five six seven eight nine ten Mode Chart
  • 20. ©2023 F5 21 Slight sidetrack: Descriptive versus Inferential stats • Descriptive uses the whole data set to draw statistical conclusions • Used for visualization • Can define and extract trends • Inferential uses a sampled set to draw conclusions • Used for predictions or hypotenuse testing • Can also visualize • But this leads us to sampling
  • 21. ©2023 F5 22 Dealing with the data • Monitoring is now a data problem • Observability signals: Metrics, Traces, Logs • Analysis is often • Aggregated or Analyzed in segments: Time-defined • Sampled and inferential • Random sampling • Stratified sampling • Cluster sampling • Systematic sampling • Purposive sampling
  • 23. ©2023 F5 24 • Changes behavior from Descriptive to Inferential • Can hide outlier behavior • Metrics are not usually sampled • May make forensics tougher • Lack of direct correlation • A necessary evil • But understand what you are giving up Sampling
  • 24. ©2023 F5 25 Distributions • Normal • Data equally distributed • Poisson • used to model the occurrence of rare events • Beta • Success/failure of binomial events • Exponential • Time between async events • Weibull • Likelihood of failure • Log-normal • Values based on many small events
  • 25. ©2023 F5 26 Slight sidetrack: Standard deviation • Measures the variability of your data • Identifies trends and outliers • NOT percentage based • Except with coefficient of variability • CV=Mean / std dev X 100 • Useful for measurement ignoring range • SRE cases • Lead times • Recovery times • Anomalies (alerts) • SLO / SLI This Photo by Unknown Author is licensed under CC BY-NC-ND
  • 26. ©2023 F5 27 Deeper dive: Weibull • Usually used for time-to-failure • Defined by a Shape and a Scale parameter • This can be challenging • Don’t ask the math • F(t)=1−e−(t/λ)β • R does it for you! Component Time-to-Failure Spinning Rust 500 hours Memory 1000 hours Power Supply 1500 hours CPU 2000 hours SSD 2500 hours library(fitdistrplus) data <- c(500, 1000, 1500, 2000, 2500) fit.weib <- fitdist(data, "weibull") summary(fit.weib) Fitting of the distribution ' weibull ' by maximum likelihood Parameters : shape 1.0624082 scale 2158.256 p.failure <- pweibull(300, shape = fit.weib$estimate[1], scale = fit.weib$estimate[2]) 1 - p.failure Failure (disk, memory) = 64.81% Failure (entire system) = 75.52% Fdisk(300)≈1−e−0.6918≈0.50025 Fmemory(300)≈1− −0.3490≈0.2959Fmemory(300)≈1−e−0.3490≈0.2959 Fpower(300)≈1− −0.2327≈0.2092Fpower(300)≈1−e−0.2327≈0.2092 FCPU(300)≈1− −0.1745≈0.1593FCPU(300)≈1−e−0.1745≈0.1593 FSSD(300)≈1− −0.1396≈0.1298FSSD(300)≈1−e−0.1396≈0.1298
  • 27. ©2023 F5 28 Deeper dive: Exponential • Models the “rate” (time between events that are unrelated) • Use cases • Network performance • User Requests (traces) • Messaging service • System failures • f(x) = me–mx • λ = 1/µ • e ~ 2.71828182846 Average Latency ms/sec Average Count/sec Suggested metric AM GM HM 250 40 10000 145.00 100.00 68.97 200 50 10000 125.00 100.00 80.00 400 25 10000 212.50 100.00 47.06 198 70 13860 134.00 117.73 103.43 105 99 10395 102.00 101.96 101.91 474 73 34602 273.50 186.02 126.52 195 97 18915 146.00 137.53 129.55 196 138 27048 167.00 164.46 161.96 133 10 1330 71.50 36.47 18.60 298 55 16390 176.50 128.02 92.86 Latency λ = 1/244.9 Probability density of 158 ms = 0.2142% Cumulative density = 47.5422% Expdist median ≈ 169.7 ms Throughput λ = 1/65.7 Probability density at 58 count = 0.628% Cumulative density = 58.7% Expdist median ≈ 45.6
  • 28. ©2023 F5 29 You may stumble upon: “On scale, statistics are not your friend” WRONG On scale, probability is not your friend. Slight Sidetrack and a pet peeve
  • 29. ©2023 F5 30 Conditional Probability – The Monty Hall Problem Bayes Theorem • You have three doors • Behind one of them is a car • The other two are goats • You pick Door # 2 1 3 2
  • 30. ©2023 F5 31 Conditional Probability – The Monty Hall Problem Bayes Theorem • You have three doors • Behind one of them is a car • The other two are goats • You pick Door # 2 • The host shows you door # 1 and offers to let you switch • Do you switch or not? 1 3 2
  • 31. ©2023 F5 32 Conditional Probability Bayes Theorem • Mathematical framework for updating probability for an event as new information becomes available • Based on this theorem • Initial probability of Door 2 = 1/3 • Initial probability of Doors 1 or 3 = 2/3 • Since it isn’t behind Door # 1 then Door # 3 is the remaining 2/3 1 3 2
  • 32. ©2023 F5 33 Bayes Theorem in DevOps Incident Troubleshooting • You're seeing an increase in error rates after a recent deployment. Historically, 80% of such incidents have been tied to database issues after major updates. • Bayes' theorem suggests that the current issue is database-related considering the recent deployment and historical data. Predictive Monitoring • Historically there's a 5% chance that your application will crash on any given day. Now, you get alerts about increasing memory consumption. • Bayes' theorem indicates the probability of an impending application crash. Optimizing A/B Testing • You're testing a new feature against a baseline to determine its impact on server load. Initially, both variants might be assumed equally likely to have an effect. • Data from the test allows you to update the probability distribution of the effect of the new feature on server load. Bayes' theorem provides a mathematical framework for incorporating new evidence into existing beliefs
  • 33. ©2023 F5 34 Common Pitfalls in Statistics Looking at the wrong central measure Ignoring Scale Confusing correlation with causation Getting causation backwards Failing to see biases
  • 35. ©2023 F5 36 • Statistics are how we tend to analyze our metrics • Statistics are aggregation and reduction to reveal central tendencies • They do not show individual behavior • Most choices make use of very few basics • But other choices may show amazing inferential results • And finally Summary
  • 36. ©2023 F5 37 • Statistics are how we tend to analyze our metrics • Statistics are aggregation and reduction to reveal central tendencies • They do not show individual behavior • Most choices make use of very few basics • But other choices may show amazing inferential results • And finally Summary The most effective debugging tool is still careful thought, coupled with judiciously placed print statements. -Brian Kernighan Unix for Beginners 1979
  • 37. ©2023 F5 38 D(9) = 286 386 577 668 298 411 128 469 151 667 598 498 812 366