SlideShare a Scribd company logo
Views and opinions expressed in this presentation are solely of its authors and do not
necessarily represent those of Alphabet, Inc or its subsidiaries, including Google, Inc.
Select images and formulae are provided with permission from Google, Inc
Percentile-Based Approach
To Forecasting Workload
Growth
Alex Gilgur, Douglas Browning, Stephen Gunn,
Xiaojun Di, Wei Chen, and Rajesh Krishnaswamy
IT Capacity and Performance
41st International Conference by
the Computer Measurement Group (CMG'15)
San Antonio, TX November 5, 2015
Session 525
How often do you see such patterns?
How do you predict them?
“What's in a name?
That which we call a rose / By any other name /Would smell as sweet”
● Useful Workload Measures:
○ Outlier Boundary
○ Mean + “Z” Standard Deviations (e.g. “3 sigma”, “6 sigma”, etc.)
○ 95th percentile
○ 90th percentile
○ 75th percentile
○ Simple Average (Mean)
○ Median
○ 25th percentile
● Define Workload via Little's Law:
○ Number of units of work in the system
■ Number of packets in flight
■ Number of queries in queue
○ W = X * T
● Workload Forecasting
○ When Classical Methods Fail
○ Workload statistics:
■ “Z-sigma”
■ Quantile Regression
● p95
● Outlier Boundaries
○ Knees, Hyperbolae, and Sensitivity
○ Quantile Compression
○ Predicting the Workload
○ Use Cases
What We Will Talk About
“It may be normal, darling; but I'd rather be natural.”
Truman Capote, Breakfast at Tiffany’s
Usual Assumptions:
● Residuals are Normally distributed
● Mean and StDev of residuals are constant
“Double, Double, Toil and Trouble”
Solutions
And the Winner Is...
… Predicting 95th Percentile Directly !
Why not Split the Workload into Two Servers?
● Sometimes app1 and app2 have to go to the same
server for processing.
● To size the server (VM/Network Link/Storage LUN/… ),
we only want to forecast the upper bound.
● Sometimes it’s hard to get additional capacity:
○ budget, justification, approvals, etc.
○ Cloud helps, but...
Percentile-based Modeling is often the only solution
QuantReg for the 2 types of workload
fits very nicely in both cases
Should we Size Hardware for 95%-ile of Workload?
5% of the time
SLA (99.9%? 99.999%?)
will be violated
IQR vs. 5th and 95th Percentiles
IQR excludes “true” outliers
IQR (Tukey’s method) 5th and 95th Percentiles
Shouldn’t we Size Resources for Non-Outliers Instead?
John Tukey’s IQR method
● Why?
○ Normality does not matter
○ 5th & 95th percentiles in this scenario
would have “outlawed” a good part of data
points that are NOT outliers
● Why Not?
○ Multi-Modal distributions
We size for SLA, as long as traffic
stays within outlier boundaries
Tukey’s Boundaries in Trended Data
Tukey’s Boundaries in Trended Data
Tukey’s Boundaries in Trended Data
● don’t predict based on all data:
○ find natural groupings (GMM, DBSCAN, ...)
○ then fit the model
○ use the higher cluster to guarantee QoS
Tukey’s Method
● robust boundaries
● distribution-agnostic
● can be used to guarantee high QoS
○ Unimodal Distribution ○ Multi-Modal Distribution
Long Story Short
● Forecast Percentiles
● Find Natural Groupings
● Size For Outlier Boundaries
● Workload Forecasting
○ When Classical Methods Fail
○ Workload statistics:
■ “Z-sigma”
■ Quantile Regression
● p95
● Outlier Boundaries
○ Knees, Hyperbolae, and Sensitivity
○ Quantile Compression
○ Predicting the Workload
○ Use Cases
What We Will Talk About Next
Knees, Hyperbolae, and Sensitivity
Capacity:
Workload:
In a closed (constrained) system,
sensitivity of throughput to latency
increases with the throughput
In Human Terms
● As throughput increases, latency can only increase.
● As latency increases, throughput in a constrained
queueing system can only decrease.
● As we increase throughput in a constrained system
near its saturation point, its upper percentiles must
grow at a slower pace than lower percentiles.
In Mathematical Terms
Quantile Compression Theorem:
IF raw demand on a constrained system X′ is
moderated via a monotonically increasing damped
function X = f(X′),
THEN, as the system is approaching saturation,
smaller percentiles of moderated demand X grow
on average faster than higher percentiles.
This is only a presentation; for mathematical proof, please see the paper.
Long Story Short
“It’s just there...”
-Miles Davis
Quantile Compression:
As the system is approaching saturation, smaller
percentiles of moderated demand X grow on
average faster than higher percentiles.
In Practical Terms: are We Constrained?
Percentile trajectories diverge; we are NOT constrained here.
In Practical Terms: are We Constrained?
p5 and p95 trajectories converge; we ARE GETTING constrained here.
Percentile trajectories are almost all parallel; we are almost NOT constrained here.
In Practical Terms: are We Constrained?
“It’s always the quiet ones”
In Practical Terms: are We Constrained?
Unconstrained Growth Rates:
P97.5` > p95` > p75` > p50`
p95 trajectory is growing slower than p50; we ARE constrained here.
In Practical Terms: are We Constrained?
Predictions made:
p75` > p95` > p50` > p97.5`
Line Predicted by p95
Line Predicted by p50
Line Predicted by p75
Line Predicted by p97.5
Unconstrained Growth Rates:
P97.5` > p95` > p75` > p50`
p95 trajectory is growing slower than p50; we ARE constrained here.
In Practical Terms: are We Constrained?
Predictions made:
p75` > p95` > p50` > p97.5`
Line Predicted by p95
Line Predicted by p50
Line Predicted by p75
Line Predicted by p97.5
Unconstrained Growth Rates:
P97.5` > p95` > p75` > p50`
Observed Growth Rates:
p75` > p95` > p50` > p97.5`
p95 trajectory is growing slower than p50; we ARE constrained here.
In Statistical Terms: When Resource is Unconstrained
Unbounded Resource Throughput: Unimodal; Asymmetric; Skew is Constant
Bounded (Constrained) Resource Throughput: may become Bimodal; Skew may vary
In Statistical Terms: When Resource is Constrained
Can we Measure Asymmetry?
Not All Distributions are Easy to Deal With
What if...
...Mean and
Variance are
undefined
?
Not All Distributions are Easy to Deal With
What if...
...Mean and
Variance are
undefined
?
Percentiles win!
What if...
...Mean and
Variance are
undefined
?
Long Story Short
When resource is constrained:
1. Distribution changes:
a. becomes left-skewed
b. becomes bimodal
2. Skew is very important
3. Percentile-based Skew is the preferable statistic
Some More Examples that you
May Have Seen Before
Growth Was Constrained
Unconstrained Growth
Right-Skewed (long right tail)
Controlled Growth
Left-Skewed (long left tail)
What We Will Talk About Next
● Workload Forecasting
○ When Linear Regression fails
○ Workload statistics:
■ “Z-sigma”
■ Quantile Regression
● p95
● Outlier Boundaries
○ Knees, Hyperbolae, and Sensitivity
○ Quantile Compression
○ Predicting the Workload
○ Use Cases
“None that I know will be, much that I fear may chance”
● Regression:
○ Business Metrics
○ Little's Law
○ Time-related Covariates
● Time Series Analysis (Forecasting):
○ EWMA
○ ARIMA
Is it right to Size Resources Using Upper Percentiles of Bounded Data?
Forecasting demand using bounded data leads to undersizing the resource
Doing so is the path to the dark side.
Resource Constraint =>
Quantile Compression =>
Underforecasting the load =>
Undersizing the resource
Quantile Compression:
As the system is approaching saturation, smaller
percentiles of moderated demand X grow on
average faster than higher percentiles.
Can we infer unbounded lines from bounded data?
TimeStamp
Skew1. Find Skew for Unbounded Data
2. Forecast Upper and Lower Percentiles to the Time Horizon of Interest
3. Infer Unbounded Upper Percentiles (Skew = const)
4. If (unbounded = forecasted) => system is still unbounded
5. If (unbounded > forecasted & forecast > history) => system will be constrained
Throughput Forecasting Algorithm
Get U(t)Start
Identify the most
appropriate trend type
Done
Predict Trajectories for the LB
(p25) and Median
(LB’, M’) = Prediction for Low
Bounds and Median
Save the forecast
For each timestamp
Build hourly
boxplots
data
Throughput
Throughput
Identify the most
appropriate trend type
Throughput
LIN
Throughput
LOG
Throughput
EXP
Throughput
QUAD
Throughput
PWR
Throughput
R2 = 0.45
R2 = 0.34
R2 = 0.47
R2 = 0.38
R2 = 0.46
Trend Type Selection
● we know the variance is huge
● we are selecting TREND TYPE
● we are NOT selecting MODEL
Now we can use
T-test
A few words about R2
Now we can use
T-test
LIN
Throughput
R2 = 0.45
QUAD
Throughput
R2 = 0.46
EXP
Throughput
R2 = 0.47
LOG
Throughput
R2 = 0.34
PWR
Throughput
R2 = 0.38
A few words about R2
Trend Type SelectionThroughput
LIN
Throughput
R2 = 0.45
● we know the variance is huge
● we are selecting TREND TYPE
● we are NOT selecting MODEL
MODELS:
“LIN”,
“PWR”,
“EXP”,
“LOG”,
“QUAD”
Identify the most
appropriate trend type
Long Story Short
Forecasting Algorithm:
1. Compute the Skew
2. Identify the Trend Type
3. Forecast p25 and p50
4. Apply Skew to Compute Upper Percentiles
5. Compute Outlier Boundaries
● Workload Forecasting
○ When Classical Methods Fail
○ Workload statistics:
■ “Z-sigma”
■ Quantile Regression
● p95
● Outlier Boundaries
○ Knees, Hyperbolae, and Sensitivity
○ Quantile Compression
○ Predicting the Workload
○ Use Cases
What We Will Talk About Next
Throughput
Throughput
Use Cases: Unbounded: How Far to the Threshold?
Threshold
1000
250
Non-Outliers
above threshold
Pr {traffic >
threshold} > 5%
Another interesting scenario:
Forecasting Resource Congestion Zone
Forecasting Resource
Congestion Zone
By predicting collision points for different percentiles,
we can get a general idea of a Resource Congestion Zone
HAL9000: I've just picked up a fault
in the AE35 unit. It's going to go
100% failure in 72 hours.
Use Cases: Unbounded: How Much to Add?
(unbounded = forecasted) => system is still unbounded
Use Cases: Bounded (Congested): How Much to Add?
(unbounded > forecasted) => system was, and will be, constrained
Use Cases: Bounded (Congested): How Much to Add?
(unbounded > forecasted) => system may have been, and will be, constrained
Long Story Short
● Feedback Loop & Quantile Compression:
○ “It’s just there”:
■ explicitly, via the protocol.
■ implicitly, in the saturation dynamics.
● Do not assume anything!
○ Especially about shapes of distributions.
● Do not forecast p95!
○ Forecast Outlier Boundaries instead.
○ Mean and Variance are overrated!
● Do Size Hardware for the would-have-been-
unbounded Forecasts
Alex Gilgur
agilgur@google.com / alexgilgur@gmail.com
+1 (408) 475-7582 / +1 (408) 828-2115
Appendix
“Big 7” of Linearizable Equations
odds log
Is it Right to Size Resources Using Upper Percentiles?
Quantile Compression:
As the system is approaching
saturation, smaller percentiles of
moderated demand X grow on
average faster than higher
percentiles.
Is it Right to Size Resources Using Upper Percentiles?
Forecasting
Methods:
● EWMA
● ARIMA
● Regression
EWMA models are very specific and computationally fast, but they have to be told trend
(linear or exponential) and seasonality (additive or multiplicative).
ARIMA model will implicitly account for trends, seasonality, and stationarity of the data.
Autocorrelation of ARIMA residuals provide all the periodicities that have been missed.
For stationary data, use ARIMA
For non-stationary data, use EWMA
EWMA and ARIMA overlap
When to use Regression:
● data are monotonic.
● seasonality is NOT statistically significant.
● EWMA and ARIMA fail.
When to use Quantile Regression:
● Upper and Lower bounds behave differently.
● Outliers are possible.
For each data set, we can run a model competition, computing forecast model quality based
on a weighted sum of model goodness of fit, model suitability for forecasting, data stationarity
and data variability, and selecting the model that works best for each data set.
EWMA
ARIMA
Quantile Regression

More Related Content

Similar to CMG15 Session 525

4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski
PROIDEA
 
Measure to fail
Measure to failMeasure to fail
Measure to fail
Tomasz Kowalczewski
 
Mini datathon
Mini datathonMini datathon
Mini datathon
Kunal Jain
 
Topic 2b .pptx
Topic 2b .pptxTopic 2b .pptx
Topic 2b .pptx
tengshiankam
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Qt Computer Invasion
Qt  Computer InvasionQt  Computer Invasion
Qt Computer Invasion
tbadri
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
Jen Stirrup
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
ABINASHPADHY6
 
10 impact of uncertainty in matching
10 impact of uncertainty in matching10 impact of uncertainty in matching
10 impact of uncertainty in matchingRishi Mathur
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
Shruti Nigam (CWM, AFP)
 
Quality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdfQuality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdf
NileshJajoo2
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
Boston Institute of Analytics
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
Michael Winer
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
ajondaree
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
 
Interpreting Data Like a Pro - Dawn of the Data Age Lecture Series
Interpreting Data Like a Pro - Dawn of the Data Age Lecture SeriesInterpreting Data Like a Pro - Dawn of the Data Age Lecture Series
Interpreting Data Like a Pro - Dawn of the Data Age Lecture Series
Luciano Pesci, PhD
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
King Khalid University
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
Scott Fraundorf
 

Similar to CMG15 Session 525 (20)

4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski
 
Measure to fail
Measure to failMeasure to fail
Measure to fail
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Topic 2b .pptx
Topic 2b .pptxTopic 2b .pptx
Topic 2b .pptx
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Qt Computer Invasion
Qt  Computer InvasionQt  Computer Invasion
Qt Computer Invasion
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
10 impact of uncertainty in matching
10 impact of uncertainty in matching10 impact of uncertainty in matching
10 impact of uncertainty in matching
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Quality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdfQuality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
Interpreting Data Like a Pro - Dawn of the Data Age Lecture Series
Interpreting Data Like a Pro - Dawn of the Data Age Lecture SeriesInterpreting Data Like a Pro - Dawn of the Data Age Lecture Series
Interpreting Data Like a Pro - Dawn of the Data Age Lecture Series
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
 

More from Alex Gilgur

INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
Alex Gilgur
 
Informs2020 using machine learning to identify the factors of people's mobi...
Informs2020   using machine learning to identify the factors of people's mobi...Informs2020   using machine learning to identify the factors of people's mobi...
Informs2020 using machine learning to identify the factors of people's mobi...
Alex Gilgur
 
Informs2019 machine learning and data mining in identification of unhappy c...
Informs2019   machine learning and data mining in identification of unhappy c...Informs2019   machine learning and data mining in identification of unhappy c...
Informs2019 machine learning and data mining in identification of unhappy c...
Alex Gilgur
 
Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907
Alex Gilgur
 
Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018
Alex Gilgur
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
Alex Gilgur
 
Data Science and Predictive SPC
Data Science and Predictive SPCData Science and Predictive SPC
Data Science and Predictive SPC
Alex Gilgur
 
Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12
Alex Gilgur
 
CSP2014 Predictive SPC
CSP2014 Predictive SPCCSP2014 Predictive SPC
CSP2014 Predictive SPC
Alex Gilgur
 
Monte carlo and network cmg'14
Monte carlo and network cmg'14Monte carlo and network cmg'14
Monte carlo and network cmg'14
Alex Gilgur
 

More from Alex Gilgur (10)

INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
 
Informs2020 using machine learning to identify the factors of people's mobi...
Informs2020   using machine learning to identify the factors of people's mobi...Informs2020   using machine learning to identify the factors of people's mobi...
Informs2020 using machine learning to identify the factors of people's mobi...
 
Informs2019 machine learning and data mining in identification of unhappy c...
Informs2019   machine learning and data mining in identification of unhappy c...Informs2019   machine learning and data mining in identification of unhappy c...
Informs2019 machine learning and data mining in identification of unhappy c...
 
Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907
 
Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
Data Science and Predictive SPC
Data Science and Predictive SPCData Science and Predictive SPC
Data Science and Predictive SPC
 
Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12
 
CSP2014 Predictive SPC
CSP2014 Predictive SPCCSP2014 Predictive SPC
CSP2014 Predictive SPC
 
Monte carlo and network cmg'14
Monte carlo and network cmg'14Monte carlo and network cmg'14
Monte carlo and network cmg'14
 

Recently uploaded

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 

Recently uploaded (20)

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 

CMG15 Session 525

  • 1. Views and opinions expressed in this presentation are solely of its authors and do not necessarily represent those of Alphabet, Inc or its subsidiaries, including Google, Inc. Select images and formulae are provided with permission from Google, Inc Percentile-Based Approach To Forecasting Workload Growth Alex Gilgur, Douglas Browning, Stephen Gunn, Xiaojun Di, Wei Chen, and Rajesh Krishnaswamy IT Capacity and Performance 41st International Conference by the Computer Measurement Group (CMG'15) San Antonio, TX November 5, 2015 Session 525
  • 2. How often do you see such patterns? How do you predict them?
  • 3. “What's in a name? That which we call a rose / By any other name /Would smell as sweet” ● Useful Workload Measures: ○ Outlier Boundary ○ Mean + “Z” Standard Deviations (e.g. “3 sigma”, “6 sigma”, etc.) ○ 95th percentile ○ 90th percentile ○ 75th percentile ○ Simple Average (Mean) ○ Median ○ 25th percentile ● Define Workload via Little's Law: ○ Number of units of work in the system ■ Number of packets in flight ■ Number of queries in queue ○ W = X * T
  • 4. ● Workload Forecasting ○ When Classical Methods Fail ○ Workload statistics: ■ “Z-sigma” ■ Quantile Regression ● p95 ● Outlier Boundaries ○ Knees, Hyperbolae, and Sensitivity ○ Quantile Compression ○ Predicting the Workload ○ Use Cases What We Will Talk About
  • 5. “It may be normal, darling; but I'd rather be natural.” Truman Capote, Breakfast at Tiffany’s Usual Assumptions: ● Residuals are Normally distributed ● Mean and StDev of residuals are constant
  • 6. “Double, Double, Toil and Trouble”
  • 8. … Predicting 95th Percentile Directly !
  • 9. Why not Split the Workload into Two Servers? ● Sometimes app1 and app2 have to go to the same server for processing. ● To size the server (VM/Network Link/Storage LUN/… ), we only want to forecast the upper bound. ● Sometimes it’s hard to get additional capacity: ○ budget, justification, approvals, etc. ○ Cloud helps, but... Percentile-based Modeling is often the only solution
  • 10. QuantReg for the 2 types of workload fits very nicely in both cases
  • 11. Should we Size Hardware for 95%-ile of Workload? 5% of the time SLA (99.9%? 99.999%?) will be violated
  • 12. IQR vs. 5th and 95th Percentiles IQR excludes “true” outliers IQR (Tukey’s method) 5th and 95th Percentiles
  • 13. Shouldn’t we Size Resources for Non-Outliers Instead? John Tukey’s IQR method ● Why? ○ Normality does not matter ○ 5th & 95th percentiles in this scenario would have “outlawed” a good part of data points that are NOT outliers ● Why Not? ○ Multi-Modal distributions We size for SLA, as long as traffic stays within outlier boundaries
  • 14. Tukey’s Boundaries in Trended Data
  • 15. Tukey’s Boundaries in Trended Data
  • 16. Tukey’s Boundaries in Trended Data
  • 17. ● don’t predict based on all data: ○ find natural groupings (GMM, DBSCAN, ...) ○ then fit the model ○ use the higher cluster to guarantee QoS Tukey’s Method ● robust boundaries ● distribution-agnostic ● can be used to guarantee high QoS ○ Unimodal Distribution ○ Multi-Modal Distribution
  • 18. Long Story Short ● Forecast Percentiles ● Find Natural Groupings ● Size For Outlier Boundaries
  • 19. ● Workload Forecasting ○ When Classical Methods Fail ○ Workload statistics: ■ “Z-sigma” ■ Quantile Regression ● p95 ● Outlier Boundaries ○ Knees, Hyperbolae, and Sensitivity ○ Quantile Compression ○ Predicting the Workload ○ Use Cases What We Will Talk About Next
  • 20. Knees, Hyperbolae, and Sensitivity Capacity: Workload: In a closed (constrained) system, sensitivity of throughput to latency increases with the throughput
  • 21. In Human Terms ● As throughput increases, latency can only increase. ● As latency increases, throughput in a constrained queueing system can only decrease. ● As we increase throughput in a constrained system near its saturation point, its upper percentiles must grow at a slower pace than lower percentiles.
  • 22. In Mathematical Terms Quantile Compression Theorem: IF raw demand on a constrained system X′ is moderated via a monotonically increasing damped function X = f(X′), THEN, as the system is approaching saturation, smaller percentiles of moderated demand X grow on average faster than higher percentiles. This is only a presentation; for mathematical proof, please see the paper.
  • 23. Long Story Short “It’s just there...” -Miles Davis Quantile Compression: As the system is approaching saturation, smaller percentiles of moderated demand X grow on average faster than higher percentiles.
  • 24. In Practical Terms: are We Constrained? Percentile trajectories diverge; we are NOT constrained here.
  • 25. In Practical Terms: are We Constrained? p5 and p95 trajectories converge; we ARE GETTING constrained here.
  • 26. Percentile trajectories are almost all parallel; we are almost NOT constrained here. In Practical Terms: are We Constrained? “It’s always the quiet ones”
  • 27. In Practical Terms: are We Constrained? Unconstrained Growth Rates: P97.5` > p95` > p75` > p50` p95 trajectory is growing slower than p50; we ARE constrained here.
  • 28. In Practical Terms: are We Constrained? Predictions made: p75` > p95` > p50` > p97.5` Line Predicted by p95 Line Predicted by p50 Line Predicted by p75 Line Predicted by p97.5 Unconstrained Growth Rates: P97.5` > p95` > p75` > p50` p95 trajectory is growing slower than p50; we ARE constrained here.
  • 29. In Practical Terms: are We Constrained? Predictions made: p75` > p95` > p50` > p97.5` Line Predicted by p95 Line Predicted by p50 Line Predicted by p75 Line Predicted by p97.5 Unconstrained Growth Rates: P97.5` > p95` > p75` > p50` Observed Growth Rates: p75` > p95` > p50` > p97.5` p95 trajectory is growing slower than p50; we ARE constrained here.
  • 30. In Statistical Terms: When Resource is Unconstrained Unbounded Resource Throughput: Unimodal; Asymmetric; Skew is Constant
  • 31. Bounded (Constrained) Resource Throughput: may become Bimodal; Skew may vary In Statistical Terms: When Resource is Constrained
  • 32. Can we Measure Asymmetry?
  • 33. Not All Distributions are Easy to Deal With What if... ...Mean and Variance are undefined ?
  • 34. Not All Distributions are Easy to Deal With What if... ...Mean and Variance are undefined ?
  • 35. Percentiles win! What if... ...Mean and Variance are undefined ?
  • 36. Long Story Short When resource is constrained: 1. Distribution changes: a. becomes left-skewed b. becomes bimodal 2. Skew is very important 3. Percentile-based Skew is the preferable statistic
  • 37. Some More Examples that you May Have Seen Before
  • 41. What We Will Talk About Next ● Workload Forecasting ○ When Linear Regression fails ○ Workload statistics: ■ “Z-sigma” ■ Quantile Regression ● p95 ● Outlier Boundaries ○ Knees, Hyperbolae, and Sensitivity ○ Quantile Compression ○ Predicting the Workload ○ Use Cases
  • 42. “None that I know will be, much that I fear may chance” ● Regression: ○ Business Metrics ○ Little's Law ○ Time-related Covariates ● Time Series Analysis (Forecasting): ○ EWMA ○ ARIMA
  • 43. Is it right to Size Resources Using Upper Percentiles of Bounded Data? Forecasting demand using bounded data leads to undersizing the resource Doing so is the path to the dark side. Resource Constraint => Quantile Compression => Underforecasting the load => Undersizing the resource Quantile Compression: As the system is approaching saturation, smaller percentiles of moderated demand X grow on average faster than higher percentiles.
  • 44. Can we infer unbounded lines from bounded data? TimeStamp Skew1. Find Skew for Unbounded Data 2. Forecast Upper and Lower Percentiles to the Time Horizon of Interest 3. Infer Unbounded Upper Percentiles (Skew = const) 4. If (unbounded = forecasted) => system is still unbounded 5. If (unbounded > forecasted & forecast > history) => system will be constrained
  • 45. Throughput Forecasting Algorithm Get U(t)Start Identify the most appropriate trend type Done Predict Trajectories for the LB (p25) and Median (LB’, M’) = Prediction for Low Bounds and Median Save the forecast For each timestamp Build hourly boxplots data Throughput Throughput
  • 46. Identify the most appropriate trend type Throughput LIN Throughput LOG Throughput EXP Throughput QUAD Throughput PWR Throughput R2 = 0.45 R2 = 0.34 R2 = 0.47 R2 = 0.38 R2 = 0.46 Trend Type Selection ● we know the variance is huge ● we are selecting TREND TYPE ● we are NOT selecting MODEL
  • 47. Now we can use T-test A few words about R2
  • 48. Now we can use T-test LIN Throughput R2 = 0.45 QUAD Throughput R2 = 0.46 EXP Throughput R2 = 0.47 LOG Throughput R2 = 0.34 PWR Throughput R2 = 0.38 A few words about R2
  • 49. Trend Type SelectionThroughput LIN Throughput R2 = 0.45 ● we know the variance is huge ● we are selecting TREND TYPE ● we are NOT selecting MODEL MODELS: “LIN”, “PWR”, “EXP”, “LOG”, “QUAD” Identify the most appropriate trend type
  • 50. Long Story Short Forecasting Algorithm: 1. Compute the Skew 2. Identify the Trend Type 3. Forecast p25 and p50 4. Apply Skew to Compute Upper Percentiles 5. Compute Outlier Boundaries
  • 51. ● Workload Forecasting ○ When Classical Methods Fail ○ Workload statistics: ■ “Z-sigma” ■ Quantile Regression ● p95 ● Outlier Boundaries ○ Knees, Hyperbolae, and Sensitivity ○ Quantile Compression ○ Predicting the Workload ○ Use Cases What We Will Talk About Next
  • 52. Throughput Throughput Use Cases: Unbounded: How Far to the Threshold? Threshold 1000 250 Non-Outliers above threshold Pr {traffic > threshold} > 5%
  • 53. Another interesting scenario: Forecasting Resource Congestion Zone
  • 54. Forecasting Resource Congestion Zone By predicting collision points for different percentiles, we can get a general idea of a Resource Congestion Zone HAL9000: I've just picked up a fault in the AE35 unit. It's going to go 100% failure in 72 hours.
  • 55. Use Cases: Unbounded: How Much to Add? (unbounded = forecasted) => system is still unbounded
  • 56. Use Cases: Bounded (Congested): How Much to Add? (unbounded > forecasted) => system was, and will be, constrained
  • 57. Use Cases: Bounded (Congested): How Much to Add? (unbounded > forecasted) => system may have been, and will be, constrained
  • 58. Long Story Short ● Feedback Loop & Quantile Compression: ○ “It’s just there”: ■ explicitly, via the protocol. ■ implicitly, in the saturation dynamics. ● Do not assume anything! ○ Especially about shapes of distributions. ● Do not forecast p95! ○ Forecast Outlier Boundaries instead. ○ Mean and Variance are overrated! ● Do Size Hardware for the would-have-been- unbounded Forecasts
  • 59. Alex Gilgur agilgur@google.com / alexgilgur@gmail.com +1 (408) 475-7582 / +1 (408) 828-2115
  • 61. “Big 7” of Linearizable Equations odds log
  • 62. Is it Right to Size Resources Using Upper Percentiles? Quantile Compression: As the system is approaching saturation, smaller percentiles of moderated demand X grow on average faster than higher percentiles.
  • 63. Is it Right to Size Resources Using Upper Percentiles?
  • 64. Forecasting Methods: ● EWMA ● ARIMA ● Regression EWMA models are very specific and computationally fast, but they have to be told trend (linear or exponential) and seasonality (additive or multiplicative). ARIMA model will implicitly account for trends, seasonality, and stationarity of the data. Autocorrelation of ARIMA residuals provide all the periodicities that have been missed. For stationary data, use ARIMA For non-stationary data, use EWMA EWMA and ARIMA overlap When to use Regression: ● data are monotonic. ● seasonality is NOT statistically significant. ● EWMA and ARIMA fail. When to use Quantile Regression: ● Upper and Lower bounds behave differently. ● Outliers are possible. For each data set, we can run a model competition, computing forecast model quality based on a weighted sum of model goodness of fit, model suitability for forecasting, data stationarity and data variability, and selecting the model that works best for each data set. EWMA ARIMA Quantile Regression