SlideShare a Scribd company logo
1 of 50
Download to read offline
Different Approaches for DifferentTasks
CMG imPACt 2016
42nd International Conference by Computer Measurement Group
Alexander Gilgur, Steve Politis
Session 361
November 08, 2016
LaJolla, CA USA
Steve Politis
Josue Kuri
Alex Nikolaidis
Grace Smith
Yuri Smirnov
Tyler Price
Paul Sorenson
For contributing knowledge, ideas, solutions, and support
What’s the difference
between Performance
and Capacity?
β€’ Two languages in the IT world
β€’ Need tools, metrics, and stats
compatible with both languages
β€’ Need fluency in both languages
β€’ Metrics:
β€’ Time (latency)
β€’ Rate
β€’ Count:
β€’ Packets in flight
β€’ Packet loss
β€’ A few words about %
Utilization
β€’ Models:
β€’ Correlations
β€’ Trends in Data
β€’ Time-Series Analysis
β€’ Approach
β€’ Top-Down?
β€’ Bottom-Up?
β€’ Hybrid?
β€’ Measures and Aggregations:
β€’ πœ‡ + 𝑍	 βˆ— 	𝜎
β€’ Busy Hour/Peak Minute
β€’ Nonparametric Measures:
β€’ P95
β€’ Outlier Boundaries
For Capacity Planning:
β€’ Uncertainty: β€œRedistribution of wealth”
β€’ Distribution Looks Gaussian
β€’ Washout of local anomalies
For Performance Analysis:
β€’ β€œBig Picture”
β€’ Immediate impact assessment
β€’ Drilldown is easier than aggregation:
β€’ No need to worry about which
aggregation function to choose
For Performance Analysis:
β€’ Immediate anomaly detection
β€’ Trend identification
β€’ Practical Significance is unknown
For Capacity Planning:
β€’ ”Just the right” bandwidth
β€’ Actual distributions & trends
β€’ Time Consuming
β€’ Aggregation can get complicated
β€’ Aggregate AZ Pairs (β€œFlows”) for each service (product)β€’ Aggregate services (products) for each AZ Pair
For Infra Performance Analysis:
β€’ Cannot tell where the β€œhot” issues are
For Capacity Planning:
β€’ Can tell how much each svc (product) needs
For Infra Performance Analysis:
β€’ Will ID β€œhot” flows (A-Z Pairs)
For Capacity Planning:
β€’ Will not find β€œhot” services(products)
β€’ Metrics:
β€’ Time (latency)
β€’ Rate
β€’ Count:
β€’ Packets in flight
β€’ Packet loss
β€’ A few words about %
Utilization
β€’ Models:
β€’ Correlations
β€’ Trends in Data
β€’ Time-Series Analysis
β€’ Approach
β€’ Top-Down?
β€’ Bottom-Up?
β€’ Hybrid?
β€’ Measures and Aggregations:
β€’ πœ‡ + 𝑍	 βˆ— 	𝜎
β€’ Busy Hour/Peak Minute
β€’ Nonparametric Measures:
β€’ P95
β€’ Outlier Boundaries
[πœ‡ βˆ’ 𝑍 βˆ— 𝜎, πœ‡ + 𝑍 βˆ— 𝜎]
β€’ The 𝑍 is arbitrary
β€’ Assumptions about the distribution:
β€’ Mean andVariance defined
β€’ Gaussian (Symmetrical)
β€’ Stationary
β€’ No outliers
β€’ Simple math:
β€’ Addition
β€’ Regression
β€’ TSA Forecasting
[πœ‡ βˆ’ 𝑍 βˆ— 𝜎, πœ‡ + 𝑍 βˆ— 𝜎] [πœ‡ βˆ’ 𝑍 𝟏 βˆ— 𝜎, πœ‡ + 𝑍 𝟐 βˆ— 𝜎]
π‘†π‘Žπ‘šπ‘π‘™π‘–π‘›π‘”:
With enough random samples, their
means will be Gaussian
(Central LimitTheorem)
π·π‘Žπ‘‘π‘Ž	π‘‡π‘Ÿπ‘Žπ‘›π‘ π‘“π‘œπ‘Ÿπ‘šπ‘Žπ‘‘π‘–π‘œπ‘›:
β€’ log
β€’ exp
β€’ Box-CoxFor Capacity Planning: For Monitoring:
Busy Hour / Peak Minute
Losing information:
β€’ Can’t identify the day’s outliers.
β€’ Need 3+ wks. of daily peaks to measure p95.
For Performance Monitoring
For Capacity Planning
β€’ π‘†π‘–π‘”π‘›π‘Žπ‘™ ∢ π‘π‘œπ‘–π‘ π‘’ β†’ 0.
β€’ Top-Down forecast is hard to interpret.
β€’ Misses underlying services.
β€’ Hides trends.
If used as an aggregated measure (Top-Down),
β€’ Accurate representation of Multiplexing.
β€’ Drilldown answers the β€œWho is hit the most?” question
β€’ We size for busy-hour traffic.
β€’ Snapshot of service distribution:
β€’ Great forTop-Down approach.
Nonparametric: p95
β€’ Sensitive to Aggregation Level:
β€’ βˆ‘ 𝑝95 (π‘₯G)	β‰  𝑝95(βˆ‘ π‘₯𝑖).
β€’ Have to have 20+ latest data points (𝑝95 ≑
K
LM
)
β€’ How many of these are outliers?
For Performance Monitoring For Capacity Planning
β€’ Summation may lead to oversizing
β€’ Ignores the bulk of the distribution
β€’ We will miss SLA 5% of the time
β€’ Forecasting p95: ergodicity assumption
β€’ Distribution shape does not matter
β€’ OK to have outliers
β€’ Only 5% of data points will cause alerts β€’ Easy to understand
β€’ β€œTradition!”
β€’ Math implemented in R, Python, Matlab, SAS
β€’ even for regression
Should we Size Hardware for π’‘πŸ—πŸ“ ?
5% of the time
SLO (99.9%? 99.999%?)
will be violated
Shouldn’t we Size Resources for Non-Outliers Instead?
We size for SLA, as long as traffic
stays within outlier boundaries
John Tukey’s IQR method:
𝐼𝑄𝑅 = 𝑝75 βˆ’ 𝑝25
πΏπ‘œπ‘€π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ = 𝑝25	 βˆ’ 𝜷 βˆ— 𝐼𝑄𝑅
π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ = 𝑝75 + 𝜷 βˆ— 𝐼𝑄𝑅
β€’ 𝑝95 will β€œoutlaw” NON-outliers
β€’ IFF 𝑝95 < π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘
β€’ Need a β€œsmart” 𝜷
Nonparametric: Outlier Boundaries
β€’ Sensitive to Aggregation Level
β€’ How many of these are real outliers?
For Performance Monitoring For Capacity Planning
β€’ Summation may lead to oversizing
β€’ Ergodicity assumption
β€’ We size for non-outliers
β€’ We guarantee SLA
β€’ Math implemented (R, Python, Matlab, SAS)
β€’ even for regression
β€’ It looks at the bulk of distribution.
β€’ We will NOT miss SLA 5% of the time.
β€’ Distribution shape does not matter
β€’ Need fewer data points than for p95
β€’ Only respond to outliers
A Word of Caution: Outlier Boundaries
π‘†π‘˜π‘’π‘€	 = 	0.13 π‘†π‘˜π‘’π‘€	 = 	0.26
𝑝95 < π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ 𝑝95 > π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘
Use π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ Use 𝑝95?
Split into HI and BULK?
β€’ There is no β€œone size fits all” approach.
β€’ There is no β€œone size fits all” statistic.
β€’ There are common principles:
β€’ Aggregate before computing percentiles.
β€’ Use Outlier Boundaries:
β€’ Performance & Capacity:
β€’ Accounts for the bulk of the data;
β€’ Distribution does not matter;
β€’ Performance:
β€’ Easy to ID outliers;
β€’ Capacity:
β€’ Sizing for Non-Outliers => less $
β€’ Avoid Predefined Percentiles.
β€’ Metrics:
β€’ Time (latency)
β€’ Rate
β€’ Count:
β€’ Packets in flight; Packet Loss; % Utilization
β€’ Models:
β€’ Correlations
β€’ Trends in Data
Coming Next
User Metrics:
β€’ throughput
β€’ latency
β€’ data loss
β€’ data loss & latency
β€’ latency & data loss
TheWhirlpool of Metrics
For Monitoring
Real metrics:
β€’ # of Packets in Flight
β€’ # of Packets in Queue or Lost
For Capacity Planning
Traditional Metrics in Planning:
β€’ π‘‡β„Žπ‘Ÿπ‘œπ‘’π‘”β„Žπ‘π‘’π‘‘ [Gbps]
β€’ πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦hijklm
β€’ πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦hinl = π‘™π‘œπ‘Žπ‘‘π‘–π‘›π‘”	||	π‘Ÿπ‘’π‘Žπ‘ π‘ π‘’π‘šπ‘π‘™π‘¦
β€’ Packets get queued and blocked.
β€’ Bits may be bursty while packets are smooth.
β€’ Reverse statement is true as well.
β€’ πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦hijklm = π‘‘π‘Ÿπ‘Žπ‘›π‘ π‘π‘œπ‘Ÿπ‘‘ + π‘žπ‘’π‘’π‘’π‘’π‘–π‘›π‘”
β€’ Packets need capacity.
β€’ Packet sizes vary => capacity [Gbps]
Example:
320	𝐺𝑏𝑝𝑠 = 26.7𝑀 βˆ—
1.5π‘˜π΅	 βˆ— 8 𝑏𝑖𝑑𝑠
𝑠𝑒𝑐
320	𝐺𝑏𝑝𝑠 = 10 βˆ—
4𝐺𝑖𝐡	 βˆ— 8 𝑏𝑖𝑑𝑠
𝑠𝑒𝑐
Traditional Metrics for Monitoring:
β€’ % Utilization
β€’ Packet Loss Rate
Time, Rate, Count, and Utilization
𝑃 π‘ž = πΈπ‘Ÿπ‘™π‘Žπ‘›π‘”πΆ	(𝑁, 𝐢)
Packet Queueing:
𝑃 𝑏 = πΈπ‘Ÿπ‘™π‘Žπ‘›π‘”π΅	(𝑁, 𝐢)
Packet Blocking:
2012 paper
𝑁 = 𝑃𝑃𝑆	 βˆ— πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦
πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦ =
1
2
βˆ— 𝑅𝑇𝑇 + π‘‡β‚¬β€’β€šβ€š
𝑏𝑝𝑠 = 𝑃𝑃𝑆	 βˆ—
𝑏𝑖𝑑𝑠
π‘π‘Žπ‘π‘˜π‘’π‘‘
𝐢 = π‘šπ‘Ž π‘₯ 𝑃𝑃𝑆 βˆ—
1
2
βˆ— 𝑅𝑇𝑇
2006 paper
(CPU centric)
β€’ Utilization CAN BE useless
β€’ If the metric does not
reflect what it is used for.
links were utilized
near 100% [
€hΖ’
€hΖ’
] but
no packet drops
β€’ There is no β€œone size fits all” approach.
β€’ There is no β€œone size fits all” statistic.
β€’ There are common principles:
β€’ Aggregate before computing percentiles.
β€’ Use a statistic that accounts for the bulk of the data.
β€’ Metric = what’s important to the BW user:
β€’ Quality of Service (QoS):
β€’ Network Latency
β€’ Packets lost
β€’ Models:
β€’ Trend in Data
β€’ Correlation
β€’ Time-Series Analysis
Coming Next
Trend in Data
Performance Monitoring:
β€’ Is this β€œnormal” behavior?
β€’ Will this trend continue?
β€’ High values will be marked as outliers
β€’ Are they?
Dealing with Trends
Option 1:
β€’ Fit in a linear regression
β€’ If it’s a good fit:
β€’ Get distribution of residuals
β€’ Add p95 or outlier boundary of
residuals to regression line
Performance Monitoring:
This results in:
Option 1:
Linear Regression -> Residuals
allows us to detrend the data and
deal with a stationary proxy…
… IFF:
β€’ Residuals are stationary
β€’ Residuals are normal
β€’ Residuals are homoscedastic
Performance Monitoring
If Residuals are Not Normal / Not Homoscedastic?
Linear Regression
does not work
Performance Monitoring:
Plan B: Directly Predict %-iles
Option 2 :
Quantile Regression:
rq (Demand ~ Time)
Performance Monitoring:
β€’ Requires stationary trends
β€’ No need for homoscedasticity
β€’ No need for normality
Using Regression
1. Build regression for 𝐾𝑃𝐼 = 𝑓	(𝐡𝑀);
2. TSA-forecast π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘ ;
3. Predict 𝐾𝑃𝐼	(𝐡𝑀|m βˆ—);
4. Combine with forecast of π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘ 
Capacity Planning:
Problems:
π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘  = 𝑓	(π‘‘π‘–π‘šπ‘’)
πΌπ‘›π‘‘π‘’π‘Ÿπ‘π‘’π‘π‘‘ = 𝑓	(π‘‘π‘–π‘šπ‘’)
Using Quantile Regression Capacity Planning:
Great for most use cases
β€’ No need for homoscedasticity
β€’ No need for normality
β€’ Works for correlatedTime Series
β€’ Requires stationary trends
β€’ Old and New have same weight
Quantile Regression in Performance Monitoring
Using Quantile Regression Performance Monitoring
Compare Model Prediction Ranges for models
built on Baseline and New data sets.
Quantify the change:
Baseline data
Data Shown Here are Generated Exclusively
for this Hypothetical Example
Performance Monitoring Look for Quantile Compression
Congestion Detection and Prediction
Time Series Analysis
Using Time Series Analysis
ETS (Error,Trend, Seasonality) Decomposition
For Performance Monitoring:
1. Fit a Forecasting (ETS) Model
2. Get residuals
3. Identify & Interpret Outliers in Residuals
4. Interpolate (or Predict) Outliers
5. Re-fit Forecasting Model
6. Predict Using the Fitted model
For Capacity Planning:
TSA Forecasting
Autoregressive(ARIMA) || ETS (EWMA) Forecasting
1. Fit a Forecasting Model
2. Get residuals
3. Identify Outliers
4. Interpolate (or Predict) Outliers
5. Re-fit Forecasting Model
6. Predict Using the Fitted model
Issues / Problems / Challenges
How can we account for these variabilities?
β€’ Underlying services have their own plans:
β€’ Growth
β€’ Deprecation
β€’ Relocation
β€’ Supporting infrastructure has its own lifecycle:
β€’ New Product Introduction
β€’ Implementation and Growth
β€’ Depreciation
β€’ Tech Refresh
β€’ Topologies and policies change in time
β€’ Change in policies and topology can lead to
changes in demand
Possible Solutions:
Flow Level
1. Bottom-Up:
β€’ Forecast each service individually;
β€’ Follow up with Monte-Carlo aggregation
𝑆𝑣𝑐1
𝑆𝑣𝑐2
𝑆𝑣𝑐3
Possible Problem:
Different prediction intervals not indicative of different data variability
Advantages:
β€’ Each service’s trend and variability is accounted for.
β€’ Each service’s growth plans are easy to account for.
πΉπ‘™π‘œπ‘€1
1. Bottom-Up:
Forecast each service individually;
follow up with Monte-Carlo aggregation
Possible Problem:
Different prediction intervals not indicative of different data variability
Possible Solutions:
Flow Level
𝑆𝑣𝑐1
𝑆𝑣𝑐2
𝑆𝑣𝑐3
2.Top-Down:
β€’ Forecast the flow.
β€’ Get Distribution of each component’s weight in the flow.
β€’ Compute each component’s demand forecast
Possible Problems:
ComponentWeights can drift in time
Interaction and Contention => β€œunknown unknown”
Possible Solutions:
Flow Level
Solutions:
Estimate ComponentWeights
Account for Quantile Compression
𝑆𝑣𝑐1
𝑆𝑣𝑐2
𝑆𝑣𝑐3
𝑆𝑣𝑐3 𝑆𝑣𝑐2
𝑆𝑣𝑐1
Flow-Level TSA Forecasting
Autoregressive(ARIMA) || ETS (EWMA) Forecasting
1. Fit a Forecasting Model
2. Get residuals
3. Identify Outliers
4. Interpolate (or Predict) Outliers
5. Re-fit Forecasting Model
6. Predict Using the Fitted model
For Capacity Planning:
β€’ TSA is NOT theWhole Story:
β€’ Business Growth is not accounted for
Flow-Level Top-Down Stochastic Problem
Problem:
1. Flow composition varies from day to day.
2. Flow composition also varies within a day.
3. Old components may not be relevant anymore.
4. New components may not have enough history.
Top-Down Forecasting Stochastic Problem Solution
For each Flow
Forecast demand
(next 2 slides)
Stochastic Problem Solution
For each Flow
For each
Service
Identify Services
active in this Flow
Compute Stats:
lower_bound
min
p05
p10
p25
p50
Mean
StDev
p75
p90
p95
p99
max
upper_bound
For each
Hour
Forecast demand
For each Flow
For each
Service
Identify Services
active in this Flow
Compute Stats:
lower_bound
min
p05
p10
p25
p50
Mean
StDev
p75
p90
p95
p99
max
upper_bound
For each
Hour
Compute this Svc’s
weight for this Stat
For each Stat
(long-term means)
Infer unconstrained
weights
(use long-term skew)
Forecast demand
Top-Down Forecasting Stochastic Problem Solution
For each Flow
For each
Service
Identify Services
active in this Flow
Compute Stats:
lower_bound
min
p05
p10
p25
p50
Mean
StDev
p75
p90
p95
p99
max
upper_bound
For each
Hour
Compute this Svc’s
weight for this Stat
For each Stat
(long-term means)
Infer unconstrained
weights
(use long-term skew)
Forecast demand
πΉπ‘π‘ π‘‘β€šΛ†β€°Ε  βˆ— π‘Šπ‘’π‘–π‘”β„Žπ‘‘Ζ’Ε’j
Solution to Top-Down Forecasting Stochastic Problem
This Solves Most of the Problems
β€’ Underlying services have their own plans :
β€’ Growth
β€’ Deprecation
β€’ Relocation
β€’ USE PER-SERVICE DEPENDENCIES
β€’ Supporting infrastructure has its own lifecycle:
β€’ New Product Introduction
β€’ Implementation and Growth
β€’ Depreciation
β€’ Tech Refresh
β€’ USE PER-SERVICE /PER-FLOW DEPENDENCIES
β€’ Topologies and policies change in time
β€’ Change in policies and topology can lead to
changes in demand
β€’ USE DUMMY VARIABLES
Now we can account for these variabilities!
Usefulness depends on:
β€’ Aggregation of Data
β€’ StatMuxing?
β€’ Peak Hour?
β€’ Hourly Stats?
β€’ Forecast Demand based on the Model
β€’ Bottoms-Up
β€’ π‘‡π‘Ÿπ‘Žπ‘“π‘“π‘–π‘ βˆ— π‘„π‘œπ‘† Drives 𝐷𝐢	πΏπ‘œπ‘Žπ‘‘ Drives π‘†π‘π‘Žπ‘π‘’	&	π‘ƒπ‘œπ‘€π‘’π‘Ÿ
β€’ Account for QoS in Demand Forecasting
β€’ Plan for SLO
β€’ DO NOT Assume Anything!
β€’ Especially about Shapes of Distributions.
β€’ Mean andVariance are Overrated!
β€’ So is 𝑝95!
β€’ Use Outlier Boundaries (β€œfences”)
β€’ Size Systems for β€œwould-be-unbounded” forecasts
β€’ DO Use Entire Distribution to be Proactive
agilgur@fb.com / alexgilgur@gmail.com
spolitis@fb.com
All data in this presentation are generated solely for illustration purposes
Select images and formulae are provided with permission from Facebook
Capacity Planning
Is the number of Gbps on a constrained system indicative of demand?
Is it right to forecast upper bound of traffic on a constrained system?
β€’ Use 𝑝25, 𝑝50, and 𝑝75 to compute the π‘†π‘˜π‘’π‘€
β€’ Forecast 𝑝25 and 𝑝50
β€’ Use the π‘†π‘˜π‘’π‘€ to infer forecast of 𝑝75β€’Ε½j‰Žƒmβ€’iGΕ½lβ€’
β€’ Compute the forecast of π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘
Resource Constraint =>
Quantile Compression =>
Underforecasting the load =>
Undersizing the resource
Account for Quantile Compression
What is Quantile Compression?

More Related Content

Similar to Performance OR Capacity #CMGimPACt2016

Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)SolarWinds
Β 
Statistis, Row Counts, Execution Plans and Query Tuning
Statistis, Row Counts, Execution Plans and Query TuningStatistis, Row Counts, Execution Plans and Query Tuning
Statistis, Row Counts, Execution Plans and Query TuningGrant Fritchey
Β 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
Β 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Sunny Mervyne Baa
Β 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsQuantUniversity
Β 
7 QC - NEW.ppt
7 QC - NEW.ppt7 QC - NEW.ppt
7 QC - NEW.pptAmitGajbhiye9
Β 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
Β 
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllisterOSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllisterNETWAYS
Β 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima Pratima Pandey
Β 
Know Your Data: The stats behind your alerts
Know Your Data: The stats behind your alertsKnow Your Data: The stats behind your alerts
Know Your Data: The stats behind your alertsAll Things Open
Β 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
Β 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
Β 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
Β 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processingFEG
Β 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousingMohammedAmeenUlIslam1
Β 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessingKnoldus Inc.
Β 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
Β 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Managementk_tauhid
Β 
Mini datathon
Mini datathonMini datathon
Mini datathonKunal Jain
Β 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
Β 

Similar to Performance OR Capacity #CMGimPACt2016 (20)

Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Β 
Statistis, Row Counts, Execution Plans and Query Tuning
Statistis, Row Counts, Execution Plans and Query TuningStatistis, Row Counts, Execution Plans and Query Tuning
Statistis, Row Counts, Execution Plans and Query Tuning
Β 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Β 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
Β 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
Β 
7 QC - NEW.ppt
7 QC - NEW.ppt7 QC - NEW.ppt
7 QC - NEW.ppt
Β 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
Β 
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllisterOSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
Β 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
Β 
Know Your Data: The stats behind your alerts
Know Your Data: The stats behind your alertsKnow Your Data: The stats behind your alerts
Know Your Data: The stats behind your alerts
Β 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Β 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Β 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Β 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processing
Β 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
Β 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
Β 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
Β 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
Β 
Mini datathon
Mini datathonMini datathon
Mini datathon
Β 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Β 

More from Alex Gilgur

INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...Alex Gilgur
Β 
Informs2020 using machine learning to identify the factors of people's mobi...
Informs2020   using machine learning to identify the factors of people's mobi...Informs2020   using machine learning to identify the factors of people's mobi...
Informs2020 using machine learning to identify the factors of people's mobi...Alex Gilgur
Β 
Informs2019 machine learning and data mining in identification of unhappy c...
Informs2019   machine learning and data mining in identification of unhappy c...Informs2019   machine learning and data mining in identification of unhappy c...
Informs2019 machine learning and data mining in identification of unhappy c...Alex Gilgur
Β 
Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907Alex Gilgur
Β 
Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018Alex Gilgur
Β 
The Curse of P90
The Curse of P90The Curse of P90
The Curse of P90Alex Gilgur
Β 
Data Science and Predictive SPC
Data Science and Predictive SPCData Science and Predictive SPC
Data Science and Predictive SPCAlex Gilgur
Β 
CSP2014 Predictive SPC
CSP2014 Predictive SPCCSP2014 Predictive SPC
CSP2014 Predictive SPCAlex Gilgur
Β 
Monte carlo and network cmg'14
Monte carlo and network cmg'14Monte carlo and network cmg'14
Monte carlo and network cmg'14Alex Gilgur
Β 

More from Alex Gilgur (9)

INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
Β 
Informs2020 using machine learning to identify the factors of people's mobi...
Informs2020   using machine learning to identify the factors of people's mobi...Informs2020   using machine learning to identify the factors of people's mobi...
Informs2020 using machine learning to identify the factors of people's mobi...
Β 
Informs2019 machine learning and data mining in identification of unhappy c...
Informs2019   machine learning and data mining in identification of unhappy c...Informs2019   machine learning and data mining in identification of unhappy c...
Informs2019 machine learning and data mining in identification of unhappy c...
Β 
Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907Erlang capacity for_connections_cmg_1907
Erlang capacity for_connections_cmg_1907
Β 
Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018Measuring Community Resilience: a Bayesian Approach CESUN2018
Measuring Community Resilience: a Bayesian Approach CESUN2018
Β 
The Curse of P90
The Curse of P90The Curse of P90
The Curse of P90
Β 
Data Science and Predictive SPC
Data Science and Predictive SPCData Science and Predictive SPC
Data Science and Predictive SPC
Β 
CSP2014 Predictive SPC
CSP2014 Predictive SPCCSP2014 Predictive SPC
CSP2014 Predictive SPC
Β 
Monte carlo and network cmg'14
Monte carlo and network cmg'14Monte carlo and network cmg'14
Monte carlo and network cmg'14
Β 

Recently uploaded

Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
Β 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
Β 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
Β 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
Β 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
Β 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
Β 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
Β 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
Β 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
Β 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoΓ£o Esperancinha
Β 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
Β 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
Β 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
Β 

Recently uploaded (20)

Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
Β 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
Β 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Β 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Β 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
Β 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
Β 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Β 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
Β 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Β 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Β 
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
Β 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Β 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
Β 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Β 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Β 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
Β 

Performance OR Capacity #CMGimPACt2016

  • 1. Different Approaches for DifferentTasks CMG imPACt 2016 42nd International Conference by Computer Measurement Group Alexander Gilgur, Steve Politis Session 361 November 08, 2016 LaJolla, CA USA
  • 2. Steve Politis Josue Kuri Alex Nikolaidis Grace Smith Yuri Smirnov Tyler Price Paul Sorenson For contributing knowledge, ideas, solutions, and support
  • 3. What’s the difference between Performance and Capacity? β€’ Two languages in the IT world β€’ Need tools, metrics, and stats compatible with both languages β€’ Need fluency in both languages
  • 4. β€’ Metrics: β€’ Time (latency) β€’ Rate β€’ Count: β€’ Packets in flight β€’ Packet loss β€’ A few words about % Utilization β€’ Models: β€’ Correlations β€’ Trends in Data β€’ Time-Series Analysis β€’ Approach β€’ Top-Down? β€’ Bottom-Up? β€’ Hybrid? β€’ Measures and Aggregations: β€’ πœ‡ + 𝑍 βˆ— 𝜎 β€’ Busy Hour/Peak Minute β€’ Nonparametric Measures: β€’ P95 β€’ Outlier Boundaries
  • 5. For Capacity Planning: β€’ Uncertainty: β€œRedistribution of wealth” β€’ Distribution Looks Gaussian β€’ Washout of local anomalies For Performance Analysis: β€’ β€œBig Picture” β€’ Immediate impact assessment β€’ Drilldown is easier than aggregation: β€’ No need to worry about which aggregation function to choose
  • 6. For Performance Analysis: β€’ Immediate anomaly detection β€’ Trend identification β€’ Practical Significance is unknown For Capacity Planning: β€’ ”Just the right” bandwidth β€’ Actual distributions & trends β€’ Time Consuming β€’ Aggregation can get complicated
  • 7. β€’ Aggregate AZ Pairs (β€œFlows”) for each service (product)β€’ Aggregate services (products) for each AZ Pair For Infra Performance Analysis: β€’ Cannot tell where the β€œhot” issues are For Capacity Planning: β€’ Can tell how much each svc (product) needs For Infra Performance Analysis: β€’ Will ID β€œhot” flows (A-Z Pairs) For Capacity Planning: β€’ Will not find β€œhot” services(products)
  • 8. β€’ Metrics: β€’ Time (latency) β€’ Rate β€’ Count: β€’ Packets in flight β€’ Packet loss β€’ A few words about % Utilization β€’ Models: β€’ Correlations β€’ Trends in Data β€’ Time-Series Analysis β€’ Approach β€’ Top-Down? β€’ Bottom-Up? β€’ Hybrid? β€’ Measures and Aggregations: β€’ πœ‡ + 𝑍 βˆ— 𝜎 β€’ Busy Hour/Peak Minute β€’ Nonparametric Measures: β€’ P95 β€’ Outlier Boundaries
  • 9. [πœ‡ βˆ’ 𝑍 βˆ— 𝜎, πœ‡ + 𝑍 βˆ— 𝜎] β€’ The 𝑍 is arbitrary β€’ Assumptions about the distribution: β€’ Mean andVariance defined β€’ Gaussian (Symmetrical) β€’ Stationary β€’ No outliers β€’ Simple math: β€’ Addition β€’ Regression β€’ TSA Forecasting
  • 10. [πœ‡ βˆ’ 𝑍 βˆ— 𝜎, πœ‡ + 𝑍 βˆ— 𝜎] [πœ‡ βˆ’ 𝑍 𝟏 βˆ— 𝜎, πœ‡ + 𝑍 𝟐 βˆ— 𝜎] π‘†π‘Žπ‘šπ‘π‘™π‘–π‘›π‘”: With enough random samples, their means will be Gaussian (Central LimitTheorem) π·π‘Žπ‘‘π‘Ž π‘‡π‘Ÿπ‘Žπ‘›π‘ π‘“π‘œπ‘Ÿπ‘šπ‘Žπ‘‘π‘–π‘œπ‘›: β€’ log β€’ exp β€’ Box-CoxFor Capacity Planning: For Monitoring:
  • 11. Busy Hour / Peak Minute Losing information: β€’ Can’t identify the day’s outliers. β€’ Need 3+ wks. of daily peaks to measure p95. For Performance Monitoring For Capacity Planning β€’ π‘†π‘–π‘”π‘›π‘Žπ‘™ ∢ π‘π‘œπ‘–π‘ π‘’ β†’ 0. β€’ Top-Down forecast is hard to interpret. β€’ Misses underlying services. β€’ Hides trends. If used as an aggregated measure (Top-Down), β€’ Accurate representation of Multiplexing. β€’ Drilldown answers the β€œWho is hit the most?” question β€’ We size for busy-hour traffic. β€’ Snapshot of service distribution: β€’ Great forTop-Down approach.
  • 12. Nonparametric: p95 β€’ Sensitive to Aggregation Level: β€’ βˆ‘ 𝑝95 (π‘₯G) β‰  𝑝95(βˆ‘ π‘₯𝑖). β€’ Have to have 20+ latest data points (𝑝95 ≑ K LM ) β€’ How many of these are outliers? For Performance Monitoring For Capacity Planning β€’ Summation may lead to oversizing β€’ Ignores the bulk of the distribution β€’ We will miss SLA 5% of the time β€’ Forecasting p95: ergodicity assumption β€’ Distribution shape does not matter β€’ OK to have outliers β€’ Only 5% of data points will cause alerts β€’ Easy to understand β€’ β€œTradition!” β€’ Math implemented in R, Python, Matlab, SAS β€’ even for regression
  • 13. Should we Size Hardware for π’‘πŸ—πŸ“ ? 5% of the time SLO (99.9%? 99.999%?) will be violated
  • 14. Shouldn’t we Size Resources for Non-Outliers Instead? We size for SLA, as long as traffic stays within outlier boundaries John Tukey’s IQR method: 𝐼𝑄𝑅 = 𝑝75 βˆ’ 𝑝25 πΏπ‘œπ‘€π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ = 𝑝25 βˆ’ 𝜷 βˆ— 𝐼𝑄𝑅 π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ = 𝑝75 + 𝜷 βˆ— 𝐼𝑄𝑅 β€’ 𝑝95 will β€œoutlaw” NON-outliers β€’ IFF 𝑝95 < π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ β€’ Need a β€œsmart” 𝜷
  • 15. Nonparametric: Outlier Boundaries β€’ Sensitive to Aggregation Level β€’ How many of these are real outliers? For Performance Monitoring For Capacity Planning β€’ Summation may lead to oversizing β€’ Ergodicity assumption β€’ We size for non-outliers β€’ We guarantee SLA β€’ Math implemented (R, Python, Matlab, SAS) β€’ even for regression β€’ It looks at the bulk of distribution. β€’ We will NOT miss SLA 5% of the time. β€’ Distribution shape does not matter β€’ Need fewer data points than for p95 β€’ Only respond to outliers
  • 16. A Word of Caution: Outlier Boundaries π‘†π‘˜π‘’π‘€ = 0.13 π‘†π‘˜π‘’π‘€ = 0.26 𝑝95 < π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ 𝑝95 > π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ Use π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ Use 𝑝95? Split into HI and BULK?
  • 17. β€’ There is no β€œone size fits all” approach. β€’ There is no β€œone size fits all” statistic. β€’ There are common principles: β€’ Aggregate before computing percentiles. β€’ Use Outlier Boundaries: β€’ Performance & Capacity: β€’ Accounts for the bulk of the data; β€’ Distribution does not matter; β€’ Performance: β€’ Easy to ID outliers; β€’ Capacity: β€’ Sizing for Non-Outliers => less $ β€’ Avoid Predefined Percentiles. β€’ Metrics: β€’ Time (latency) β€’ Rate β€’ Count: β€’ Packets in flight; Packet Loss; % Utilization β€’ Models: β€’ Correlations β€’ Trends in Data Coming Next
  • 18. User Metrics: β€’ throughput β€’ latency β€’ data loss β€’ data loss & latency β€’ latency & data loss TheWhirlpool of Metrics For Monitoring Real metrics: β€’ # of Packets in Flight β€’ # of Packets in Queue or Lost For Capacity Planning Traditional Metrics in Planning: β€’ π‘‡β„Žπ‘Ÿπ‘œπ‘’π‘”β„Žπ‘π‘’π‘‘ [Gbps] β€’ πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦hijklm β€’ πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦hinl = π‘™π‘œπ‘Žπ‘‘π‘–π‘›π‘” || π‘Ÿπ‘’π‘Žπ‘ π‘ π‘’π‘šπ‘π‘™π‘¦ β€’ Packets get queued and blocked. β€’ Bits may be bursty while packets are smooth. β€’ Reverse statement is true as well. β€’ πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦hijklm = π‘‘π‘Ÿπ‘Žπ‘›π‘ π‘π‘œπ‘Ÿπ‘‘ + π‘žπ‘’π‘’π‘’π‘’π‘–π‘›π‘” β€’ Packets need capacity. β€’ Packet sizes vary => capacity [Gbps] Example: 320 𝐺𝑏𝑝𝑠 = 26.7𝑀 βˆ— 1.5π‘˜π΅ βˆ— 8 𝑏𝑖𝑑𝑠 𝑠𝑒𝑐 320 𝐺𝑏𝑝𝑠 = 10 βˆ— 4𝐺𝑖𝐡 βˆ— 8 𝑏𝑖𝑑𝑠 𝑠𝑒𝑐 Traditional Metrics for Monitoring: β€’ % Utilization β€’ Packet Loss Rate
  • 19. Time, Rate, Count, and Utilization 𝑃 π‘ž = πΈπ‘Ÿπ‘™π‘Žπ‘›π‘”πΆ (𝑁, 𝐢) Packet Queueing: 𝑃 𝑏 = πΈπ‘Ÿπ‘™π‘Žπ‘›π‘”π΅ (𝑁, 𝐢) Packet Blocking: 2012 paper 𝑁 = 𝑃𝑃𝑆 βˆ— πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦ πΏπ‘Žπ‘‘π‘’π‘›π‘π‘¦ = 1 2 βˆ— 𝑅𝑇𝑇 + π‘‡β‚¬β€’β€šβ€š 𝑏𝑝𝑠 = 𝑃𝑃𝑆 βˆ— 𝑏𝑖𝑑𝑠 π‘π‘Žπ‘π‘˜π‘’π‘‘ 𝐢 = π‘šπ‘Ž π‘₯ 𝑃𝑃𝑆 βˆ— 1 2 βˆ— 𝑅𝑇𝑇 2006 paper (CPU centric) β€’ Utilization CAN BE useless β€’ If the metric does not reflect what it is used for. links were utilized near 100% [ €hΖ’ €hΖ’ ] but no packet drops
  • 20. β€’ There is no β€œone size fits all” approach. β€’ There is no β€œone size fits all” statistic. β€’ There are common principles: β€’ Aggregate before computing percentiles. β€’ Use a statistic that accounts for the bulk of the data. β€’ Metric = what’s important to the BW user: β€’ Quality of Service (QoS): β€’ Network Latency β€’ Packets lost β€’ Models: β€’ Trend in Data β€’ Correlation β€’ Time-Series Analysis Coming Next
  • 21. Trend in Data Performance Monitoring: β€’ Is this β€œnormal” behavior? β€’ Will this trend continue? β€’ High values will be marked as outliers β€’ Are they?
  • 22. Dealing with Trends Option 1: β€’ Fit in a linear regression β€’ If it’s a good fit: β€’ Get distribution of residuals β€’ Add p95 or outlier boundary of residuals to regression line Performance Monitoring:
  • 23. This results in: Option 1: Linear Regression -> Residuals allows us to detrend the data and deal with a stationary proxy… … IFF: β€’ Residuals are stationary β€’ Residuals are normal β€’ Residuals are homoscedastic Performance Monitoring
  • 24. If Residuals are Not Normal / Not Homoscedastic? Linear Regression does not work Performance Monitoring:
  • 25. Plan B: Directly Predict %-iles Option 2 : Quantile Regression: rq (Demand ~ Time) Performance Monitoring: β€’ Requires stationary trends β€’ No need for homoscedasticity β€’ No need for normality
  • 26. Using Regression 1. Build regression for 𝐾𝑃𝐼 = 𝑓 (𝐡𝑀); 2. TSA-forecast π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘ ; 3. Predict 𝐾𝑃𝐼 (𝐡𝑀|m βˆ—); 4. Combine with forecast of π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘  Capacity Planning: Problems: π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘  = 𝑓 (π‘‘π‘–π‘šπ‘’) πΌπ‘›π‘‘π‘’π‘Ÿπ‘π‘’π‘π‘‘ = 𝑓 (π‘‘π‘–π‘šπ‘’)
  • 27. Using Quantile Regression Capacity Planning: Great for most use cases β€’ No need for homoscedasticity β€’ No need for normality β€’ Works for correlatedTime Series β€’ Requires stationary trends β€’ Old and New have same weight
  • 28. Quantile Regression in Performance Monitoring
  • 29. Using Quantile Regression Performance Monitoring Compare Model Prediction Ranges for models built on Baseline and New data sets. Quantify the change: Baseline data Data Shown Here are Generated Exclusively for this Hypothetical Example
  • 30. Performance Monitoring Look for Quantile Compression Congestion Detection and Prediction
  • 32. Using Time Series Analysis ETS (Error,Trend, Seasonality) Decomposition For Performance Monitoring: 1. Fit a Forecasting (ETS) Model 2. Get residuals 3. Identify & Interpret Outliers in Residuals 4. Interpolate (or Predict) Outliers 5. Re-fit Forecasting Model 6. Predict Using the Fitted model For Capacity Planning:
  • 33. TSA Forecasting Autoregressive(ARIMA) || ETS (EWMA) Forecasting 1. Fit a Forecasting Model 2. Get residuals 3. Identify Outliers 4. Interpolate (or Predict) Outliers 5. Re-fit Forecasting Model 6. Predict Using the Fitted model
  • 34. Issues / Problems / Challenges How can we account for these variabilities? β€’ Underlying services have their own plans: β€’ Growth β€’ Deprecation β€’ Relocation β€’ Supporting infrastructure has its own lifecycle: β€’ New Product Introduction β€’ Implementation and Growth β€’ Depreciation β€’ Tech Refresh β€’ Topologies and policies change in time β€’ Change in policies and topology can lead to changes in demand
  • 35. Possible Solutions: Flow Level 1. Bottom-Up: β€’ Forecast each service individually; β€’ Follow up with Monte-Carlo aggregation 𝑆𝑣𝑐1 𝑆𝑣𝑐2 𝑆𝑣𝑐3 Possible Problem: Different prediction intervals not indicative of different data variability Advantages: β€’ Each service’s trend and variability is accounted for. β€’ Each service’s growth plans are easy to account for. πΉπ‘™π‘œπ‘€1
  • 36. 1. Bottom-Up: Forecast each service individually; follow up with Monte-Carlo aggregation Possible Problem: Different prediction intervals not indicative of different data variability Possible Solutions: Flow Level 𝑆𝑣𝑐1 𝑆𝑣𝑐2 𝑆𝑣𝑐3
  • 37. 2.Top-Down: β€’ Forecast the flow. β€’ Get Distribution of each component’s weight in the flow. β€’ Compute each component’s demand forecast Possible Problems: ComponentWeights can drift in time Interaction and Contention => β€œunknown unknown” Possible Solutions: Flow Level Solutions: Estimate ComponentWeights Account for Quantile Compression 𝑆𝑣𝑐1 𝑆𝑣𝑐2 𝑆𝑣𝑐3 𝑆𝑣𝑐3 𝑆𝑣𝑐2 𝑆𝑣𝑐1
  • 38. Flow-Level TSA Forecasting Autoregressive(ARIMA) || ETS (EWMA) Forecasting 1. Fit a Forecasting Model 2. Get residuals 3. Identify Outliers 4. Interpolate (or Predict) Outliers 5. Re-fit Forecasting Model 6. Predict Using the Fitted model For Capacity Planning: β€’ TSA is NOT theWhole Story: β€’ Business Growth is not accounted for
  • 39. Flow-Level Top-Down Stochastic Problem Problem: 1. Flow composition varies from day to day. 2. Flow composition also varies within a day. 3. Old components may not be relevant anymore. 4. New components may not have enough history.
  • 40. Top-Down Forecasting Stochastic Problem Solution For each Flow Forecast demand (next 2 slides)
  • 41. Stochastic Problem Solution For each Flow For each Service Identify Services active in this Flow Compute Stats: lower_bound min p05 p10 p25 p50 Mean StDev p75 p90 p95 p99 max upper_bound For each Hour Forecast demand
  • 42. For each Flow For each Service Identify Services active in this Flow Compute Stats: lower_bound min p05 p10 p25 p50 Mean StDev p75 p90 p95 p99 max upper_bound For each Hour Compute this Svc’s weight for this Stat For each Stat (long-term means) Infer unconstrained weights (use long-term skew) Forecast demand Top-Down Forecasting Stochastic Problem Solution
  • 43. For each Flow For each Service Identify Services active in this Flow Compute Stats: lower_bound min p05 p10 p25 p50 Mean StDev p75 p90 p95 p99 max upper_bound For each Hour Compute this Svc’s weight for this Stat For each Stat (long-term means) Infer unconstrained weights (use long-term skew) Forecast demand πΉπ‘π‘ π‘‘β€šΛ†β€°Ε  βˆ— π‘Šπ‘’π‘–π‘”β„Žπ‘‘Ζ’Ε’j Solution to Top-Down Forecasting Stochastic Problem
  • 44. This Solves Most of the Problems β€’ Underlying services have their own plans : β€’ Growth β€’ Deprecation β€’ Relocation β€’ USE PER-SERVICE DEPENDENCIES β€’ Supporting infrastructure has its own lifecycle: β€’ New Product Introduction β€’ Implementation and Growth β€’ Depreciation β€’ Tech Refresh β€’ USE PER-SERVICE /PER-FLOW DEPENDENCIES β€’ Topologies and policies change in time β€’ Change in policies and topology can lead to changes in demand β€’ USE DUMMY VARIABLES Now we can account for these variabilities! Usefulness depends on: β€’ Aggregation of Data β€’ StatMuxing? β€’ Peak Hour? β€’ Hourly Stats?
  • 45. β€’ Forecast Demand based on the Model β€’ Bottoms-Up β€’ π‘‡π‘Ÿπ‘Žπ‘“π‘“π‘–π‘ βˆ— π‘„π‘œπ‘† Drives 𝐷𝐢 πΏπ‘œπ‘Žπ‘‘ Drives π‘†π‘π‘Žπ‘π‘’ & π‘ƒπ‘œπ‘€π‘’π‘Ÿ β€’ Account for QoS in Demand Forecasting β€’ Plan for SLO β€’ DO NOT Assume Anything! β€’ Especially about Shapes of Distributions. β€’ Mean andVariance are Overrated! β€’ So is 𝑝95! β€’ Use Outlier Boundaries (β€œfences”) β€’ Size Systems for β€œwould-be-unbounded” forecasts β€’ DO Use Entire Distribution to be Proactive
  • 47. All data in this presentation are generated solely for illustration purposes Select images and formulae are provided with permission from Facebook
  • 48.
  • 49. Capacity Planning Is the number of Gbps on a constrained system indicative of demand? Is it right to forecast upper bound of traffic on a constrained system? β€’ Use 𝑝25, 𝑝50, and 𝑝75 to compute the π‘†π‘˜π‘’π‘€ β€’ Forecast 𝑝25 and 𝑝50 β€’ Use the π‘†π‘˜π‘’π‘€ to infer forecast of 𝑝75β€’Ε½j‰Žƒmβ€’iGΕ½lβ€’ β€’ Compute the forecast of π‘ˆπ‘π‘π‘’π‘Ÿπ΅π‘œπ‘’π‘›π‘‘ Resource Constraint => Quantile Compression => Underforecasting the load => Undersizing the resource Account for Quantile Compression
  • 50. What is Quantile Compression?