A Systematic Approach to Capacity Planning in the Real World

Arun Kejariwal
Arun KejariwalStatistical Learning Principal at Machine Zone, Inc.
@Twitter | Velocity 2013 1
A Systematic Approach to !
Capacity Planning in the Real World
Bryce Yan, Arun Kejariwal
(@bryce_yan, @arun_kejariwal)
Capacity Engineering @ Twitter
June 2013
@Twitter | Velocity 2013 2
User Experience
•  Anytime, Anywhere, Any device
•  Real-time performance
•  Additional challenges




[2] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf
[1] Xu et al. NSDI 2013 - https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final77.pdf
Fault Tolerance
Variability [2]
@Twitter | Velocity 2013 3
Approaches to Capacity Planning
•  Throw hardware at the problem
•  Reactive approach


o  How much?
o  What kind? (Inventory management etc.)
PoorUX
Bottomline
@Twitter | Velocity 2013 4
Capacity Planning is Non-trivial
•  Organic growth
  Over 200M monthly active users [1]
•  Events planned or unplanned




  Events/incidents (e.g., Superbowl’13 blackout)
  Behavioral response
o  Demographics, Cultural
o  Retweets, Photos, Vines
  Tax different services/applications
o  Different capacity requests
[2] http://arstechnica.com/information-technology/2012/10/hurricane-sandy-takes-data-centers-offline-with-flooding-power-outages/
[3] http://www.zdnet.com/amazons-compute-cloud-has-a-networking-hiccup-7000005776/
[2, 3]
[1] https://twitter.com/twitter/status/281051652235087872
@Twitter | Velocity 2013 5
Capacity Planning is Non-trivial (cont’d)
•  Evolving product development landscape
  New features
  New products
•  New hardware platforms
  Purchase pipeline
  How much and when to buy – Cost performance trade-off
•  Overall goal



User Experience
 Operational footprint
@Twitter | Velocity 2013 6
Capacity Modeling Overview
@Twitter | Velocity 2013 7
Capacity Modeling
•  Takes core drivers as inputs to generate usage demand
  Forecasts the amount of work based on core driver projections
•  Relates the work metric to a primary resource to identify the capacity
threshold
  Primary resources
  Computing power (CPU, RAM)
  Storage (disk I/O, disk space)
  Network (network bandwidth)
•  Generate hardware demand based on the limiting primary resource
@Twitter | Velocity 2013 8
Core Drivers
•  Underlying business metrics that drive demand for more capacity
  Active Users
  Tweets per second (TPS)
  Favorites per second (FPS)
  Requests per second (RPS)
•  Normalized by Active Users to isolate user engagement
•  Project user engagement and Active Users independently
@Twitter | Velocity 2013 9
Active Users aka User Growth
 Normalized Core Drivers for Engagement
Core Drivers (cont’d)
PerActiveUserValues
Time
Favorites
Retweets
Poly. (Favorites)
Linear (Retweets)
ActiveUserCount
Time
Active
Users
Linear (Active
Users)
@Twitter | Velocity 2013 10
Core Drivers (cont’d)
Time
User Growth: Active Users
Active
Users
Linear (Active
Users)
Time
Engagement: Photos/Active User
Photos
Linear (Photos)
Time
Core Driver: Photos per Day
Photos
Photos
Forecast
@Twitter | Velocity 2013 11
Capacity Threshold
•  Primary resource scalability threshold
  Determined by load testing
  Synthetic load
  Replaying production traffic
  Real-time production traffic
  Test systems may be
  Isolated replicas of production
  Staging systems in production
  Production systems
ServiceResponseTime
CPU
Average Response Times vs CPU
X
@Twitter | Velocity 2013 12
Hardware Demand
•  Core driver  capacity threshold  scaling formula  server count
•  Example
  Core driver: Requests per Second
  Per server request throughput determined by 
capacity threshold
  Scaling formula for Sizing
  Number of Servers = (RPS) / Per Server Threshold
CoreDriver(RPS)/ServerCount
Time
RPS (Actuals)
 RPS (Forecast)
 # Servers (Actuals)
 # Servers (Forecast)
@Twitter | Velocity 2013 13
Statistical Approach to Capacity Modeling
@Twitter | Velocity 2013 14
Capacity Planning Methodology
•  Predict expected value based on historical and temporal statistical analysis
  Metrics 
  Average, Standard deviation, 95th, 99th percentile 
  Techniques
  Moving Average – EMA (exponential moving average)
  Correlation
  β analysis
  MACD
  Forecasting - ARIMA

•  Limitations
  Changing usage patterns
  Organic growth, behavioral, cultural 
  Event driven
  Super Bowl: How a game would turn out?
@Twitter | Velocity 2013 15
Capacity Planning Methodology (contd.)
•  Correlation Analysis
  Assess the relation between resource metric(s) and core driver
  Caution: Correlation does not imply causation 
Core Driver
Network
CPU
Time
@Twitter | Velocity 2013 16
1
0.95
0.99
0.98
0.97
0.94
0.81
1
0.89
0.95
0.87
0.98
0.86
1
0.97
0.99
0.88
0.75
1
0.94
0.95
0.8
1
0.85
0.71
1
0.79 1
CoreDriver1
CoreDriver2
CoreDriver3
CoreDriver4
CoreDriver5
CoreDriver6
CoreDriver7
Core Driver 1
Core Driver 2
Core Driver 3
Core Driver 4
Core Driver 5
Core Driver 6
Core Driver 7
Core Driver Correlations
Capacity Planning Methodology (contd.)
•  Correlation matrix 
  Capture interactions in a Service Oriented Architecture (SOA)
  Other Use: User engagement
@Twitter | Velocity 2013 17
Rolling Correlation
Time
Capacity Planning Methodology (contd.)
•  Correlation varies over time
  Growing user base
  New products, features
•  Rolling correlation analysis – capture time varying nature
  Raw times series 
  EMA
  Challenge: What should be the window width?
@Twitter | Velocity 2013 18
Capacity Planning Methodology (contd.)
•  Relative Growth
  How does INTC moves with respect to S&P 500?
-6.00%
-4.00%
-2.00%
0.00%
2.00%
4.00%
6.00%
8.00%
12/13/08
12/20/08
12/27/08
1/3/09
1/10/09
1/17/09
1/24/09
1/31/09
2/7/09
2/14/09
2/21/09
2/28/09
3/7/09
3/14/09
3/21/09
3/28/09
4/4/09
4/11/09
4/18/09
4/25/09
5/2/09
5/9/09
DailyReturns
S&P 500 
 INTC
β: 1.35
: β Analysis
@Twitter | Velocity 2013 19
Capacity Planning Methodology (contd.)
0
200
400
600
800
1000
1200
1400
1600
0
200
400
600
800
1000
1200
1400
1600
Resource
CoreDriver
Time
Core Driver
 Resource
β: 1.08
•  Relative Growth:β Analysis 
  Relative growth of a core driver and a resource driver
@Twitter | Velocity 2013 20
Capacity Planning Methodology (contd.)
•  β varies over time
  New products, features 
  New metric to log
Rolling Beta
Time
@Twitter | Velocity 2013 21
Capacity Planning Methodology (contd.)
•  Growth: Detecting breakout
  MACD: Moving Average Convergence Divergence
  Difference of n- and m-width, n>m, EMA
  Diverging EMAs
o  Commonly used as a 

buy/sell signal in

context of a stock
o  Early detection of

potential capacity ask 
"MACD"
MACD Signal
Time
@Twitter | Velocity 2013 22
Acknowledgements
•  Winston Lee, Capacity Engineer, Twitter
•  Management team
@Twitter | Velocity 2013 23
Join the Flock
•  We are hiring!!
  https://twitter.com/JoinTheFlock
  https://twitter.com/jobs
  Contact us: @bryce_yan, @arun_kejariwal
Like problem solving? 
 Like challenges? 
 Be at cutting Edge 
 Make an impact
1 of 23

Recommended

Techniques for Minimizing Cloud Footprint by
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
1.4K views17 slides
Isolating Events from the Fail Whale by
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail WhaleArun Kejariwal
2K views33 slides
Gimme More! Supporting User Growth in a Performant and Efficient Fashion by
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionArun Kejariwal
2.3K views29 slides
Finding bad apples early: Minimizing performance impact by
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
1.1K views30 slides
Data Data Everywhere: Not An Insight to Take Action Upon by
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
1.5K views37 slides
Intelligent Production: Deploying IoT and cloud-based machine learning to opt... by
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Amazon Web Services
2.4K views31 slides

More Related Content

What's hot

MapR Edge : Act Locally Learn Globally by
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
695 views18 slides
High Performance Computing by
High Performance ComputingHigh Performance Computing
High Performance ComputingNous Infosystems
279 views7 slides
What's New in 6.3 + Data On-Boarding by
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingSplunk
768 views46 slides
Going Server-less for Web-Services that need to Crunch Large Volumes of Data by
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
176 views29 slides
Splunk Ninjas: New Features and Search Dojo by
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk
859 views49 slides
Anomaly detection in real-time data streams using Heron by
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronArun Kejariwal
4.7K views49 slides

What's hot(20)

MapR Edge : Act Locally Learn Globally by ridhav
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
ridhav695 views
What's New in 6.3 + Data On-Boarding by Splunk
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-Boarding
Splunk768 views
Going Server-less for Web-Services that need to Crunch Large Volumes of Data by Denis C. Bauer
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Denis C. Bauer176 views
Splunk Ninjas: New Features and Search Dojo by Splunk
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search Dojo
Splunk859 views
Anomaly detection in real-time data streams using Heron by Arun Kejariwal
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using Heron
Arun Kejariwal4.7K views
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve... by Amazon Web Services
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
How novel compute technology transforms life science research by Denis C. Bauer
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
Denis C. Bauer350 views
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat... by Mathieu Dumoulin
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Mathieu Dumoulin1.2K views
IRJET- Optimization of Completion Time through Efficient Resource Allocation ... by IRJET Journal
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
IRJET Journal9 views
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic... by Vinu Charanya
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
Vinu Charanya203 views
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea... by MapR Technologies
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies1.4K views
Keynote 1 the rise of stream processing for data management & micro serv... by Sabri Skhiri
Keynote 1  the rise of stream processing for data management & micro serv...Keynote 1  the rise of stream processing for data management & micro serv...
Keynote 1 the rise of stream processing for data management & micro serv...
Sabri Skhiri402 views
Data-Drive DevOps: Mining Machine Data for "Metrics that Matter" by Splunk
Data-Drive DevOps: Mining Machine Data for "Metrics that Matter"Data-Drive DevOps: Mining Machine Data for "Metrics that Matter"
Data-Drive DevOps: Mining Machine Data for "Metrics that Matter"
Splunk4.4K views
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove... by Spark Summit
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit2.2K views
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018 by Sri Ambati
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati460 views
Using Cloud CAE Delivered by AWS HPC to Optimize Next-Gen Medical Devices - B... by Amazon Web Services
Using Cloud CAE Delivered by AWS HPC to Optimize Next-Gen Medical Devices - B...Using Cloud CAE Delivered by AWS HPC to Optimize Next-Gen Medical Devices - B...
Using Cloud CAE Delivered by AWS HPC to Optimize Next-Gen Medical Devices - B...
Complex event processing platform handling millions of users - Krzysztof Zarz... by GetInData
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
GetInData66 views
SplunkLive! - Splunk for IT Operations by Splunk
SplunkLive! - Splunk for IT OperationsSplunkLive! - Splunk for IT Operations
SplunkLive! - Splunk for IT Operations
Splunk1.2K views

Viewers also liked

Design+Performance Velocity 2015 by
Design+Performance Velocity 2015Design+Performance Velocity 2015
Design+Performance Velocity 2015Steve Souders
18.3K views59 slides
Days In Green (DIG): Forecasting the life of a healthy service by
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
793 views32 slides
Com t'ho explico by
Com t'ho explicoCom t'ho explico
Com t'ho explicoCESIRE - Dept d'Educació - GENCAT
642 views40 slides
Velocity 2015-final by
Velocity 2015-finalVelocity 2015-final
Velocity 2015-finalArun Kejariwal
2.1K views40 slides
Days In Green : Forecasting the Life of a Healthy Service @Twitter by
Days In Green : Forecasting the Life of a Healthy Service @TwitterDays In Green : Forecasting the Life of a Healthy Service @Twitter
Days In Green : Forecasting the Life of a Healthy Service @TwitterVibhav Garg
2K views32 slides
Tric y Trake 15 junio 1967 by
Tric y Trake  15 junio 1967Tric y Trake  15 junio 1967
Tric y Trake 15 junio 1967Martin Alberto Belaustegui
319 views82 slides

Viewers also liked(20)

Design+Performance Velocity 2015 by Steve Souders
Design+Performance Velocity 2015Design+Performance Velocity 2015
Design+Performance Velocity 2015
Steve Souders18.3K views
Days In Green (DIG): Forecasting the life of a healthy service by Arun Kejariwal
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
Arun Kejariwal793 views
Days In Green : Forecasting the Life of a Healthy Service @Twitter by Vibhav Garg
Days In Green : Forecasting the Life of a Healthy Service @TwitterDays In Green : Forecasting the Life of a Healthy Service @Twitter
Days In Green : Forecasting the Life of a Healthy Service @Twitter
Vibhav Garg2K views
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit... by Piyush Kumar
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...
Piyush Kumar4.5K views
A Tool for Practical Garbage Collection Analysis In the Cloud by Arun Kejariwal
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
Arun Kejariwal3.4K views
Metrics, Metrics Everywhere (but where the heck do you start?) by SOASTA
Metrics, Metrics Everywhere (but where the heck do you start?)Metrics, Metrics Everywhere (but where the heck do you start?)
Metrics, Metrics Everywhere (but where the heck do you start?)
SOASTA4.3K views
Location Planning and Analysis by Iza Marie
Location Planning and AnalysisLocation Planning and Analysis
Location Planning and Analysis
Iza Marie 32.5K views
Simple Log Analysis and Trending by Mike Brittain
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
Mike Brittain10.5K views
Statistical Learning Based Anomaly Detection @ Twitter by Arun Kejariwal
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
Arun Kejariwal5.1K views
6. process selection and facility layout by Sudipta Saha
6. process selection and facility layout6. process selection and facility layout
6. process selection and facility layout
Sudipta Saha19.9K views

Similar to A Systematic Approach to Capacity Planning in the Real World

Re-Platforming Applications for the Cloud by
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudCarter Wickstrom
181 views34 slides
Automated Discovery of Performance Regressions in Enterprise Applications by
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsSAIL_QU
229 views58 slides
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines by
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
716 views30 slides
Shikha fdp 62_14july2017 by
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Dr. Shikha Mehta
214 views37 slides
Druid @ branch by
Druid @ branch Druid @ branch
Druid @ branch Biswajit Das
1.5K views15 slides
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra... by
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET Journal
26 views3 slides

Similar to A Systematic Approach to Capacity Planning in the Real World(20)

Re-Platforming Applications for the Cloud by Carter Wickstrom
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the Cloud
Carter Wickstrom181 views
Automated Discovery of Performance Regressions in Enterprise Applications by SAIL_QU
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
SAIL_QU229 views
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines by DATAVERSITY
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY716 views
Druid @ branch by Biswajit Das
Druid @ branch Druid @ branch
Druid @ branch
Biswajit Das1.5K views
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra... by IRJET Journal
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET Journal26 views
performancetestinganoverview-110206071921-phpapp02.pdf by MAshok10
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
MAshok106 views
Innovate2010 jazz keynote by oslc
Innovate2010 jazz keynoteInnovate2010 jazz keynote
Innovate2010 jazz keynote
oslc1.2K views
Web Performance Bootcamp 2014 by Daniel Austin
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
Daniel Austin1.1K views
Chapter 10 by bodo-con
Chapter 10Chapter 10
Chapter 10
bodo-con2.4K views
All about that reactive ui by Paul van Zyl
All about that reactive uiAll about that reactive ui
All about that reactive ui
Paul van Zyl351 views
3 Keys to Performance Testing at the Speed of Agile by Neotys
3 Keys to Performance Testing at the Speed of Agile3 Keys to Performance Testing at the Speed of Agile
3 Keys to Performance Testing at the Speed of Agile
Neotys36 views
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog by Redis Labs
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs2K views
FlorenceAI: Reinventing Data Science at Humana by Databricks
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks468 views
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics by Amazon Web Services
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
Amazon Web Services5.5K views
Web Performance BootCamp 2013 by Daniel Austin
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
Daniel Austin1.7K views
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H... by Data Con LA
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA291 views
A Novel Dynamic Priority Based Job Scheduling Approach for Cloud Environment by IRJET Journal
A Novel Dynamic Priority Based Job Scheduling Approach for Cloud EnvironmentA Novel Dynamic Priority Based Job Scheduling Approach for Cloud Environment
A Novel Dynamic Priority Based Job Scheduling Approach for Cloud Environment
IRJET Journal24 views

More from Arun Kejariwal

Anomaly Detection At The Edge by
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
581 views54 slides
Serverless Streaming Architectures and Algorithms for the Enterprise by
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
2.8K views227 slides
Sequence-to-Sequence Modeling for Time Series by
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
3.2K views64 slides
Sequence-to-Sequence Modeling for Time Series by
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
1.9K views45 slides
Model Serving via Pulsar Functions by
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
1.7K views44 slides
Designing Modern Streaming Data Applications by
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
2.6K views227 slides

More from Arun Kejariwal(11)

Anomaly Detection At The Edge by Arun Kejariwal
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
Arun Kejariwal581 views
Serverless Streaming Architectures and Algorithms for the Enterprise by Arun Kejariwal
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
Arun Kejariwal2.8K views
Sequence-to-Sequence Modeling for Time Series by Arun Kejariwal
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal3.2K views
Sequence-to-Sequence Modeling for Time Series by Arun Kejariwal
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal1.9K views
Model Serving via Pulsar Functions by Arun Kejariwal
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
Arun Kejariwal1.7K views
Designing Modern Streaming Data Applications by Arun Kejariwal
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
Arun Kejariwal2.6K views
Correlation Analysis on Live Data Streams by Arun Kejariwal
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
Arun Kejariwal321 views
Deep Learning for Time Series Data by Arun Kejariwal
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
Arun Kejariwal1.7K views
Correlation Analysis on Live Data Streams by Arun Kejariwal
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
Arun Kejariwal2.1K views
Real Time Analytics: Algorithms and Systems by Arun Kejariwal
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
Arun Kejariwal23K views

Recently uploaded

Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
145 views13 slides
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueShapeBlue
137 views13 slides
"Running students' code in isolation. The hard way", Yurii Holiuk by
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
36 views34 slides
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023BookNet Canada
44 views19 slides
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...ShapeBlue
108 views12 slides
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
164 views13 slides

Recently uploaded(20)

Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue137 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue108 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue247 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue139 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue141 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage57 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue183 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 views

A Systematic Approach to Capacity Planning in the Real World

  • 1. @Twitter | Velocity 2013 1 A Systematic Approach to ! Capacity Planning in the Real World Bryce Yan, Arun Kejariwal (@bryce_yan, @arun_kejariwal) Capacity Engineering @ Twitter June 2013
  • 2. @Twitter | Velocity 2013 2 User Experience •  Anytime, Anywhere, Any device •  Real-time performance •  Additional challenges [2] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf [1] Xu et al. NSDI 2013 - https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final77.pdf Fault Tolerance Variability [2]
  • 3. @Twitter | Velocity 2013 3 Approaches to Capacity Planning •  Throw hardware at the problem •  Reactive approach o  How much? o  What kind? (Inventory management etc.) PoorUX Bottomline
  • 4. @Twitter | Velocity 2013 4 Capacity Planning is Non-trivial •  Organic growth   Over 200M monthly active users [1] •  Events planned or unplanned   Events/incidents (e.g., Superbowl’13 blackout)   Behavioral response o  Demographics, Cultural o  Retweets, Photos, Vines   Tax different services/applications o  Different capacity requests [2] http://arstechnica.com/information-technology/2012/10/hurricane-sandy-takes-data-centers-offline-with-flooding-power-outages/ [3] http://www.zdnet.com/amazons-compute-cloud-has-a-networking-hiccup-7000005776/ [2, 3] [1] https://twitter.com/twitter/status/281051652235087872
  • 5. @Twitter | Velocity 2013 5 Capacity Planning is Non-trivial (cont’d) •  Evolving product development landscape   New features   New products •  New hardware platforms   Purchase pipeline   How much and when to buy – Cost performance trade-off •  Overall goal User Experience Operational footprint
  • 6. @Twitter | Velocity 2013 6 Capacity Modeling Overview
  • 7. @Twitter | Velocity 2013 7 Capacity Modeling •  Takes core drivers as inputs to generate usage demand   Forecasts the amount of work based on core driver projections •  Relates the work metric to a primary resource to identify the capacity threshold   Primary resources   Computing power (CPU, RAM)   Storage (disk I/O, disk space)   Network (network bandwidth) •  Generate hardware demand based on the limiting primary resource
  • 8. @Twitter | Velocity 2013 8 Core Drivers •  Underlying business metrics that drive demand for more capacity   Active Users   Tweets per second (TPS)   Favorites per second (FPS)   Requests per second (RPS) •  Normalized by Active Users to isolate user engagement •  Project user engagement and Active Users independently
  • 9. @Twitter | Velocity 2013 9 Active Users aka User Growth Normalized Core Drivers for Engagement Core Drivers (cont’d) PerActiveUserValues Time Favorites Retweets Poly. (Favorites) Linear (Retweets) ActiveUserCount Time Active Users Linear (Active Users)
  • 10. @Twitter | Velocity 2013 10 Core Drivers (cont’d) Time User Growth: Active Users Active Users Linear (Active Users) Time Engagement: Photos/Active User Photos Linear (Photos) Time Core Driver: Photos per Day Photos Photos Forecast
  • 11. @Twitter | Velocity 2013 11 Capacity Threshold •  Primary resource scalability threshold   Determined by load testing   Synthetic load   Replaying production traffic   Real-time production traffic   Test systems may be   Isolated replicas of production   Staging systems in production   Production systems ServiceResponseTime CPU Average Response Times vs CPU X
  • 12. @Twitter | Velocity 2013 12 Hardware Demand •  Core driver  capacity threshold  scaling formula  server count •  Example   Core driver: Requests per Second   Per server request throughput determined by capacity threshold   Scaling formula for Sizing   Number of Servers = (RPS) / Per Server Threshold CoreDriver(RPS)/ServerCount Time RPS (Actuals) RPS (Forecast) # Servers (Actuals) # Servers (Forecast)
  • 13. @Twitter | Velocity 2013 13 Statistical Approach to Capacity Modeling
  • 14. @Twitter | Velocity 2013 14 Capacity Planning Methodology •  Predict expected value based on historical and temporal statistical analysis   Metrics   Average, Standard deviation, 95th, 99th percentile   Techniques   Moving Average – EMA (exponential moving average)   Correlation   β analysis   MACD   Forecasting - ARIMA •  Limitations   Changing usage patterns   Organic growth, behavioral, cultural   Event driven   Super Bowl: How a game would turn out?
  • 15. @Twitter | Velocity 2013 15 Capacity Planning Methodology (contd.) •  Correlation Analysis   Assess the relation between resource metric(s) and core driver   Caution: Correlation does not imply causation Core Driver Network CPU Time
  • 16. @Twitter | Velocity 2013 16 1 0.95 0.99 0.98 0.97 0.94 0.81 1 0.89 0.95 0.87 0.98 0.86 1 0.97 0.99 0.88 0.75 1 0.94 0.95 0.8 1 0.85 0.71 1 0.79 1 CoreDriver1 CoreDriver2 CoreDriver3 CoreDriver4 CoreDriver5 CoreDriver6 CoreDriver7 Core Driver 1 Core Driver 2 Core Driver 3 Core Driver 4 Core Driver 5 Core Driver 6 Core Driver 7 Core Driver Correlations Capacity Planning Methodology (contd.) •  Correlation matrix   Capture interactions in a Service Oriented Architecture (SOA)   Other Use: User engagement
  • 17. @Twitter | Velocity 2013 17 Rolling Correlation Time Capacity Planning Methodology (contd.) •  Correlation varies over time   Growing user base   New products, features •  Rolling correlation analysis – capture time varying nature   Raw times series   EMA   Challenge: What should be the window width?
  • 18. @Twitter | Velocity 2013 18 Capacity Planning Methodology (contd.) •  Relative Growth   How does INTC moves with respect to S&P 500? -6.00% -4.00% -2.00% 0.00% 2.00% 4.00% 6.00% 8.00% 12/13/08 12/20/08 12/27/08 1/3/09 1/10/09 1/17/09 1/24/09 1/31/09 2/7/09 2/14/09 2/21/09 2/28/09 3/7/09 3/14/09 3/21/09 3/28/09 4/4/09 4/11/09 4/18/09 4/25/09 5/2/09 5/9/09 DailyReturns S&P 500 INTC β: 1.35 : β Analysis
  • 19. @Twitter | Velocity 2013 19 Capacity Planning Methodology (contd.) 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 Resource CoreDriver Time Core Driver Resource β: 1.08 •  Relative Growth:β Analysis   Relative growth of a core driver and a resource driver
  • 20. @Twitter | Velocity 2013 20 Capacity Planning Methodology (contd.) •  β varies over time   New products, features   New metric to log Rolling Beta Time
  • 21. @Twitter | Velocity 2013 21 Capacity Planning Methodology (contd.) •  Growth: Detecting breakout   MACD: Moving Average Convergence Divergence   Difference of n- and m-width, n>m, EMA   Diverging EMAs o  Commonly used as a buy/sell signal in context of a stock o  Early detection of potential capacity ask "MACD" MACD Signal Time
  • 22. @Twitter | Velocity 2013 22 Acknowledgements •  Winston Lee, Capacity Engineer, Twitter •  Management team
  • 23. @Twitter | Velocity 2013 23 Join the Flock •  We are hiring!!   https://twitter.com/JoinTheFlock   https://twitter.com/jobs   Contact us: @bryce_yan, @arun_kejariwal Like problem solving? Like challenges? Be at cutting Edge Make an impact