A Systematic Approach to Capacity Planning in the Real World

3,980 views

Published on

The presentation walks through the high level methodology and details some of the statistical apprache

Published in: Technology
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,980
On SlideShare
0
From Embeds
0
Number of Embeds
655
Actions
Shares
0
Downloads
0
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide

A Systematic Approach to Capacity Planning in the Real World

  1. 1. @Twitter | Velocity 2013 1A Systematic Approach to !Capacity Planning in the Real WorldBryce Yan, Arun Kejariwal(@bryce_yan, @arun_kejariwal)Capacity Engineering @ TwitterJune 2013
  2. 2. @Twitter | Velocity 2013 2User Experience•  Anytime, Anywhere, Any device•  Real-time performance•  Additional challenges[2] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf[1] Xu et al. NSDI 2013 - https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final77.pdfFault ToleranceVariability [2]
  3. 3. @Twitter | Velocity 2013 3Approaches to Capacity Planning•  Throw hardware at the problem•  Reactive approacho  How much?o  What kind? (Inventory management etc.)PoorUXBottomline
  4. 4. @Twitter | Velocity 2013 4Capacity Planning is Non-trivial•  Organic growth  Over 200M monthly active users [1]•  Events planned or unplanned  Events/incidents (e.g., Superbowl’13 blackout)  Behavioral responseo  Demographics, Culturalo  Retweets, Photos, Vines  Tax different services/applicationso  Different capacity requests[2] http://arstechnica.com/information-technology/2012/10/hurricane-sandy-takes-data-centers-offline-with-flooding-power-outages/[3] http://www.zdnet.com/amazons-compute-cloud-has-a-networking-hiccup-7000005776/[2, 3][1] https://twitter.com/twitter/status/281051652235087872
  5. 5. @Twitter | Velocity 2013 5Capacity Planning is Non-trivial (cont’d)•  Evolving product development landscape  New features  New products•  New hardware platforms  Purchase pipeline  How much and when to buy – Cost performance trade-off•  Overall goalUser Experience Operational footprint
  6. 6. @Twitter | Velocity 2013 6Capacity Modeling Overview
  7. 7. @Twitter | Velocity 2013 7Capacity Modeling•  Takes core drivers as inputs to generate usage demand  Forecasts the amount of work based on core driver projections•  Relates the work metric to a primary resource to identify the capacitythreshold  Primary resources  Computing power (CPU, RAM)  Storage (disk I/O, disk space)  Network (network bandwidth)•  Generate hardware demand based on the limiting primary resource
  8. 8. @Twitter | Velocity 2013 8Core Drivers•  Underlying business metrics that drive demand for more capacity  Active Users  Tweets per second (TPS)  Favorites per second (FPS)  Requests per second (RPS)•  Normalized by Active Users to isolate user engagement•  Project user engagement and Active Users independently
  9. 9. @Twitter | Velocity 2013 9Active Users aka User Growth Normalized Core Drivers for EngagementCore Drivers (cont’d)PerActiveUserValuesTimeFavoritesRetweetsPoly. (Favorites)Linear (Retweets)ActiveUserCountTimeActiveUsersLinear (ActiveUsers)
  10. 10. @Twitter | Velocity 2013 10Core Drivers (cont’d)TimeUser Growth: Active UsersActiveUsersLinear (ActiveUsers)TimeEngagement: Photos/Active UserPhotosLinear (Photos)TimeCore Driver: Photos per DayPhotosPhotosForecast
  11. 11. @Twitter | Velocity 2013 11Capacity Threshold•  Primary resource scalability threshold  Determined by load testing  Synthetic load  Replaying production traffic  Real-time production traffic  Test systems may be  Isolated replicas of production  Staging systems in production  Production systemsServiceResponseTimeCPUAverage Response Times vs CPUX
  12. 12. @Twitter | Velocity 2013 12Hardware Demand•  Core driver  capacity threshold  scaling formula  server count•  Example  Core driver: Requests per Second  Per server request throughput determined by capacity threshold  Scaling formula for Sizing  Number of Servers = (RPS) / Per Server ThresholdCoreDriver(RPS)/ServerCountTimeRPS (Actuals) RPS (Forecast) # Servers (Actuals) # Servers (Forecast)
  13. 13. @Twitter | Velocity 2013 13Statistical Approach to Capacity Modeling
  14. 14. @Twitter | Velocity 2013 14Capacity Planning Methodology•  Predict expected value based on historical and temporal statistical analysis  Metrics   Average, Standard deviation, 95th, 99th percentile   Techniques  Moving Average – EMA (exponential moving average)  Correlation  β analysis  MACD  Forecasting - ARIMA•  Limitations  Changing usage patterns  Organic growth, behavioral, cultural   Event driven  Super Bowl: How a game would turn out?
  15. 15. @Twitter | Velocity 2013 15Capacity Planning Methodology (contd.)•  Correlation Analysis  Assess the relation between resource metric(s) and core driver  Caution: Correlation does not imply causation Core DriverNetworkCPUTime
  16. 16. @Twitter | Velocity 2013 1610.950.990.980.970.940.8110.890.950.870.980.8610.970.990.880.7510.940.950.810.850.7110.79 1CoreDriver1CoreDriver2CoreDriver3CoreDriver4CoreDriver5CoreDriver6CoreDriver7Core Driver 1Core Driver 2Core Driver 3Core Driver 4Core Driver 5Core Driver 6Core Driver 7Core Driver CorrelationsCapacity Planning Methodology (contd.)•  Correlation matrix   Capture interactions in a Service Oriented Architecture (SOA)  Other Use: User engagement
  17. 17. @Twitter | Velocity 2013 17Rolling CorrelationTimeCapacity Planning Methodology (contd.)•  Correlation varies over time  Growing user base  New products, features•  Rolling correlation analysis – capture time varying nature  Raw times series   EMA  Challenge: What should be the window width?
  18. 18. @Twitter | Velocity 2013 18Capacity Planning Methodology (contd.)•  Relative Growth  How does INTC moves with respect to S&P 500?-6.00%-4.00%-2.00%0.00%2.00%4.00%6.00%8.00%12/13/0812/20/0812/27/081/3/091/10/091/17/091/24/091/31/092/7/092/14/092/21/092/28/093/7/093/14/093/21/093/28/094/4/094/11/094/18/094/25/095/2/095/9/09DailyReturnsS&P 500 INTCβ: 1.35: β Analysis
  19. 19. @Twitter | Velocity 2013 19Capacity Planning Methodology (contd.)0200400600800100012001400160002004006008001000120014001600ResourceCoreDriverTimeCore Driver Resourceβ: 1.08•  Relative Growth:β Analysis   Relative growth of a core driver and a resource driver
  20. 20. @Twitter | Velocity 2013 20Capacity Planning Methodology (contd.)•  β varies over time  New products, features   New metric to logRolling BetaTime
  21. 21. @Twitter | Velocity 2013 21Capacity Planning Methodology (contd.)•  Growth: Detecting breakout  MACD: Moving Average Convergence Divergence  Difference of n- and m-width, n>m, EMA  Diverging EMAso  Commonly used as a buy/sell signal incontext of a stocko  Early detection ofpotential capacity ask "MACD"MACD SignalTime
  22. 22. @Twitter | Velocity 2013 22Acknowledgements•  Winston Lee, Capacity Engineer, Twitter•  Management team
  23. 23. @Twitter | Velocity 2013 23Join the Flock•  We are hiring!!  https://twitter.com/JoinTheFlock  https://twitter.com/jobs  Contact us: @bryce_yan, @arun_kejariwalLike problem solving? Like challenges? Be at cutting Edge Make an impact

×