Gimme More! Supporting User Growth in a Performant and Efficient Fashion

1,585 views
1,482 views

Published on

Published in: Technology, Economy & Finance

Gimme More! Supporting User Growth in a Performant and Efficient Fashion

  1. 1. Gimme More! ! Supporting User Growth in a! Performant and Efficient Fashion Arun Kejariwal, Winston Lee (@arun_kejariwal) (@winstl) Capacity Engineering @ Twitter November 2013 @Twitter 1
  2. 2. User Experience •  Anytime, Anywhere, Any device q  5.2 billion mobile users by 2017 [1] q  More than 10 billion mobile devices/connections by 2017 [1] q  Worldwide mobile data traffic will reach 11.2 exabytes/month by 2017 (13x increase) [1] •  Real-time performance [1] http://newsroom.cisco.com/release/1135354 (Feb. 5, 2013) @Twitter 2
  3. 3. Capacity Planning: Why bother? •  Organic growth q  Over 230M monthly active users [1] •  User engagement •  Evolving product landscape q  Cards, Photos, Vines §  Mobile video will increase 16-fold between 2012 and 2017 [2] •  Events planned or unplanned [1] http://www.sec.gov/Archives/edgar/data/1418091/000119312513400028/d564001ds1a.htm [2] http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-520862.html @Twitter 3
  4. 4. Approaches to Capacity Planning •  Throw hardware at the problem o  How much? o  What kind? (Inventory management etc.) o  Operationally inefficient! •  Reactive approach Bottomline Poor UX @Twitter 4
  5. 5. Systematic Capacity Planning •  Objectives q  Check under-allocation §  Performance §  Availability o  Adversely impact user experience q  Check over-allocation §  Operational efficiency o  Adversely impacts bottomline •  Determine capacity needed proactively via forecasting q  Business metrics q  System resource usage @Twitter 5
  6. 6. Systematic Capacity Planning: Forecasting •  Key questions q  Which data? §  Raw §  Periodic Max §  Moving average q  Data granularity §  Minutely §  Daily o  Depends q  Which model? §  Linear §  Spline §  Holt-Winters Non-Trivial! §  ARIMA @Twitter 6
  7. 7. Good old Linear Regression Linear Regression based Forecast Adjusted R-squared: 0.6062 Raw Data Forecast @Twitter 7
  8. 8. Linear Regression using periodic max Linear Regression Using Maxes based Forecast Adjusted R-squared: 0.5673 Standard Error 2.45x Raw Data Forecast @Twitter 8
  9. 9. Splines •  Smooth Spline q λ: penalty for “wiggliness” Spline based Fitting Raw Data Fitted @Twitter 9
  10. 10. Splines Spline based Forecast Raw Data Forecast @Twitter 10
  11. 11. Splines Boundary 2 Boundary 1 •  Sensitive to nature of time series at the boundary @Twitter 11
  12. 12. Splines – Take 2 Spline based Forecast (Boundary 1) Raw Data Forecast 8.31x higher than end of time series @Twitter 12
  13. 13. Splines – Take 3 Spline based Forecast (Boundary 2) Raw Data Forecast 3.77x higher than end of time series @Twitter 13
  14. 14. Holt-Winters •  Triple exponential smoothing Estimate of linear trend Seasonal correction factors Holt-Winters based Fitting Raw Data Fitted @Twitter 14
  15. 15. Holt-Winters Holt-Winters based Forecast Raw Data Upper 95% CI Forecast @Twitter 15
  16. 16. ARIMA •  Auto-Regressive Integrated Moving Average q  (p, d , q) Moving Average order Integrated order Autoregressive order Autoregressive component Moving Average component @Twitter 16
  17. 17. ARIMA •  Fitting Auto ARIMA based Fitting Raw Data Fitted @Twitter 17
  18. 18. ARIMA – Take 1 ARIMA based Forecast (p, d, q): (0,1,1)(0,1,1)[7] Raw Data Upper 95% CI Forecast @Twitter 18
  19. 19. ARIMA – Take 2 Auto ARIMA based Forecast (p, d, q): (1,1,1)(2,0,0)[7] Raw Data Upper 95% CI Forecast @Twitter 19
  20. 20. Impact of Outliers @Twitter 20
  21. 21. Forecast without outlier @Twitter 21
  22. 22. Good “enough”? @Twitter 22
  23. 23. Impact of “Corrections” @Twitter 23
  24. 24. Implications of data characteristics ARIMA based forecast Raw Data Upper 95% CI Forecast @Twitter 24
  25. 25. Forecast without the boundary case ARIMA based Forecast - Without initial spike Raw Data Upper 95% CI Forecast @Twitter 25
  26. 26. Forecast with truncation ARIMA based Forecast - Truncated and Without initial spike Raw Data Upper 95% CI Forecast @Twitter 26
  27. 27. Lessons learned •  Data fidelity q  Anomalies q  Absence of seasonality •  Modeling q  Never perfect §  Assess forecasting error q  Continuous refinement §  Incoming data stream is dynamic o  Organic growth o  New products o  Behavioral aspect @Twitter 27
  28. 28. Acknowledgements •  Capacity Engineering Team •  Management team @Twitter 28
  29. 29. Join the Flock Like problem solving? Like challenges? Be at cutting Edge Make an impact •  We are hiring!! q  https://twitter.com/JoinTheFlock q  https://twitter.com/jobs q  Contact us: @arun_kejariwal, @winstl @Twitter 29

×