Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! !
Supporting User Growth in a!
Performant and Efficient Fashion
Arun Kejariwal, Winston Lee
(@arun_kejariwal)
(@winstl)
Capacity Engineering @ Twitter
November 2013
@Twitter 1
User Experience
• Anytime, Anywhere, Any device
q 5.2 billion mobile users by 2017 [1]
q More than 10 billion mobile devices/connections by 2017 [1]
q Worldwide mobile data traffic will reach 11.2 exabytes/month by 2017 (13x increase) [1]
• Real-time performance
[1] http://newsroom.cisco.com/release/1135354 (Feb. 5, 2013)
@Twitter 2
Capacity Planning: Why bother?
• Organic growth
q Over 230M monthly active users [1]
• User engagement
• Evolving product landscape
q Cards, Photos, Vines
§ Mobile video will increase 16-fold between 2012 and 2017 [2]
• Events planned or unplanned
[1] http://www.sec.gov/Archives/edgar/data/1418091/000119312513400028/d564001ds1a.htm
[2] http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-520862.html
@Twitter 3
Approaches to Capacity Planning
• Throw hardware at the problem
o How much?
o What kind? (Inventory management etc.)
o Operationally inefficient!
• Reactive approach
Bottomline
Poor UX
@Twitter 4
Systematic Capacity Planning
• Objectives
q Check under-allocation
§ Performance
§ Availability
o
Adversely impact user experience
q Check over-allocation
§ Operational efficiency
o
Adversely impacts bottomline
• Determine capacity needed proactively via forecasting
q Business metrics
q System resource usage
@Twitter 5
Systematic Capacity Planning: Forecasting
• Key questions
q Which data?
§ Raw
§ Periodic Max
§ Moving average
q Data granularity
§ Minutely
§ Daily
o
Depends
q Which model?
§ Linear
§ Spline
§ Holt-Winters
Non-Trivial!
§ ARIMA
@Twitter 6
Good old Linear Regression
Linear Regression based Forecast
Adjusted R-squared: 0.6062
Raw Data
Forecast
@Twitter 7
Linear Regression using periodic max
Linear Regression Using Maxes based Forecast
Adjusted R-squared: 0.5673
Standard Error
2.45x
Raw Data
Forecast
@Twitter 8
Splines – Take 2
Spline based Forecast (Boundary 1)
Raw Data
Forecast
8.31x higher than end of time series
@Twitter 12
Splines – Take 3
Spline based Forecast (Boundary 2)
Raw Data
Forecast
3.77x higher than end of time series
@Twitter 13
Holt-Winters
• Triple exponential smoothing
Estimate of linear trend
Seasonal correction factors
Holt-Winters based Fitting
Raw Data
Fitted
@Twitter 14
ARIMA
• Auto-Regressive Integrated Moving Average
q (p, d , q)
Moving Average order
Integrated order
Autoregressive order
Autoregressive component
Moving Average component
@Twitter 16
Implications of data characteristics
ARIMA based forecast
Raw Data
Upper 95% CI
Forecast
@Twitter 24
Forecast without the boundary case
ARIMA based Forecast -
Without initial spike
Raw Data
Upper 95% CI
Forecast
@Twitter 25
Forecast with truncation
ARIMA based Forecast - Truncated and Without initial spike
Raw Data
Upper 95% CI
Forecast
@Twitter 26
Lessons learned
• Data fidelity
q Anomalies
q Absence of seasonality
• Modeling
q Never perfect
§ Assess forecasting error
q Continuous refinement
§ Incoming data stream is dynamic
o
Organic growth
o
New products
o
Behavioral aspect
@Twitter 27
Join the Flock
Like problem solving?
Like challenges?
Be at cutting Edge
Make an impact
• We are hiring!!
q https://twitter.com/JoinTheFlock
q https://twitter.com/jobs
q Contact us: @arun_kejariwal, @winstl
@Twitter 29