•

6 likes•2,772 views

Report

Share

Tutorial on time series analysis , modeling and forecasting.

- 1. Time Series Modeling and Forecasting Mohamed Baddar Data scientist - Careem Networks GmbH mbaddar2@gmail.com https://twitter.com/mbaddar2 https://mbaddards.blogspot.com/
- 2. Contents A gentle introduction for time series analysis and forecasting, in this session we introduce: 1. Application for time series analysis and forecasting 2. Visual analysis of time series 3. Stationarity of time series 4. Different Time Series models 5. Time series data preprocessing 6. Building Time Series Model 7. Evaluating Models 8. Demo 9. Questions 2
- 3. 1. Applications of time series analysis ● Economic/Sales forecasting ● Stock market analysis ● User Behavior analysis ● Process and quality control ● Inventory Studies ● Weather Forecasting ● Workload projections ● Census Analysis 3 References: http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc41.htm
- 4. The Main idea behind time series analysis step: Systematically Isolating each “assumed” component of the time series and identify it as a “reproducible pattern” till the remainder becomes unexplainable white noise 4
- 5. 2. Visual Analysis Human eye and brain are one of the most complex analysis tools What quick answers can we get from visual analysis ● Is there any abnormal values (outliers) ● Any detectable trend, seasonality ● For cross time series analysis (are the two serieses behaving in the same way) 5
- 6. 4. Stationarity of a Time Series ● Strict stationarity ○ Joint distribution of (Y_t, Y_t+1, … Y_t+n) is the same as (Y_t+k,Y_t+1+k, … Y_t+n+k) Where 0<=t,n,k<=N , t+n+k<=N ○ In other words , the joint distribution of a n-window of the data doesn’t change as the window moves over the series ○ Practically , hard to estimate the complete joint distribution of the moving n-windows, so practically we use mean and variance ● Weak stationarity ○ u and sigma are the same for the window n at different K’s (formula to add) ○ Our next objective is to “Stationarize the time series” ■ Making mean stable over time ■ Making variance stable over time 6
- 7. 5. Different Time Series Models ● Error , Trend , Seasonality Model ○ Y = F(E,T,S) ○ Different combinations ○ Additive : Y = T+S+E ○ Multiplicative Y = T.S.E ● ARIMA model ○ Main signal modeled as Autoregressive Moving average integrated model ■ AR : ■ MA : ■ I 7
- 8. Time Series Data Transformation ● Make series stationary (as possible) ● Log transformation ○ Y_t = B0*u_t*S_t ○ Log(Y_t) = Log(B0)+log(u_t)+log(S_t) ○ Log makes variance stable (independent from mean value or trend) ● Box-cox transformation 8
- 9. Moving average ● Help reduce outliers effect and transient fluctuations 9
- 10. Differencing ● For discrete data is the same as differentiation for continuous data ● Idea is to remove nth order trend with nth order differencing ● Practically applying differencing 2 times is sufficient to stationarize the time series 10
- 11. Seasonality adjustment ● Trend Seasonal Error (ETS) model ○ Estimate seasonal index for each period (month of year) ○ De-seasonalize by either subtracting or dividing by seasonal index 11
- 12. Holt-Winters seasonal method ● Model trend and seasonality using exponential smoothing ● Three main components: ○ Level ○ Trend ○ Seasonal 12
- 13. 13 Holt Winter modeling and forecast example (R Air-Passengers Data) Fitted Model Sample forecast Predictes vs test
- 14. Holt Winter modeling and forecast example (AirPassengers Data) ● Default parameters used ● Forecast for future 12 months ● Test data = 1 year (should be >= on period) ● RMSE used to test performance ● RMSE to naive forecast (simple random walk) can be used as a benchmark ● More details in the notebook 14
- 15. General Approach for time series modeling and forecasting 1. Plotting 2. Check stationarity 3. Data Transformations 4. Seasonal , trend decomposition 5. AR and MA model checking 6. Residual checking 7. Building time series model a. Separate data into train and test b. Build model and forecast based on train data c. Apply performance measures on test data 8. Grid search and rolling origin usually used to get the best model 15