Upcoming SlideShare
×

# Forecasting without forecasters

2,465 views
2,338 views

Published on

Keynote talk given at the International Symposium on Forecasting, Seoul, South Korea. 25 June 2013

6 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
2,465
On SlideShare
0
From Embeds
0
Number of Embeds
716
Actions
Shares
0
169
0
Likes
6
Embeds 0
No embeds

No notes for slide

### Forecasting without forecasters

1. 1. 1 Rob J Hyndman Forecasting without forecasters
2. 2. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Motivation 2
3. 3. Motivation Forecasting without forecasters Motivation 3
4. 4. Motivation Forecasting without forecasters Motivation 3
5. 5. Motivation Forecasting without forecasters Motivation 3
6. 6. Motivation Forecasting without forecasters Motivation 3
7. 7. Motivation Forecasting without forecasters Motivation 3
8. 8. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Speciﬁcations Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
9. 9. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Speciﬁcations Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
10. 10. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Speciﬁcations Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
11. 11. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Speciﬁcations Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
12. 12. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Speciﬁcations Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
13. 13. Example: Asian sheep Forecasting without forecasters Motivation 5 Numbers of sheep in Asia Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
14. 14. Example: Asian sheep Forecasting without forecasters Motivation 5 Automatic ETS forecasts Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
15. 15. Example: Cortecosteroid sales Forecasting without forecasters Motivation 6 Monthly cortecosteroid drug sales in Australia Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
16. 16. Example: Cortecosteroid sales Forecasting without forecasters Motivation 6 Automatic ARIMA forecasts Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
17. 17. M3 competition Forecasting without forecasters Motivation 7
18. 18. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
19. 19. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
20. 20. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
21. 21. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
22. 22. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Exponential smoothing 8
23. 23. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY.
24. 24. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “Unfortunately, exponential smoothing methods do not allow the easy calculation of prediction intervals.” (MWH, p.177) ¯ No satisfactory way to select an exponential smoothing method.
25. 25. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “Unfortunately, exponential smoothing methods do not allow the easy calculation of prediction intervals.” (MWH, p.177) ¯ No satisfactory way to select an exponential smoothing method.
26. 26. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. Current Reference Hyndman and Athanasopoulos (2013) Forecasting: principles and practice, OTexts: Australia. OTexts.com/fpp.
27. 27. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M Forecasting without forecasters Exponential smoothing 10
28. 28. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing Forecasting without forecasters Exponential smoothing 10
29. 29. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Forecasting without forecasters Exponential smoothing 10
30. 30. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method Forecasting without forecasters Exponential smoothing 10
31. 31. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Forecasting without forecasters Exponential smoothing 10
32. 32. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method Forecasting without forecasters Exponential smoothing 10
33. 33. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method Forecasting without forecasters Exponential smoothing 10
34. 34. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method A,M: Multiplicative Holt-Winters’ method Forecasting without forecasters Exponential smoothing 10
35. 35. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exponential smoothing methods. Forecasting without forecasters Exponential smoothing 10
36. 36. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exponential smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Forecasting without forecasters Exponential smoothing 10
37. 37. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
38. 38. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
39. 39. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
40. 40. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
41. 41. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
42. 42. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
43. 43. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11 Innovations state space models ¯ All ETS models can be written in innovations state space form (IJF, 2002). ¯ Additive and multiplicative versions give the same point forecasts but different prediction intervals.
44. 44. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
45. 45. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
46. 46. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
47. 47. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
48. 48. Exponential smoothing Forecasting without forecasters Exponential smoothing 13 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600
49. 49. Exponential smoothing fit <- ets(livestock) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Exponential smoothing 14 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600
50. 50. Exponential smoothing Forecasting without forecasters Exponential smoothing 15 Forecasts from ETS(M,Md,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6
51. 51. Exponential smoothing fit <- ets(h02) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Exponential smoothing 16 Forecasts from ETS(M,Md,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6
52. 52. M3 comparisons Method MAPE sMAPE MASE Theta 17.83 12.86 1.40 ForecastPro 18.00 13.06 1.47 ETS additive 18.58 13.69 1.48 ETS 19.33 13.57 1.59 Forecasting without forecasters Exponential smoothing 17
53. 53. References RJ Hyndman, AB Koehler, RD Snyder, and S Grose (2002). “A state space framework for automatic forecasting using exponential smoothing methods”. International Journal of Forecasting 18(3), 439–454. RJ Hyndman, AB Koehler, JK Ord, and RD Snyder (2008). Forecasting with exponential smoothing: the state space approach. Springer-Verlag. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters Exponential smoothing 18
54. 54. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters ARIMA modelling 19
55. 55. ARIMA modelling Forecasting without forecasters ARIMA modelling 20 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY.
56. 56. ARIMA modelling Forecasting without forecasters ARIMA modelling 20 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “There is such a bewildering variety of ARIMA models, it can be difﬁcult to decide which model is most appropriate for a given set of data.” (MWH, p.347)
57. 57. Auto ARIMA Forecasting without forecasters ARIMA modelling 21 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
58. 58. Auto ARIMA fit <- auto.arima(livestock) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters ARIMA modelling 22 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
59. 59. Auto ARIMA Forecasting without forecasters ARIMA modelling 23 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
60. 60. Auto ARIMA fit <- auto.arima(h02) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters ARIMA modelling 24 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
61. 61. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
62. 62. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
63. 63. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
64. 64. How does auto.arima() work? A seasonal ARIMA process Φ(Bm )φ(B)(1 − B)d (1 − Bm )D yt = c + Θ(Bm )θ(B)εt Need to select appropriate orders p, q, d, P, Q, D, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select D using OCSB unit root test. Select p, q, P, Q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 26
65. 65. M3 comparisons Method MAPE sMAPE MASE Theta 17.83 12.86 1.40 ForecastPro 18.00 13.06 1.47 BJauto 19.14 13.73 1.55 AutoARIMA 18.98 13.75 1.47 ETS-additive 18.58 13.69 1.48 ETS 19.33 13.57 1.59 ETS-ARIMA 18.17 13.11 1.44 Forecasting without forecasters ARIMA modelling 27
66. 66. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
67. 67. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
68. 68. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
69. 69. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
70. 70. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
71. 71. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
72. 72. References RJ Hyndman and Y Khandakar (2008). “Automatic time series forecasting : the forecast package for R”. Journal of Statistical Software 26(3) RJ Hyndman (2011). “Major changes to the forecast package”. robjhyndman.com/hyndsight/forecast3/. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters ARIMA modelling 29
73. 73. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Time series with complex seasonality 30
74. 74. Examples Forecasting without forecasters Time series with complex seasonality 31 US finished motor gasoline products Weeks Thousandsofbarrelsperday 1992 1994 1996 1998 2000 2002 2004 6500700075008000850090009500
75. 75. Examples Forecasting without forecasters Time series with complex seasonality 31 Number of calls to large American bank (7am−9pm) 5 minute intervals Numberofcallarrivals 100200300400 3 March 17 March 31 March 14 April 28 April 12 May
76. 76. Examples Forecasting without forecasters Time series with complex seasonality 31 Turkish electricity demand Days Electricitydemand(GW) 2000 2002 2004 2006 2008 10152025
77. 77. TBATS model TBATS Trigonometric terms for seasonality Box-Cox transformations for heterogeneity ARMA errors for short-term dynamics Trend (possibly damped) Seasonal (including multiple and non-integer periods) Forecasting without forecasters Time series with complex seasonality 32
78. 78. Examples fit <- tbats(gasoline) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 33 Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>}) Weeks Thousandsofbarrelsperday 1995 2000 2005 70008000900010000
79. 79. Examples fit <- tbats(callcentre) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 34 Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>}) 5 minute intervals Numberofcallarrivals 0100200300400500 3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June
80. 80. Examples fit <- tbats(turk) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 35 Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>}) Days Electricitydemand(GW) 2000 2002 2004 2006 2008 2010 10152025
81. 81. References Automatic algorithm described in AM De Livera, RJ Hyndman, and RD Snyder (2011). “Forecasting time series with complex seasonal patterns using exponential smoothing”. Journal of the American Statistical Association 106(496), 1513–1527. Slightly improved algorithm implemented in RJ Hyndman (2012). forecast: Forecasting functions for time series. cran.r-project.org/package=forecast. More work required! Forecasting without forecasters Time series with complex seasonality 36
82. 82. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Hierarchical and grouped time series 37
83. 83. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
84. 84. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
85. 85. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
86. 86. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
87. 87. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classiﬁcation System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
88. 88. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classiﬁcation System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
89. 89. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classiﬁcation System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
90. 90. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classiﬁcation System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
91. 91. Hierarchical data Total A B C Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
92. 92. Hierarchical data Total A B C Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
93. 93. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1       YA,t YB,t YC,t   Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
94. 94. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
95. 95. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Bt Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
96. 96. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Bt Yt = SBt Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
97. 97. Grouped data Total A AX AY B BX BY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Forecasting without forecasters Hierarchical and grouped time series 41
98. 98. Grouped data Total A AX AY B BX BY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Forecasting without forecasters Hierarchical and grouped time series 41
99. 99. Grouped data Total A AX AY B BX BY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Yt = SBt Forecasting without forecasters Hierarchical and grouped time series 41
100. 100. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
101. 101. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
102. 102. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
103. 103. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42 Independent of covariance structure of hierarchy! Optimal reconciliation weights are S(S S)−1 S , independent of data.
104. 104. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and ﬂexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
105. 105. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and ﬂexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
106. 106. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and ﬂexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
107. 107. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and ﬂexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
108. 108. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and ﬂexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
109. 109. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and ﬂexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
110. 110. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and ﬂexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
111. 111. Challenges Computational difﬁculties in big hierarchies due to size of the S matrix and non-singular behavior of (S S). Need to estimate covariance matrix to produce prediction intervals. Forecasting without forecasters Hierarchical and grouped time series 44
112. 112. Challenges Computational difﬁculties in big hierarchies due to size of the S matrix and non-singular behavior of (S S). Need to estimate covariance matrix to produce prediction intervals. Forecasting without forecasters Hierarchical and grouped time series 44
113. 113. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) Forecasting without forecasters Hierarchical and grouped time series 45
114. 114. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) Forecasting without forecasters Hierarchical and grouped time series 45 Total A AX AY B BX BY
115. 115. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) # Forecast 10-step-ahead using optimal combination method # ETS used for each series by default fc <- forecast(y, h=10) Forecasting without forecasters Hierarchical and grouped time series 46
116. 116. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) # Forecast 10-step-ahead using optimal combination method # ETS used for each series by default fc <- forecast(y, h=10) # Select your own methods ally <- allts(y) allf <- matrix(, nrow=10, ncol=ncol(ally)) for(i in 1:ncol(ally)) allf[,i] <- mymethod(ally[,i], h=10) allf <- ts(allf, start=2004) # Reconcile forecasts so they add up fc2 <- combinef(allf, Smatrix(y)) Forecasting without forecasters Hierarchical and grouped time series 47
117. 117. References RJ Hyndman, RA Ahmed, G Athanasopoulos, and HL Shang (2011). “Optimal combination forecasts for hierarchical time series”. Computational Statistics and Data Analysis 55(9), 2579–2589 RJ Hyndman, RA Ahmed, and HL Shang (2013). hts: Hierarchical time series. cran.r-project.org/package=hts. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters Hierarchical and grouped time series 48
118. 118. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Functional time series 49
119. 119. Fertility rates Forecasting without forecasters Functional time series 50 15 20 25 30 35 40 45 50 050100150200250 Australia: fertility rates (1921) Age Fertilityrate
120. 120. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
121. 121. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
122. 122. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
123. 123. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
124. 124. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
125. 125. Fertility application Forecasting without forecasters Functional time series 52 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
126. 126. Fertility model Forecasting without forecasters Functional time series 53 15 20 25 30 35 40 45 50 051015 Age Mu 15 20 25 30 35 40 45 50 0.000.100.200.30 Age Phi1 Year Beta1 1920 1960 2000 −20−100510 15 20 25 30 35 40 45 50 −0.2−0.10.00.10.2 Age Phi2 YearBeta2 1920 1960 2000 −100102030
127. 127. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
128. 128. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
129. 129. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
130. 130. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate 80% prediction intervals
131. 131. R code Forecasting without forecasters Functional time series 55 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate library(demography) plot(aus.fert) fit <- fdm(aus.fert) fc <- forecast(fit)
132. 132. References RJ Hyndman and S Ullah (2007). “Robust forecasting of mortality and fertility rates: A functional data approach”. Computational Statistics and Data Analysis 51(10), 4942–4956 RJ Hyndman and HL Shang (2009). “Forecasting functional time series (with discussion)”. Journal of the Korean Statistical Society 38(3), 199–221 RJ Hyndman (2012). demography: Forecasting mortality, fertility, migration and population data. cran.r-project.org/package=demography. Forecasting without forecasters Functional time series 56
133. 133. For further information robjhyndman.com Slides and references for this talk. Links to all papers and books. Links to R packages. A blog about forecasting research. Forecasting without forecasters Functional time series 57