Forecasting without forecasters

2,465 views
2,338 views

Published on

Keynote talk given at the International Symposium on Forecasting, Seoul, South Korea. 25 June 2013

Published in: Technology, Business

Forecasting without forecasters

  1. 1. 1 Rob J Hyndman Forecasting without forecasters
  2. 2. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Motivation 2
  3. 3. Motivation Forecasting without forecasters Motivation 3
  4. 4. Motivation Forecasting without forecasters Motivation 3
  5. 5. Motivation Forecasting without forecasters Motivation 3
  6. 6. Motivation Forecasting without forecasters Motivation 3
  7. 7. Motivation Forecasting without forecasters Motivation 3
  8. 8. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  9. 9. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  10. 10. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  11. 11. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  12. 12. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  13. 13. Example: Asian sheep Forecasting without forecasters Motivation 5 Numbers of sheep in Asia Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  14. 14. Example: Asian sheep Forecasting without forecasters Motivation 5 Automatic ETS forecasts Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  15. 15. Example: Cortecosteroid sales Forecasting without forecasters Motivation 6 Monthly cortecosteroid drug sales in Australia Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  16. 16. Example: Cortecosteroid sales Forecasting without forecasters Motivation 6 Automatic ARIMA forecasts Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  17. 17. M3 competition Forecasting without forecasters Motivation 7
  18. 18. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  19. 19. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  20. 20. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  21. 21. M3 competition Forecasting without forecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  22. 22. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Exponential smoothing 8
  23. 23. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY.
  24. 24. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “Unfortunately, exponential smoothing methods do not allow the easy calculation of prediction intervals.” (MWH, p.177) ¯ No satisfactory way to select an exponential smoothing method.
  25. 25. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “Unfortunately, exponential smoothing methods do not allow the easy calculation of prediction intervals.” (MWH, p.177) ¯ No satisfactory way to select an exponential smoothing method.
  26. 26. Exponential smoothing Forecasting without forecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. Current Reference Hyndman and Athanasopoulos (2013) Forecasting: principles and practice, OTexts: Australia. OTexts.com/fpp.
  27. 27. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M Forecasting without forecasters Exponential smoothing 10
  28. 28. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing Forecasting without forecasters Exponential smoothing 10
  29. 29. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Forecasting without forecasters Exponential smoothing 10
  30. 30. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method Forecasting without forecasters Exponential smoothing 10
  31. 31. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Forecasting without forecasters Exponential smoothing 10
  32. 32. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method Forecasting without forecasters Exponential smoothing 10
  33. 33. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method Forecasting without forecasters Exponential smoothing 10
  34. 34. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method A,M: Multiplicative Holt-Winters’ method Forecasting without forecasters Exponential smoothing 10
  35. 35. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exponential smoothing methods. Forecasting without forecasters Exponential smoothing 10
  36. 36. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exponential smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Forecasting without forecasters Exponential smoothing 10
  37. 37. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  38. 38. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  39. 39. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  40. 40. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  41. 41. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  42. 42. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  43. 43. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11 Innovations state space models ¯ All ETS models can be written in innovations state space form (IJF, 2002). ¯ Additive and multiplicative versions give the same point forecasts but different prediction intervals.
  44. 44. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  45. 45. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  46. 46. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  47. 47. Automatic forecasting From Hyndman et al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  48. 48. Exponential smoothing Forecasting without forecasters Exponential smoothing 13 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600
  49. 49. Exponential smoothing fit <- ets(livestock) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Exponential smoothing 14 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600
  50. 50. Exponential smoothing Forecasting without forecasters Exponential smoothing 15 Forecasts from ETS(M,Md,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6
  51. 51. Exponential smoothing fit <- ets(h02) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Exponential smoothing 16 Forecasts from ETS(M,Md,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6
  52. 52. M3 comparisons Method MAPE sMAPE MASE Theta 17.83 12.86 1.40 ForecastPro 18.00 13.06 1.47 ETS additive 18.58 13.69 1.48 ETS 19.33 13.57 1.59 Forecasting without forecasters Exponential smoothing 17
  53. 53. References RJ Hyndman, AB Koehler, RD Snyder, and S Grose (2002). “A state space framework for automatic forecasting using exponential smoothing methods”. International Journal of Forecasting 18(3), 439–454. RJ Hyndman, AB Koehler, JK Ord, and RD Snyder (2008). Forecasting with exponential smoothing: the state space approach. Springer-Verlag. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters Exponential smoothing 18
  54. 54. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters ARIMA modelling 19
  55. 55. ARIMA modelling Forecasting without forecasters ARIMA modelling 20 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY.
  56. 56. ARIMA modelling Forecasting without forecasters ARIMA modelling 20 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “There is such a bewildering variety of ARIMA models, it can be difficult to decide which model is most appropriate for a given set of data.” (MWH, p.347)
  57. 57. Auto ARIMA Forecasting without forecasters ARIMA modelling 21 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  58. 58. Auto ARIMA fit <- auto.arima(livestock) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters ARIMA modelling 22 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  59. 59. Auto ARIMA Forecasting without forecasters ARIMA modelling 23 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  60. 60. Auto ARIMA fit <- auto.arima(h02) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters ARIMA modelling 24 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  61. 61. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
  62. 62. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
  63. 63. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
  64. 64. How does auto.arima() work? A seasonal ARIMA process Φ(Bm )φ(B)(1 − B)d (1 − Bm )D yt = c + Θ(Bm )θ(B)εt Need to select appropriate orders p, q, d, P, Q, D, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select D using OCSB unit root test. Select p, q, P, Q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 26
  65. 65. M3 comparisons Method MAPE sMAPE MASE Theta 17.83 12.86 1.40 ForecastPro 18.00 13.06 1.47 BJauto 19.14 13.73 1.55 AutoARIMA 18.98 13.75 1.47 ETS-additive 18.58 13.69 1.48 ETS 19.33 13.57 1.59 ETS-ARIMA 18.17 13.11 1.44 Forecasting without forecasters ARIMA modelling 27
  66. 66. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  67. 67. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  68. 68. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  69. 69. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  70. 70. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  71. 71. M3 conclusions MYTHS Simple methods do better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  72. 72. References RJ Hyndman and Y Khandakar (2008). “Automatic time series forecasting : the forecast package for R”. Journal of Statistical Software 26(3) RJ Hyndman (2011). “Major changes to the forecast package”. robjhyndman.com/hyndsight/forecast3/. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters ARIMA modelling 29
  73. 73. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Time series with complex seasonality 30
  74. 74. Examples Forecasting without forecasters Time series with complex seasonality 31 US finished motor gasoline products Weeks Thousandsofbarrelsperday 1992 1994 1996 1998 2000 2002 2004 6500700075008000850090009500
  75. 75. Examples Forecasting without forecasters Time series with complex seasonality 31 Number of calls to large American bank (7am−9pm) 5 minute intervals Numberofcallarrivals 100200300400 3 March 17 March 31 March 14 April 28 April 12 May
  76. 76. Examples Forecasting without forecasters Time series with complex seasonality 31 Turkish electricity demand Days Electricitydemand(GW) 2000 2002 2004 2006 2008 10152025
  77. 77. TBATS model TBATS Trigonometric terms for seasonality Box-Cox transformations for heterogeneity ARMA errors for short-term dynamics Trend (possibly damped) Seasonal (including multiple and non-integer periods) Forecasting without forecasters Time series with complex seasonality 32
  78. 78. Examples fit <- tbats(gasoline) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 33 Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>}) Weeks Thousandsofbarrelsperday 1995 2000 2005 70008000900010000
  79. 79. Examples fit <- tbats(callcentre) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 34 Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>}) 5 minute intervals Numberofcallarrivals 0100200300400500 3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June
  80. 80. Examples fit <- tbats(turk) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 35 Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>}) Days Electricitydemand(GW) 2000 2002 2004 2006 2008 2010 10152025
  81. 81. References Automatic algorithm described in AM De Livera, RJ Hyndman, and RD Snyder (2011). “Forecasting time series with complex seasonal patterns using exponential smoothing”. Journal of the American Statistical Association 106(496), 1513–1527. Slightly improved algorithm implemented in RJ Hyndman (2012). forecast: Forecasting functions for time series. cran.r-project.org/package=forecast. More work required! Forecasting without forecasters Time series with complex seasonality 36
  82. 82. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Hierarchical and grouped time series 37
  83. 83. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  84. 84. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  85. 85. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  86. 86. Introduction Total A AA AB AC B BA BB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  87. 87. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  88. 88. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  89. 89. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  90. 90. Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  91. 91. Hierarchical data Total A B C Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  92. 92. Hierarchical data Total A B C Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  93. 93. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1       YA,t YB,t YC,t   Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  94. 94. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  95. 95. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Bt Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  96. 96. Hierarchical data Total A B C Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Bt Yt = SBt Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  97. 97. Grouped data Total A AX AY B BX BY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Forecasting without forecasters Hierarchical and grouped time series 41
  98. 98. Grouped data Total A AX AY B BX BY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Forecasting without forecasters Hierarchical and grouped time series 41
  99. 99. Grouped data Total A AX AY B BX BY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Yt = SBt Forecasting without forecasters Hierarchical and grouped time series 41
  100. 100. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
  101. 101. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
  102. 102. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
  103. 103. Forecasts Key idea: forecast reconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42 Independent of covariance structure of hierarchy! Optimal reconciliation weights are S(S S)−1 S , independent of data.
  104. 104. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  105. 105. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  106. 106. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  107. 107. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  108. 108. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  109. 109. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  110. 110. Features Forget “bottom up” or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  111. 111. Challenges Computational difficulties in big hierarchies due to size of the S matrix and non-singular behavior of (S S). Need to estimate covariance matrix to produce prediction intervals. Forecasting without forecasters Hierarchical and grouped time series 44
  112. 112. Challenges Computational difficulties in big hierarchies due to size of the S matrix and non-singular behavior of (S S). Need to estimate covariance matrix to produce prediction intervals. Forecasting without forecasters Hierarchical and grouped time series 44
  113. 113. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) Forecasting without forecasters Hierarchical and grouped time series 45
  114. 114. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) Forecasting without forecasters Hierarchical and grouped time series 45 Total A AX AY B BX BY
  115. 115. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) # Forecast 10-step-ahead using optimal combination method # ETS used for each series by default fc <- forecast(y, h=10) Forecasting without forecasters Hierarchical and grouped time series 46
  116. 116. Example using R library(hts) # bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) # Forecast 10-step-ahead using optimal combination method # ETS used for each series by default fc <- forecast(y, h=10) # Select your own methods ally <- allts(y) allf <- matrix(, nrow=10, ncol=ncol(ally)) for(i in 1:ncol(ally)) allf[,i] <- mymethod(ally[,i], h=10) allf <- ts(allf, start=2004) # Reconcile forecasts so they add up fc2 <- combinef(allf, Smatrix(y)) Forecasting without forecasters Hierarchical and grouped time series 47
  117. 117. References RJ Hyndman, RA Ahmed, G Athanasopoulos, and HL Shang (2011). “Optimal combination forecasts for hierarchical time series”. Computational Statistics and Data Analysis 55(9), 2579–2589 RJ Hyndman, RA Ahmed, and HL Shang (2013). hts: Hierarchical time series. cran.r-project.org/package=hts. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters Hierarchical and grouped time series 48
  118. 118. Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Functional time series 49
  119. 119. Fertility rates Forecasting without forecasters Functional time series 50 15 20 25 30 35 40 45 50 050100150200250 Australia: fertility rates (1921) Age Fertilityrate
  120. 120. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  121. 121. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  122. 122. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  123. 123. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  124. 124. Functional data model Let ft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  125. 125. Fertility application Forecasting without forecasters Functional time series 52 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  126. 126. Fertility model Forecasting without forecasters Functional time series 53 15 20 25 30 35 40 45 50 051015 Age Mu 15 20 25 30 35 40 45 50 0.000.100.200.30 Age Phi1 Year Beta1 1920 1960 2000 −20−100510 15 20 25 30 35 40 45 50 −0.2−0.10.00.10.2 Age Phi2 YearBeta2 1920 1960 2000 −100102030
  127. 127. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  128. 128. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  129. 129. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  130. 130. Forecasts of ft(x) Forecasting without forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate 80% prediction intervals
  131. 131. R code Forecasting without forecasters Functional time series 55 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate library(demography) plot(aus.fert) fit <- fdm(aus.fert) fc <- forecast(fit)
  132. 132. References RJ Hyndman and S Ullah (2007). “Robust forecasting of mortality and fertility rates: A functional data approach”. Computational Statistics and Data Analysis 51(10), 4942–4956 RJ Hyndman and HL Shang (2009). “Forecasting functional time series (with discussion)”. Journal of the Korean Statistical Society 38(3), 199–221 RJ Hyndman (2012). demography: Forecasting mortality, fertility, migration and population data. cran.r-project.org/package=demography. Forecasting without forecasters Functional time series 56
  133. 133. For further information robjhyndman.com Slides and references for this talk. Links to all papers and books. Links to R packages. A blog about forecasting research. Forecasting without forecasters Functional time series 57

×