1
Rob J Hyndman
Forecasting without
forecasters
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Motivation 2
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
1 Common in business to have over 1000
products that need forecasting at least monthly.
2 Forecasts are often required by people who are
untrained in time series analysis.
3 Some types of data can be decomposed into a
large number of univariate time series that
need to be forecast.
Specifications
Automatic forecasting algorithms must:
¯ determine an appropriate time series model;
¯ estimate the parameters;
¯ compute the forecasts with prediction intervals.
Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000
products that need forecasting at least monthly.
2 Forecasts are often required by people who are
untrained in time series analysis.
3 Some types of data can be decomposed into a
large number of univariate time series that
need to be forecast.
Specifications
Automatic forecasting algorithms must:
¯ determine an appropriate time series model;
¯ estimate the parameters;
¯ compute the forecasts with prediction intervals.
Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000
products that need forecasting at least monthly.
2 Forecasts are often required by people who are
untrained in time series analysis.
3 Some types of data can be decomposed into a
large number of univariate time series that
need to be forecast.
Specifications
Automatic forecasting algorithms must:
¯ determine an appropriate time series model;
¯ estimate the parameters;
¯ compute the forecasts with prediction intervals.
Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000
products that need forecasting at least monthly.
2 Forecasts are often required by people who are
untrained in time series analysis.
3 Some types of data can be decomposed into a
large number of univariate time series that
need to be forecast.
Specifications
Automatic forecasting algorithms must:
¯ determine an appropriate time series model;
¯ estimate the parameters;
¯ compute the forecasts with prediction intervals.
Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000
products that need forecasting at least monthly.
2 Forecasts are often required by people who are
untrained in time series analysis.
3 Some types of data can be decomposed into a
large number of univariate time series that
need to be forecast.
Specifications
Automatic forecasting algorithms must:
¯ determine an appropriate time series model;
¯ estimate the parameters;
¯ compute the forecasts with prediction intervals.
Forecasting without forecasters Motivation 4
Example: Asian sheep
Forecasting without forecasters Motivation 5
Numbers of sheep in Asia
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
Example: Asian sheep
Forecasting without forecasters Motivation 5
Automatic ETS forecasts
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
Example: Cortecosteroid sales
Forecasting without forecasters Motivation 6
Monthly cortecosteroid drug sales in Australia
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
Example: Cortecosteroid sales
Forecasting without forecasters Motivation 6
Automatic ARIMA forecasts
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
M3 competition
Forecasting without forecasters Motivation 7
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecasting
algorithms.
Best-performing methods undocumented.
Limited subsequent research on general
automatic forecasting algorithms.
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecasting
algorithms.
Best-performing methods undocumented.
Limited subsequent research on general
automatic forecasting algorithms.
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecasting
algorithms.
Best-performing methods undocumented.
Limited subsequent research on general
automatic forecasting algorithms.
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecasting
algorithms.
Best-performing methods undocumented.
Limited subsequent research on general
automatic forecasting algorithms.
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Exponential smoothing 8
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright and
Hyndman (1998) Forecasting:
methods and applications, 3rd
ed., Wiley: NY.
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright and
Hyndman (1998) Forecasting:
methods and applications, 3rd
ed., Wiley: NY.
¯ “Unfortunately, exponential smoothing
methods do not allow the easy calculation of
prediction intervals.” (MWH, p.177)
¯ No satisfactory way to select an exponential
smoothing method.
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright and
Hyndman (1998) Forecasting:
methods and applications, 3rd
ed., Wiley: NY.
¯ “Unfortunately, exponential smoothing
methods do not allow the easy calculation of
prediction intervals.” (MWH, p.177)
¯ No satisfactory way to select an exponential
smoothing method.
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright and
Hyndman (1998) Forecasting:
methods and applications, 3rd
ed., Wiley: NY.
Current Reference
Hyndman and Athanasopoulos
(2013) Forecasting: principles
and practice, OTexts: Australia.
OTexts.com/fpp.
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Md,N: Multiplicative damped trend method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Md,N: Multiplicative damped trend method
A,A: Additive Holt-Winters’ method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Md,N: Multiplicative damped trend method
A,A: Additive Holt-Winters’ method
A,M: Multiplicative Holt-Winters’ method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exponential smoothing
methods.
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exponential smoothing
methods.
Each can have an additive or multiplicative
error, giving 30 separate models.
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Trend
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Error Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Error Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Error Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Innovations state space models
¯ All ETS models can be written in innovations
state space form (IJF, 2002).
¯ Additive and multiplicative versions give the
same point forecasts but different prediction
intervals.
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate to
the data. Optimize parameters and initial
values using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlying
state space model.
Forecasting without forecasters Exponential smoothing 12
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate to
the data. Optimize parameters and initial
values using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlying
state space model.
Forecasting without forecasters Exponential smoothing 12
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate to
the data. Optimize parameters and initial
values using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlying
state space model.
Forecasting without forecasters Exponential smoothing 12
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate to
the data. Optimize parameters and initial
values using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlying
state space model.
Forecasting without forecasters Exponential smoothing 12
Exponential smoothing
Forecasting without forecasters Exponential smoothing 13
Forecasts from ETS(M,A,N)
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
300400500600
Exponential smoothing
fit <- ets(livestock)
fcast <- forecast(fit)
plot(fcast)
Forecasting without forecasters Exponential smoothing 14
Forecasts from ETS(M,A,N)
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
300400500600
Exponential smoothing
Forecasting without forecasters Exponential smoothing 15
Forecasts from ETS(M,Md,M)
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.41.6
Exponential smoothing
fit <- ets(h02)
fcast <- forecast(fit)
plot(fcast)
Forecasting without forecasters Exponential smoothing 16
Forecasts from ETS(M,Md,M)
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.41.6
M3 comparisons
Method MAPE sMAPE MASE
Theta 17.83 12.86 1.40
ForecastPro 18.00 13.06 1.47
ETS additive 18.58 13.69 1.48
ETS 19.33 13.57 1.59
Forecasting without forecasters Exponential smoothing 17
References
RJ Hyndman, AB Koehler, RD Snyder, and
S Grose (2002). “A state space framework for
automatic forecasting using exponential
smoothing methods”. International Journal of
Forecasting 18(3), 439–454.
RJ Hyndman, AB Koehler, JK Ord, and RD Snyder
(2008). Forecasting with exponential
smoothing: the state space approach.
Springer-Verlag.
RJ Hyndman and G Athanasopoulos (2013).
Forecasting: principles and practice. OTexts.
OTexts.com/fpp/.
Forecasting without forecasters Exponential smoothing 18
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters ARIMA modelling 19
ARIMA modelling
Forecasting without forecasters ARIMA modelling 20
Classic Reference
Makridakis, Wheelwright and
Hyndman (1998) Forecasting:
methods and applications, 3rd
ed., Wiley: NY.
ARIMA modelling
Forecasting without forecasters ARIMA modelling 20
Classic Reference
Makridakis, Wheelwright and
Hyndman (1998) Forecasting:
methods and applications, 3rd
ed., Wiley: NY.
¯ “There is such a bewildering variety of ARIMA
models, it can be difficult to decide which model
is most appropriate for a given set of data.”
(MWH, p.347)
Auto ARIMA
Forecasting without forecasters ARIMA modelling 21
Forecasts from ARIMA(0,1,0) with drift
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
Auto ARIMA
fit <- auto.arima(livestock)
fcast <- forecast(fit)
plot(fcast)
Forecasting without forecasters ARIMA modelling 22
Forecasts from ARIMA(0,1,0) with drift
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
Auto ARIMA
Forecasting without forecasters ARIMA modelling 23
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
Auto ARIMA
fit <- auto.arima(h02)
fcast <- forecast(fit)
plot(fcast)
Forecasting without forecasters ARIMA modelling 24
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1 − B)d
yt = c + θ(B)εt
Need to select appropriate orders p, q, d, and
whether to include c.
Forecasting without forecasters ARIMA modelling 25
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1 − B)d
yt = c + θ(B)εt
Need to select appropriate orders p, q, d, and
whether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:
Select no. differences d via KPSS unit root test.
Select p, q, c by minimising AIC.
Use stepwise search to traverse model space,
starting with a simple model and considering
nearby variants.
Forecasting without forecasters ARIMA modelling 25
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1 − B)d
yt = c + θ(B)εt
Need to select appropriate orders p, q, d, and
whether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:
Select no. differences d via KPSS unit root test.
Select p, q, c by minimising AIC.
Use stepwise search to traverse model space,
starting with a simple model and considering
nearby variants.
Forecasting without forecasters ARIMA modelling 25
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A seasonal ARIMA process
Φ(Bm
)φ(B)(1 − B)d
(1 − Bm
)D
yt = c + Θ(Bm
)θ(B)εt
Need to select appropriate orders p, q, d, P, Q, D, and
whether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:
Select no. differences d via KPSS unit root test.
Select D using OCSB unit root test.
Select p, q, P, Q, c by minimising AIC.
Use stepwise search to traverse model space,
starting with a simple model and considering
nearby variants.
Forecasting without forecasters ARIMA modelling 26
M3 comparisons
Method MAPE sMAPE MASE
Theta 17.83 12.86 1.40
ForecastPro 18.00 13.06 1.47
BJauto 19.14 13.73 1.55
AutoARIMA 18.98 13.75 1.47
ETS-additive 18.58 13.69 1.48
ETS 19.33 13.57 1.59
ETS-ARIMA 18.17 13.11 1.44
Forecasting without forecasters ARIMA modelling 27
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
I have an algorithm that does better than all of
these, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
I have an algorithm that does better than all of
these, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
I have an algorithm that does better than all of
these, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
I have an algorithm that does better than all of
these, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
I have an algorithm that does better than all of
these, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
I have an algorithm that does better than all of
these, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
References
RJ Hyndman and Y Khandakar (2008).
“Automatic time series forecasting : the
forecast package for R”. Journal of Statistical
Software 26(3)
RJ Hyndman (2011). “Major changes to the
forecast package”.
robjhyndman.com/hyndsight/forecast3/.
RJ Hyndman and G Athanasopoulos (2013).
Forecasting: principles and practice. OTexts.
OTexts.com/fpp/.
Forecasting without forecasters ARIMA modelling 29
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Time series with complex seasonality 30
Examples
Forecasting without forecasters Time series with complex seasonality 31
US finished motor gasoline products
Weeks
Thousandsofbarrelsperday
1992 1994 1996 1998 2000 2002 2004
6500700075008000850090009500
Examples
Forecasting without forecasters Time series with complex seasonality 31
Number of calls to large American bank (7am−9pm)
5 minute intervals
Numberofcallarrivals
100200300400
3 March 17 March 31 March 14 April 28 April 12 May
Examples
Forecasting without forecasters Time series with complex seasonality 31
Turkish electricity demand
Days
Electricitydemand(GW)
2000 2002 2004 2006 2008
10152025
TBATS model
TBATS
Trigonometric terms for seasonality
Box-Cox transformations for heterogeneity
ARMA errors for short-term dynamics
Trend (possibly damped)
Seasonal (including multiple and
non-integer periods)
Forecasting without forecasters Time series with complex seasonality 32
Examples
fit <- tbats(gasoline)
fcast <- forecast(fit)
plot(fcast)
Forecasting without forecasters Time series with complex seasonality 33
Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>})
Weeks
Thousandsofbarrelsperday
1995 2000 2005
70008000900010000
Examples
fit <- tbats(callcentre)
fcast <- forecast(fit)
plot(fcast)
Forecasting without forecasters Time series with complex seasonality 34
Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>})
5 minute intervals
Numberofcallarrivals
0100200300400500
3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June
Examples
fit <- tbats(turk)
fcast <- forecast(fit)
plot(fcast)
Forecasting without forecasters Time series with complex seasonality 35
Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>})
Days
Electricitydemand(GW)
2000 2002 2004 2006 2008 2010
10152025
References
Automatic algorithm described in
AM De Livera, RJ Hyndman, and RD Snyder
(2011). “Forecasting time series with complex
seasonal patterns using exponential
smoothing”. Journal of the American Statistical
Association 106(496), 1513–1527.
Slightly improved algorithm implemented in
RJ Hyndman (2012). forecast: Forecasting
functions for time series.
cran.r-project.org/package=forecast.
More work required!
Forecasting without forecasters Time series with complex seasonality 36
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Hierarchical and grouped time series 37
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized
in a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: daily numbers of calls to HP call centres
are grouped by product type and location of call
centre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized
in a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: daily numbers of calls to HP call centres
are grouped by product type and location of call
centre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized
in a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: daily numbers of calls to HP call centres
are grouped by product type and location of call
centre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized
in a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: daily numbers of calls to HP call centres
are grouped by product type and location of call
centre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical data
Total
A B C
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Hierarchical data
Total
A B C
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Hierarchical data
Total
A B C
Yt = [Yt, YA,t, YB,t, YC,t] =




1 1 1
1 0 0
0 1 0
0 0 1






YA,t
YB,t
YC,t


Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Hierarchical data
Total
A B C
Yt = [Yt, YA,t, YB,t, YC,t] =




1 1 1
1 0 0
0 1 0
0 0 1




S


YA,t
YB,t
YC,t


Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Hierarchical data
Total
A B C
Yt = [Yt, YA,t, YB,t, YC,t] =




1 1 1
1 0 0
0 1 0
0 0 1




S


YA,t
YB,t
YC,t


Bt
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Hierarchical data
Total
A B C
Yt = [Yt, YA,t, YB,t, YC,t] =




1 1 1
1 0 0
0 1 0
0 0 1




S


YA,t
YB,t
YC,t


Bt
Yt = SBt
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Grouped data
Total
A
AX AY
B
BX BY
Total
X
AX BX
Y
AY BY
Yt =











Yt
YA,t
YB,t
YX,t
YY,t
YAX,t
YAY,t
YBX,t
YBY,t











=











1 1 1 1
1 1 0 0
0 0 1 1
1 0 1 0
0 1 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1











S



YAX,t
YAY,t
YBX,t
YBY,t



Bt
Forecasting without forecasters Hierarchical and grouped time series 41
Grouped data
Total
A
AX AY
B
BX BY
Total
X
AX BX
Y
AY BY
Yt =











Yt
YA,t
YB,t
YX,t
YY,t
YAX,t
YAY,t
YBX,t
YBY,t











=











1 1 1 1
1 1 0 0
0 0 1 1
1 0 1 0
0 1 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1











S



YAX,t
YAY,t
YBX,t
YBY,t



Bt
Forecasting without forecasters Hierarchical and grouped time series 41
Grouped data
Total
A
AX AY
B
BX BY
Total
X
AX BX
Y
AY BY
Yt =











Yt
YA,t
YB,t
YX,t
YY,t
YAX,t
YAY,t
YBX,t
YBY,t











=











1 1 1 1
1 1 0 0
0 0 1 1
1 0 1 0
0 1 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1











S



YAX,t
YAY,t
YBX,t
YBY,t



Bt
Yt = SBt
Forecasting without forecasters Hierarchical and grouped time series 41
Forecasts
Key idea: forecast reconciliation
¯ Ignore structural constraints and forecast every
series of interest independently.
¯ Adjust forecasts to impose constraints.
Let ˆYn(h) be vector of initial forecasts for horizon h,
made at time n, stacked in same order as Yt.
Optimal reconciled forecasts:
˜Yn(h) = S(S S)−1
S ˆYn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Forecasts
Key idea: forecast reconciliation
¯ Ignore structural constraints and forecast every
series of interest independently.
¯ Adjust forecasts to impose constraints.
Let ˆYn(h) be vector of initial forecasts for horizon h,
made at time n, stacked in same order as Yt.
Optimal reconciled forecasts:
˜Yn(h) = S(S S)−1
S ˆYn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Forecasts
Key idea: forecast reconciliation
¯ Ignore structural constraints and forecast every
series of interest independently.
¯ Adjust forecasts to impose constraints.
Let ˆYn(h) be vector of initial forecasts for horizon h,
made at time n, stacked in same order as Yt.
Optimal reconciled forecasts:
˜Yn(h) = S(S S)−1
S ˆYn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Forecasts
Key idea: forecast reconciliation
¯ Ignore structural constraints and forecast every
series of interest independently.
¯ Adjust forecasts to impose constraints.
Let ˆYn(h) be vector of initial forecasts for horizon h,
made at time n, stacked in same order as Yt.
Optimal reconciled forecasts:
˜Yn(h) = S(S S)−1
S ˆYn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Independent of covariance structure of hierarchy!
Optimal reconciliation weights are S(S S)−1
S ,
independent of data.
Features
Forget “bottom up” or “top down”. This
approach combines all forecasts optimally.
Method outperforms bottom-up and top-down,
especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts at
any level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work with
any hierarchical or grouped time series.
Conceptually easy to implement: OLS on base
forecasts.
Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. This
approach combines all forecasts optimally.
Method outperforms bottom-up and top-down,
especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts at
any level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work with
any hierarchical or grouped time series.
Conceptually easy to implement: OLS on base
forecasts.
Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. This
approach combines all forecasts optimally.
Method outperforms bottom-up and top-down,
especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts at
any level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work with
any hierarchical or grouped time series.
Conceptually easy to implement: OLS on base
forecasts.
Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. This
approach combines all forecasts optimally.
Method outperforms bottom-up and top-down,
especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts at
any level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work with
any hierarchical or grouped time series.
Conceptually easy to implement: OLS on base
forecasts.
Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. This
approach combines all forecasts optimally.
Method outperforms bottom-up and top-down,
especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts at
any level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work with
any hierarchical or grouped time series.
Conceptually easy to implement: OLS on base
forecasts.
Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. This
approach combines all forecasts optimally.
Method outperforms bottom-up and top-down,
especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts at
any level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work with
any hierarchical or grouped time series.
Conceptually easy to implement: OLS on base
forecasts.
Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. This
approach combines all forecasts optimally.
Method outperforms bottom-up and top-down,
especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts at
any level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work with
any hierarchical or grouped time series.
Conceptually easy to implement: OLS on base
forecasts.
Forecasting without forecasters Hierarchical and grouped time series 43
Challenges
Computational difficulties in big hierarchies due
to size of the S matrix and non-singular
behavior of (S S).
Need to estimate covariance matrix to produce
prediction intervals.
Forecasting without forecasters Hierarchical and grouped time series 44
Challenges
Computational difficulties in big hierarchies due
to size of the S matrix and non-singular
behavior of (S S).
Need to estimate covariance matrix to produce
prediction intervals.
Forecasting without forecasters Hierarchical and grouped time series 44
Example using R
library(hts)
# bts is a matrix containing the bottom level time series
# g describes the grouping/hierarchical structure
y <- hts(bts, g=c(1,1,2,2))
Forecasting without forecasters Hierarchical and grouped time series 45
Example using R
library(hts)
# bts is a matrix containing the bottom level time series
# g describes the grouping/hierarchical structure
y <- hts(bts, g=c(1,1,2,2))
Forecasting without forecasters Hierarchical and grouped time series 45
Total
A
AX AY
B
BX BY
Example using R
library(hts)
# bts is a matrix containing the bottom level time series
# g describes the grouping/hierarchical structure
y <- hts(bts, g=c(1,1,2,2))
# Forecast 10-step-ahead using optimal combination method
# ETS used for each series by default
fc <- forecast(y, h=10)
Forecasting without forecasters Hierarchical and grouped time series 46
Example using R
library(hts)
# bts is a matrix containing the bottom level time series
# g describes the grouping/hierarchical structure
y <- hts(bts, g=c(1,1,2,2))
# Forecast 10-step-ahead using optimal combination method
# ETS used for each series by default
fc <- forecast(y, h=10)
# Select your own methods
ally <- allts(y)
allf <- matrix(, nrow=10, ncol=ncol(ally))
for(i in 1:ncol(ally))
allf[,i] <- mymethod(ally[,i], h=10)
allf <- ts(allf, start=2004)
# Reconcile forecasts so they add up
fc2 <- combinef(allf, Smatrix(y))
Forecasting without forecasters Hierarchical and grouped time series 47
References
RJ Hyndman, RA Ahmed, G Athanasopoulos,
and HL Shang (2011). “Optimal combination
forecasts for hierarchical time series”.
Computational Statistics and Data Analysis
55(9), 2579–2589
RJ Hyndman, RA Ahmed, and HL Shang (2013).
hts: Hierarchical time series.
cran.r-project.org/package=hts.
RJ Hyndman and G Athanasopoulos (2013).
Forecasting: principles and practice. OTexts.
OTexts.com/fpp/.
Forecasting without forecasters Hierarchical and grouped time series 48
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Functional time series 49
Fertility rates
Forecasting without forecasters Functional time series 50
15 20 25 30 35 40 45 50
050100150200250
Australia: fertility rates (1921)
Age
Fertilityrate
Functional data model
Let ft,x be the observed data in period t at age x,
t = 1, . . . , n.
ft(x) = µ(x) +
K
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allow
forecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)
principal components.
Univariate models used for automatic forecasting of
scores {βt,k}.
Functional data model
Let ft,x be the observed data in period t at age x,
t = 1, . . . , n.
ft(x) = µ(x) +
K
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allow
forecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)
principal components.
Univariate models used for automatic forecasting of
scores {βt,k}.
Functional data model
Let ft,x be the observed data in period t at age x,
t = 1, . . . , n.
ft(x) = µ(x) +
K
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allow
forecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)
principal components.
Univariate models used for automatic forecasting of
scores {βt,k}.
Functional data model
Let ft,x be the observed data in period t at age x,
t = 1, . . . , n.
ft(x) = µ(x) +
K
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allow
forecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)
principal components.
Univariate models used for automatic forecasting of
scores {βt,k}.
Functional data model
Let ft,x be the observed data in period t at age x,
t = 1, . . . , n.
ft(x) = µ(x) +
K
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allow
forecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)
principal components.
Univariate models used for automatic forecasting of
scores {βt,k}.
Fertility application
Forecasting without forecasters Functional time series 52
15 20 25 30 35 40 45 50
050100150200250
Australia fertility rates (1921−2006)
Age
Fertilityrate
Fertility model
Forecasting without forecasters Functional time series 53
15 20 25 30 35 40 45 50
051015
Age
Mu
15 20 25 30 35 40 45 50
0.000.100.200.30
Age
Phi1
Year
Beta1
1920 1960 2000
−20−100510
15 20 25 30 35 40 45 50
−0.2−0.10.00.10.2
Age
Phi2 YearBeta2
1920 1960 2000
−100102030
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050100150200250
Australia fertility rates (1921−2006)
Age
Fertilityrate
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050100150200250
Australia fertility rates (1921−2006)
Age
Fertilityrate
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050100150200250
Australia fertility rates (1921−2006)
Age
Fertilityrate
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050100150200250
Australia fertility rates (1921−2006)
Age
Fertilityrate
80% prediction intervals
R code
Forecasting without forecasters Functional time series 55
15 20 25 30 35 40 45 50
050100150200250
Australia fertility rates (1921−2006)
Age
Fertilityrate
library(demography)
plot(aus.fert)
fit <- fdm(aus.fert)
fc <- forecast(fit)
References
RJ Hyndman and S Ullah (2007). “Robust
forecasting of mortality and fertility rates: A
functional data approach”. Computational
Statistics and Data Analysis 51(10), 4942–4956
RJ Hyndman and HL Shang (2009). “Forecasting
functional time series (with discussion)”.
Journal of the Korean Statistical Society 38(3),
199–221
RJ Hyndman (2012). demography: Forecasting
mortality, fertility, migration and population
data.
cran.r-project.org/package=demography.
Forecasting without forecasters Functional time series 56
For further information
robjhyndman.com
Slides and references for this talk.
Links to all papers and books.
Links to R packages.
A blog about forecasting research.
Forecasting without forecasters Functional time series 57

Forecasting without forecasters

  • 1.
    1 Rob J Hyndman Forecastingwithout forecasters
  • 2.
    Outline 1 Motivation 2 Exponentialsmoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Motivation 2
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    Motivation 1 Common inbusiness to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  • 9.
    Motivation 1 Common inbusiness to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  • 10.
    Motivation 1 Common inbusiness to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  • 11.
    Motivation 1 Common inbusiness to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  • 12.
    Motivation 1 Common inbusiness to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. 3 Some types of data can be decomposed into a large number of univariate time series that need to be forecast. Specifications Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Forecasting without forecasters Motivation 4
  • 13.
    Example: Asian sheep Forecastingwithout forecasters Motivation 5 Numbers of sheep in Asia Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  • 14.
    Example: Asian sheep Forecastingwithout forecasters Motivation 5 Automatic ETS forecasts Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  • 15.
    Example: Cortecosteroid sales Forecastingwithout forecasters Motivation 6 Monthly cortecosteroid drug sales in Australia Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  • 16.
    Example: Cortecosteroid sales Forecastingwithout forecasters Motivation 6 Automatic ARIMA forecasts Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  • 17.
    M3 competition Forecasting withoutforecasters Motivation 7
  • 18.
    M3 competition Forecasting withoutforecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  • 19.
    M3 competition Forecasting withoutforecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  • 20.
    M3 competition Forecasting withoutforecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  • 21.
    M3 competition Forecasting withoutforecasters Motivation 7 3003 time series. Early comparison of automatic forecasting algorithms. Best-performing methods undocumented. Limited subsequent research on general automatic forecasting algorithms.
  • 22.
    Outline 1 Motivation 2 Exponentialsmoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Exponential smoothing 8
  • 23.
    Exponential smoothing Forecasting withoutforecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY.
  • 24.
    Exponential smoothing Forecasting withoutforecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “Unfortunately, exponential smoothing methods do not allow the easy calculation of prediction intervals.” (MWH, p.177) ¯ No satisfactory way to select an exponential smoothing method.
  • 25.
    Exponential smoothing Forecasting withoutforecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “Unfortunately, exponential smoothing methods do not allow the easy calculation of prediction intervals.” (MWH, p.177) ¯ No satisfactory way to select an exponential smoothing method.
  • 26.
    Exponential smoothing Forecasting withoutforecasters Exponential smoothing 9 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. Current Reference Hyndman and Athanasopoulos (2013) Forecasting: principles and practice, OTexts: Australia. OTexts.com/fpp.
  • 27.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M Forecasting without forecasters Exponential smoothing 10
  • 28.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing Forecasting without forecasters Exponential smoothing 10
  • 29.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Forecasting without forecasters Exponential smoothing 10
  • 30.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method Forecasting without forecasters Exponential smoothing 10
  • 31.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Forecasting without forecasters Exponential smoothing 10
  • 32.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method Forecasting without forecasters Exponential smoothing 10
  • 33.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method Forecasting without forecasters Exponential smoothing 10
  • 34.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method A,M: Multiplicative Holt-Winters’ method Forecasting without forecasters Exponential smoothing 10
  • 35.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exponential smoothing methods. Forecasting without forecasters Exponential smoothing 10
  • 36.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exponential smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Forecasting without forecasters Exponential smoothing 10
  • 37.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  • 38.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  • 39.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  • 40.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  • 41.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  • 42.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11
  • 43.
    Exponential smoothing methods SeasonalComponent Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Forecasting without forecasters Exponential smoothing 11 Innovations state space models ¯ All ETS models can be written in innovations state space form (IJF, 2002). ¯ Additive and multiplicative versions give the same point forecasts but different prediction intervals.
  • 44.
    Automatic forecasting From Hyndmanet al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  • 45.
    Automatic forecasting From Hyndmanet al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  • 46.
    Automatic forecasting From Hyndmanet al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  • 47.
    Automatic forecasting From Hyndmanet al. (IJF, 2002): Apply each of 30 models that are appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion). Select best method using AIC: AIC = −2 log(Likelihood) + 2p where p = # parameters. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Forecasting without forecasters Exponential smoothing 12
  • 48.
    Exponential smoothing Forecasting withoutforecasters Exponential smoothing 13 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600
  • 49.
    Exponential smoothing fit <-ets(livestock) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Exponential smoothing 14 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600
  • 50.
    Exponential smoothing Forecasting withoutforecasters Exponential smoothing 15 Forecasts from ETS(M,Md,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6
  • 51.
    Exponential smoothing fit <-ets(h02) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters Exponential smoothing 16 Forecasts from ETS(M,Md,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6
  • 52.
    M3 comparisons Method MAPEsMAPE MASE Theta 17.83 12.86 1.40 ForecastPro 18.00 13.06 1.47 ETS additive 18.58 13.69 1.48 ETS 19.33 13.57 1.59 Forecasting without forecasters Exponential smoothing 17
  • 53.
    References RJ Hyndman, ABKoehler, RD Snyder, and S Grose (2002). “A state space framework for automatic forecasting using exponential smoothing methods”. International Journal of Forecasting 18(3), 439–454. RJ Hyndman, AB Koehler, JK Ord, and RD Snyder (2008). Forecasting with exponential smoothing: the state space approach. Springer-Verlag. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters Exponential smoothing 18
  • 54.
    Outline 1 Motivation 2 Exponentialsmoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters ARIMA modelling 19
  • 55.
    ARIMA modelling Forecasting withoutforecasters ARIMA modelling 20 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY.
  • 56.
    ARIMA modelling Forecasting withoutforecasters ARIMA modelling 20 Classic Reference Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, 3rd ed., Wiley: NY. ¯ “There is such a bewildering variety of ARIMA models, it can be difficult to decide which model is most appropriate for a given set of data.” (MWH, p.347)
  • 57.
    Auto ARIMA Forecasting withoutforecasters ARIMA modelling 21 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  • 58.
    Auto ARIMA fit <-auto.arima(livestock) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters ARIMA modelling 22 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550
  • 59.
    Auto ARIMA Forecasting withoutforecasters ARIMA modelling 23 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  • 60.
    Auto ARIMA fit <-auto.arima(h02) fcast <- forecast(fit) plot(fcast) Forecasting without forecasters ARIMA modelling 24 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4
  • 61.
    How does auto.arima()work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
  • 62.
    How does auto.arima()work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
  • 63.
    How does auto.arima()work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 25 Algorithm choices driven by forecast accuracy.
  • 64.
    How does auto.arima()work? A seasonal ARIMA process Φ(Bm )φ(B)(1 − B)d (1 − Bm )D yt = c + Θ(Bm )θ(B)εt Need to select appropriate orders p, q, d, P, Q, D, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select D using OCSB unit root test. Select p, q, P, Q, c by minimising AIC. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Forecasting without forecasters ARIMA modelling 26
  • 65.
    M3 comparisons Method MAPEsMAPE MASE Theta 17.83 12.86 1.40 ForecastPro 18.00 13.06 1.47 BJauto 19.14 13.73 1.55 AutoARIMA 18.98 13.75 1.47 ETS-additive 18.58 13.69 1.48 ETS 19.33 13.57 1.59 ETS-ARIMA 18.17 13.11 1.44 Forecasting without forecasters ARIMA modelling 27
  • 66.
    M3 conclusions MYTHS Simple methodsdo better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  • 67.
    M3 conclusions MYTHS Simple methodsdo better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  • 68.
    M3 conclusions MYTHS Simple methodsdo better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  • 69.
    M3 conclusions MYTHS Simple methodsdo better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  • 70.
    M3 conclusions MYTHS Simple methodsdo better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  • 71.
    M3 conclusions MYTHS Simple methodsdo better. Exponential smoothing is better than ARIMA. FACTS The best methods are hybrid approaches. ETS-ARIMA (the simple average of ETS-additive and AutoARIMA) is the only fully documented method that is comparable to the M3 competition winners. I have an algorithm that does better than all of these, but it takes too long to be practical. Forecasting without forecasters ARIMA modelling 28
  • 72.
    References RJ Hyndman andY Khandakar (2008). “Automatic time series forecasting : the forecast package for R”. Journal of Statistical Software 26(3) RJ Hyndman (2011). “Major changes to the forecast package”. robjhyndman.com/hyndsight/forecast3/. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters ARIMA modelling 29
  • 73.
    Outline 1 Motivation 2 Exponentialsmoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Time series with complex seasonality 30
  • 74.
    Examples Forecasting without forecastersTime series with complex seasonality 31 US finished motor gasoline products Weeks Thousandsofbarrelsperday 1992 1994 1996 1998 2000 2002 2004 6500700075008000850090009500
  • 75.
    Examples Forecasting without forecastersTime series with complex seasonality 31 Number of calls to large American bank (7am−9pm) 5 minute intervals Numberofcallarrivals 100200300400 3 March 17 March 31 March 14 April 28 April 12 May
  • 76.
    Examples Forecasting without forecastersTime series with complex seasonality 31 Turkish electricity demand Days Electricitydemand(GW) 2000 2002 2004 2006 2008 10152025
  • 77.
    TBATS model TBATS Trigonometric termsfor seasonality Box-Cox transformations for heterogeneity ARMA errors for short-term dynamics Trend (possibly damped) Seasonal (including multiple and non-integer periods) Forecasting without forecasters Time series with complex seasonality 32
  • 78.
    Examples fit <- tbats(gasoline) fcast<- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 33 Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>}) Weeks Thousandsofbarrelsperday 1995 2000 2005 70008000900010000
  • 79.
    Examples fit <- tbats(callcentre) fcast<- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 34 Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>}) 5 minute intervals Numberofcallarrivals 0100200300400500 3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June
  • 80.
    Examples fit <- tbats(turk) fcast<- forecast(fit) plot(fcast) Forecasting without forecasters Time series with complex seasonality 35 Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>}) Days Electricitydemand(GW) 2000 2002 2004 2006 2008 2010 10152025
  • 81.
    References Automatic algorithm describedin AM De Livera, RJ Hyndman, and RD Snyder (2011). “Forecasting time series with complex seasonal patterns using exponential smoothing”. Journal of the American Statistical Association 106(496), 1513–1527. Slightly improved algorithm implemented in RJ Hyndman (2012). forecast: Forecasting functions for time series. cran.r-project.org/package=forecast. More work required! Forecasting without forecasters Time series with complex seasonality 36
  • 82.
    Outline 1 Motivation 2 Exponentialsmoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Hierarchical and grouped time series 37
  • 83.
    Introduction Total A AA AB AC B BABB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  • 84.
    Introduction Total A AA AB AC B BABB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  • 85.
    Introduction Total A AA AB AC B BABB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  • 86.
    Introduction Total A AA AB AC B BABB BC C CA CB CC Examples Manufacturing product hierarchies Pharmaceutical sales Net labour turnover Forecasting without forecasters Hierarchical and grouped time series 38
  • 87.
    Hierarchical/grouped time series Ahierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  • 88.
    Hierarchical/grouped time series Ahierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  • 89.
    Hierarchical/grouped time series Ahierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  • 90.
    Hierarchical/grouped time series Ahierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: daily numbers of calls to HP call centres are grouped by product type and location of call centre. Forecasting without forecasters Hierarchical and grouped time series 39
  • 91.
    Hierarchical data Total A BC Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  • 92.
    Hierarchical data Total A BC Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  • 93.
    Hierarchical data Total A BC Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1       YA,t YB,t YC,t   Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  • 94.
    Hierarchical data Total A BC Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  • 95.
    Hierarchical data Total A BC Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Bt Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  • 96.
    Hierarchical data Total A BC Yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Bt Yt = SBt Forecasting without forecasters Hierarchical and grouped time series 40 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t.
  • 97.
    Grouped data Total A AX AY B BXBY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Forecasting without forecasters Hierarchical and grouped time series 41
  • 98.
    Grouped data Total A AX AY B BXBY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Forecasting without forecasters Hierarchical and grouped time series 41
  • 99.
    Grouped data Total A AX AY B BXBY Total X AX BX Y AY BY Yt =            Yt YA,t YB,t YX,t YY,t YAX,t YAY,t YBX,t YBY,t            =            1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1            S    YAX,t YAY,t YBX,t YBY,t    Bt Yt = SBt Forecasting without forecasters Hierarchical and grouped time series 41
  • 100.
    Forecasts Key idea: forecastreconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
  • 101.
    Forecasts Key idea: forecastreconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
  • 102.
    Forecasts Key idea: forecastreconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42
  • 103.
    Forecasts Key idea: forecastreconciliation ¯ Ignore structural constraints and forecast every series of interest independently. ¯ Adjust forecasts to impose constraints. Let ˆYn(h) be vector of initial forecasts for horizon h, made at time n, stacked in same order as Yt. Optimal reconciled forecasts: ˜Yn(h) = S(S S)−1 S ˆYn(h) Forecasting without forecasters Hierarchical and grouped time series 42 Independent of covariance structure of hierarchy! Optimal reconciliation weights are S(S S)−1 S , independent of data.
  • 104.
    Features Forget “bottom up”or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  • 105.
    Features Forget “bottom up”or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  • 106.
    Features Forget “bottom up”or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  • 107.
    Features Forget “bottom up”or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  • 108.
    Features Forget “bottom up”or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  • 109.
    Features Forget “bottom up”or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  • 110.
    Features Forget “bottom up”or “top down”. This approach combines all forecasts optimally. Method outperforms bottom-up and top-down, especially for middle levels. Covariates can be included in base forecasts. Adjustments can be made to base forecasts at any level. Point forecasts are always aggregate consistent. Very simple and flexible method. Can work with any hierarchical or grouped time series. Conceptually easy to implement: OLS on base forecasts. Forecasting without forecasters Hierarchical and grouped time series 43
  • 111.
    Challenges Computational difficulties inbig hierarchies due to size of the S matrix and non-singular behavior of (S S). Need to estimate covariance matrix to produce prediction intervals. Forecasting without forecasters Hierarchical and grouped time series 44
  • 112.
    Challenges Computational difficulties inbig hierarchies due to size of the S matrix and non-singular behavior of (S S). Need to estimate covariance matrix to produce prediction intervals. Forecasting without forecasters Hierarchical and grouped time series 44
  • 113.
    Example using R library(hts) #bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) Forecasting without forecasters Hierarchical and grouped time series 45
  • 114.
    Example using R library(hts) #bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) Forecasting without forecasters Hierarchical and grouped time series 45 Total A AX AY B BX BY
  • 115.
    Example using R library(hts) #bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) # Forecast 10-step-ahead using optimal combination method # ETS used for each series by default fc <- forecast(y, h=10) Forecasting without forecasters Hierarchical and grouped time series 46
  • 116.
    Example using R library(hts) #bts is a matrix containing the bottom level time series # g describes the grouping/hierarchical structure y <- hts(bts, g=c(1,1,2,2)) # Forecast 10-step-ahead using optimal combination method # ETS used for each series by default fc <- forecast(y, h=10) # Select your own methods ally <- allts(y) allf <- matrix(, nrow=10, ncol=ncol(ally)) for(i in 1:ncol(ally)) allf[,i] <- mymethod(ally[,i], h=10) allf <- ts(allf, start=2004) # Reconcile forecasts so they add up fc2 <- combinef(allf, Smatrix(y)) Forecasting without forecasters Hierarchical and grouped time series 47
  • 117.
    References RJ Hyndman, RAAhmed, G Athanasopoulos, and HL Shang (2011). “Optimal combination forecasts for hierarchical time series”. Computational Statistics and Data Analysis 55(9), 2579–2589 RJ Hyndman, RA Ahmed, and HL Shang (2013). hts: Hierarchical time series. cran.r-project.org/package=hts. RJ Hyndman and G Athanasopoulos (2013). Forecasting: principles and practice. OTexts. OTexts.com/fpp/. Forecasting without forecasters Hierarchical and grouped time series 48
  • 118.
    Outline 1 Motivation 2 Exponentialsmoothing 3 ARIMA modelling 4 Time series with complex seasonality 5 Hierarchical and grouped time series 6 Functional time series Forecasting without forecasters Functional time series 49
  • 119.
    Fertility rates Forecasting withoutforecasters Functional time series 50 15 20 25 30 35 40 45 50 050100150200250 Australia: fertility rates (1921) Age Fertilityrate
  • 120.
    Functional data model Letft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  • 121.
    Functional data model Letft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  • 122.
    Functional data model Letft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  • 123.
    Functional data model Letft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  • 124.
    Functional data model Letft,x be the observed data in period t at age x, t = 1, . . . , n. ft(x) = µ(x) + K k=1 βt,k φk(x) + et(x) Forecasting without forecasters Functional time series 51 Decomposition separates time and age to allow forecasting. Estimate µ(x) as mean ft(x) across years. Estimate βt,k and φk(x) using functional (weighted) principal components. Univariate models used for automatic forecasting of scores {βt,k}.
  • 125.
    Fertility application Forecasting withoutforecasters Functional time series 52 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  • 126.
    Fertility model Forecasting withoutforecasters Functional time series 53 15 20 25 30 35 40 45 50 051015 Age Mu 15 20 25 30 35 40 45 50 0.000.100.200.30 Age Phi1 Year Beta1 1920 1960 2000 −20−100510 15 20 25 30 35 40 45 50 −0.2−0.10.00.10.2 Age Phi2 YearBeta2 1920 1960 2000 −100102030
  • 127.
    Forecasts of ft(x) Forecastingwithout forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  • 128.
    Forecasts of ft(x) Forecastingwithout forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  • 129.
    Forecasts of ft(x) Forecastingwithout forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate
  • 130.
    Forecasts of ft(x) Forecastingwithout forecasters Functional time series 54 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate 80% prediction intervals
  • 131.
    R code Forecasting withoutforecasters Functional time series 55 15 20 25 30 35 40 45 50 050100150200250 Australia fertility rates (1921−2006) Age Fertilityrate library(demography) plot(aus.fert) fit <- fdm(aus.fert) fc <- forecast(fit)
  • 132.
    References RJ Hyndman andS Ullah (2007). “Robust forecasting of mortality and fertility rates: A functional data approach”. Computational Statistics and Data Analysis 51(10), 4942–4956 RJ Hyndman and HL Shang (2009). “Forecasting functional time series (with discussion)”. Journal of the Korean Statistical Society 38(3), 199–221 RJ Hyndman (2012). demography: Forecasting mortality, fertility, migration and population data. cran.r-project.org/package=demography. Forecasting without forecasters Functional time series 56
  • 133.
    For further information robjhyndman.com Slidesand references for this talk. Links to all papers and books. Links to R packages. A blog about forecasting research. Forecasting without forecasters Functional time series 57