STAT 497_LN8_1_FORECASTING.ppt

STAT 497
LECTURE NOTES 8
FORECASTING
1

FORECASTING
• One of the most important objectives in time
series analysis is to forecast its future values.
It is the primary objective of modeling.
• ESTIMATION (tahmin)the value of an
estimator for a parameter.
• PREDICTION (kestirim)the value of a r.v.
when we use the estimates of the parameter.
• FORECASTING (öngörü)the value of a future
r.v. that is not observed by the sample.
2

FORECASTING
(Forecast)
Y
n)
(Predictio
Y
Y
)
of
(Estimate
a
Y
Y
1
t
t
t
t
t
t









ˆ
ˆ
ˆ
ˆ
1
1




3

FORECASTING FROM AN ARMA MODEL
THE MINIMUM MEAN SQUARED ERROR FORECASTS
Observed time series, Y1, Y2,…,Yn.
n: the forecast origin
4
Y1 Y2 Yn
………………….. Yn+1? Yn+2?
Observed sample
 
 
 
















n
n
n
n
2
n
n
1
n
n
Y
of
forecast
MSE
minimum
Y
of
forecast
ahead
step
Y
of
value
forecast
the
Y
Y
of
value
forecast
the
Y
Y
of
value
forecast
the
Y
ˆ
2
ˆ
1
ˆ

FORECASTING FROM AN ARMA
MODEL
5
   
sample
observed
the
given
Y
of
n
expectatio
l
conditiona
The
Y
Y
Y
Y
E
Y
n
n
n
n
n

 





 1
1 ,
,
,
ˆ

• The stationary ARMA model for Yt is
or
• Assume that we have data Y1, Y2, . . . , Yn and
we want to forecast Yn+l (i.e., l steps ahead
from forecast origin n). Then the actual value
is
    t
q
t
p a
B
Y
B 

 
 0
q
t
q
t
t
p
t
p
t
t a
a
a
Y
Y
Y 


 






 



 
 1
1
1
1
0
q
n
q
n
n
p
n
p
n
n a
a
a
Y
Y
Y 








 






 




 
 



 1
1
1
1
0
6

• Considering the Random Shock Form of the
series
7
 
 
 

 

























n
n
n
n
t
p
q
t
n
a
a
a
a
a
B
B
a
B
Y
2
2
1
1
0
0
0






• Taking the expectation of Yn+l , we have
where
8
   
















1
1
1
1 ,
,
,
ˆ
n
n
n
n
n
n
a
a
Y
Y
Y
Y
E
Y
 








0
,
0
,
0
,
, 1
j
a
j
Y
Y
a
E
j
n
n
j
n 

• The forecast error:
9
   




















1
0
1
1
1
1
ˆ









i
i
n
i
n
n
n
n
n
n
a
a
a
a
Y
Y
e
• The expectation of the forecast error:  
  0


n
e
E
• So, the forecast in unbiased.
• The variance of the forecast error:
 
  






 







1
0
2
2
1
0




i
i
a
i
i
n
i
n a
Var
e
Var 

• One step-ahead (l=1)
10
 
   
 
  2
1
1
1
2
1
0
1
2
1
1
0
1
1
1
ˆ
1
1
ˆ
a
n
n
n
n
n
n
n
n
n
n
n
n
e
Var
a
Y
Y
e
a
a
Y
a
a
a
Y





























• Two step-ahead (l=2)
11
 
   
 
   
2
1
2
1
1
2
2
2
0
2
1
1
2
0
2
1
2
2
ˆ
2
2
ˆ

























a
n
n
n
n
n
n
n
n
n
n
n
n
e
Var
a
a
Y
Y
e
a
Y
a
a
a
Y






• Note that,
• That’s why ARMA (or ARIMA) forecasting is
useful only for short-term forecasting.
12
 
 
  








0
lim
0
ˆ
lim






n
n
e
Var
Y

PREDICTION INTERVAL FOR Yn+l
• A 95% prediction interval for Yn+l (l steps ahead)
is
13
   
 
  




1
0
2
96
.
1
ˆ
96
.
1
ˆ




i
i
a
n
n
n
Y
e
Var
Y

For one step-ahead the simplifies to
  a
n
Y 
96
.
1
1
ˆ 
For two step-ahead the simplifies to
  2
1
1
96
.
1
2
ˆ 

 a
n
Y 
• When computing prediction intervals from data, we substitute
estimates for parameters, giving approximate prediction intervals

REASONS NEEDING A LONG REALIZATION
• Estimate correlation structure (i.e., the ACF and PACF)
functions and get accurate standard errors.
• Estimate seasonal pattern (need at least 4 or 5
seasonal periods).
• Approximate prediction intervals assume that
parameters are known (good approximation if
realization is large).
• Fewer estimation problems (likelihood function better
behaved).
• Possible to check forecasts by withholding recent data .
• Can check model stability by dividing data and
analyzing both sides.
14

REASONS FOR USING A PARSIMONIOUS
MODEL
• Fewer numerical problems in estimation.
• Easier to understand the model.
• With fewer parameters, forecasts less sensitive to
deviations between parameters and estimates.
• Model may applied more generally to similar
processes.
• Rapid real-time computations for control or other
action.
• Having a parsimonious model is less important if
the realization is large.
15

EXAMPLES
• AR(1)
• MA(1)
• ARMA(1,1)
16

UPDATING THE FORECASTS
• Let’s say we have n observations at time t=n
and find a good model for this series and
obtain the forecast for Yn+1, Yn+2 and so on. At
t=n+1, we observe the value of Yn+1. Now, we
want to update our forecasts using the original
value of Yn+1 and the forecasted value of it.
17

The forecast error is
We can also write this as
18
    








1
0
ˆ


 

i
i
n
i
n
n
n a
Y
Y
e
   
 
n
e
i
i
n
i
i
i
n
i
i
i
n
i
n
n
n
a
a
a
a
Y
Y
e
n













































1
0
0
0
1
1
1
1
1
1 1
ˆ
1

19
   
   
     
 
     
 
1
ˆ
1
ˆ
ˆ
1
ˆ
1
ˆ
ˆ
1
ˆ
ˆ
ˆ
1
ˆ
1
1
1
1
1
1
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
Y
Y
Y
Y
Y
Y
Y
Y
a
Y
Y
a
Y
Y
Y
Y










































n=100
     
 
1
ˆ
2
ˆ
1
ˆ 100
101
1
100
101 Y
Y
Y
Y 




FORECASTS OF THE TRANSFORMED
SERIES
• If you use variance stabilizing transformation,
after the forecasting, you have to convert the
forecasts for the original series.
• If you use log-transformation, you have to
consider the fact that
20
       
 
 
n
n
n
n Y
ln
,
,
Y
ln
Y
ln
E
exp
Y
,
,
Y
Y
E 
 
 1
1 
 

FORECASTS OF THE TRANSFORMED
SERIES
• If X has a normal distribution with mean 
and variance 2,
• Hence, the minimum mean square error
forecast for the original series is given by
21
 
  .
exp
X
exp
E










2
2


   
   
n n
1
ˆ
exp where Z ln Y
2
n n
Z Var Z  
 
 
 
 
 
n
n Z
,
,
Z
Z
E 
 1


  
n
n Z
,
,
Z
Z
Var 
 1
2




MEASURING THE FORECAST
ACCURACY
22
Mean Absolute Scaled Error MASE the mean absolute error of the forecast
values, divided by the mean absolute
error of the in-sample one-step naive forecast

ACCURACY
23

ACCURACY
24

ACCURACY
25

MOVING AVERAGE AND EXPONENTIAL
SMOOTHING
• This is a forecasting procedure based on a
simple updating equations to calculate
forecasts using the underlying pattern of the
series. Not based on ARIMA approach.
• Recent observations are expected to have
more power in forecasting values so a model
can be constructed that places more weight
on recent observations than older
observations.
26

MOVING AVERAGE AND EXPONENTIAL
SMOOTHING
• Smoothed curve (eliminate up-and-down
movement)
• Trend
• Seasonality
27

SIMPLE MOVING AVERAGES
• 3 periods moving averages
Yt = (Yt-1 + Yt-2 + Yt-3)/3
• Also, 5 periods MA can be considered.
28
Period Actual 3 Quarter MA Forecast 5 Quarter MA forecast
Mar-83 239.3 Missing Missing
Jun-83 239.8 Missing Missing
Sep-83 236.1 Missing Missing
Dec-83 232 238.40 Missing
Mar-84 224.75 235.97 Missing
Jun-84 237.45 230.95 234.39
Sep-84 245.4 231.40 234.02
Dec-84 251.58 235.87 235.14
… So on..

• One can impose weights and use weighted
moving averages (WMA).
Eg Y t = 0.6Yt-1 + 0.3Yt-2+ 0.1Yt-2
• How many periods to use is a question; more
significant smoothing-out effect with longer lags.
• Peaks and troughs (bottoms) are not predicted.
• Events are being averaged out.
• Since any moving average is serially correlated,
any sequence of random numbers could appear
to exhibit cyclical fluctuation.
29

• Exchange Rates: Forecasts using the SMA(3)
model
30
Date Rate
Three-Quarter
Moving Average
Three-Quarter
Forecast
Mar-85 257.53 missing missing
Jun-85 250.81 missing missing
Se-85 238.38 248.90 missing
Dec-85 207.18 232.12 248.90
Mar-86 187.81 211.12 232.12

SIMPLE EXPONENTIAL SMOOTHING
(SES)
• Suppressing short-run fluctuation by
smoothing the series
• Weighted averages of all previous values with
more weights on recent values
• No trend, No seasonality
31

(SES)
• Observed time series
Y1, Y2, …, Yn
• The equation for the model is
where : the smoothing parameter, 0    1
Yt: the value of the observation at time t
St: the value of the smoothed obs. at time t.
32
  1
1 1 
 

 t
t
t S
Y
S 


(SES)
• The equation can also be written as
• Then, the forecast is
33
 





error
forecast
the
t
t
t
t S
Y
S
S 1
1
1 

 

 
 
 
t
t
t
t
t
t
S
Y
S
S
Y
S









 1
1

(SES)
• Why Exponential?: For the observed time series
Y1,Y2,…,Yn, Yn+1 can be expressed as a weighted
sum of previous observations.
where ci’s are the weights.
• Giving more weights to the recent observations,
we can use the geometric weights (decreasing
by a constant ratio for every unit increase in
lag).
34
  



 
 2
2
1
1
0
1
ˆ t
t
t
t Y
c
Y
c
Y
c
Y
  .
1
0
,...;
1
,
0
;
1 




 

 i
c
i
i

(SES)
• Then,
35
       
     
1
ˆ
1
1
ˆ
1
1
1
1
ˆ
1
2
2
1
1
0













t
t
t
t
t
t
t
Y
Y
Y
Y
Y
Y
Y







 
St+1 St

(SES)
• Remarks on  (smoothing parameter).
– Choose  between 0 and 1.
– If  = 1, it becomes a naive model; if  is close to
1, more weights are put on recent values. The
model fully utilizes forecast errors.
– If  is close to 0, distant values are given weights
comparable to recent values. Choose  close to 0
when there are big random variations in the data.
–  is often selected as to minimize the MSE.
36

(SES)
• Remarks on  (smoothing parameter).
– In empirical works, 0.05    0.3 commonly used.
Values close to 1 are used rarely.
– Numerical Minimization Process:
• Take different  values ranging between 0 and 1.
• Calculate 1-step-ahead forecast errors for each .
• Calculate MSE for each case.
• Choose  which has the min MSE.
37







n
t
t
t
t
t e
min
S
Y
e
1
2

(SES)
• EXAMPLE:
38
Time Yt St+1 (=0.10) (YtSt)2
1 5 - -
2 7 (0.1)5+(0.9)5=5 4
3 6 (0.1)7+(0.9)5=5.2 0.64
4 3 (0.1)6+(0.9)5.2=5.08 5.1984
5 4 (0.1)3+(0.9)5.08=5.052 1.107
TOTAL 10.945
74
.
2
1



n
SSE
MSE
• Calculate this for =0.2, 0.3,…,0.9, 1 and compare
the MSEs. Choose  with minimum MSE

(SES)
• Some softwares automatically chooses the
optimal  using the search method or non-
linear optimization techniques.
INITIAL VALUE PROBLEM
1.Setting S1 to Y1 is one method of initialization.
2.Take the average of, say first 4 or 5
observations and use this as an initial value.
39

DOUBLE EXPONENTIAL SMOOTHING
OR HOLT’S EXPONENTIAL SMOOTHING
• Introduce a Trend factor to the simple
exponential smoothing method
• Trend, but still no seasonality
SES + Trend = DES
• Two equations are needed now to handle the
trend.
40
  
    1
0
,
1
1
0
,
1
1
1
1
1
1























t
t
t
t
t
t
t
t
T
S
S
T
T
S
Y
S
Trend term is the expected increase or decrease per unit time
period in the current level (mean level)

HOLT’S EXPONENTIAL SMOOTHING
• Two parameters :
 = smoothing parameter
 = trend coefficient
• h-step ahead forecast at time t is
• Trend prediction is added in the h-step ahead
forecast.
41
  t
t
t hT
S
h
Y 

ˆ
Current level Current slope

• Now, we have two updated equations. The
first smoothing equation adjusts St directly for
the trend of the previous period Tt-1 by adding
it to the last smoothed value St-1. This helps to
bring St to the appropriate base of the current
value. The second smoothing equation
updates the trend which is expressed as the
difference between last two values.
42

• Initial value problem:
– S1 is set to Y1
– T1=Y2Y1 or (YnY1)/(n1)
 and  can be chosen as
the value between 0.02< ,<0.2
or by minimizing the MSE as in SES.
43

• Example: (use  = 0.6, =0.7; S1 = 4, T1= 1)
44
Holt Holt
time Yt St Tt
1 3 4 1
2 5 3.8 0.64
3 4 4.78 0.74
4 - 4.78+0.74
5 - 4.78+2*0.74

HOLT-WINTER’S EXPONENTIAL
SMOOTHING
• Introduce both Trend and Seasonality factors
• Seasonality can be added additively or
multiplicatively.
• Model (multiplicative):
45
  
   
  s
t
t
t
t
t
t
t
t
t
t
s
t
t
t
I
S
Y
I
T
S
S
T
T
S
I
Y
S
























1
1
1
1
1
1
1
1

SMOOTHING
Here, (Yt /St) captures seasonal effects.
s = # of periods in the seasonal cycles
(s = 4, for quarterly data)
Three parameters :
 = smoothing parameter
 = trend coefficient
 = seasonality coefficient
46

SMOOTHING
• h-step ahead forecast
• Seasonal factor is multiplied in the h-step
ahead forecast
 , and  can be chosen as
the value between 0.02< ,,<0.2
or by minimizing the MSE as in SES.
47
    s
h
t
t
t
t I
hT
S
h
Y 



ˆ

SMOOTHING
• To initialize Holt-Winter, we need at least one
complete season’s data to determine the
initial estimates of It-s.
• Initial value:
48
s
s
Y
s
Y
T
or
s
Y
Y
s
Y
Y
s
Y
Y
s
T
s
Y
S
s
s
t
t
s
t
t
0
s
s
s
s
s
s
t
t
/
/
/
1
.
2
/
.
1
2
1
1
2
2
1
1
0
1
0











 













 

















SMOOTHING
• For the seasonal index, say we have 6 years
and 4 quarter (s=4).
STEPS TO FOLLOW
STEP 1: Compute the averages of each of 6
years.
49
averages
yearly
The
,
,
,
n
,
/
Y
A
i
i
n 




6
2
1
4
4
1


SMOOTHING
• STEP 2: Divide the observations by the
appropriate yearly mean.
50
Year 1 2 3 4 5 6
Q1 Y1/A1 Y5/A2 Y9/A3 Y13/A4 Y17/A5 Y21/A6
Q2 Y2/A1 Y6/A2 Y10/A3 Y14/A4 Y18/A5 Y22/A6
Q3 Y3/A1 Y7/A2 Y11/A3 Y15/A4 Y19/A5 Y23/A6
Q4 Y4/A1 Y8/A2 Y12/A3 Y16/A4 Y20/A5 Y24/A6

SMOOTHING
• STEP 3: The seasonal indices are formed by
computing the average of each row such that
51
6
/
6
/
6
/
6
/
6
24
5
20
4
16
3
12
2
8
1
4
4
6
23
5
19
4
15
3
11
2
7
1
3
3
6
22
5
18
4
14
3
10
2
6
1
2
2
6
21
5
17
4
13
3
9
2
5
1
1
1
















































A
Y
A
Y
A
Y
A
Y
A
Y
A
Y
I
A
Y
A
Y
A
Y
A
Y
A
Y
A
Y
I
A
Y
A
Y
A
Y
A
Y
A
Y
A
Y
I
A
Y
A
Y
A
Y
A
Y
A
Y
A
Y
I

SMOOTHING
• Note that, if a computer program selects 0 for  and , this
does not mean that there is no trend or seasonality.
• For Simple Exponential Smoothing, a level weight near zero
implies that simple differencing of the time series may be
appropriate.
• For Holt Exponential Smoothing, a level weight near zero
implies that the smoothed trend is constant and that an
ARIMA model with deterministic trend may be a more
appropriate model.
• For Winters Method and Seasonal Exponential Smoothing,
a seasonal weight near one implies that a nonseasonal
model may be more appropriate and a seasonal weight
near zero implies that deterministic seasonal factors may
be present.
52

EXAMPLE
> HoltWinters(beer)
Holt-Winters exponential smoothing with trend and additive seasonal component.
Call:
HoltWinters(x = beer)
Smoothing parameters:
alpha: 0.1884622
beta : 0.3068298
gamma: 0.4820179
Coefficients:
[,1]
a 50.4105781
b 0.1134935
s1 -2.2048105
s2 4.3814869
s3 2.1977679
s4 -6.5090499
s5 -1.2416780
s6 4.5036243
s7 2.3271515
s8 -5.6818213
s9 -2.8012536
s10 5.2038114
s11 3.3874876
s12 -5.6261644
53

EXAMPLE (Contd.)
> beer.hw<-HoltWinters(beer)
> predict(beer.hw,n.ahead=12)
Jan Feb Mar Apr May Jun Jul Aug
1963
1964 49.73637 55.59516 53.53218 45.63670 48.63077 56.74932 55.04649 46.14634
Sep Oct Nov Dec
1963 48.31926 55.01905 52.94883 44.35550
54

ADDITIVE VS MULTIPLICATIVE
SEASONALITY
• Seasonal components can be additive in nature or multiplicative.
For example, during the month of December the sales for a
particular toy may increase by 1 million dollars every year. Thus,
we could add to our forecasts for every December the amount of
1 million dollars (over the respective annual average) to account
for this seasonal fluctuation. In this case, the seasonality is
additive.
• Alternatively, during the month of December the sales for a
particular toy may increase by 40%, that is, increase by a factor of
1.4. Thus, when the sales for the toy are generally weak, than the
absolute (dollar) increase in sales during December will be
relatively weak (but the percentage will be constant); if the sales
of the toy are strong, than the absolute (dollar) increase in sales
will be proportionately greater. Again, in this case the sales
increase by a certain factor, and the seasonal component is thus
multiplicative in nature (i.e., the multiplicative seasonal
component in this case would be 1.4).
55

SEASONALITY
• In plots of the series, the distinguishing characteristic
between these two types of seasonal components is
that in the additive case, the series shows steady
seasonal fluctuations, regardless of the overall level of
the series; in the multiplicative case, the size of the
seasonal fluctuations vary, depending on the overall
level of the series.
• Additive model:
Forecastt = St + It-s
• Multiplicative model:
Forecastt = St*It-s
56

SEASONALITY
57

Exponential Smoothing Models
1. No trend and additive
seasonal variability (1,0)
2. Additive seasonal variability with
an additive trend (1,1)
3. Multiplicative seasonal variability
with an additive trend (2,1)
with a multiplicative trend (2,2)

Exponential Smoothing Models
• Select the type of model to fit based on the presence of
– Trend – additive or multiplicative, dampened or not
– Seasonal variability – additive or multiplicative
5. Dampened trend with additive
seasonal variability (1,1)
and dampened trend (2,2)

Forecast profiles from exponential smoothing adapted from E. Gardner,
Journal of Forecasting, Vol. 4 (1985)
60

OTHER METHODS
(i) Adaptive-response smoothing
.. Choose  from the data using the smoothed and
absolute forecast errors
(ii) Additive Winter’s Models
.. The seasonality equation is modified.
(iii) Gompertz Curve
.. Progression of new products
(iv) Logistics Curve
.. Progression of new products (also with a limit, L)
(v) Bass Model
61

EXPONENTIAL SMOOTING IN R
General notation: ETS(Error,Trend,Seasonal)
ExponenTial Smoothing
ETS(A,N,N): Simple exponential smoothing with additive errors
ETS(A,A,N): Holt's linear method with additive errors
ETS(A,A,A): Additive Holt-Winters' method with additive errors
62

From Hyndman et al. (2008):
• Apply each of 30 methods that are appropriate to
the data. Optimize parameters and initial values
using MLE (or some other method).
• Select best method using AIC:
AIC = -2 log(Likelihood) + 2p
where p = # parameters.
• Produce forecasts using the best method.
• Obtain prediction intervals using underlying state
space model (this part is done by R automatically).
63
***http://robjhyndman.com/research/Rtimeseries_handout.pdf

ets() function
• Automatically chooses a model by default using the
AIC
• Can handle any combination of trend, seasonality
and damping
• Produces prediction intervals for every model.
• Normality of errors assumption has to be satisfied.
Otherwise, we can not trust the prediction intervals.
• Ensures the parameters are admissible (equivalent to
invertible)
64
***http://robjhyndman.com/research/Rtimeseries_handout.pdf

65
Source: https://robjhyndman.com/papers/hksg.pdf

> library(tseries)
> library(forecast)
> library(expsmooth)
> fit=ets(beer)
> fit2 <- ets(beer,model="MNM",damped=FALSE)
> fcast1 <- forecast(fit, h=24)
> fcast2 <- forecast(fit2, h=24)
sigma: 1.2714
AIC AICc BIC
478.1877 480.3828 500.8838
67
R automatically finds the best model.
We are defining the model as MNM

> fit
ETS(A,Ad,A)
alpha = 0.0739
beta = 0.0739
gamma = 0.213
phi = 0.9053
Initial states:
l = 38.2918
b = 0.6085
s=-5.9572 3.6056 5.1923 -2.8407
sigma: 1.2714
AIC AICc BIC
478.1877 480.3828 500.8838
68

> fit2
ETS(M,N,M)
Call:
ets(y = beer, model = "MNM", damped = FALSE)
alpha = 0.3689
gamma = 0.3087
Initial states:
l = 39.7259
s=0.8789 1.0928 1.108 0.9203
sigma: 0.0296
AIC AICc BIC
490.9042 491.8924 506.0349
70

• GOODNESS-OF-FIT
> accuracy(fit)
ME RMSE MAE MPE MAPE MASE
0.1007482 1.2714088 1.0495752 0.1916268 2.2306151 0.1845166
> accuracy(fit2)
ME RMSE MAE MPE MAPE MASE
0.2596092 1.3810629 1.1146970 0.5444713 2.3416001 0.1959651
72
The smaller is the better.

> plot(forecast(fit,level=c(50,80,95)))
73

> plot(forecast(fit2,level=c(50,80,95)))
74

FACEBOOK’S PROPHET MODEL
• The prophet model is capable of modelling the time
series with strong multiple seasonalities at day level,
week level, year level etc. and trend.
• It has intuitive parameters that a not-so-expert data
scientist can tune for better forecasts.
• At its core, it is an additive regressive model which can
detect change points to model the time series.
• Prophet decomposes the time series into components
of trend gt, seasonality St and holidays ht.
75
• Sources: https://www.analyticsvidhya.com/blog/2018/05/generate-accurate-forecasts-facebook-
prophet-python-r/
• https://www.england.nhs.uk/wp-content/uploads/2020/01/advanced-forecasting-techniques.pdf
• https://mode.com/example-gallery/forecasting_prophet_r_cookbook/
• https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-part-3-
predicting-the-future-with-facebook-prophet-3f3af145cdc

The Prophet Forecasting Model
• A decomposable time series model with three
main model components: trend, seasonality, and
holidays. They are combined in the following equation:
y(t)=g(t)+s(t)+h(t)+t
• g(t): piecewise linear or logistic growth curve for
modelling non-periodic changes in time series
• s(t): periodic changes (e.g. weekly/yearly seasonality)
• h(t): effects of holidays (user provided) with irregular
schedules
• εt: error term accounts for any unusual changes not
accommodated by the model
76

PROPHET
• Model Fitting: When the seasonality and
holiday features for each observation are
combined into a matrix X and the changepoint
indicators a(t) in a matrix A, the entire model
can be fitted by Stan's L-BFGS to find a
maximum a posteriori estimate (Carpenter et
al. 2017).
78

• Using time as a regressor, Prophet is trying to fit
several linear and non linear functions of time as
components.
• Modeling seasonality as an additive component is
the same approach taken by exponential smoothing
in Holt-Winters technique.
• This approach considers the forecasting problem as a
curve-fitting exercise rather than looking explicitly at
the time based dependence of each observation
within a time series.
79

• The basic methodology is an iterative curve-
matching routine, where Prophet will then
train your data on a bigger period, then
predict again and this will repeat until the end
point is reached.
• Fully automated routine - recognizes repeating
patterns over weeks, months, years, and
identified holidays.
• Can be configured to take account of irregular
holidays 80

• To fit and forecast the effects of seasonality, prophet
relies on fourier series to provide a flexible model.
Seasonal effects s(t) are approximated by the following
function:
• P is the period (365.25 for yearly data and 7 for weekly
data).
• Parameters [a1, b1, ….., aN, bN] need to be estimated for
a given N to model seasonality.
• The fourier order N that defines high frequency
changes. It is a tuning parameter. 81

• Holidays and events incur predictable shocks to a
time series. For instance, Ramadan occurs on
different days each year and a large portion of the
population buy a lot of items during this period.
• Prophet allows the analyst to provide a custom list
of past and future events. A window around such
days are considered separately and additional
parameters are fitted to model the effect of holidays
and events.
82

Forecasting in R with Prophet
• Prophet works best with daily periodicity data
with at least one year of historical data.
• We have our data at a daily periodicity: Daily
Orders
library(prophet)
83

Summary of Prophet
• The Prophet procedure is an additive regression
model with four main components:
1. A piecewise linear or logistic growth curve trend.
Prophet automatically detects changes in trends by
selecting changepoints from the data.
2. A yearly seasonal component modeled
using Fourier series.
3. A weekly seasonal component using dummy
variables.
4. A user-provided list of important holidays.
84

• We will use a well-known stock (AAPL) for an
example and plan to apply the forecasting
technique on it. This is based on a daily
frequency of five years of historical data of the
stock. You can download the CSV file from
Yahoo Finance
(https://sg.finance.yahoo.com/quote/AAPL/hi
story?p=AAPL )
library(prophet)
aapl=read.csv('aapl.csv')
85

# Remove unwanted columns and just keep Close column
aapl_C=select(aapl,-c(Open, High, Low,Adj.Close,Volume))
str(aapl_C)
'data.frame': 253 obs. of 2 variables:
$ Date : Factor w/ 253 levels "2019-11-06","2019-11-
07",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Close: num 64.3 64.9 65 65.6 65.5 ...
# Rename the columns of Date and Close as ds and y for the
prophet technique to work
aapl=aapl_C %>% rename(ds=Date, y=Close)
• As our time-frequency is based on the daily
close, we will apply the forecast with a period
of 365 days or a year. Followed by the predict
function to get our projections.
87

• Lastly, we would create an additional vector with the
following columns namely: “yhat”, “yhat_lower” and
“yhat_upper”. These variables represent the predicted
value, lower predicted and upper predicted values
respectively.
> Fore=prophet(aapl,daily.seasonality=TRUE)
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to
override this.
> forecast=make_future_dataframe(Fore, periods=365)
> aapl_fc=predict(Fore, forecast)
> tail(aapl_fc[c("ds", "yhat","yhat_lower", "yhat_upper")])
ds yhat yhat_lower yhat_upper
613 2021-10-31 96.87971 -39.35249 233.2583
614 2021-11-01 96.57712 -39.37143 231.8056
615 2021-11-02 96.73694 -40.17679 236.6399
616 2021-11-03 97.00014 -41.40404 235.0813
617 2021-11-04 96.59606 -38.64420 235.1065
618 2021-11-05 96.25696 -39.78927 236.4799 88

plot(Fore, aapl_fc, xlab="Years", ylab="Price")
89

> prophet_plot_components(Fore, aapl_fc)
90
Breakdown of
forecast components
such as trend, yearly
seasonality, and
weekly seasonality
with the
prophet_plot_
components function
in the previous code.

• If you prefer an interactive plot
#Interactive Plot
dyplot.prophet(Fore, aapl_fc)
91

• add_country_holidays
plot(Fore, aapl_fc) + add_changepoints_to_plot(m)
• add_country_holidays: Add in built-in holidays
for the specified country.
add_country_holidays(m, country_name)
m: prophet object
county_name= name of the country “UnitedStates” or
“US”
Holidays will be calculated for arbitrary date
ranges in the history and future.
92

Example of adding holiday effects to the model
playoffs = pd.DataFrame({
'holiday': 'playoff',
'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
'2010-01-24', '2010-02-07', '2011-01-08',
'2013-01-12', '2014-01-12', '2014-01-19',
'2014-02-02', '2015-01-11', '2016-01-17',
'2016-01-24', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
superbowls = pd.DataFrame({
'holiday': 'superbowl',
'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
holidays = pd.concat((playoffs, superbowls))
93

m = Prophet(holidays=holidays)
fc = predict(m, forecast)
m.plot_components(fc);
94

Forecasting in R with Prophet
• For another example with data exploration
please visit
https://app.mode.com/modeanalytics/reports
/fe3ef2574877/details/notebook
95

TBATS
• Trigonometric seasonality, Box-Cox transformation, ARIMA errors,
Trend and Seasonal components. TBATS evaluates multiple
forecasting techniques against a training dataset, and picks the
‘best’ method based on some performance measures.
• TBATS uses a combination of Fourier terms with an exponential
smoothing state space model and a Box-Cox transformation, in a
completely automated manner.
• In a TBATS model the seasonality is allowed to change slowly over
time, while other methods force the seasonal patterns to repeat
periodically without changing.
• A downside of TBATS model, however, is that they can be slow to
estimate, especially with long time series. As TBATS is automated
sometimes the prediction is not useful, due to automated
parameters that do not represent the reality of the observed
variable. 96

TBATS
• TBATS will consider various models, such us:
– with Box-Cox transformation and without it.
– with and without Trend
– with and without Trend Damping
– with and without ARIMA process used to model residuals
– non-seasonal model
– various amounts of harmonics used to model seasonal effects
• The model with the lowest AIC score will be selected as
the final method.
• See also De Livera, A.M., Hyndman, R.J., & Snyder, R. D. (2011),
Forecasting time series with complex seasonal patterns using
exponential smoothing, Journal of the American Statistical
Association, 106(496), 1513-1527 97

TBATS
• We can extend exponential smoothing models to
accommodate T seasonal patterns as follows:
98
where m1, . . . ,mT denote the seasonal
periods, lt and bt represent the level
and trend components of the series at
time t, respectively, s(i)t represents the
ith seasonal component at time t, dt
denotes an ARMA(p, q) process and t
is a Gaussian white noise process with
zero mean and constant variance. The
smoothing parameters are given by α,
β. i for i = 1, . . . , T, and is the
dampening parameter, which gives
more control over trend extrapolation
when the trend component is damped.

TBATS
• Trigonometric representation of seasonal
components based on Fourier series. We could
replace the equation for s(i)t in the BATS model with
99
where (i)1 and (i)2 are the smoothing parameters and
(i)j = 2j/mi.
• For even mi values, ki = mi /2 and for odd mi values
ki = (mi − 1)/2 where ki is the number of harmonics
that is needed for the ith seasonal component.

TBATS
• Example: A time series giving the monthly
totals of accidental deaths in the USA.
fit <- tbats(USAccDeaths)
plot(forecast(fit))
100
TBATS(omega, p,q, phi, <m1,k1>, ...,
<mJ,kJ>) where omega is the Box-Cox
parameter and phi is the damping
parameter; the error is modelled as an
ARMA(p,q) process and m1,...,mJ list
the seasonal periods used in the
model and k1,...,kJ are the
corresponding number of Fourier terms
used for each seasonality.
Omega=1No Box-Cox
transformation
(0,0)  No ARMA(p,q)
.  No Damping parameter
<12, 5>  s=12 and # of terms used in
Fourier series=5

Multiple Seasonality
101
• If the frequency of observations is greater than once per
week, then there is usually more than one way of handling
the frequency.
• For example, data with daily observations might have a
weekly seasonality (frequency=7) or an annual seasonality
(frequency=365.25).
• Similarly, data that are observed every minute might have an
hourly seasonality (frequency=60), a daily seasonality
(frequency=24x60=1440), a weekly seasonality
(frequency=24x60x7=10080) and an annual seasonality
(frequency=24x60x365.25=525960).
• If you want to use a ts object, then you need to decide which
of these is the most important.

An alternative is to use a msts object (defined in
the forecast package) which handles multiple
seasonality time series. Then you can specify all
the frequencies that might be relevant. It is also
flexible enough to handle non-integer
frequencies.
102
Frequencies
Data Minute Hour Day Week Year
Daily 7 365.25
Hourly 24 168 8766
Half-hourly 48 336 17532
Minutes 60 1440 10080 525960
Seconds 60 3600 86400 604800 31557600

TBATS
• For example, the taylor data set from the
forecast package contains half-hourly
electricity demand data from England and
Wales over about 3 months in 2000. It was
defined as
library(forecast)
taylor <- msts(x, seasonal.periods=c(48,336))
• One convenient model for multiple seasonal
time series is a TBATS model:
taylor.fit <- tbats(taylor)
plot(forecast(taylor.fit))
103

ts.plot(x)
104
> head(taylor,n=15)
Multi-Seasonal Time Series:
Start: 1 1
Seasonal Periods: 48 336
Data:
[1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000
[8] 0.0000000 0.0000000 0.0000000 0.0000000 0.9980915 0.8480038
0.9884929
[15] 1.5881351

TBATS
• Strengths
– Box-cox transformation can deal with data with non-linearity and then
somewhat makes the variance becomes constant.
– ARMA model on residuals can solve autocorrelation problem.
– No need to worry about initial values.
– Can get not only point prediction but also interval prediction.
– The performance is better than simple state space model.
– Can deal with data with non-integer seasonal period, non-nested
periods and high frequency data.
– Can do multi-seasonality without increasing too many parameters.
• Weaknesses
– The assumption of εt∼NID(0,σ2) may not hold.
– Can not add explanatory variables.
– The performance for long-term prediction is not very well.
– The computation cost is big if the data size is large. 106

STAT 497_LN8_1_FORECASTING.ppt

Recommended

Recommended

More Related Content

Similar to STAT 497_LN8_1_FORECASTING.ppt

Similar to STAT 497_LN8_1_FORECASTING.ppt (20)

Recently uploaded

Recently uploaded (20)

STAT 497_LN8_1_FORECASTING.ppt