SlideShare a Scribd company logo
1 of 116
Download to read offline
Social Forecasting
Lecture 3: Smoothing and Arima
Thomas Chadefaux
1
Time series components
Time series patterns
Trend pattern exists when there is a long-term increase or
decrease in the data.
Cyclic pattern exists when data exhibit rises and falls that
are not of fixed period (duration usually of at least 2
years).
Seasonal pattern exists when a series is influenced by seasonal
factors (e.g., the quarter of the year, the month, or
day of the week).
2
Time series decomposition
yt = f (St, Tt, Rt)
where yt = data at period t
Tt = trend-cycle component at period t
St = seasonal component at period t
Rt = remainder component at period t
Additive decomposition: yt = St + Tt + Rt.
Multiplicative decomposition: yt = St × Tt × Rt.
3
Time series decomposition
• Additive model appropriate if magnitude of seasonal
fluctuations does not vary with level.
• If seasonal are proportional to level of series, then
multiplicative model appropriate.
• Multiplicative decomposition more prevalent with
economic series
• technical (can ignore): Alternative: use a Box-Cox
transformation, and then use additive decomposition,
since Logs turn multiplicative relationship into an
additive relationship:
yt = St×Tt×Et ⇒ log yt = log St+log Tt+log Rt.
4
Euro electrical equipment
Monthly manufacture of electrical equipment: computer, electronic
and optical products. January 1996 - March 2012.
data
trend
seasonal
remainder
60
80
100
120
80
90
100
110
−20
−10
0
10
−4
0
4
5
Helper functions
• seasonal() extracts the seasonal component
• trendcycle() extracts the trend-cycle component
• remainder() extracts the remainder component.
• seasadj() returns the seasonally adjusted series.
6
Your turn
Repeat the decomposition using
library(fpp2)
elecequip %>%
stl(s.window=15, t.window=2) %>%
autoplot()
What happens as you change s.window and t.window?
7
Seasonal adjustment
Seasonal adjustment
• Useful by-product of decomposition: an easy way to calculate
seasonally adjusted data.
• Additive decomposition: seasonally adjusted data given by
yt − St = Tt + Rt
8
Euro electrical equipment
fit <- stl(elecequip, s.window=7)
autoplot(elecequip, series="Data") +
autolayer(seasadj(fit), series="Seasonally Adjusted")
80
100
120
New
orders
index
series
Data
Seasonally Adjusted
Electrical equipment manufacturing (Euro area)
9
Seasonal adjustment
• We use estimates of S based on past values to seasonally
adjust a current value.
• Seasonally adjusted series reflect remainders as well as trend.
Therefore they are not “smooth”” and “downturns”” or
“upturns” can be misleading.
• It is better to use the trend-cycle component to look for
turning points.
10
How to decompose?
1. Compute the trend T̂t using a moving average over the past m
values
2. Calculate the detrended series: yt − T̂t
3. Calculate the seasonal component: average the detrended
values for that season. E.g. the seasonal component for March
is the average of all the detrended March values in the data.
Then normalize these to sum to 0. This gives Ŝt
4. The remainder component is calculated by subtracting the
estimated seasonal and trend-cycle components:
R̂t = yt − T̂t − Ŝt
11
Smoothing
Simple methods
Time series y1, y2, . . . , yT .
Random walk forecasts
ŷT+h|T = yT
Average forecasts
ŷT+h|T =
1
T
T
X
t=1
yt
• Want something in between that weights most
recent data more highly.
• Simple exponential smoothing uses a weighted
moving average with weights that decrease
exponentially. 12
Simple Exponential Smoothing
Forecast equation
ŷT+1|T = αyT + α(1 − α)yT−1 + α(1 − α)2
yT−2 + · · ·
where 0 ≤ α ≤ 1.
Weights assigned to observations for:
Observation α = 0.2 α = 0.4 α = 0.6 α = 0.8
yT 0.2 0.4 0.6 0.8
yT−1 0.16 0.24 0.24 0.16
yT−2 0.128 0.144 0.096 0.032
yT−3 0.1024 0.0864 0.0384 0.0064
yT−4 (0.2)(0.8)4
(0.4)(0.6)4
(0.6)(0.4)4
(0.8)(0.2)4
yT−5 (0.2)(0.8)5
(0.4)(0.6)5
(0.6)(0.4)5
(0.8)(0.2)5
13
Optimisation
• Need to choose value for α
• Similarly to regression — we choose α by minimising SSE:
SSE =
T
X
t=1
(yt − ŷt|t−1)2
.
• Unlike regression there is no closed form solution — use
numerical optimization.
14
Example: Oil production
oildata <- window(oil, start=1996)
# Estimate parameters
fc <- ses(oildata, h=5)
summary(fc[["model"]])
## Simple exponential smoothing
##
## Call:
## ses(y = oildata, h = 5)
##
## Smoothing parameters:
## alpha = 0.8339
##
## Initial states:
## l = 446.5868
##
## sigma: 29.8282
##
## AIC AICc BIC
15
Example: Oil production
Year Time Observation Level Forecast
1995 0 446.59
1996 1 445.36 445.57 446.59
1997 2 453.20 451.93 445.57
1998 3 454.41 454.00 451.93
1999 4 422.38 427.63 454.00
2000 5 456.04 451.32 427.63
2001 6 440.39 442.20 451.32
2002 7 425.19 428.02 442.20
2003 8 486.21 476.54 428.02
2004 9 500.43 496.46 476.54
2005 10 521.28 517.15 496.46
2006 11 508.95 510.31 517.15
2007 12 488.89 492.45 510.31
2008 13 509.87 506.98 492.45
2009 14 456.72 465.07 506.98
2010 15 473.82 472.36 465.07
2011 16 525.95 517.05 472.36
2012 17 549.83 544.39 517.05
2013 18 542.34 542.68 544.39
2014 19
2014 1 542.68
2015 2 542.68
2016 3 542.68 16
Example: Oil production
autoplot(fc) +
autolayer(fitted(fc), series="Fitted") +
ylab("Oil (millions of tonnes)") + xlab("Year")
450
500
550
600
Oil
(millions
of
tonnes)
series
Fitted
Forecasts from Simple exponential smoothing
17
Autocorrelation
Autocorrelation
Autocorrelation measures the linear relationship between lagged
values of a time series
rk =
PT
t=k+1(yt − ȳ)(yt−k − ȳ)
PT
t=1(yt − ȳ)2
18
Autocorrelation
400
450
500
1995 2000 2005 2010
Time
beer2
Beer production
## [1] "Cor(beer, beer.lag) = -0.103"
19
Trend and seasonality in ACF
aelec <- window(elec, start=1980)
autoplot(aelec) + xlab("Year") + ylab("GWh")
8000
10000
12000
14000
GWh
20
Trend and seasonality in ACF
ggAcf(aelec, lag=48)
0.00
0.25
0.50
0.75
0 12 24 36 48
Lag
ACF
Series: aelec
21
Stationarity and differencing
Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of
(yt, . . . , yt+s) does not depend on t.
A stationary series is:
• roughly horizontal
• constant variance
• no patterns predictable in the long-term
22
Stationary?
3600
3700
3800
3900
4000
0 50 100 150 200 250 300
Day
Dow
Jones
Index
23
Stationary?
−100
−50
0
50
0 50 100 150 200 250 300
Day
Change
in
Dow
Jones
Index
24
Stationary?
4000
5000
6000
1950 1955 1960 1965 1970 1975 1980
Year
Number
of
strikes
25
Stationary?
40
60
80
1975 1980 1985 1990 1995
Year
Total
sales
Sales of new one−family houses, USA
26
Stationary?
100
200
300
1900 1920 1940 1960 1980
Year
$
Price of a dozen eggs in 1993 dollars
27
Stationary?
80
90
100
110
1990 1991 1992 1993 1994 1995
Year
thousands
Number of pigs slaughtered in Victoria
28
Stationary?
0
2000
4000
6000
1820 1840 1860 1880 1900 1920
Year
Number
trapped
Annual Canadian Lynx Trappings
29
Stationary?
400
450
500
1995 2000 2005 2010
Year
megalitres
Australian quarterly beer production
30
Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of
(yt, . . . , yt+s) does not depend on t.
Transformations help to stabilize the variance.
For ARIMA modelling, we also need to stabilize the mean.
31
Non-stationarity in the mean
Identifying non-stationary series
• time plot.
• The ACF of stationary data drops to zero relatively quickly
• The ACF of non-stationary data decreases slowly.
• For non-stationary data, the value of r1 is often large and
positive.
32
Example: Dow-Jones index
3600
3700
3800
3900
4000
0 50 100 150 200 250 300
Day
Dow
Jones
Index
33
Example: Dow-Jones index
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25
Lag
ACF
Series: dj
34
Example: Dow-Jones index
−100
−50
0
50
0 50 100 150 200 250 300
Day
Change
in
Dow
Jones
Index
35
Example: Dow-Jones index
−0.10
−0.05
0.00
0.05
0.10
0 5 10 15 20 25
Lag
ACF
Series: diff(dj)
36
Differencing
• Differencing helps to stabilize the mean.
• The differenced series is the change between each observation
in the original series: y0
t = yt − yt−1.
• The differenced series will have only T − 1 values since it is not
possible to calculate a difference y0
1 for the first observation.
37
Second-order differencing
Occasionally the differenced data will not appear stationary and it
may be necessary to difference the data a second time:
y00
t = y0
t − y0
t−1
= (yt − yt−1) − (yt−1 − yt−2)
= yt − 2yt−1 + yt−2.
• y00
t will have T − 2 values.
• In practice, it is almost never necessary to go beyond
second-order differences.
38
Seasonal differencing
A seasonal difference is the difference between an observation and
the corresponding observation from the previous year.
y0
t = yt − yt−m
where m = number of seasons.
• For monthly data m = 12.
• For quarterly data m = 4.
39
Electricity production
usmelec %>% autoplot()
200
300
400
1980 1990 2000 2010
Time
.
40
Electricity production
usmelec %>% log() %>% autoplot()
5.1
5.4
5.7
6.0
1980 1990 2000 2010
Time
.
41
Electricity production
usmelec %>% log() %>% diff(lag=12) %>%
autoplot()
0.0
0.1
1980 1990 2000 2010
Time
.
42
Electricity production
usmelec %>% log() %>% diff(lag=12) %>%
diff(lag=1) %>% autoplot()
−0.15
−0.10
−0.05
0.00
0.05
0.10
1980 1990 2000 2010
Time
.
43
Electricity production
• Seasonally differenced series is closer to being stationary.
• Remaining non-stationarity can be removed with further first
difference.
If y0
t = yt − yt−12 denotes seasonally differenced series, then
twice-differenced series i
y∗
t = y0
t − y0
t−1
= (yt − yt−12) − (yt−1 − yt−13)
= yt − yt−1 − yt−12 + yt−13 .
44
Seasonal differencing
When both seasonal and first differences are applied:
• it makes no difference which is done first—the result will be
the same.
• If seasonality is strong, we recommend that seasonal
differencing be done first because sometimes the resulting
series will be stationary and there will be no need for further
first difference.
It is important that if differencing is used, the differences are
interpretable.
45
Interpretation of differencing
• first differences are the change between one observation and
the next;
• seasonal differences are the change between one year to the
next.
But taking lag 3 differences for yearly data, for example, results in a
model which cannot be sensibly interpreted.
46
ARIMA MODELS
Autoregressive models
Autoregressive (AR) models:
yt = c + φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt,
where εt is white noise. This is a multiple regression with lagged
values of yt as predictors.
8
10
12
0 20 40 60 80 100
Time
AR(1)
15.0
17.5
20.0
22.5
25.0
0 20 40 60 80 100
Time
AR(2)
47
AR(1) model
yt = 2 − 0.8yt−1 + εt
εt ∼ N(0, 1), T = 100.
8
10
12
0 20 40 60 80 100
Time
AR(1)
48
AR(1) model
yt = c + φ1yt−1 + εt
• When φ1 = 0, yt is equivalent to white noise
• When φ1 = 1 and c = 0, yt is equivalent to a random walk
• When φ1 = 1 and c 6= 0, yt is equivalent to a random walk
with drift
• When φ1 < 0, yt tends to oscillate between positive and
negative values.
49
AR(2) model
yt = 8 + 1.3yt−1 − 0.7yt−2 + εt
εt ∼ N(0, 1), T = 100.
15.0
17.5
20.0
22.5
25.0
0 20 40 60 80 100
Time
AR(2)
50
Stationarity conditions
We normally restrict autoregressive models to stationary data, and
then some constraints on the values of the parameters are required.
51
Moving Average (MA) models
Moving Average (MA) models:
yt = c + εt + θ1εt−1 + θ2εt−2 + · · · + θqεt−q,
where εt is white noise. This is a multiple regression with past
errors as predictors. Don’t confuse this with moving average
smoothing!
18
20
22
0 20 40 60 80 100
Time
MA(1)
−5.0
−2.5
0.0
2.5
0 20 40 60 80 100
Time
MA(2)
## MA(1) model
yt = 20 + εt + 0.8εt−1
εt ∼ N(0, 1), T = 100.
52
ARIMA modelling in R
How does auto.arima() work?
Need to select appropriate orders: p, q, d end{block}
Hyndman and Khandakar (JSS, 2008) algorithm:
• Select no. differences d via KPSS test (tests for a unit root) .
• Select p, q by minimising AICc.
• Use stepwise search to traverse model space.
53
How does auto.arima() work?
AICc = −2 log(L) + 2(p + q + k + 1)
h
1 + (p+q+k+2)
T−p−q−k−2
i
.
where L is the maximised likelihood fitted to the differenced
data, k = 1 if c 6= 0 and k = 0 otherwise.
Step1: Select current model (with smallest AICc) from:
ARIMA(2, d, 2)
ARIMA(0, d, 0)
ARIMA(1, d, 0)
ARIMA(0, d, 1)
Step 2: Consider variations of current model:
• vary one of p, q, from current model by ±1;
• p, q both vary from current model by ±1;
• Include/exclude c from current model.
Model with lowest AICc becomes current model.
54
Choosing your own model
Number of users logged on to an internet server each minute over a
100-minute period.
ggtsdisplay(internet)
80
120
160
200
0 20 40 60 80 100
−0.5
0.0
0.5
1.0
5 10 15 20
Lag
ACF
−0.5
0.0
0.5
1.0
5 10 15 20
Lag
PACF
55
Choosing your own model
ggtsdisplay(diff(internet))
−15
−10
−5
0
5
10
15
0 20 40 60 80 100
−0.4
0.0
0.4
0.8
5 10 15 20
Lag
ACF
−0.4
0.0
0.4
0.8
5 10 15 20
Lag
PACF
56
Choosing your own model
(fit <- Arima(internet,order=c(1,1,0)))
## Series: internet
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## 0.8026
## s.e. 0.0580
##
## sigma^2 = 11.79: log likelihood = -262.62
## AIC=529.24 AICc=529.36 BIC=534.43
57
Choosing your own model
ggtsdisplay(resid(fit))
−5
0
5
0 20 40 60 80 100
−0.2
0.0
0.2
5 10 15 20
Lag
ACF
−0.2
0.0
0.2
5 10 15 20
Lag
PACF
58
Choosing your own model
(fit <- Arima(internet,order=c(2,1,0)))
## Series: internet
## ARIMA(2,1,0)
##
## Coefficients:
## ar1 ar2
## 1.0449 -0.2966
## s.e. 0.0961 0.0961
##
## sigma^2 = 10.85: log likelihood = -258.09
## AIC=522.18 AICc=522.43 BIC=529.96
59
Choosing your own model
ggtsdisplay(resid(fit))
−5
0
5
0 20 40 60 80 100
−0.2
−0.1
0.0
0.1
0.2
5 10 15 20
Lag
ACF
−0.2
−0.1
0.0
0.1
0.2
5 10 15 20
Lag
PACF
60
Choosing your own model
(fit <- Arima(internet,order=c(3,1,0)))
## Series: internet
## ARIMA(3,1,0)
##
## Coefficients:
## ar1 ar2 ar3
## 1.1513 -0.6612 0.3407
## s.e. 0.0950 0.1353 0.0941
##
## sigma^2 = 9.656: log likelihood = -252
## AIC=511.99 AICc=512.42 BIC=522.37
61
Choosing your own model
ggtsdisplay(resid(fit))
−5
0
5
0 20 40 60 80 100
−0.2
−0.1
0.0
0.1
0.2
5 10 15 20
Lag
ACF
−0.2
−0.1
0.0
0.1
0.2
5 10 15 20
Lag
PACF
62
Choosing your own model
auto.arima(internet)
## Series: internet
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.6504 0.5256
## s.e. 0.0842 0.0896
##
## sigma^2 = 9.995: log likelihood = -254.15
## AIC=514.3 AICc=514.55 BIC=522.08
63
Choosing your own model
checkresiduals(fit)
−5
0
5
0 20 40 60 80 100
Residuals from ARIMA(3,1,0)
−0.2
−0.1
0.0
0.1
0.2
5 10 15 20
Lag
ACF
0
5
10
15
20
−10 −5 0 5 10
residuals
df$y
64
Choosing your own model
fit %>% forecast %>% autoplot
100
150
200
250
0 30 60 90
Time
internet
Forecasts from ARIMA(3,1,0)
65
Modelling procedure with Arima
1. Plot the data. Identify any unusual observations.
2. If necessary, transform the data (e.g. by logging it) to
stabilize the variance.
3. If the data are non-stationary: take first differences of the
data until the data are stationary.
4. Examine the ACF/PACF: Is an AR(p) or MA(q) model
appropriate?
5. Try your chosen model(s), and use the AICc to search for a
better model.
6. Check the residuals from your chosen model by plotting the
ACF of the residuals, and doing a portmanteau test of the
residuals. If they do not look like white noise, try a
modified model.
7. Once the residuals look like white noise, calculate forecasts.
66
Modelling procedure with auto.arima
1. Plot the data. Identify any unusual observations.
2. If necessary, transform the data (using a Box-Cox
transformation) to stabilize the variance.
3. Use auto.arima to select a model.
6. Check the residuals from your chosen model by plotting the
ACF of the residuals, and doing a portmanteau test of the
residuals. If they do not look like white noise, try a
modified model.
7. Once the residuals look like white noise, calculate forecasts.
67
Seasonally adjusted electrical equipment
eeadj <- seasadj(stl(elecequip, s.window="periodic"))
autoplot(eeadj) + xlab("Year") +
ylab("Seasonally adjusted new orders index")
80
90
100
110
2000 2005 2010
Year
Seasonally
adjusted
new
orders
index
68
Another example: Seasonally adjusted electrical equipment
1. Time plot shows sudden changes, particularly big drop in
2008/2009 due to global economic environment. Otherwise
nothing unusual and no need for data adjustments.
2. No evidence of changing variance, so no Box-Cox
transformation.
3. Data are clearly non-stationary, so we take first differences.
69
Seasonally adjusted electrical equipment
ggtsdisplay(diff(eeadj))
−10
−5
0
5
10
2000 2005 2010
−0.2
0.0
0.2
12 24 36
Lag
ACF
−0.2
0.0
0.2
12 24 36
Lag
PACF
70
Seasonally adjusted electrical equipment
4. PACF is suggestive of AR(3). So initial candidate model is
ARIMA(3,1,0).
5. Fit ARIMA(3,1,0) model along with variations: ARIMA(4,1,0),
ARIMA(2,1,0), ARIMA(3,1,1), etc. ARIMA(3,1,1) has smallest
AICc value.
71
Seasonally adjusted electrical equipment
(fit <- Arima(eeadj, order=c(3,1,1)))
## Series: eeadj
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 0.0044 0.0916 0.3698 -0.3921
## s.e. 0.2201 0.0984 0.0669 0.2426
##
## sigma^2 = 9.577: log likelihood = -492.69
## AIC=995.38 AICc=995.7 BIC=1011.72
72
Seasonally adjusted electrical equipment
6. ACF plot of residuals from ARIMA(3,1,1) model look like white
noise.
checkresiduals(fit, test=F)
−10
−5
0
5
10
2000 2005 2010
Residuals from ARIMA(3,1,1)
−0.15
−0.10
−0.05
0.00
0.05
0.10
0.15
12 24 36
Lag
ACF
0
10
20
30
−10 −5 0 5 10
residuals
df$y
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,1)
## Q* = 24.034, df = 20, p-value = 0.2409 73
Seasonally adjusted electrical equipment
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,1)
## Q* = 24.034, df = 20, p-value = 0.2409
##
## Model df: 4. Total lags used: 24
74
Seasonally adjusted electrical equipment
fit %>% forecast %>% autoplot
60
80
100
120
2000 2005 2010 2015
Time
eeadj
Forecasts from ARIMA(3,1,1)
75
Forecasting
Point forecasts
1. Rearrange ARIMA equation so yt is on LHS.
2. Rewrite equation by replacing t by T + h.
3. On RHS, replace future observations by their forecasts, future
errors by zero, and past errors by corresponding residuals.
Start with h = 1. Repeat for h = 2, 3, . . ..
76
Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1 − φ1B − φ2B2
− φ3B3
)(1 − B)yt = (1 + θ1B)εt,
h
1 − (1 + φ1)B + (φ1 − φ2)B2
+ (φ2 − φ3)B3
+ φ3B4
i
yt
= (1 + θ1B)εt,
yt − (1 + φ1)yt−1 + (φ1 − φ2)yt−2 + (φ2 − φ3)yt−3
+ φ3yt−4 = εt + θ1εt−1.
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3
− φ3yt−4 + εt + θ1εt−1.
77
Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3
− φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2
− φ3yT−3 + εT+1 + θ1εT .
ARIMA(3,1,1) forecasts: Step 3
ŷT+1|T = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2
− φ3yT−3 + θ1eT .78
Point forecasts (h=2)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3
− φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+2 = (1 + φ1)yT+1 − (φ1 − φ2)yT − (φ2 − φ3)yT−1
− φ3yT−2 + εT+2 + θ1εT+1.
ARIMA(3,1,1) forecasts: Step 3
ŷT+2|T = (1 + φ1)ŷT+1|T − (φ1 − φ2)yT − (φ2 − φ3)yT−1
− φ3yT−2.
79
European quarterly retail trade
euretail %>% diff(lag=4) %>% ggtsdisplay()
−2
0
2
2000 2005 2010
0.0
0.5
4 8 12 16
Lag
ACF
0.0
0.5
4 8 12 16
Lag
PACF
80
European quarterly retail trade
euretail %>% diff(lag=4) %>% diff() %>%
ggtsdisplay()
−2
−1
0
1
2000 2005 2010
−0.4
−0.2
0.0
0.2
4 8 12 16
Lag
ACF
−0.4
−0.2
0.0
0.2
4 8 12 16
Lag
PACF
81
European quarterly retail trade
• d = 1 and D = 1 seems necessary.
• Significant spike at lag 1 in ACF suggests non-seasonal MA(1)
component.
• Significant spike at lag 4 in ACF suggests seasonal MA(1)
component.
• Initial candidate model: ARIMA(0,1,1)(0,1,1).
82
European quarterly retail trade
fit <- Arima(euretail, order=c(0,1,1),
seasonal=c(0,1,1))
checkresiduals(fit)
−1.0
−0.5
0.0
0.5
1.0
2000 2005 2010
Residuals from ARIMA(0,1,1)(0,1,1)[4]
−0.2
−0.1
0.0
0.1
0.2
0.3
4 8 12 16
Lag
ACF
0
5
10
15
−1.0 −0.5 0.0 0.5 1.0
residuals
df$y
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,1,1)[4] 83
European quarterly retail trade
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 10.654, df = 6, p-value = 0.09968
##
## Model df: 2. Total lags used: 8
84
European quarterly retail trade
• ACF and PACF of residuals show significant spikes at lag 2,
and maybe lag 3.
• AICc of ARIMA(0,1,2)(0,1,1)4 model is 74.27.
• AICc of ARIMA(0,1,3)(0,1,1)4 model is 68.39.
fit <- Arima(euretail, order=c(0,1,3),
seasonal=c(0,1,1))
checkresiduals(fit)
85
European quarterly retail trade
## Series: euretail
## ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.2630 0.3694 0.4200 -0.6636
## s.e. 0.1237 0.1255 0.1294 0.1545
##
## sigma^2 = 0.156: log likelihood = -28.63
## AIC=67.26 AICc=68.39 BIC=77.65
86
European quarterly retail trade
checkresiduals(fit)
−1.0
−0.5
0.0
0.5
2000 2005 2010
Residuals from ARIMA(0,1,3)(0,1,1)[4]
−0.2
−0.1
0.0
0.1
0.2
4 8 12 16
Lag
ACF
0
5
10
15
−1.0 −0.5 0.0 0.5 1.0
residuals
df$y
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,3)(0,1,1)[4] 87
European quarterly retail trade
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,3)(0,1,1)[4]
## Q* = 0.51128, df = 4, p-value = 0.9724
##
## Model df: 4. Total lags used: 8
88
European quarterly retail trade
autoplot(forecast(fit, h=12))
90
95
100
2000 2005 2010 2015
Time
euretail
Forecasts from ARIMA(0,1,3)(0,1,1)[4]
89
European quarterly retail trade
auto.arima(euretail)
## Series: euretail
## ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.2630 0.3694 0.4200 -0.6636
## s.e. 0.1237 0.1255 0.1294 0.1545
##
## sigma^2 = 0.156: log likelihood = -28.63
## AIC=67.26 AICc=68.39 BIC=77.65
90
If time allows
Cortecosteroid drug sales
H02
sales
(million
scripts)
Log
H02
sales
1995 2000 2005
0.50
0.75
1.00
1.25
−0.8
−0.4
0.0
Year
91
Cortecosteroid drug sales
ggtsdisplay(diff(lh02,12), xlab="Year",
main="Seasonally differenced H02 scripts")
0.0
0.2
0.4
1995 2000 2005
Year
Seasonally differenced H02 scripts
0.0
0.2
0.4
ACF
0.0
0.2
0.4
PACF
92
Cortecosteroid drug sales
• Choose D = 1 and d = 0.
• Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term.
• Spikes in PACF suggests possible non-seasonal AR(3) term.
• Initial candidate model: ARIMA(3,0,0)(2,1,0)12.
93
Cortecosteroid drug sales
Model AICc
ARIMA(3,0,1)(0,1,2)12 -485.48
ARIMA(3,0,1)(1,1,1)12 -484.25
ARIMA(3,0,1)(0,1,1)12 -483.67
ARIMA(3,0,1)(2,1,0)12 -476.31
ARIMA(3,0,0)(2,1,0)12 -475.12
ARIMA(3,0,2)(2,1,0)12 -474.88
ARIMA(3,0,1)(1,1,0)12 -463.40
94
Cortecosteroid drug sales
(fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2),
lambda=0))
## Series: h02
## ARIMA(3,0,1)(0,1,2)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ma1
## -0.1603 0.5481 0.5678 0.3827
## s.e. 0.1636 0.0878 0.0942 0.1895
## sma1 sma2
## -0.5222 -0.1768
## s.e. 0.0861 0.0872
##
## sigma^2 = 0.004278: log likelihood = 250.04
## AIC=-486.08 AICc=-485.48 BIC=-463.28 95
Cortecosteroid drug sales
checkresiduals(fit, lag=36)
−0.2
−0.1
0.0
0.1
0.2
1995 2000 2005
Residuals from ARIMA(3,0,1)(0,1,2)[12]
−0.1
0.0
0.1
12 24 36
Lag
ACF
0
10
20
30
−0.2 −0.1 0.0 0.1 0.2
residuals
df$y
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,1)(0,1,2)[12]
## Q* = 50.712, df = 30, p-value = 0.01045 96
Cortecosteroid drug sales
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,1)(0,1,2)[12]
## Q* = 50.712, df = 30, p-value = 0.01045
##
## Model df: 6. Total lags used: 36
97
Cortecosteroid drug sales
(fit <- auto.arima(h02, lambda=0))
## Series: h02
## ARIMA(2,1,1)(0,1,2)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ma1 sma1
## -1.1358 -0.5753 0.3683 -0.5318
## s.e. 0.1608 0.0965 0.1884 0.0838
## sma2
## -0.1817
## s.e. 0.0881
##
## sigma^2 = 0.004278: log likelihood = 248.25
## AIC=-484.51 AICc=-484.05 BIC=-465
98
Cortecosteroid drug sales
checkresiduals(fit, lag=36)
−0.2
−0.1
0.0
0.1
0.2
1995 2000 2005
Residuals from ARIMA(2,1,1)(0,1,2)[12]
−0.1
0.0
0.1
12 24 36
Lag
ACF
0
10
20
30
−0.2 −0.1 0.0 0.1 0.2
residuals
df$y
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,1)(0,1,2)[12]
## Q* = 51.096, df = 31, p-value = 0.01298
## 99
Cortecosteroid drug sales
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,1)(0,1,2)[12]
## Q* = 51.096, df = 31, p-value = 0.01298
##
## Model df: 5. Total lags used: 36
100
Cortecosteroid drug sales
(fit <- auto.arima(h02, lambda=0, max.order=9,
stepwise=FALSE, approximation=FALSE))
## Series: h02
## ARIMA(4,1,1)(2,1,2)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ar4
## -0.0425 0.2098 0.2017 -0.2273
## s.e. 0.2167 0.1813 0.1144 0.0810
## ma1 sar1 sar2 sma1
## -0.7424 0.6213 -0.3832 -1.2019
## s.e. 0.2074 0.2421 0.1185 0.2491
## sma2
## 0.4959
## s.e. 0.2135
##
## sigma^2 = 0.004049: log likelihood = 254.31
## AIC=-488.63 AICc=-487.4 BIC=-456.1
101
Cortecosteroid drug sales
checkresiduals(fit, lag=36)
−0.2
−0.1
0.0
0.1
1995 2000 2005
Residuals from ARIMA(4,1,1)(2,1,2)[12]
−0.1
0.0
0.1
12 24 36
Lag
ACF
0
10
20
30
−0.2 −0.1 0.0 0.1 0.2
residuals
df$y
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,1)(2,1,2)[12]
## Q* = 36.456, df = 27, p-value = 0.1057
## 102
Cortecosteroid drug sales
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,1)(2,1,2)[12]
## Q* = 36.456, df = 27, p-value = 0.1057
##
## Model df: 9. Total lags used: 36
103
Cortecosteroid drug sales
Training data: July 1991 to June 2006
Test data: July 2006–June 2008
getrmse <- function(x,h,...)
{
train.end <- time(x)[length(x)-h]
test.start <- time(x)[length(x)-h+1]
train <- window(x,end=train.end)
test <- window(x,start=test.start)
fit <- Arima(train,...)
fc <- forecast(fit,h=h)
return(accuracy(fc,test)[2,"RMSE"])
}
getrmse(h02,h=24,order=c(3,0,0),seasonal=c(2,1,0),lambda=0)
getrmse(h02,h=24,order=c(3,0,1),seasonal=c(2,1,0),lambda=0)
getrmse(h02,h=24,order=c(3,0,2),seasonal=c(2,1,0),lambda=0)
getrmse(h02,h=24,order=c(3,0,1),seasonal=c(1,1,0),lambda=0) 104
Cortecosteroid drug sales
Model RMSE
ARIMA(4,1,1)(2,1,2)[12] 0.0615
ARIMA(3,0,1)(0,1,2)[12] 0.0622
ARIMA(3,0,1)(1,1,1)[12] 0.0630
ARIMA(2,1,4)(0,1,1)[12] 0.0632
ARIMA(2,1,3)(0,1,1)[12] 0.0634
ARIMA(3,0,3)(0,1,1)[12] 0.0638
ARIMA(2,1,5)(0,1,1)[12] 0.0640
ARIMA(3,0,1)(0,1,1)[12] 0.0644
ARIMA(3,0,2)(0,1,1)[12] 0.0644
ARIMA(3,0,2)(2,1,0)[12] 0.0645
ARIMA(3,0,1)(2,1,0)[12] 0.0646
ARIMA(3,0,0)(2,1,0)[12] 0.0661
ARIMA(3,0,1)(1,1,0)[12] 0.0679 105
Cortecosteroid drug sales
• Models with lowest AICc values tend to give slightly better
results than the other models.
• AICc comparisons must have the same orders of differencing.
But RMSE test set comparisons can involve any models.
• Use the best model available, even if it does not pass all tests.
106
Cortecosteroid drug sales
fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2),
lambda=0)
autoplot(forecast(fit)) +
ylab("H02 sales (million scripts)") + xlab("Year")
0.5
1.0
1.5
1995 2000 2005 2010
Year
H02
sales
(million
scripts)
Forecasts from ARIMA(3,0,1)(0,1,2)[12]
107

More Related Content

Similar to Social Forecasting Lecture 3: Smoothing and ARIMA Time Series Models

Is the Macroeconomy Locally Unstable and Why Should We Care?
Is the Macroeconomy Locally Unstable and Why Should We Care?Is the Macroeconomy Locally Unstable and Why Should We Care?
Is the Macroeconomy Locally Unstable and Why Should We Care?ADEMU_Project
 
Trend and seasonal component/abshor.marantika/kelompok 12
Trend and seasonal component/abshor.marantika/kelompok 12Trend and seasonal component/abshor.marantika/kelompok 12
Trend and seasonal component/abshor.marantika/kelompok 12Linlin2611
 
Trend and seasonal component/Abshor.Marantika - kelompok 12
Trend and seasonal component/Abshor.Marantika - kelompok 12Trend and seasonal component/Abshor.Marantika - kelompok 12
Trend and seasonal component/Abshor.Marantika - kelompok 12Linlin2611
 
Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...
Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...
Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...Eesti Pank
 
2b. Decomposition.pptx
2b. Decomposition.pptx2b. Decomposition.pptx
2b. Decomposition.pptxgigol12808
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)inventionjournals
 
Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838ohenebabismark508
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualLewisSimmonss
 
Optimal debt maturity management
Optimal debt maturity managementOptimal debt maturity management
Optimal debt maturity managementADEMU_Project
 
Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...Bowei Chen
 

Similar to Social Forecasting Lecture 3: Smoothing and ARIMA Time Series Models (20)

Is the Macroeconomy Locally Unstable and Why Should We Care?
Is the Macroeconomy Locally Unstable and Why Should We Care?Is the Macroeconomy Locally Unstable and Why Should We Care?
Is the Macroeconomy Locally Unstable and Why Should We Care?
 
Panel data
Panel dataPanel data
Panel data
 
1notes
1notes1notes
1notes
 
lect1
lect1lect1
lect1
 
Trend and seasonal component/abshor.marantika/kelompok 12
Trend and seasonal component/abshor.marantika/kelompok 12Trend and seasonal component/abshor.marantika/kelompok 12
Trend and seasonal component/abshor.marantika/kelompok 12
 
Trend and seasonal component/Abshor.Marantika - kelompok 12
Trend and seasonal component/Abshor.Marantika - kelompok 12Trend and seasonal component/Abshor.Marantika - kelompok 12
Trend and seasonal component/Abshor.Marantika - kelompok 12
 
Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...
Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...
Boris Blagov. Financial Crises and Time-Varying Risk Premia in a Small Open E...
 
Forecast2007
Forecast2007Forecast2007
Forecast2007
 
2b. Decomposition.pptx
2b. Decomposition.pptx2b. Decomposition.pptx
2b. Decomposition.pptx
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
Forecasting-Seasonal Models.ppt
Forecasting-Seasonal Models.pptForecasting-Seasonal Models.ppt
Forecasting-Seasonal Models.ppt
 
Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838
 
Exam binder 1
Exam binder 1Exam binder 1
Exam binder 1
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions Manual
 
DCT
DCTDCT
DCT
 
DCT
DCTDCT
DCT
 
DCT
DCTDCT
DCT
 
Optimal debt maturity management
Optimal debt maturity managementOptimal debt maturity management
Optimal debt maturity management
 
Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...
 
Econometrics - lecture 20 and 21
Econometrics - lecture 20 and 21Econometrics - lecture 20 and 21
Econometrics - lecture 20 and 21
 

Recently uploaded

Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 

Recently uploaded (20)

Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 

Social Forecasting Lecture 3: Smoothing and ARIMA Time Series Models

  • 1. Social Forecasting Lecture 3: Smoothing and Arima Thomas Chadefaux 1
  • 3. Time series patterns Trend pattern exists when there is a long-term increase or decrease in the data. Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years). Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). 2
  • 4. Time series decomposition yt = f (St, Tt, Rt) where yt = data at period t Tt = trend-cycle component at period t St = seasonal component at period t Rt = remainder component at period t Additive decomposition: yt = St + Tt + Rt. Multiplicative decomposition: yt = St × Tt × Rt. 3
  • 5. Time series decomposition • Additive model appropriate if magnitude of seasonal fluctuations does not vary with level. • If seasonal are proportional to level of series, then multiplicative model appropriate. • Multiplicative decomposition more prevalent with economic series • technical (can ignore): Alternative: use a Box-Cox transformation, and then use additive decomposition, since Logs turn multiplicative relationship into an additive relationship: yt = St×Tt×Et ⇒ log yt = log St+log Tt+log Rt. 4
  • 6. Euro electrical equipment Monthly manufacture of electrical equipment: computer, electronic and optical products. January 1996 - March 2012. data trend seasonal remainder 60 80 100 120 80 90 100 110 −20 −10 0 10 −4 0 4 5
  • 7. Helper functions • seasonal() extracts the seasonal component • trendcycle() extracts the trend-cycle component • remainder() extracts the remainder component. • seasadj() returns the seasonally adjusted series. 6
  • 8. Your turn Repeat the decomposition using library(fpp2) elecequip %>% stl(s.window=15, t.window=2) %>% autoplot() What happens as you change s.window and t.window? 7
  • 10. Seasonal adjustment • Useful by-product of decomposition: an easy way to calculate seasonally adjusted data. • Additive decomposition: seasonally adjusted data given by yt − St = Tt + Rt 8
  • 11. Euro electrical equipment fit <- stl(elecequip, s.window=7) autoplot(elecequip, series="Data") + autolayer(seasadj(fit), series="Seasonally Adjusted") 80 100 120 New orders index series Data Seasonally Adjusted Electrical equipment manufacturing (Euro area) 9
  • 12. Seasonal adjustment • We use estimates of S based on past values to seasonally adjust a current value. • Seasonally adjusted series reflect remainders as well as trend. Therefore they are not “smooth”” and “downturns”” or “upturns” can be misleading. • It is better to use the trend-cycle component to look for turning points. 10
  • 13. How to decompose? 1. Compute the trend T̂t using a moving average over the past m values 2. Calculate the detrended series: yt − T̂t 3. Calculate the seasonal component: average the detrended values for that season. E.g. the seasonal component for March is the average of all the detrended March values in the data. Then normalize these to sum to 0. This gives Ŝt 4. The remainder component is calculated by subtracting the estimated seasonal and trend-cycle components: R̂t = yt − T̂t − Ŝt 11
  • 15. Simple methods Time series y1, y2, . . . , yT . Random walk forecasts ŷT+h|T = yT Average forecasts ŷT+h|T = 1 T T X t=1 yt • Want something in between that weights most recent data more highly. • Simple exponential smoothing uses a weighted moving average with weights that decrease exponentially. 12
  • 16. Simple Exponential Smoothing Forecast equation ŷT+1|T = αyT + α(1 − α)yT−1 + α(1 − α)2 yT−2 + · · · where 0 ≤ α ≤ 1. Weights assigned to observations for: Observation α = 0.2 α = 0.4 α = 0.6 α = 0.8 yT 0.2 0.4 0.6 0.8 yT−1 0.16 0.24 0.24 0.16 yT−2 0.128 0.144 0.096 0.032 yT−3 0.1024 0.0864 0.0384 0.0064 yT−4 (0.2)(0.8)4 (0.4)(0.6)4 (0.6)(0.4)4 (0.8)(0.2)4 yT−5 (0.2)(0.8)5 (0.4)(0.6)5 (0.6)(0.4)5 (0.8)(0.2)5 13
  • 17. Optimisation • Need to choose value for α • Similarly to regression — we choose α by minimising SSE: SSE = T X t=1 (yt − ŷt|t−1)2 . • Unlike regression there is no closed form solution — use numerical optimization. 14
  • 18. Example: Oil production oildata <- window(oil, start=1996) # Estimate parameters fc <- ses(oildata, h=5) summary(fc[["model"]]) ## Simple exponential smoothing ## ## Call: ## ses(y = oildata, h = 5) ## ## Smoothing parameters: ## alpha = 0.8339 ## ## Initial states: ## l = 446.5868 ## ## sigma: 29.8282 ## ## AIC AICc BIC 15
  • 19. Example: Oil production Year Time Observation Level Forecast 1995 0 446.59 1996 1 445.36 445.57 446.59 1997 2 453.20 451.93 445.57 1998 3 454.41 454.00 451.93 1999 4 422.38 427.63 454.00 2000 5 456.04 451.32 427.63 2001 6 440.39 442.20 451.32 2002 7 425.19 428.02 442.20 2003 8 486.21 476.54 428.02 2004 9 500.43 496.46 476.54 2005 10 521.28 517.15 496.46 2006 11 508.95 510.31 517.15 2007 12 488.89 492.45 510.31 2008 13 509.87 506.98 492.45 2009 14 456.72 465.07 506.98 2010 15 473.82 472.36 465.07 2011 16 525.95 517.05 472.36 2012 17 549.83 544.39 517.05 2013 18 542.34 542.68 544.39 2014 19 2014 1 542.68 2015 2 542.68 2016 3 542.68 16
  • 20. Example: Oil production autoplot(fc) + autolayer(fitted(fc), series="Fitted") + ylab("Oil (millions of tonnes)") + xlab("Year") 450 500 550 600 Oil (millions of tonnes) series Fitted Forecasts from Simple exponential smoothing 17
  • 22. Autocorrelation Autocorrelation measures the linear relationship between lagged values of a time series rk = PT t=k+1(yt − ȳ)(yt−k − ȳ) PT t=1(yt − ȳ)2 18
  • 23. Autocorrelation 400 450 500 1995 2000 2005 2010 Time beer2 Beer production ## [1] "Cor(beer, beer.lag) = -0.103" 19
  • 24. Trend and seasonality in ACF aelec <- window(elec, start=1980) autoplot(aelec) + xlab("Year") + ylab("GWh") 8000 10000 12000 14000 GWh 20
  • 25. Trend and seasonality in ACF ggAcf(aelec, lag=48) 0.00 0.25 0.50 0.75 0 12 24 36 48 Lag ACF Series: aelec 21
  • 27. Stationarity Definition If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t. A stationary series is: • roughly horizontal • constant variance • no patterns predictable in the long-term 22
  • 28. Stationary? 3600 3700 3800 3900 4000 0 50 100 150 200 250 300 Day Dow Jones Index 23
  • 29. Stationary? −100 −50 0 50 0 50 100 150 200 250 300 Day Change in Dow Jones Index 24
  • 30. Stationary? 4000 5000 6000 1950 1955 1960 1965 1970 1975 1980 Year Number of strikes 25
  • 31. Stationary? 40 60 80 1975 1980 1985 1990 1995 Year Total sales Sales of new one−family houses, USA 26
  • 32. Stationary? 100 200 300 1900 1920 1940 1960 1980 Year $ Price of a dozen eggs in 1993 dollars 27
  • 33. Stationary? 80 90 100 110 1990 1991 1992 1993 1994 1995 Year thousands Number of pigs slaughtered in Victoria 28
  • 34. Stationary? 0 2000 4000 6000 1820 1840 1860 1880 1900 1920 Year Number trapped Annual Canadian Lynx Trappings 29
  • 35. Stationary? 400 450 500 1995 2000 2005 2010 Year megalitres Australian quarterly beer production 30
  • 36. Stationarity Definition If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t. Transformations help to stabilize the variance. For ARIMA modelling, we also need to stabilize the mean. 31
  • 37. Non-stationarity in the mean Identifying non-stationary series • time plot. • The ACF of stationary data drops to zero relatively quickly • The ACF of non-stationary data decreases slowly. • For non-stationary data, the value of r1 is often large and positive. 32
  • 38. Example: Dow-Jones index 3600 3700 3800 3900 4000 0 50 100 150 200 250 300 Day Dow Jones Index 33
  • 39. Example: Dow-Jones index 0.00 0.25 0.50 0.75 1.00 0 5 10 15 20 25 Lag ACF Series: dj 34
  • 40. Example: Dow-Jones index −100 −50 0 50 0 50 100 150 200 250 300 Day Change in Dow Jones Index 35
  • 41. Example: Dow-Jones index −0.10 −0.05 0.00 0.05 0.10 0 5 10 15 20 25 Lag ACF Series: diff(dj) 36
  • 42. Differencing • Differencing helps to stabilize the mean. • The differenced series is the change between each observation in the original series: y0 t = yt − yt−1. • The differenced series will have only T − 1 values since it is not possible to calculate a difference y0 1 for the first observation. 37
  • 43. Second-order differencing Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time: y00 t = y0 t − y0 t−1 = (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2. • y00 t will have T − 2 values. • In practice, it is almost never necessary to go beyond second-order differences. 38
  • 44. Seasonal differencing A seasonal difference is the difference between an observation and the corresponding observation from the previous year. y0 t = yt − yt−m where m = number of seasons. • For monthly data m = 12. • For quarterly data m = 4. 39
  • 45. Electricity production usmelec %>% autoplot() 200 300 400 1980 1990 2000 2010 Time . 40
  • 46. Electricity production usmelec %>% log() %>% autoplot() 5.1 5.4 5.7 6.0 1980 1990 2000 2010 Time . 41
  • 47. Electricity production usmelec %>% log() %>% diff(lag=12) %>% autoplot() 0.0 0.1 1980 1990 2000 2010 Time . 42
  • 48. Electricity production usmelec %>% log() %>% diff(lag=12) %>% diff(lag=1) %>% autoplot() −0.15 −0.10 −0.05 0.00 0.05 0.10 1980 1990 2000 2010 Time . 43
  • 49. Electricity production • Seasonally differenced series is closer to being stationary. • Remaining non-stationarity can be removed with further first difference. If y0 t = yt − yt−12 denotes seasonally differenced series, then twice-differenced series i y∗ t = y0 t − y0 t−1 = (yt − yt−12) − (yt−1 − yt−13) = yt − yt−1 − yt−12 + yt−13 . 44
  • 50. Seasonal differencing When both seasonal and first differences are applied: • it makes no difference which is done first—the result will be the same. • If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. It is important that if differencing is used, the differences are interpretable. 45
  • 51. Interpretation of differencing • first differences are the change between one observation and the next; • seasonal differences are the change between one year to the next. But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. 46
  • 53. Autoregressive models Autoregressive (AR) models: yt = c + φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt, where εt is white noise. This is a multiple regression with lagged values of yt as predictors. 8 10 12 0 20 40 60 80 100 Time AR(1) 15.0 17.5 20.0 22.5 25.0 0 20 40 60 80 100 Time AR(2) 47
  • 54. AR(1) model yt = 2 − 0.8yt−1 + εt εt ∼ N(0, 1), T = 100. 8 10 12 0 20 40 60 80 100 Time AR(1) 48
  • 55. AR(1) model yt = c + φ1yt−1 + εt • When φ1 = 0, yt is equivalent to white noise • When φ1 = 1 and c = 0, yt is equivalent to a random walk • When φ1 = 1 and c 6= 0, yt is equivalent to a random walk with drift • When φ1 < 0, yt tends to oscillate between positive and negative values. 49
  • 56. AR(2) model yt = 8 + 1.3yt−1 − 0.7yt−2 + εt εt ∼ N(0, 1), T = 100. 15.0 17.5 20.0 22.5 25.0 0 20 40 60 80 100 Time AR(2) 50
  • 57. Stationarity conditions We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required. 51
  • 58. Moving Average (MA) models Moving Average (MA) models: yt = c + εt + θ1εt−1 + θ2εt−2 + · · · + θqεt−q, where εt is white noise. This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing! 18 20 22 0 20 40 60 80 100 Time MA(1) −5.0 −2.5 0.0 2.5 0 20 40 60 80 100 Time MA(2) ## MA(1) model yt = 20 + εt + 0.8εt−1 εt ∼ N(0, 1), T = 100. 52
  • 60. How does auto.arima() work? Need to select appropriate orders: p, q, d end{block} Hyndman and Khandakar (JSS, 2008) algorithm: • Select no. differences d via KPSS test (tests for a unit root) . • Select p, q by minimising AICc. • Use stepwise search to traverse model space. 53
  • 61. How does auto.arima() work? AICc = −2 log(L) + 2(p + q + k + 1) h 1 + (p+q+k+2) T−p−q−k−2 i . where L is the maximised likelihood fitted to the differenced data, k = 1 if c 6= 0 and k = 0 otherwise. Step1: Select current model (with smallest AICc) from: ARIMA(2, d, 2) ARIMA(0, d, 0) ARIMA(1, d, 0) ARIMA(0, d, 1) Step 2: Consider variations of current model: • vary one of p, q, from current model by ±1; • p, q both vary from current model by ±1; • Include/exclude c from current model. Model with lowest AICc becomes current model. 54
  • 62. Choosing your own model Number of users logged on to an internet server each minute over a 100-minute period. ggtsdisplay(internet) 80 120 160 200 0 20 40 60 80 100 −0.5 0.0 0.5 1.0 5 10 15 20 Lag ACF −0.5 0.0 0.5 1.0 5 10 15 20 Lag PACF 55
  • 63. Choosing your own model ggtsdisplay(diff(internet)) −15 −10 −5 0 5 10 15 0 20 40 60 80 100 −0.4 0.0 0.4 0.8 5 10 15 20 Lag ACF −0.4 0.0 0.4 0.8 5 10 15 20 Lag PACF 56
  • 64. Choosing your own model (fit <- Arima(internet,order=c(1,1,0))) ## Series: internet ## ARIMA(1,1,0) ## ## Coefficients: ## ar1 ## 0.8026 ## s.e. 0.0580 ## ## sigma^2 = 11.79: log likelihood = -262.62 ## AIC=529.24 AICc=529.36 BIC=534.43 57
  • 65. Choosing your own model ggtsdisplay(resid(fit)) −5 0 5 0 20 40 60 80 100 −0.2 0.0 0.2 5 10 15 20 Lag ACF −0.2 0.0 0.2 5 10 15 20 Lag PACF 58
  • 66. Choosing your own model (fit <- Arima(internet,order=c(2,1,0))) ## Series: internet ## ARIMA(2,1,0) ## ## Coefficients: ## ar1 ar2 ## 1.0449 -0.2966 ## s.e. 0.0961 0.0961 ## ## sigma^2 = 10.85: log likelihood = -258.09 ## AIC=522.18 AICc=522.43 BIC=529.96 59
  • 67. Choosing your own model ggtsdisplay(resid(fit)) −5 0 5 0 20 40 60 80 100 −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag ACF −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag PACF 60
  • 68. Choosing your own model (fit <- Arima(internet,order=c(3,1,0))) ## Series: internet ## ARIMA(3,1,0) ## ## Coefficients: ## ar1 ar2 ar3 ## 1.1513 -0.6612 0.3407 ## s.e. 0.0950 0.1353 0.0941 ## ## sigma^2 = 9.656: log likelihood = -252 ## AIC=511.99 AICc=512.42 BIC=522.37 61
  • 69. Choosing your own model ggtsdisplay(resid(fit)) −5 0 5 0 20 40 60 80 100 −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag ACF −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag PACF 62
  • 70. Choosing your own model auto.arima(internet) ## Series: internet ## ARIMA(1,1,1) ## ## Coefficients: ## ar1 ma1 ## 0.6504 0.5256 ## s.e. 0.0842 0.0896 ## ## sigma^2 = 9.995: log likelihood = -254.15 ## AIC=514.3 AICc=514.55 BIC=522.08 63
  • 71. Choosing your own model checkresiduals(fit) −5 0 5 0 20 40 60 80 100 Residuals from ARIMA(3,1,0) −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag ACF 0 5 10 15 20 −10 −5 0 5 10 residuals df$y 64
  • 72. Choosing your own model fit %>% forecast %>% autoplot 100 150 200 250 0 30 60 90 Time internet Forecasts from ARIMA(3,1,0) 65
  • 73. Modelling procedure with Arima 1. Plot the data. Identify any unusual observations. 2. If necessary, transform the data (e.g. by logging it) to stabilize the variance. 3. If the data are non-stationary: take first differences of the data until the data are stationary. 4. Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate? 5. Try your chosen model(s), and use the AICc to search for a better model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model. 7. Once the residuals look like white noise, calculate forecasts. 66
  • 74. Modelling procedure with auto.arima 1. Plot the data. Identify any unusual observations. 2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance. 3. Use auto.arima to select a model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model. 7. Once the residuals look like white noise, calculate forecasts. 67
  • 75. Seasonally adjusted electrical equipment eeadj <- seasadj(stl(elecequip, s.window="periodic")) autoplot(eeadj) + xlab("Year") + ylab("Seasonally adjusted new orders index") 80 90 100 110 2000 2005 2010 Year Seasonally adjusted new orders index 68
  • 76. Another example: Seasonally adjusted electrical equipment 1. Time plot shows sudden changes, particularly big drop in 2008/2009 due to global economic environment. Otherwise nothing unusual and no need for data adjustments. 2. No evidence of changing variance, so no Box-Cox transformation. 3. Data are clearly non-stationary, so we take first differences. 69
  • 77. Seasonally adjusted electrical equipment ggtsdisplay(diff(eeadj)) −10 −5 0 5 10 2000 2005 2010 −0.2 0.0 0.2 12 24 36 Lag ACF −0.2 0.0 0.2 12 24 36 Lag PACF 70
  • 78. Seasonally adjusted electrical equipment 4. PACF is suggestive of AR(3). So initial candidate model is ARIMA(3,1,0). 5. Fit ARIMA(3,1,0) model along with variations: ARIMA(4,1,0), ARIMA(2,1,0), ARIMA(3,1,1), etc. ARIMA(3,1,1) has smallest AICc value. 71
  • 79. Seasonally adjusted electrical equipment (fit <- Arima(eeadj, order=c(3,1,1))) ## Series: eeadj ## ARIMA(3,1,1) ## ## Coefficients: ## ar1 ar2 ar3 ma1 ## 0.0044 0.0916 0.3698 -0.3921 ## s.e. 0.2201 0.0984 0.0669 0.2426 ## ## sigma^2 = 9.577: log likelihood = -492.69 ## AIC=995.38 AICc=995.7 BIC=1011.72 72
  • 80. Seasonally adjusted electrical equipment 6. ACF plot of residuals from ARIMA(3,1,1) model look like white noise. checkresiduals(fit, test=F) −10 −5 0 5 10 2000 2005 2010 Residuals from ARIMA(3,1,1) −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 12 24 36 Lag ACF 0 10 20 30 −10 −5 0 5 10 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,1,1) ## Q* = 24.034, df = 20, p-value = 0.2409 73
  • 81. Seasonally adjusted electrical equipment ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,1,1) ## Q* = 24.034, df = 20, p-value = 0.2409 ## ## Model df: 4. Total lags used: 24 74
  • 82. Seasonally adjusted electrical equipment fit %>% forecast %>% autoplot 60 80 100 120 2000 2005 2010 2015 Time eeadj Forecasts from ARIMA(3,1,1) 75
  • 84. Point forecasts 1. Rearrange ARIMA equation so yt is on LHS. 2. Rewrite equation by replacing t by T + h. 3. On RHS, replace future observations by their forecasts, future errors by zero, and past errors by corresponding residuals. Start with h = 1. Repeat for h = 2, 3, . . .. 76
  • 85. Point forecasts ARIMA(3,1,1) forecasts: Step 1 (1 − φ1B − φ2B2 − φ3B3 )(1 − B)yt = (1 + θ1B)εt, h 1 − (1 + φ1)B + (φ1 − φ2)B2 + (φ2 − φ3)B3 + φ3B4 i yt = (1 + θ1B)εt, yt − (1 + φ1)yt−1 + (φ1 − φ2)yt−2 + (φ2 − φ3)yt−3 + φ3yt−4 = εt + θ1εt−1. yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1. 77
  • 86. Point forecasts (h=1) yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1. ARIMA(3,1,1) forecasts: Step 2 yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + εT+1 + θ1εT . ARIMA(3,1,1) forecasts: Step 3 ŷT+1|T = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + θ1eT .78
  • 87. Point forecasts (h=2) yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1. ARIMA(3,1,1) forecasts: Step 2 yT+2 = (1 + φ1)yT+1 − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2 + εT+2 + θ1εT+1. ARIMA(3,1,1) forecasts: Step 3 ŷT+2|T = (1 + φ1)ŷT+1|T − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2. 79
  • 88. European quarterly retail trade euretail %>% diff(lag=4) %>% ggtsdisplay() −2 0 2 2000 2005 2010 0.0 0.5 4 8 12 16 Lag ACF 0.0 0.5 4 8 12 16 Lag PACF 80
  • 89. European quarterly retail trade euretail %>% diff(lag=4) %>% diff() %>% ggtsdisplay() −2 −1 0 1 2000 2005 2010 −0.4 −0.2 0.0 0.2 4 8 12 16 Lag ACF −0.4 −0.2 0.0 0.2 4 8 12 16 Lag PACF 81
  • 90. European quarterly retail trade • d = 1 and D = 1 seems necessary. • Significant spike at lag 1 in ACF suggests non-seasonal MA(1) component. • Significant spike at lag 4 in ACF suggests seasonal MA(1) component. • Initial candidate model: ARIMA(0,1,1)(0,1,1). 82
  • 91. European quarterly retail trade fit <- Arima(euretail, order=c(0,1,1), seasonal=c(0,1,1)) checkresiduals(fit) −1.0 −0.5 0.0 0.5 1.0 2000 2005 2010 Residuals from ARIMA(0,1,1)(0,1,1)[4] −0.2 −0.1 0.0 0.1 0.2 0.3 4 8 12 16 Lag ACF 0 5 10 15 −1.0 −0.5 0.0 0.5 1.0 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,1)(0,1,1)[4] 83
  • 92. European quarterly retail trade ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,1)(0,1,1)[4] ## Q* = 10.654, df = 6, p-value = 0.09968 ## ## Model df: 2. Total lags used: 8 84
  • 93. European quarterly retail trade • ACF and PACF of residuals show significant spikes at lag 2, and maybe lag 3. • AICc of ARIMA(0,1,2)(0,1,1)4 model is 74.27. • AICc of ARIMA(0,1,3)(0,1,1)4 model is 68.39. fit <- Arima(euretail, order=c(0,1,3), seasonal=c(0,1,1)) checkresiduals(fit) 85
  • 94. European quarterly retail trade ## Series: euretail ## ARIMA(0,1,3)(0,1,1)[4] ## ## Coefficients: ## ma1 ma2 ma3 sma1 ## 0.2630 0.3694 0.4200 -0.6636 ## s.e. 0.1237 0.1255 0.1294 0.1545 ## ## sigma^2 = 0.156: log likelihood = -28.63 ## AIC=67.26 AICc=68.39 BIC=77.65 86
  • 95. European quarterly retail trade checkresiduals(fit) −1.0 −0.5 0.0 0.5 2000 2005 2010 Residuals from ARIMA(0,1,3)(0,1,1)[4] −0.2 −0.1 0.0 0.1 0.2 4 8 12 16 Lag ACF 0 5 10 15 −1.0 −0.5 0.0 0.5 1.0 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,3)(0,1,1)[4] 87
  • 96. European quarterly retail trade ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,3)(0,1,1)[4] ## Q* = 0.51128, df = 4, p-value = 0.9724 ## ## Model df: 4. Total lags used: 8 88
  • 97. European quarterly retail trade autoplot(forecast(fit, h=12)) 90 95 100 2000 2005 2010 2015 Time euretail Forecasts from ARIMA(0,1,3)(0,1,1)[4] 89
  • 98. European quarterly retail trade auto.arima(euretail) ## Series: euretail ## ARIMA(0,1,3)(0,1,1)[4] ## ## Coefficients: ## ma1 ma2 ma3 sma1 ## 0.2630 0.3694 0.4200 -0.6636 ## s.e. 0.1237 0.1255 0.1294 0.1545 ## ## sigma^2 = 0.156: log likelihood = -28.63 ## AIC=67.26 AICc=68.39 BIC=77.65 90
  • 100. Cortecosteroid drug sales H02 sales (million scripts) Log H02 sales 1995 2000 2005 0.50 0.75 1.00 1.25 −0.8 −0.4 0.0 Year 91
  • 101. Cortecosteroid drug sales ggtsdisplay(diff(lh02,12), xlab="Year", main="Seasonally differenced H02 scripts") 0.0 0.2 0.4 1995 2000 2005 Year Seasonally differenced H02 scripts 0.0 0.2 0.4 ACF 0.0 0.2 0.4 PACF 92
  • 102. Cortecosteroid drug sales • Choose D = 1 and d = 0. • Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term. • Spikes in PACF suggests possible non-seasonal AR(3) term. • Initial candidate model: ARIMA(3,0,0)(2,1,0)12. 93
  • 103. Cortecosteroid drug sales Model AICc ARIMA(3,0,1)(0,1,2)12 -485.48 ARIMA(3,0,1)(1,1,1)12 -484.25 ARIMA(3,0,1)(0,1,1)12 -483.67 ARIMA(3,0,1)(2,1,0)12 -476.31 ARIMA(3,0,0)(2,1,0)12 -475.12 ARIMA(3,0,2)(2,1,0)12 -474.88 ARIMA(3,0,1)(1,1,0)12 -463.40 94
  • 104. Cortecosteroid drug sales (fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2), lambda=0)) ## Series: h02 ## ARIMA(3,0,1)(0,1,2)[12] ## Box Cox transformation: lambda= 0 ## ## Coefficients: ## ar1 ar2 ar3 ma1 ## -0.1603 0.5481 0.5678 0.3827 ## s.e. 0.1636 0.0878 0.0942 0.1895 ## sma1 sma2 ## -0.5222 -0.1768 ## s.e. 0.0861 0.0872 ## ## sigma^2 = 0.004278: log likelihood = 250.04 ## AIC=-486.08 AICc=-485.48 BIC=-463.28 95
  • 105. Cortecosteroid drug sales checkresiduals(fit, lag=36) −0.2 −0.1 0.0 0.1 0.2 1995 2000 2005 Residuals from ARIMA(3,0,1)(0,1,2)[12] −0.1 0.0 0.1 12 24 36 Lag ACF 0 10 20 30 −0.2 −0.1 0.0 0.1 0.2 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,0,1)(0,1,2)[12] ## Q* = 50.712, df = 30, p-value = 0.01045 96
  • 106. Cortecosteroid drug sales ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,0,1)(0,1,2)[12] ## Q* = 50.712, df = 30, p-value = 0.01045 ## ## Model df: 6. Total lags used: 36 97
  • 107. Cortecosteroid drug sales (fit <- auto.arima(h02, lambda=0)) ## Series: h02 ## ARIMA(2,1,1)(0,1,2)[12] ## Box Cox transformation: lambda= 0 ## ## Coefficients: ## ar1 ar2 ma1 sma1 ## -1.1358 -0.5753 0.3683 -0.5318 ## s.e. 0.1608 0.0965 0.1884 0.0838 ## sma2 ## -0.1817 ## s.e. 0.0881 ## ## sigma^2 = 0.004278: log likelihood = 248.25 ## AIC=-484.51 AICc=-484.05 BIC=-465 98
  • 108. Cortecosteroid drug sales checkresiduals(fit, lag=36) −0.2 −0.1 0.0 0.1 0.2 1995 2000 2005 Residuals from ARIMA(2,1,1)(0,1,2)[12] −0.1 0.0 0.1 12 24 36 Lag ACF 0 10 20 30 −0.2 −0.1 0.0 0.1 0.2 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(2,1,1)(0,1,2)[12] ## Q* = 51.096, df = 31, p-value = 0.01298 ## 99
  • 109. Cortecosteroid drug sales ## ## Ljung-Box test ## ## data: Residuals from ARIMA(2,1,1)(0,1,2)[12] ## Q* = 51.096, df = 31, p-value = 0.01298 ## ## Model df: 5. Total lags used: 36 100
  • 110. Cortecosteroid drug sales (fit <- auto.arima(h02, lambda=0, max.order=9, stepwise=FALSE, approximation=FALSE)) ## Series: h02 ## ARIMA(4,1,1)(2,1,2)[12] ## Box Cox transformation: lambda= 0 ## ## Coefficients: ## ar1 ar2 ar3 ar4 ## -0.0425 0.2098 0.2017 -0.2273 ## s.e. 0.2167 0.1813 0.1144 0.0810 ## ma1 sar1 sar2 sma1 ## -0.7424 0.6213 -0.3832 -1.2019 ## s.e. 0.2074 0.2421 0.1185 0.2491 ## sma2 ## 0.4959 ## s.e. 0.2135 ## ## sigma^2 = 0.004049: log likelihood = 254.31 ## AIC=-488.63 AICc=-487.4 BIC=-456.1 101
  • 111. Cortecosteroid drug sales checkresiduals(fit, lag=36) −0.2 −0.1 0.0 0.1 1995 2000 2005 Residuals from ARIMA(4,1,1)(2,1,2)[12] −0.1 0.0 0.1 12 24 36 Lag ACF 0 10 20 30 −0.2 −0.1 0.0 0.1 0.2 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(4,1,1)(2,1,2)[12] ## Q* = 36.456, df = 27, p-value = 0.1057 ## 102
  • 112. Cortecosteroid drug sales ## ## Ljung-Box test ## ## data: Residuals from ARIMA(4,1,1)(2,1,2)[12] ## Q* = 36.456, df = 27, p-value = 0.1057 ## ## Model df: 9. Total lags used: 36 103
  • 113. Cortecosteroid drug sales Training data: July 1991 to June 2006 Test data: July 2006–June 2008 getrmse <- function(x,h,...) { train.end <- time(x)[length(x)-h] test.start <- time(x)[length(x)-h+1] train <- window(x,end=train.end) test <- window(x,start=test.start) fit <- Arima(train,...) fc <- forecast(fit,h=h) return(accuracy(fc,test)[2,"RMSE"]) } getrmse(h02,h=24,order=c(3,0,0),seasonal=c(2,1,0),lambda=0) getrmse(h02,h=24,order=c(3,0,1),seasonal=c(2,1,0),lambda=0) getrmse(h02,h=24,order=c(3,0,2),seasonal=c(2,1,0),lambda=0) getrmse(h02,h=24,order=c(3,0,1),seasonal=c(1,1,0),lambda=0) 104
  • 114. Cortecosteroid drug sales Model RMSE ARIMA(4,1,1)(2,1,2)[12] 0.0615 ARIMA(3,0,1)(0,1,2)[12] 0.0622 ARIMA(3,0,1)(1,1,1)[12] 0.0630 ARIMA(2,1,4)(0,1,1)[12] 0.0632 ARIMA(2,1,3)(0,1,1)[12] 0.0634 ARIMA(3,0,3)(0,1,1)[12] 0.0638 ARIMA(2,1,5)(0,1,1)[12] 0.0640 ARIMA(3,0,1)(0,1,1)[12] 0.0644 ARIMA(3,0,2)(0,1,1)[12] 0.0644 ARIMA(3,0,2)(2,1,0)[12] 0.0645 ARIMA(3,0,1)(2,1,0)[12] 0.0646 ARIMA(3,0,0)(2,1,0)[12] 0.0661 ARIMA(3,0,1)(1,1,0)[12] 0.0679 105
  • 115. Cortecosteroid drug sales • Models with lowest AICc values tend to give slightly better results than the other models. • AICc comparisons must have the same orders of differencing. But RMSE test set comparisons can involve any models. • Use the best model available, even if it does not pass all tests. 106
  • 116. Cortecosteroid drug sales fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2), lambda=0) autoplot(forecast(fit)) + ylab("H02 sales (million scripts)") + xlab("Year") 0.5 1.0 1.5 1995 2000 2005 2010 Year H02 sales (million scripts) Forecasts from ARIMA(3,0,1)(0,1,2)[12] 107