Time series data
• Daily IBM stock prices
• Monthly rainfall
• Annual Google profits
• Quarterly beer production
200
300
400
500
600
1960 1970 1980 1990 2000 2010
Time
.
2
Beer production
Forecasting is estimating how the sequence of observations
will continue into the future.
200
300
400
500
600
1960 1980 2000
Time
.
Forecasts from ETS(M,A,M)
4
Defining your data as Time series
#yearly data: one observation per year
y <- ts(c(123, 39, 78, 52, 110), start = 2012, frequency =
y
## Time Series:
## Start = 2012
## End = 2016
## Frequency = 1
## [1] 123 39 78 52 110
6
Monthly data
# Monthly data
y <- ts(y, start = 2003, frequency = 12)
y
## Jan Feb Mar Apr May
## 2003 123 39 78 52 110
Note that quarterly data would require “frequency = 4”, weekly
data frequency = 52, etc.
7
Time plots
autoplot(LAprices) + ggtitle('House Prices in LA')
4e+05
5e+05
6e+05
7e+05
2008 2010 2012 2014 2016 2018 2020
Time
LAprices
House Prices in LA
8
Plotting naïve
beer2 <- window(ausbeer,start=1992,end=c(2007,4))
autoplot(beer2) +
autolayer(naive(beer2, h=11), PI=TRUE, series="Mean") +
ggtitle("Forecasts for quarterly beer production") +
xlab("Year") + ylab("Megalitres") +
guides(colour=guide_legend(title="Forecast"))
250
500
750
1995 2000 2005 2010
Megalitres
Forecast
Mean
Forecasts for quarterly beer production
19
Seasonal naïve method
• Forecasts equal to last value from same season.
• Forecasts: ŷT+h|T = yT+h−m(k+1), where m = seasonal period
and k is the integer part of (h − 1)/m (i.e., the number of
complete years in the forecast period prior to time T + h )
• E.g., h = 1 and m = 12 (i.e. monthly data) → k = 0, so
ŷT+h|T = yT+1−12(0+1) = yT−11).
• E.g. we are in January, so we predict February using January
-11 months = Feb of the previous year.
20
Drift method
• Forecasts equal to last value plus average change.
• Forecasts:
ŷT+h|T = yT +
h
T − 1
T
X
t=2
(yt − yt−1) (1)
= yT +
h
T − 1
(yT − y1). (2)
• Equivalent to extrapolating a line drawn between first and last
observations.
23
The problem of overfitting
A model which fits the data well does not necessarily forecast well.
A perfect fit can always be obtained by using a model with enough
parameters.
Over-fitting a model to data is as bad as failing to identify the
systematic pattern in the data
29
The problem of overfitting: an example
0.0 0.2 0.4 0.6 0.8 1.0
−50
0
50
100
150
200
x
y
30
three models
#model fitting
linearmodel = lm(y~x)
#prediction on test data set
predict_linear = predict(linearmodel,
list(x = testx))
z = xˆ2
# fitting
quadraticmodel<- lm(y~ x + z)
# prediction on test data set
predict_quadratic = predict(quadraticmodel,
list(x = testx, z = testxˆ2))
#fitting
smoothspline = smooth.spline(x,y,df = 20) 31
Plots
0.0 0.2 0.4 0.6 0.8 1.0
−50
0
50
100
150
200
Example of Overfitting, Normal Fitting and Underfitting.
X
Y
32
When to use which partition?
Fit the model only to training period
Assess performance on validation period
Deploy model by joining training+validation; forecast the future
35
How to choose a validation period?
Depends on:
• Forecast horizon
• Seasonality
• Length of series
• Underlying conditions affecting series
36
Partitioning time series in R
Time
Ridership
1400
1600
1800
2000
2200
2400
2600
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Training Validation Future
37
Which model to choose?
yt+h = trend + trend
tslm(train.ts ~ trend )
Time
Ridership
1400
1600
1800
2000
2200
2400
2600
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Training Validation Future
38
Which model to choose?
yt+h = trend + trend2
tslm(train.ts ~ trend + I(trendˆ2))
Time
Ridership
1400
1600
1800
2000
2200
2400
2600
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Training Validation Future
39
Which model to choose?
yt+h = trend + trend2 + trend3
In R:
tslm(train.ts ~ trend + I(trendˆ2) + I(trendˆ3))
Ridership
1400
1600
1800
2000
2200
2400
2600
Training Validation Future
40
Which model to choose?
yt+h = trend + trend2 + season
In R:
tslm(train.ts ~ trend + I(trendˆ2) + season)
Ridership
1400
1600
1800
2000
2200
2400
2600
Training Validation Future
41
Choosing the model: compare errors
head(ridership.lm.pred$mean )
## Apr May Jun Jul
## 2001 2004.271 2045.419 2008.675 2128.560
## Aug Sep
## 2001 2187.911 1875.032
head(valid.ts)
## Apr May Jun Jul
## 2001 2023.792 2047.008 2072.913 2126.717
## Aug Sep
## 2001 2202.638 1707.693
42
MAE: Mean Absolute Error
Gives the magnitude of the absolute error
1
v
v
X
t=1
| ˆ
yt − yt|
ridership.lm <- tslm(train.ts ~ trend)
ridership.lm.pred <- forecast(ridership.lm, h = stepsAhead, leve
sum(abs(ridership.lm.pred$mean - valid.ts))
## [1] 7539.736
ridership.lm <- tslm(train.ts ~ trend + I(trendˆ2))
ridership.lm.pred <- forecast(ridership.lm, h = stepsAhead, leve
sum(abs(ridership.lm.pred$mean - valid.ts))
## [1] 4814.579
ridership.lm <- tslm(train.ts ~ trend + I(trendˆ2)+ season)
ridership.lm.pred <- forecast(ridership.lm, h = stepsAhead, leve
43
MAPE: Mean Absolute Percentage Error
Percentage deviation. Useful to compare across series
1
v
v
X
t=1
|
ˆ
yt − yt
yt
| × 100
ridership.lm <- tslm(train.ts ~ trend + I(trendˆ2))
ridership.lm.pred <- forecast(ridership.lm, h = stepsAhead,
sum(abs((ridership.lm.pred$mean - valid.ts) /valid.ts ))
## [1] 2.547263
ridership.lm <- tslm(train.ts ~ trend + I(trendˆ2)+ season
ridership.lm.pred <- forecast(ridership.lm, h = stepsAhead,
sum(abs((ridership.lm.pred$mean - valid.ts) /valid.ts ))
## [1] 2.411532
44
Mean Squared Error and Root Mean Squared Error
MSE =
1
v
v
X
t=1
( ˆ
yt − yt)2
RMSE =
1
v
v
X
t=1
( ˆ
yt − yt)2
45
Mean Squared Error and Root Mean Squared Error
ridership.lm <- tslm(train.ts ~ trend + I(trendˆ2))
ridership.lm.pred <- forecast(ridership.lm, h = stepsAhead,
sum(sqrt((ridership.lm.pred$mean - valid.ts)ˆ2 ))
## [1] 4814.579
ridership.lm <- tslm(train.ts ~ trend + I(trendˆ2)+ season
ridership.lm.pred <- forecast(ridership.lm, h = stepsAhead,
sum(sqrt((ridership.lm.pred$mean - valid.ts)ˆ2 ))
## [1] 4742.101
46
Time series cross-validation
Traditional evaluation
time
Training data Test data
Time series cross-validation
time
• Forecast accuracy averaged over test sets.
• Also known as “evaluation on a rolling forecasting origin”
48
tsCV function
set.seed(0)
s1 <- (rnorm(100, mean=0.1))
s2 <- (rnorm(100, mean=-0.1))
s3 <- cumsum(c(s1, s2))
ecv <- tsCV(s3, rwf, drift=TRUE, h=1, initial =100)
plot(s3, type='l', ylim=c(-20,20))
lines(c(s3 + ecv), type='l', col=2)
pred <- (rwf(s3[1:100], drift=TRUE, h=100 ))$mean
lines(pred, type='l', col=3)
A good way to choose the best forecasting model is to find the model with
the smallest RMSE computed using time series cross-validation.
49
Prediction intervals
• A forecast ŷT+h|T is (usually) the mean of the conditional
distribution yT+h | y1, . . . , yT .
• A prediction interval gives a region within which we expect
yT+h to lie with a specified probability.
• Assuming forecast errors are normally distributed, then a 95%
PI is
ŷT+h|T ± 1.96σ̂h
where σ̂h is the st dev of the h-step distribution.
• When h = 1, σ̂h can be estimated from the residuals.
51
Easiest way to generate prediction intervals: bootstrap
We can simulate the next observation of a time series using
yT+1 = ŷT+1|T + eT+1
we can replace eT+1 by sampling from the collection of errors we
have seen in the past (i.e., the residuals). Adding the new simulated
observation to our data set, we can repeat the process to obtain
yT+2 = ŷT+2|T + eT+2
Doing this repeatedly, we obtain many possible futures. Then we
can compute prediction intervals by calculating percentiles for each
forecast horizon
53
Prediction intervals
• Computed automatically using: naive(), snaive(), rwf(),
meanf(), etc.
• Use level argument to control coverage.
• Check residual assumptions before believing them.
• Usually too narrow due to unaccounted uncertainty.
—>
54