Value for Money Resale Flats in Singapore Documentation
Forecasting Revenue With Stationary Time Series Models
1. Forecasting
Revenue
With
Stationary
Time
Series
Models
Geoffery
Mullings
April
26,
2016
Executive
Summary
A
trend
stationary
model
is
used
to
forecast
Starbucks'
(SBUX)
revenue
from
a
snapshot
of
historical
data.
The
report
provides
a
prediction
of
2015
Q4
revenue
to
demonstrate
the
forecast's
validity
in
addition
to
predictions
for
up
to
five
quarters
further
ahead.
The
forecast
for
2015
Q4
undershot
SBUX's
actual
realized
revenue
by
over
$400
million
or
.33
Standard
Deviation.
Still,
the
model
correctly
predicted
that
revenue
would
climb
in
the
short-‐run.
Introduction
Effective
forecasts
capture
patterns
in
observations
along
multiple
dimensions.
Comprehensive
forecasts
include
trend
direction,
stationarity,
seasonality,
and
auto-‐
regression
of
observations.
Additionally
a
forecast
will
most
likely
hold
more
validity
if
it
captures
as
much
information
as
possible
from
the
relevant
history
of
data.
The
model
providing
the
forecasts
of
Starbucks'
(SBUX)
revenue
considers
all
of
those
dimensions.
Methodology
Data
ETL:
library(zoo)
##
##
Attaching
package:
'zoo'
##
The
following
objects
are
masked
from
'package:base':
##
##
as.Date,
as.Date.numeric
library(dyn)
library(urca)
temp
=
tempfile()
download.file("http://faculty.baruch.cuny.edu/smanzan/eco9723/files/EPS
_REV_FALL2015.csv",temp)
data
=
read.csv(temp)
unlink(temp)
mytick
=
"SBUX"
#
Starbucks
index
=
which(data[,"tic"]
==
mytick)
mydata
=
data[index,]
2. Data
Munging
and
Further
Transformation
and
Loading:
sum(is.na(mydata))
#
Two
missing
values
in
the
data.
##
[1]
2
head(mydata)
#
Both
missing
values
are
in
the
first
row
of
observations
.
##
datadate
tic
datacqtr
epsfxq
revtq
##
7775
12/31/90
SBUX
1990Q4
NA
NA
##
7776
3/31/91
SBUX
1991Q1
0.15
27.042
##
7777
6/30/91
SBUX
1991Q2
0.05
14.538
##
7778
9/30/91
SBUX
1991Q3
0.04
16.070
##
7779
12/31/91
SBUX
1991Q4
0.15
22.483
##
7780
3/31/92
SBUX
1992Q1
0.05
20.277
NoNamydata
=
mydata[-‐1,]
#
Removing
NA
values
from
the
data.
startdate
=
NoNamydata[2,
"datacqtr"]
rev
=
zooreg(NoNamydata[,
"revtq"],
start=as.yearqtr(startdate),
freque
ncy=4)
#
Rev
is
the
quarterly
revenue
value
for
Starbucks.
#
Creating
the
Trend
and
Dummy
Variables
trend
=
zooreg(1:length(rev),
start=as.yearqtr(startdate),
frequency=4)
trendsq
=
trend^2
trendcub
=
trend^3
Q1
=
zooreg(as.numeric(cycle(rev)
==1),
start=start(rev),
frequency=4)
Q2
=
zooreg(as.numeric(cycle(rev)
==2),
start=start(rev),
frequency=4)
Q3
=
zooreg(as.numeric(cycle(rev)
==3),
start=start(rev),
frequency=4)
Q4
=
zooreg(as.numeric(cycle(rev)
==4),
start=start(rev),
frequency=4)
#
Determining
whether
to
use
the
log
of
SBUX's
revenue.
par(mfrow=c(2,2))
plot(rev,xlab="",
main="SBUX's
Revenue
Values
nOver
Time")
plot(log(rev),
xlab="",
main="SBUX's
Logarithmic
nRevenue
Over
Time")
plot(diff(rev),
xlab="",
main="Changes
in
SBUX's
nRevenue
Over
Time")
plot(diff(log(rev)),
xlab="",
main="Changes
in
SBUX's
Logarithmic
nRev
enue
Over
Time")
3.
Logarithmic
values
are
easier
to
linerarize
and
are
generally
accurate
absent
large
changes
in
values.
This
data
from
Starbucks
seems
to
be
an
ideal
candidate
for
logarithmic
transformation.
Data
Analysis:
lrev
=
log(rev)
dlrev
=
diff(lrev)
#
Estimating
the
statistical
significance
of
the
lags
and
trend
variabl
es
to
#
predicting
logarithmic
revenue
values.
adffit
=
dyn$lm(dlrev
~
lag(rev,
-‐1)
+
lag(dlrev,
-‐1:-‐4)
+
trend
+
Q2
+
Q3
+
Q4)
summary(adffit)
##
##
Call:
##
lm(formula
=
dyn(dlrev
~
lag(rev,
-‐1)
+
lag(dlrev,
-‐1:-‐4)
+
trend
+
##
Q2
+
Q3
+
Q4))
##
##
Residuals:
##
Min
1Q
Median
3Q
Max
4. ##
-‐0.138063
-‐0.022155
0.004575
0.022489
0.111106
##
##
Coefficients:
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
3.094e-‐01
3.688e-‐02
8.388
1.09e-‐12
***
##
lag(rev,
-‐1)
6.412e-‐05
1.472e-‐05
4.356
3.77e-‐05
***
##
lag(dlrev,
-‐1:-‐4)1
-‐2.986e-‐01
9.993e-‐02
-‐2.988
0.00369
**
##
lag(dlrev,
-‐1:-‐4)2
-‐7.593e-‐02
1.065e-‐01
-‐0.713
0.47788
##
lag(dlrev,
-‐1:-‐4)3
-‐1.545e-‐01
1.010e-‐01
-‐1.530
0.12991
##
lag(dlrev,
-‐1:-‐4)4
1.370e-‐01
4.875e-‐02
2.810
0.00617
**
##
trend
-‐4.903e-‐03
9.123e-‐04
-‐5.375
6.89e-‐07
***
##
Q2
-‐1.504e-‐01
2.350e-‐02
-‐6.400
8.77e-‐09
***
##
Q3
-‐7.619e-‐02
2.375e-‐02
-‐3.208
0.00190
**
##
Q4
-‐5.520e-‐02
2.670e-‐02
-‐2.067
0.04182
*
##
-‐-‐-‐
##
Signif.
codes:
0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1
##
##
Residual
standard
error:
0.04118
on
83
degrees
of
freedom
##
(9
observations
deleted
due
to
missingness)
##
Multiple
R-‐squared:
0.8447,
Adjusted
R-‐squared:
0.8279
##
F-‐statistic:
50.16
on
9
and
83
DF,
p-‐value:
<
2.2e-‐16
#
Using
an
Augmented
Dickey-‐Fuller
(ADF)
Test
to
test
the
null
hypothes
is
that
#
the
logarithmic
revenue
values
are
non-‐stationary
with
a
trend.
#
Fourth
lag
seems
statistically
significant
to
predicting
revenue,
so
the
ADF
#
test
will
be
run
with
that
many
lags.
An
Augmented
Dickey-‐Fuller
Test
will
assess
the
null
hypothesis
that
the
logarithmic
revenue
values
follow
a
non-‐stationary
trend.
Non-‐Stationary
trends
require
a
unique
set
of
statistical
testing
to
accurately
determine
the
significance
of
predictors.
adf
=
ur.df(lrev,
type="trend",
lags=4)
summary(adf)
##
##
###############################################
##
#
Augmented
Dickey-‐Fuller
Test
Unit
Root
Test
#
##
###############################################
##
##
Test
regression
trend
##
##
##
Call:
##
lm(formula
=
z.diff
~
z.lag.1
+
1
+
tt
+
z.diff.lag)
##
##
Residuals:
##
Min
1Q
Median
3Q
Max
##
-‐0.159265
-‐0.026954
-‐0.000358
0.026902
0.100777
5. ##
##
Coefficients:
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
0.6104241
0.0599690
10.179
<
2e-‐16
***
##
z.lag.1
-‐0.0894058
0.0121152
-‐7.380
9.34e-‐11
***
##
tt
0.0020192
0.0006394
3.158
0.00219
**
##
z.diff.lag1
-‐0.4934043
0.0629353
-‐7.840
1.11e-‐11
***
##
z.diff.lag2
-‐0.3609311
0.0735722
-‐4.906
4.36e-‐06
***
##
z.diff.lag3
-‐0.4404602
0.0681291
-‐6.465
5.88e-‐09
***
##
z.diff.lag4
0.3007123
0.0468534
6.418
7.25e-‐09
***
##
-‐-‐-‐
##
Signif.
codes:
0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1
##
##
Residual
standard
error:
0.04599
on
86
degrees
of
freedom
##
Multiple
R-‐squared:
0.7993,
Adjusted
R-‐squared:
0.7853
##
F-‐statistic:
57.1
on
6
and
86
DF,
p-‐value:
<
2.2e-‐16
##
##
##
Value
of
test-‐statistic
is:
-‐7.3796
41.4224
55.4738
##
##
Critical
values
for
test
statistics:
##
1pct
5pct
10pct
##
tau3
-‐4.04
-‐3.45
-‐3.15
##
phi2
6.50
4.88
4.16
##
phi3
8.73
6.49
5.47
The
test
statistic
-‐7.38
is
far
greater
than
the
critical
value
-‐3.45
for
our
ADF
test
at
5%.
This
evidence
rejects
the
null
hypothesis
that
the
logarithmic
revenue
values
are
consistent
with
a
non-‐stationary
trend.
#
Testing
the
potential
trend
models
to
determine
which
is
statisticall
y
the
#
most
appropriate
to
include
in
this
model.
fitlin
=
dyn$lm(lrev
~
trend)
fitquad
=
dyn$lm(lrev
~
trend
+
trendsq)
fitcub
=
dyn$lm(lrev
~
trend
+
trendsq
+
trendcub)
par(mfrow=c(1,1))
plot(lrev,
xlab="",
col="gray50",
main="Trend
Lines
Over
SBUX's
Logarit
hmic
Revenue
nOver
Time")
lines(fitted(fitlin),col=2,lwd=2,lty=2)
lines(fitted(fitquad),col=4,lwd=2,lty=2)
lines(fitted(fitcub),col=6,lwd=2,lty=2)
6.
#
The
cubic
trend
seems
to
provide
the
best
fit
visually.
Since
the
mod
el
is
#
stationary,
we
can
safely
assess
the
signifcance
of
the
fit
usi
ng
t-‐test
#
statistics
and
p
values.
round(summary(fitlin)$coefficients,
4)
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
3.8042
0.0951
39.9975
0
##
trend
0.0546
0.0017
32.7463
0
round(summary(fitquad)$coefficients,
4)
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
2.7881
0.0440
63.4008
0
##
trend
0.1156
0.0021
56.3742
0
##
trendsq
-‐0.0006
0.0000
-‐30.6872
0
round(summary(fitcub)$coefficients,
4)
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
2.5468
0.0474
53.6981
0
##
trend
0.1441
0.0041
34.9149
0
7. ##
trendsq
-‐0.0013
0.0001
-‐13.7926
0
##
trendcub
0.0000
0.0000
7.5225
0
All
three
models
seem
statistically
significant
-‐
the
standard
trend
shows
the
most
statistical
promise
although
the
quadratic
one
seems
more
so
appropriate
visually.
An
Akaike
Information
Criterion
(AIC)
Test
will
help
determine
how
much
data
is
systematically
left
out
of
each
model.
The
lowest
scoring
model
would
be
the
best
fit.
#
Running
an
AIC
test
to
determine
which
model
fits
the
historical
data
the
#
best.
AIC(fitlin)
##
[1]
132.9257
AIC(fitquad)
##
[1]
-‐99.28701
AIC(fitcub)
##
[1]
-‐143.4695
#
The
cubic
model
seems
best
at
capturing
data
points.
Checking
the
aut
ocorrelation
of
residuals
for
the
cubed
trend.
acf(residuals(fitcub),
lag=12,
xlab="",
main="Auto-‐Correlation
of
Resid
uals
On
A
Cubic
Trend")
8.
#
Few
Residuals
are
significantly
correlated
and
they
become
a
lot
less
so
as
#
time
goes
on.
The
cubic
trend
is
statistically
the
most
appropriate
one
to
use
in
our
model,
displaying
the
lowest
AIC
value
and
few
correlated
residuals.
#
Testing
if
the
logarithmic
revenue
values
are
seasonal.
fitcubs
=
dyn$lm(lrev
~
trendcub
+
Q2
+
Q3
+
Q4)
summary(fitcubs)
##
##
Call:
##
lm(formula
=
dyn(lrev
~
trendcub
+
Q2
+
Q3
+
Q4))
##
##
Residuals:
##
Min
1Q
Median
3Q
Max
##
-‐2.6606
-‐0.6308
0.2962
0.8754
1.1638
##
##
Coefficients:
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
5.529e+00
2.323e-‐01
23.801
<2e-‐16
***
9. ##
trendcub
4.617e-‐06
3.886e-‐07
11.881
<2e-‐16
***
##
Q2
-‐1.991e-‐01
2.979e-‐01
-‐0.669
0.505
##
Q3
-‐1.921e-‐01
2.979e-‐01
-‐0.645
0.521
##
Q4
-‐1.226e-‐01
3.009e-‐01
-‐0.408
0.685
##
-‐-‐-‐
##
Signif.
codes:
0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1
##
##
Residual
standard
error:
1.042
on
93
degrees
of
freedom
##
Multiple
R-‐squared:
0.6037,
Adjusted
R-‐squared:
0.5867
##
F-‐statistic:
35.42
on
4
and
93
DF,
p-‐value:
<
2.2e-‐16
#
None
of
the
seasonal
dummies
are
statistically
significant.
Plotting
out
the
residuals
and
a
reference
diagram.
par(mfrow=c(1,3))
plot(lrev,
xlab="",
main="SBUX's
Logarithmic
Revenue
nOver
Time")
plot(residuals(fitcubs),
xlab="",
main="Residuals
On
A
nSeasonal
Cubic
Trend")
acf(residuals(fitcubs),
lag=12,
xlab="",
main="Auto-‐Correlation
of
nRe
siduals
On
A
Seasonal
nCubic
Trend")
#
Residuals
are
highly
and
persistently
auto-‐correlated
when
seasonal
d
ummies
are
included.
The
data
is
not
seasonal.
10.
#
An
auto-‐regressive
component
will
be
built
into
the
model.
The
AR
fac
tor
ideally
will
#
capture
many
of
the
residuals
left
over
by
the
selec
ted
trend.
resid
=
residuals(fitcub)
fitresid
=
ar(resid,
aic=TRUE,
order.max=8,
demean=FALSE,
method="ols")
ord
=
1:fitresid$order
#
Finding
the
optimal
order
to
capture
auto-‐regr
ession.
fitcubar
=
dyn$lm(lrev
~
lag(lrev,
-‐ord)
+
trendcub)
summary(fitcubar)
##
##
Call:
##
lm(formula
=
dyn(lrev
~
lag(lrev,
-‐ord)
+
trendcub))
##
##
Residuals:
##
Min
1Q
Median
3Q
Max
##
-‐0.149116
-‐0.023591
-‐0.000112
0.021632
0.112097
##
##
Coefficients:
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
3.161e-‐01
7.908e-‐02
3.998
0.000142
***
##
lag(lrev,
-‐ord)1
5.775e-‐01
1.032e-‐01
5.595
2.98e-‐07
***
##
lag(lrev,
-‐ord)2
3.450e-‐01
1.210e-‐01
2.851
0.005535
**
##
lag(lrev,
-‐ord)3
-‐6.674e-‐02
1.131e-‐01
-‐0.590
0.556923
##
lag(lrev,
-‐ord)4
6.861e-‐01
6.823e-‐02
10.055
7.54e-‐16
***
##
lag(lrev,
-‐ord)5
-‐5.617e-‐01
9.929e-‐02
-‐5.657
2.30e-‐07
***
##
lag(lrev,
-‐ord)6
-‐2.721e-‐01
1.145e-‐01
-‐2.377
0.019859
*
##
lag(lrev,
-‐ord)7
9.642e-‐02
1.026e-‐01
0.940
0.349926
##
lag(lrev,
-‐ord)8
1.585e-‐01
4.840e-‐02
3.275
0.001563
**
##
trendcub
3.524e-‐08
2.682e-‐08
1.314
0.192657
##
-‐-‐-‐
##
Signif.
codes:
0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1
##
##
Residual
standard
error:
0.03869
on
80
degrees
of
freedom
##
(16
observations
deleted
due
to
missingness)
##
Multiple
R-‐squared:
0.9992,
Adjusted
R-‐squared:
0.9991
##
F-‐statistic:
1.145e+04
on
9
and
80
DF,
p-‐value:
<
2.2e-‐16
#
As
should
be
expected
the
lags
are
much
more
statistically
significan
t
than
the
cubic
trend.
AIC(fitcubar)
##
[1]
-‐318.5753
par(mfrow=c(1,2))
plot(residuals(fitcubar),
xlab="",
main="Residuals
On
An
nAuto-‐Regress
ed
Cubic
nTrend")
11. acf(residuals(fitcubar),
lag=12,
xlab="",
main="Auto-‐Correlation
of
nR
esiduals
On
An
nAuto-‐Regressed
Cubic
nTrend")
#
The
AIC
value
has
almost
tripled,
and
the
residuals
for
this
model
ar
e
more
tightly
#
around
0
and
far
less
predictable
as
evidenced
by
their
low
auto-‐corr
elations.
The
analysis
demonstrates
that
our
model
paramaters
are
statistically
suitable
for
forecasting
SBUX's
revenue.
The
model
significantly
explains
over
99%
of
the
observations
seen
in
the
historical
data.
Forecast
myf
=
myforecast(lrev,
ord=ord,
n.ahead
=
6,
trend
=
3,
seasonal="No")
myf
##
2015
Q4
2016
Q1
2016
Q2
2016
Q3
2016
Q4
2017
Q1
##
8.499695
8.612908
8.554675
8.609586
8.609824
8.706665
myf
=
myforecast(rev,
ord=ord,
n.ahead
=
6,
trend
=
3,
seasonal="No")
round(myf,
2)
12. ##
2015
Q4
2016
Q1
2016
Q2
2016
Q3
2016
Q4
2017
Q1
##
4916.53
5589.93
5308.09
5644.72
5607.45
6297.16
The
forecast
for
2015
Q4
undershot
the
actual
realized
revenue
by
over
$400
million
or
.33
Standard
Deviations.
The
model
predicts
revenue
will
interchangably
climb
and
drop
each
quarter
before
taking
a
sharp
upward
turn
in
2017
Q1.
Each
predicted
change
in
revenue
is
notably
within
one
standard
deviation
of
the
prior
value.
Conclusion
Although
not
perfect,
the
trend
stationary
model
designed
in
this
report
does
seem
to
be
a
reliable
measure
of
where
SBUX's
revenue
will
trend
in
the
short
term.
While
the
model's
level
of
precision
may
be
questionable,
it's
obviously
a
useful
tool
for
quantitatively
informing
decisions
surrounding
the
firm's
revenue
outlook.