Forecasting Revenue With Stationary Time Series Models

Forecasting
Revenue
With
Stationary
Time
Series
Models

Geoffery
Mullings

April
26,
2016

Executive
Summary

A
trend
stationary
model
is
used
to
forecast
Starbucks'
(SBUX)
revenue
from
a

snapshot
of
historical
data.
The
report
provides
a
prediction
of
2015
Q4
revenue
to

demonstrate
the
forecast's
validity
in
addition
to
predictions
for
up
to
five
quarters

further
ahead.
The
forecast
for
2015
Q4
undershot
SBUX's
actual
realized
revenue

by
over
$400
million
or
.33
Standard
Deviation.
Still,
the
model
correctly
predicted

that
revenue
would
climb
in
the
short-‐run.

Introduction

Effective
forecasts
capture
patterns
in
observations
along
multiple
dimensions.

Comprehensive
forecasts
include
trend
direction,
stationarity,
seasonality,
and
auto-‐
regression
of
observations.
Additionally
a
forecast
will
most
likely
hold
more

validity
if
it
captures
as
much
information
as
possible
from
the
relevant
history
of

data.
The
model
providing
the
forecasts
of
Starbucks'
(SBUX)
revenue
considers
all

of
those
dimensions.

Methodology

Data
ETL:

library(zoo)

##

##
Attaching
package:
'zoo'

##
The
following
objects
are
masked
from
'package:base':

##

##

as.Date,
as.Date.numeric

library(dyn)

library(urca)

temp
=
tempfile()

download.file("http://faculty.baruch.cuny.edu/smanzan/eco9723/files/EPS
_REV_FALL2015.csv",temp)

data
=
read.csv(temp)

unlink(temp)

mytick
=
"SBUX"
#
Starbucks

index
=
which(data[,"tic"]
==
mytick)

mydata
=
data[index,]

Data
Munging
and
Further
Transformation
and
Loading:

sum(is.na(mydata))
#
Two
missing
values
in
the
data.

##
[1]
2

head(mydata)
#
Both
missing
values
are
in
the
first
row
of
observations
.

##

datadate

tic
datacqtr
epsfxq

revtq

##
7775
12/31/90
SBUX

1990Q4

NA

NA

##
7776

3/31/91
SBUX

1991Q1

0.15
27.042

##
7777

6/30/91
SBUX

1991Q2

0.05
14.538

##
7778

9/30/91
SBUX

1991Q3

0.04
16.070

##
7779
12/31/91
SBUX

1991Q4

0.15
22.483

##
7780

3/31/92
SBUX

1992Q1

0.05
20.277

NoNamydata
=
mydata[-‐1,]
#
Removing
NA
values
from
the
data.

startdate
=
NoNamydata[2,
"datacqtr"]

rev
=
zooreg(NoNamydata[,
"revtq"],
start=as.yearqtr(startdate),
freque
ncy=4)
#
Rev
is
the
quarterly
revenue
value
for
Starbucks.

#
Creating
the
Trend
and
Dummy
Variables

trend
=
zooreg(1:length(rev),
start=as.yearqtr(startdate),
frequency=4)

trendsq
=
trend^2

trendcub
=
trend^3

Q1
=
zooreg(as.numeric(cycle(rev)
==1),
start=start(rev),
frequency=4)

Q2
=
==2),
start=start(rev),
frequency=4)

Q3
=
==3),
start=start(rev),
frequency=4)

Q4
=
==4),
start=start(rev),
frequency=4)

#
Determining
whether
to
use
the
log
of
SBUX's
revenue.

par(mfrow=c(2,2))

plot(rev,xlab="",
main="SBUX's
Revenue
Values
nOver
Time")

plot(log(rev),
xlab="",
main="SBUX's
Logarithmic
nRevenue
Over
Time")

plot(diff(rev),
xlab="",
main="Changes
in
SBUX's
nRevenue
Over
Time")

plot(diff(log(rev)),
xlab="",
main="Changes
in
SBUX's
Logarithmic
nRev
enue
Over
Time")

Logarithmic
values
are
easier
to
linerarize
and
are
generally
accurate
absent
large

changes
in
values.
This
data
from
Starbucks
seems
to
be
an
ideal
candidate
for

logarithmic
transformation.

Data
Analysis:

lrev
=
log(rev)

dlrev
=
diff(lrev)

#
Estimating
the
statistical
significance
of
the
lags
and
trend
variabl
es
to

#
predicting
logarithmic
revenue
values.

adffit
=
dyn$lm(dlrev
~
lag(rev,
-‐1)
+
lag(dlrev,
-‐1:-‐4)
+
trend
+
Q2
+

Q3
+
Q4)

summary(adffit)

##

##
Call:

##
lm(formula
=
dyn(dlrev
~
lag(rev,
-‐1)
+
lag(dlrev,
-‐1:-‐4)
+
trend
+

##

Q2
+
Q3
+
Q4))

##

##
Residuals:

##

Min

1Q

Median

3Q

Max

##
-‐0.138063
-‐0.022155

0.004575

0.022489

0.111106

##

##
Coefficients:

##

Estimate
Std.
Error
t
value
Pr(>|t|)

##
(Intercept)

3.094e-‐01

3.688e-‐02

8.388
1.09e-‐12
***

##
lag(rev,
-‐1)

6.412e-‐05

1.472e-‐05

4.356
3.77e-‐05
***

##
lag(dlrev,
-‐1:-‐4)1
-‐2.986e-‐01

9.993e-‐02

-‐2.988

0.00369
**

##
lag(dlrev,
-‐1:-‐4)2
-‐7.593e-‐02

1.065e-‐01

-‐0.713

0.47788

##
lag(dlrev,
-‐1:-‐4)3
-‐1.545e-‐01

1.010e-‐01

-‐1.530

0.12991

##
lag(dlrev,
-‐1:-‐4)4

1.370e-‐01

4.875e-‐02

2.810

0.00617
**

##
trend

-‐4.903e-‐03

9.123e-‐04

-‐5.375
6.89e-‐07
***

##
Q2

-‐1.504e-‐01

2.350e-‐02

-‐6.400
8.77e-‐09
***

##
Q3

-‐7.619e-‐02

2.375e-‐02

-‐3.208

0.00190
**

##
Q4

-‐5.520e-‐02

2.670e-‐02

-‐2.067

0.04182
*

##
-‐-‐-‐

##
Signif.
codes:

0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1

##

##
Residual
standard
error:
0.04118
on
83
degrees
of
freedom

##

(9
observations
deleted
due
to
missingness)

##
Multiple
R-‐squared:

0.8447,
Adjusted
R-‐squared:

0.8279

##
F-‐statistic:
50.16
on
9
and
83
DF,

p-‐value:
<
2.2e-‐16

#
Using
an
Augmented
Dickey-‐Fuller
(ADF)
Test
to
test
the
null
hypothes
is
that

#
the
logarithmic
revenue
values
are
non-‐stationary
with
a
trend.

#
Fourth
lag
seems
statistically
significant
to
predicting
revenue,
so

the
ADF

#
test
will
be
run
with
that
many
lags.

An
Augmented
Dickey-‐Fuller
Test
will
assess
the
null
hypothesis
that
the

logarithmic
revenue
values
follow
a
non-‐stationary
trend.
Non-‐Stationary
trends

require
a
unique
set
of
statistical
testing
to
accurately
determine
the
significance
of

predictors.

adf
=
ur.df(lrev,
type="trend",
lags=4)

summary(adf)

##

##
###############################################

##
#
Augmented
Dickey-‐Fuller
Test
Unit
Root
Test
#

##
###############################################

##

##
Test
regression
trend

##

##

##
Call:

##
lm(formula
=
z.diff
~
z.lag.1
+
1
+
tt
+
z.diff.lag)

##

##
Residuals:

##

Min

1Q

Median

3Q

Max

##
-‐0.159265
-‐0.026954
-‐0.000358

0.026902

0.100777

##

##
Coefficients:

##

Estimate
Std.
Error
t
value
Pr(>|t|)

##
(Intercept)

0.6104241

0.0599690

10.179

<
2e-‐16
***

##
z.lag.1

-‐0.0894058

0.0121152

-‐7.380
9.34e-‐11
***

##
tt

0.0020192

0.0006394

3.158

0.00219
**

##
z.diff.lag1
-‐0.4934043

0.0629353

-‐7.840
1.11e-‐11
***

##
z.diff.lag2
-‐0.3609311

0.0735722

-‐4.906
4.36e-‐06
***

##
z.diff.lag3
-‐0.4404602

0.0681291

-‐6.465
5.88e-‐09
***

##
z.diff.lag4

0.3007123

0.0468534

6.418
7.25e-‐09
***

##
-‐-‐-‐

##
Signif.
codes:

0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1

##

##
Residual
standard
error:
0.04599
on
86
degrees
of
freedom

##
Multiple
R-‐squared:

0.7993,
Adjusted
R-‐squared:

0.7853

##
F-‐statistic:

57.1
on
6
and
86
DF,

p-‐value:
<
2.2e-‐16

##

##

##
Value
of
test-‐statistic
is:
-‐7.3796
41.4224
55.4738

##

##
Critical
values
for
test
statistics:

##

1pct

5pct
10pct

##
tau3
-‐4.04
-‐3.45
-‐3.15

##
phi2

6.50

4.88

4.16

##
phi3

8.73

6.49

5.47

The
test
statistic
-‐7.38
is
far
greater
than
the
critical
value
-‐3.45
for
our
ADF
test
at

5%.
This
evidence
rejects
the
null
hypothesis
that
the
logarithmic
revenue
values

are
consistent
with
a
non-‐stationary
trend.

#
Testing
the
potential
trend
models
to
determine
which
is
statisticall
y
the
#
most
appropriate
to
include
in
this
model.

fitlin

=
dyn$lm(lrev
~
trend)

fitquad
=
dyn$lm(lrev
~
trend
+
trendsq)

fitcub
=
dyn$lm(lrev
~
trend
+
trendsq
+
trendcub)

par(mfrow=c(1,1))

plot(lrev,
xlab="",
col="gray50",
main="Trend
Lines
Over
SBUX's
Logarit
hmic
Revenue
nOver
Time")

lines(fitted(fitlin),col=2,lwd=2,lty=2)

lines(fitted(fitquad),col=4,lwd=2,lty=2)

lines(fitted(fitcub),col=6,lwd=2,lty=2)

#
The
cubic
trend
seems
to
provide
the
best
fit
visually.
Since
the
mod
el
is
#
stationary,
we
can
safely
assess
the
signifcance
of
the
fit
usi
ng
t-‐test

#
statistics
and
p
values.

round(summary(fitlin)$coefficients,
4)

##

Estimate
Std.
Error
t
value
Pr(>|t|)

##
(Intercept)

3.8042

0.0951
39.9975

0

##
trend

0.0546

0.0017
32.7463

0

round(summary(fitquad)$coefficients,
4)

##

Estimate
Std.
Error

t
value
Pr(>|t|)

##
(Intercept)

2.7881

0.0440

63.4008

0

##
trend

0.1156

0.0021

56.3742

0

##
trendsq

-‐0.0006

0.0000
-‐30.6872

0

round(summary(fitcub)$coefficients,
4)

##

Estimate
Std.
Error

t
value
Pr(>|t|)

##
(Intercept)

2.5468

0.0474

53.6981

0

##
trend

0.1441

0.0041

34.9149

0

##
trendsq

-‐0.0013

0.0001
-‐13.7926

0

##
trendcub

0.0000

0.0000

7.5225

0

All
three
models
seem
statistically
significant
-‐
the
standard
trend
shows
the
most

statistical
promise
although
the
quadratic
one
seems
more
so
appropriate
visually.

An
Akaike
Information
Criterion
(AIC)
Test
will
help
determine
how
much
data
is

systematically
left
out
of
each
model.
The
lowest
scoring
model
would
be
the
best

fit.

#
Running
an
AIC
test
to
determine
which
model
fits
the
historical
data

the

#
best.

AIC(fitlin)

##
[1]
132.9257

AIC(fitquad)

##
[1]
-‐99.28701

AIC(fitcub)

##
[1]
-‐143.4695

#
The
cubic
model
seems
best
at
capturing
data
points.
Checking
the
aut
ocorrelation
of
residuals
for
the
cubed
trend.

acf(residuals(fitcub),
lag=12,
xlab="",
main="Auto-‐Correlation
of
Resid
uals
On
A
Cubic
Trend")

#
Few
Residuals
are
significantly
correlated
and
they
become
a
lot
less

so
as

#
time
goes
on.

The
cubic
trend
is
statistically
the
most
appropriate
one
to
use
in
our
model,

displaying
the
lowest
AIC
value
and
few
correlated
residuals.

#
Testing
if
the
logarithmic
revenue
values
are
seasonal.

fitcubs
=
dyn$lm(lrev
~
trendcub
+
Q2
+
Q3
+
Q4)

summary(fitcubs)

##

##
Call:

##
lm(formula
=
dyn(lrev
~
trendcub
+
Q2
+
Q3
+
Q4))

##

##
Residuals:

##

Min

1Q

Median

3Q

Max

##
-‐2.6606
-‐0.6308

0.2962

0.8754

1.1638

##

##
Coefficients:

##

Estimate
Std.
Error
t
value
Pr(>|t|)

##
(Intercept)

5.529e+00

2.323e-‐01

23.801

<2e-‐16
***

##
trendcub

4.617e-‐06

3.886e-‐07

11.881

<2e-‐16
***

##
Q2

-‐1.991e-‐01

2.979e-‐01

-‐0.669

0.505

##
Q3

-‐1.921e-‐01

2.979e-‐01

-‐0.645

0.521

##
Q4

-‐1.226e-‐01

3.009e-‐01

-‐0.408

0.685

##
-‐-‐-‐

##
Signif.
codes:

0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1

##

##
Residual
standard
error:
1.042
on
93
degrees
of
freedom

##
Multiple
R-‐squared:

0.6037,
Adjusted
R-‐squared:

0.5867

##
F-‐statistic:
35.42
on
4
and
93
DF,

p-‐value:
<
2.2e-‐16

#
None
of
the
seasonal
dummies
are
statistically
significant.
Plotting

out
the
residuals
and
a
reference
diagram.

par(mfrow=c(1,3))

plot(lrev,
xlab="",
main="SBUX's
Logarithmic
Revenue
nOver
Time")

plot(residuals(fitcubs),
xlab="",
main="Residuals
On
A
nSeasonal
Cubic

Trend")

acf(residuals(fitcubs),
lag=12,
xlab="",
of
nRe
siduals
On
A
Seasonal
nCubic
Trend")

#
Residuals
are
highly
and
persistently
auto-‐correlated
when
seasonal
d
ummies
are
included.
The
data
is
not
seasonal.

#
An
auto-‐regressive
component
will
be
built
into
the
model.
The
AR
fac
tor
ideally
will
#
capture
many
of
the
residuals
left
over
by
the
selec
ted
trend.

resid
=
residuals(fitcub)

fitresid
=
ar(resid,
aic=TRUE,
order.max=8,
demean=FALSE,
method="ols")

ord
=
1:fitresid$order
#
Finding
the
optimal
order
to
capture
auto-‐regr
ession.

fitcubar
=
dyn$lm(lrev
~
lag(lrev,
-‐ord)
+
trendcub)

summary(fitcubar)

##

##
Call:

##
lm(formula
=
dyn(lrev
~
lag(lrev,
-‐ord)
+
trendcub))

##

##
Residuals:

##

Min

1Q

Median

3Q

Max

##
-‐0.149116
-‐0.023591
-‐0.000112

0.021632

0.112097

##

##
Coefficients:

##

Estimate
Std.
Error
t
value
Pr(>|t|)

##
(Intercept)

3.161e-‐01

7.908e-‐02

3.998
0.000142
***

##
lag(lrev,
-‐ord)1

5.775e-‐01

1.032e-‐01

5.595
2.98e-‐07
***

##
lag(lrev,
-‐ord)2

3.450e-‐01

1.210e-‐01

2.851
0.005535
**

##
lag(lrev,
-‐ord)3
-‐6.674e-‐02

1.131e-‐01

-‐0.590
0.556923

##
lag(lrev,
-‐ord)4

6.861e-‐01

6.823e-‐02

10.055
7.54e-‐16
***

##
lag(lrev,
-‐ord)5
-‐5.617e-‐01

9.929e-‐02

-‐5.657
2.30e-‐07
***

##
lag(lrev,
-‐ord)6
-‐2.721e-‐01

1.145e-‐01

-‐2.377
0.019859
*

##
lag(lrev,
-‐ord)7

9.642e-‐02

1.026e-‐01

0.940
0.349926

##
lag(lrev,
-‐ord)8

1.585e-‐01

4.840e-‐02

3.275
0.001563
**

##
trendcub

3.524e-‐08

2.682e-‐08

1.314
0.192657

##
-‐-‐-‐

##
Signif.
codes:

0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1

##

##
Residual
standard
error:
0.03869
on
80
degrees
of
freedom

##

(16
observations
deleted
due
to
missingness)

##
Multiple
R-‐squared:

0.9992,
Adjusted
R-‐squared:

0.9991

##
F-‐statistic:
1.145e+04
on
9
and
80
DF,

p-‐value:
<
2.2e-‐16

#
As
should
be
expected
the
lags
are
much
more
statistically
significan
t
than
the
cubic
trend.

AIC(fitcubar)

##
[1]
-‐318.5753

par(mfrow=c(1,2))

plot(residuals(fitcubar),
xlab="",
main="Residuals
On
An
nAuto-‐Regress
ed
Cubic
nTrend")

acf(residuals(fitcubar),
lag=12,
xlab="",
of
nR
esiduals
On
An
nAuto-‐Regressed
Cubic
nTrend")

#
The
AIC
value
has
almost
tripled,
and
the
residuals
for
this
model
ar
e
more
tightly

#
around
0
and
far
less
predictable
as
evidenced
by
their
low
auto-‐corr
elations.

The
analysis
demonstrates
that
our
model
paramaters
are
statistically
suitable
for

forecasting
SBUX's
revenue.
The
model
significantly
explains
over
99%
of
the

observations
seen
in
the
historical
data.

Forecast

myf
=
myforecast(lrev,
ord=ord,
n.ahead
=
6,
trend
=
3,
seasonal="No")

myf

##

2015
Q4

2016
Q1

2016
Q2

2016
Q3

2016
Q4

2017
Q1

##
8.499695
8.612908
8.554675
8.609586
8.609824
8.706665

myf
=
myforecast(rev,
ord=ord,
n.ahead
=
6,
trend
=
3,
seasonal="No")

round(myf,
2)

##
2015
Q4
2016
Q1
2016
Q2
2016
Q3
2016
Q4
2017
Q1

##
4916.53
5589.93
5308.09
5644.72
5607.45
6297.16

The
forecast
for
2015
Q4
undershot
the
actual
realized
revenue
by
over
$400

million
or
.33
Standard
Deviations.
The
model
predicts
revenue
will
interchangably

climb
and
drop
each
quarter
before
taking
a
sharp
upward
turn
in
2017
Q1.
Each

predicted
change
in
revenue
is
notably
within
one
standard
deviation
of
the
prior

value.

Conclusion

Although
not
perfect,
the
trend
stationary
model
designed
in
this
report
does
seem

to
be
a
reliable
measure
of
where
SBUX's
revenue
will
trend
in
the
short
term.
While

the
model's
level
of
precision
may
be
questionable,
it's
obviously
a
useful
tool
for

quantitatively
informing
decisions
surrounding
the
firm's
revenue
outlook.

Forecasting Revenue With Stationary Time Series Models

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Forecasting Revenue With Stationary Time Series Models

Similar to Forecasting Revenue With Stationary Time Series Models (20)

Forecasting Revenue With Stationary Time Series Models