SalesyearQ1Q2Q3Q43.569199510004.1521995.25000000000001003.9981995.50.docx

SalesyearQ1Q2Q3Q43.569199510004.1521995.25000000000001
003.9981995.50000000000000103.7521995.7500000000000001
4.362199610005.2941996.25000000000001004.9231996.500000
00000000104.9591996.75000000000000015.657199710006.551
997.25000000000001006.2171997.50000000000000105.732199
7.75000000000000017.123199810008.1391998.2500000000000
1007.6991998.50000000000000107.2581998.750000000000000
18.9521999100010.4311999.25000000000001009.8771999.5000
0000000000109.21999.750000000000000111.1122000100012.6
182000.250000000000010011.5452000.500000000000001010.4
632000.750000000000000112.22001100014.5762001.25000000
0000010013.2892001.500000000000001013.4882001.75000000
0000000114.2822002100016.2772002.250000000000010014.47
52002.500000000000001013.2132002.750000000000000115.10
42003100017.9892003.250000000000010016.5982003.5000000
00000001015.1252003.750000000000000117.552004100019.96
2004.250000000000010018.7722004.500000000000001016.812
2004.750000000000000118.9732005000022.3052005.25000000
0000000020.7442005.500000000000000019.4892005.75000000
0000000021.4612006000026.0262006.250000000000000023.08
52006.500000000000000020.2652006.750000000000000018.58
52007000022.1842007.250000000000000018.8612007.5000000
00000000017.6592007.750000000000000017.9082008000020.9
902008.250000000000000017.7842008.500000000000000014.6
072008.750000000000000016.1752009000019.0712009.250000
000000000016.3612009.500000000000000014.5692009.750000
000000000016.8632010000019.4102010.250000000000000016.
5982010.500000000000000015.1262010.750000000000000016.
8232011000020.2322011.250000000000000017.3262011.50000
0000000000017.8082011.750000000000000017.8082012000020
.5702012.250000000000000018.1302012.500000000000000019.
1242012.7500000000000000
pricetimemonthdayyear149.39999391010213146.52010313147.3
4999083010413150.39999394010713148.14999395010813147.8
9999396010913149.64999397011013153.34999088011113153.3

0000319011413152.51001151315311011613155.512011713156.
300003113011813148.599990814012213150.399993915012313
146.550003116012413148.30000311701251314918012813149.8
00003119012913147.699996920013013146.9499969210131131
47.949996922020113144.349990823020413144.0500031240205
13142.099990825020613140.300003126020713141.0500031270
20813140.149993928021113140.649993929021213138.7530021
31313831021413136.949996932021513136.5330219131413402
2013141.7535022113143.099990836022213142.5999908370225
13142.899993938022613142.399993939022713142.6499939400
22813142.899993941030113145.949996942030413140.5430305
13140.599990844030613142.449996945030713143.3499908460
30813143.050003147031113141.649993948031213140.0500031
49031313138.899993950031413136.551031513133.1499939520
31813133.099990853031913133.599990854032013133.7555032
113135.300003156032213135.599990857032513137.599990858
032613136.599990859032713137.149993960032813138.399993
961040113136.149993962040213139.449996963040313139.564
040413140.149993965040513135.899993966040813135.399993
967040913136.050003168041013136.800003169041113135.257
0041213134.449996971041513135.849990872041613136.09999
0873041713
R Homework for Chapter 20
Coffee Prices 2013
Coffee is the world’s second largest legal export commodity
(after oil) and is the second largest source of
foreign exchange for developing nations.The Unite States
consumes about one-fifth of the world’s coffee. The
International Coffee Organization (ICO) computes a coffee
price index using Colombian, Brazilian, and a
mixture of other coffee data. Data are provided for the daily

ICO price index (in US dollars) from January
2013 to April 2013.
You can find the data file on Blackboard. Download it and put it
in the same folder as your R program file.
Then, use the following command to read the data
coffee <- read.table('Coffee_prices_2013.txt', sep = 't', header
= TRUE)
1. Make a time series plot (not scatterplot) of price against
time. Use “Coffee Price” and “Time” as label for
y-axis and x-axis, respectively. Which time series components
are evident from the plot?
2. Smooth the coffee price series using simple moving averages
(SMA) of length 2 and 8. Add the two
smoothed curves (one in red and one in green) to the plot made
in (a) and compare them.
3. Apply single exponential smoothing (SES) to the coffee price
series with weights α = 0.8 and α = 0.2,
respectively. Add the two smoothed curves (one in orange and
one in purple) to the plot made in (a) and
compare them.
4. Find autocorrelation between the original time series and
each of the first 5 lags. Then, fit an autoregressive
model with the lags whose autocorrelations are greater than 0.8.
Write down the fitted model and add the
smoothed curve in blue to the plot made in (a). Which lag does
the model depend on most? Why?
5. Suppose we know that the next value in the series was, in
fact, 138.90. Compute the corresponding
absolute percentage error (APE) for each of the models you
have fitted before. Which model gives us the best

prediction?
1
Coffee Prices 2013
Chapter 20: Time Series Analysis
Home Depot Sales
The Home Depot chain of home improvement stores grew in the
1980s and 1990s faster than any other retailer
in history. By 2005, it was the second largest retailer in the
United States. But its extraordinary record of
growth was slowed by the financial crisis of 2008. How do
different methods of modeling time series compare
for understanding these data?
Let’s first read the dataset into R
HD <- read.table('Home_Depot_2012_GE19.txt', sep = 't',
header = TRUE)
and look at its structure
str(HD)
## 'data.frame': 72 obs. of 6 variables:
## $ Sales: num 3.57 4.15 4 3.75 4.36 ...
## $ year : num 1995 1995 1996 1996 1996 ...
## $ Q1 : int 1 0 0 0 1 0 0 0 1 0 ...
## $ Q2 : int 0 1 0 0 0 1 0 0 0 1 ...
## $ Q3 : int 0 0 1 0 0 0 1 0 0 0 ...
## $ Q4 : int 0 0 0 1 0 0 0 1 0 0 ...
As we can see, there are 72 observations for each of 6 variables.
Both Sales and year are numerical variables,

while Q1, Q2, Q3 and Q4 are the indicator variables for the
quarters.
Let’s plot the time series of Home Depot quarterly sales:
plot(HD$year, HD$Sales, xlab = 'Year', ylab = 'Sales',
main = 'Home Depot Quarterly Sales 1995 - 2012')
lines(HD$year, HD$Sales)
1995 2000 2005 2010
5
1
0
1
5
2
0
2
5
Home Depot Quarterly Sales 1995 − 2012
Year
S
a
le
s
As we can see, there was a consistent increasing trend until the

end of 2006. After that, sales fell sharply.
1
They appear to have been recovering. Throughout this period,
however, there are fluctuations around the
trend that appear to be seasonal because they repeat every four
quarters.
Smoothing Methods
First, Let’s try a simple moving average (SMA). For data with a
strong seasonal component, it’s a good idea
to choose a moving average length based on the period. The
period of Home Depot sales is 4. We therefore
apply a SMA of length 2 and 4, respectively. We use SMA()
function in TTR package to do this:
library(TTR)
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
sma1 <- SMA(HD$Sales, n = 2)
sma2 <- SMA(HD$Sales, n = 4)
The sma1 and sma2 return the smoothed series, which are
plotted below:
par(mfrow = c(1, 2))
plot(HD$year, HD$Sales, xlab = 'Year', ylab = 'Sales', main =
'SMA of Length 2')
lines(HD$year, sma1, col = 'red')

'SMA of Length 4')
lines(HD$year, sma2, col = 'red')
1995 2000 2005 2010
5
1
0
1
5
2
0
2
5
SMA of Length 2
Year
S
a
le
s
1995 2000 2005 2010
5
1

0
1
5
2
0
2
5
SMA of Length 4
Year
S
a
le
s
As expected, the SMA of longer length shows a smoother series.
The SMA of length 4 smooths out most of
the seasonal effects, and has trouble modeling the sudden
change in 2007. But, it provides a good description
of the trend. The SMA of length 2 is more wiggly, capturing the
seasonal effects and the sudden change
better.
Second, let’s try an exponential moving average (EMA). We
need to choose a smoothing weight. We will use
α = 0.5, which weights the current data value equally all the rest
in the past. For comparison reasons, we
2

also try α = 0.2, which makes the smoothing heavily depend on
the data in the past. The initial value is
chosen as the first observed data point. We use EMA() function
in TTR package to do this:
ema1 <- EMA(HD$Sales, ratio = 0.5, n = 1)
ema2 <- EMA(HD$Sales, ratio = 0.2, n = 1)
'EMA (alpha = 0.5)')
lines(HD$year, ema1, col = 'red')
'EMA (alpha = 0.2)')
lines(HD$year, ema2, col = 'red')
1995 2000 2005 2010
5
1
0
1
5
2
0
2
5

EMA (alpha = 0.5)
Year
S
a
le
s
1995 2000 2005 2010
5
1
0
1
5
2
0
2
5
EMA (alpha = 0.2)
Year
S
a
le
s

As we can see, the EMA with a bigger α provides a less smooth
series. The EMA of α = 0.5 follows the
seasonal pattern more closely and models the sudden change
better. The EMA of α = 0.2 totally fails,
because it is too smooth to capture the seasonal effects, and
responds too slowly to the trend (under-estimate
before 2007 and over-estimate after).
Autoregressive Method
Let’s fit an autoregressive model. Because we know that the
seasonal pattern is of 4 quarters long, we use 4
lagged variables as the predictors. The autocorrelations
regarding those lags are given by
acf(HD$Sales, lag.max = 4, plot = FALSE)
##
## Autocorrelations of series 'HD$Sales', by lag
##
## 0 1 2 3 4
## 1.000 0.912 0.838 0.831 0.838
It’s interesting to see that the autocorrelation of lag4 is 0.838,
the second largest among all. It is also the
evidence of the seasonal effects.
3
Now let’s use ar() function to fit the autoregressive model:
ar1 <- ar(HD$Sales, aic = FALSE, order.max = 4, demean =
FALSE,
intercept = TRUE, method = 'ols')

ar1
##
## Call:
## ar(x = HD$Sales, aic = FALSE, order.max = 4, method =
"ols", demean = FALSE, intercept = TRUE)
##
## Coefficients:
## 1 2 3 4
## 0.4695 -0.2551 0.2233 0.4871
##
## Intercept: 1.694 (0.4544)
##
## Order selected 4 sigma^2 estimated as 1.612
The fitted AR(4) model is
ŷt = 1.694 + 0.4695yt−1 − 0.255yt−2 + 0.223yt−3 + 0.487yt−4
Unfortunately, the ar1 does not provide the fitted values. But,
we can use the residuals e to compute them
using ŷ = y − e:
fitted.ar1 <- HD$Sales - ar1$resid
Let’s plot the fitted series and the residuals:
'AR(4)')
lines(HD$year, fitted.ar1, col = 'red')
plot(HD$year, ar1$resid, xlab = 'Year', ylab = 'Residuals')
abline(0, 0)
1995 2000 2005 2010

5
1
0
1
5
2
0
2
5
AR(4)
Year
S
a
le
s
1995 2000 2005 2010
−
4
−
2
0
2

4
Year
R
e
si
d
u
a
ls
As we can see, the AR(4) model follows the seasonal pattern
well, but seems to have great difficulty capturing
the sudden change in 2007. The residual plot shows some
disturbance around the time of the financial
4
crisis. Because sales at that time stopped resembling previous
behavior with respect to growth, the 4 lagged
variables are less successful predictors.
Multiple Regression-based Model
Let’s fit a multiple regression model to Sales using year and the
dummy variables. We use year to model
trend and the dummy variables seasonal component. Since there
is a linear trend until the end of 2006, we
will fit the model to the time series before 2007 only. Note that
we will use the last quarter Q4 as baseline, so

only put the first three dummy variables in the model.
HD.new <- HD[HD$year<2007,]
imod <- lm(Sales ~ year + Q1 + Q2 + Q3, data = HD.new)
summary(imod)
##
## Call:
## lm(formula = Sales ~ year + Q1 + Q2 + Q3, data = HD.new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9176 -0.6004 -0.1046 0.3649 4.7269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.515e+03 1.084e+02 -32.414 < 2e-16 ***
## year 1.762e+00 5.414e-02 32.554 < 2e-16 ***
## Q1 5.885e-01 4.967e-01 1.185 0.242652
## Q2 1.755e+00 4.921e-01 3.567 0.000901 ***
## Q3 4.554e-01 4.878e-01 0.934 0.355686
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.178 on 43 degrees of freedom
## Multiple R-squared: 0.966, Adjusted R-squared: 0.9629
## F-statistic: 305.7 on 4 and 43 DF, p-value: < 2.2e-16
The coefficient of year is positive and highly significant,
showing a positive linear trend in the time series of
Sales . The significant coefficient of Q2 indicates that the sales
in second quarter is on average significantly
higher than the last quarter (baseline). The whole model looks
great with high R2 and significant F-test.
Comparison

So far, we have applied a few different methods to the Home
Depot series. Suppose we want to forecast
the sales for the first quarter of 2013. (Note: the original series
ends at the fourth quarter of 2012.) In the
smoothing methods, we use the last average in the series to
forecast:
(yhat.sma1 <- sma1[length(sma1)])
## [1] 18.627
(yhat.sma2 <- sma2[length(sma2)])
## [1] 18.908
5
(yhat.ema1 <- ema1[length(ema1)])
## [1] 18.89295
(yhat.ema2 <- ema2[length(ema2)])
## [1] 18.38313
In the autoregressive model, we may use predict() function to
forecast:
(yhat.ar1 <- predict(ar1, n.ahead = 1, se.fit = FALSE))
## Time Series:
## Start = 73
## End = 73
## Frequency = 1
## [1] 19.31656
In the multiple regression, we also need predict() function to
forecast:

data.new <- data.frame(year = 2013, Q1 = 1, Q2 = 0, Q3 = 0)
yhat <- predict(imod, newdata = data.new)
We know that the actual sales in the first quarter of 2013 were
$19.124$B. Which model does provide the
best prediction? We can compute absolute percentage error
(APE) for each prediction:
y.true <- 19.124
abs(y.true - yhat.sma1)/abs(y.true)*100
## [1] 2.598829
abs(y.true - yhat.sma2)/abs(y.true)*100
## [1] 1.129471
abs(y.true - yhat.ema1)/abs(y.true)*100
## [1] 1.208153
abs(y.true - yhat.ema2)/abs(y.true)*100
## [1] 3.87401
abs(y.true - yhat.ar1)/abs(y.true)*100
## Time Series:
## Start = 73
## End = 73
## Frequency = 1
## [1] 1.006905
abs(y.true - yhat)/abs(y.true)*100
## 1
## 76.65737
The AR(4) model seems to offer the best prediction, because it
yields the smallest APE (1.01%). The multiple
regression model gives the worst forecast, because the financial
crisis changed the trend of the time series

after 2007, but the model assumes the trend remains the same.
6
Home Depot SalesSmoothing MethodsAutoregressive
MethodMultiple Regression-based ModelComparison

SalesyearQ1Q2Q3Q43.569199510004.1521995.25000000000001003.9981995.50.docx

Recommended

Recommended

More Related Content

Similar to SalesyearQ1Q2Q3Q43.569199510004.1521995.25000000000001003.9981995.50.docx

Similar to SalesyearQ1Q2Q3Q43.569199510004.1521995.25000000000001003.9981995.50.docx (20)

More from jeffsrosalyn

More from jeffsrosalyn (20)

Recently uploaded

Recently uploaded (20)

SalesyearQ1Q2Q3Q43.569199510004.1521995.25000000000001003.9981995.50.docx