SlideShare a Scribd company logo
1 of 11
Download to read offline
Australian Red Wine
Time Series Data Analysis
Julia L. Nickle
Predict 411 – Section 57
11/28/2012
2
Introduction
As discussed in Assignment 7, the modeling of time series data requires the data to be
stationary, assuming constant mean and variance. If a data series is non-stationary, an ARIMA
model cannot be properly fit, resulting in inaccurate and inefficient forecasts. The purpose of this
assignment is to explore the Australian red wine data set, which covers monthly sales over 11
years, and efficiently generate forecasts for ten periods ahead.
Initial analysis reveals the data set suffers from both non-stationarity and seasonality. In
order for the non-stationary series to become stationary, the data must be transformed to ensure
the mean, variance and autocorrelation structure must not change over time (e-Handbook of
Statistical Methods, 2012). Time series data that has been modified to account for this will show
no trends and constant variance over time. When a series suffers from seasonality, there are
periodic fluctuations; or reoccurring peaks and declines in the data. This can often be corrected
through differencing, meaning taking the difference of the change from one period to the next
(Nau, 2005). Because differencing can result in lost observations, time series analysis calls for
large sample sizes. Another viable remedy is creating a multiplicative model that accounts for
seasonality.
Once non-stationarity and seasonality issues are remedied, the series can be identified,
checked, and forecasted. This analysis of the Australian red wine data set covers its
transformation to a stationary series, accounting for seasonality, identification methods through
autocorrelation function and partial autocorrelation function plots, as well as fitting the data to
various models. Finally, this analysis includes forecasts of red wine sales as suggested by an
optimal model.
3
Data
The Australian red wine data set consists of 142 observations. The data was collected
from January 1980 through October 1991 and includes the monthly sales, in kiloliters, of red
wine. Simple descriptive statistics show the mean of the series as 1,477.77 with a standard
deviation of 531.03. A time series plot illustrates an upward trend and seasonal pattern within the
data. Sales in kiloliters appear to peak in July and drop during January. This visualization
indicates the data is affected by both non-stationarity and seasonality.
As constant variance over time cannot be assumed, the data needs to be transformed in
order to utilize the Box-Jenkins method for time series analysis. Transforming the data into a
stationary series may be as simple as taking the natural log of sales in kiloliters. The time series
plot of this transformation shows that the variability steadied somewhat, but further
transformation is necessary. Taking the first difference at lag 1 eliminates the upward trend,
achieving stationarity; the change can be seen in the time series plot after differencing. However,
the issue of seasonality still remains. Additional differencing at lag 12 accounts for the issue,
resulting in a time series plot that shows the Australian red wine data set as stationary and
unaffected by seasonality. The series is ready to move forward with the next steps in the
identification portion of the Box-Jenkins method.
Analysis
The best practice to identify an appropriate model for the Australian red wine data set
begins with an evaluation of the series’ autocorrelation and partial autocorrelation function plots
in addition to an assessment of the autocorrelation check for white noise. Analysis begins with
the original, non-stationary data set. Proc ARIMA output illustrates the results of white noise
4
test; the autocorrelations up to lag 36 are highly significant with p-values <.0001. The null
hypothesis, which states that none of the autocorrelations up to a given lag are significantly
different than zero, can be rejected, meaning that an ARIMA model is in fact necessary for the
data to be forecasted accurately. However, with the Australian red wine data set in its original
state, affected by non-stationarity and seasonality, forecasts will be unreliable and inaccurate.
Results for the data set using the natural log of sales in kiloliters are similar; while the variability
is less erratic, the series is still too unstable for effective modeling. The autocorrelation function
plot clearly demonstrates the effect of non-stationarity, featuring slow decays and increases.
Modeling the series after taking the first difference proves to be a step in the right
direction. As previously stated, the time series plot illustrates constant variance, yet, there are
still significant peaks within the set. Moving to model the log of sales in kiloliters after both the
first differencing and the difference at lag 12 provides a better solution. The autocorrelation
check for white noise is consistent; the p-values are highly significant at <.0001 through lag 36.
The ACF plot shows a sharp drop after lag one; the drops continue until the series dips below
zero. The PACF plot shows a slightly more stable, exponential decay over the lags. Overall,
however, the plots are similar, which suggests that a mixed model might be the best way to
represent the series. To determine whether or not this is accurate, the series should be fit to both
AR and MA models, and comparing results to a mixed, ARMA model.
Proc ARIMA output illustrates how well the Australian red wine data set fits to an MA
model of order 1. The moving average term of the model is significant with a p-value of .0003.
However, the autocorrelation check of residuals shows a highly significant p-value of <.0001 at
lag 12. Therefore, the MA(1) model is not sufficient to represent the Australian red wine series.
5
Moreover, if the data is fitted to an MA(12) model, results show that none of the MA terms are
significant. Taken as a whole, it appears that an MA model alone is not suitable for the data set.
If the series is fitted to an AR(1) model, results show the AR term as highly significant,
with a p-value <.0001. But, as with the MA(1) model, the autocorrelation check of residuals
shows a highly significant p-value of <.0001 at lag 12. Not all of the AR parameters are
significant to the model, however. Parameters 1, 5, 8, 11, and 12 are significant to the model
while the rest of the parameters are unnecessary and do not need to be included.
Perhaps because the data illustrates trend and seasonal components, the series should be
represented by a multiplicative model with differencing at lags 1 and 12. PROC ARIMA shows
that a multiplicative AR(1,12) model proves to be a decent fit, but is not ideal. Both AR terms
are highly significant with p-values < .0001. Fit statistics show an AIC of -147.262 and an SBC
of -138.683 with a standard error of 0.135171. Yet, the autocorrelation check of residuals shows
significant p-values to lag 24.
The multiplicative MA(1,12) model, according to fit statistics, provides a better fit. Not
only are both MA terms are highly significant, but the AIC and SBC values are smaller; -184.973
and -176.394, respectively. Moreover, the standard error estimate is smaller, at .011679. Here,
the autocorrelation check of residuals test cannot be rejected; none of the p-values are
statistically significant. This indicates that the model provides an adequate fit to the data. None
of the other models thus far confirmed this through each lag. Additionally, according to the Q-Q
plot, the residuals appear normally distributed.
As a precaution, the data should also be modeled by a multiplicative ARMA(1,12),
because the ACF and PACF indicated an ARMA might best represent the series. However,
PROC ARIMA output shows that of the 5 parameters, only the MA terms are significant. As
6
such, the multiplicative MA(1,12) model appears to provide the optimal fit to the Australian red
wine data set.
Proc ARIMA initially forecasts results as log values of sales so final values are
transformed to exponentiate the forecast values. Using the MA(1,12) model, forecasts of sales in
kiloliters for ten periods ahead show values ranging from 1,123.58 to 2,885.36. The average
forecast is 2,095.61 with a standard deviation of 166.37. Lower and upper confidence limits
range from 878.10 to 2,188.17 and 1,416.46 to 3,734.57, respectively. The average standard error
for the ten forecasts is .127983. Visually, the forecasted values decrease between observation
144 and 146, but begin to increase steadily between observation 148 and 151 and then taper off
between before observation 152.
Summary/Conclusions:
Exploratory analysis of the Australian red wine data set using time series plots reveal the
set’s non-stationary nature and sensitivity to seasonality. In order to appropriate apply the Box-
Jenkins method for time series data analysis, the set requires transformation. Only when the data
is stationary and seasonality is accounted for, can the series be identified correctly. During this
critical step in the Box-Jenkins process, ACF and PACF plots illustrate the data set’s best
possible representation through a mixed, ARMA model. Evaluation of AR(1), AR(12), MA(1),
(MA12) models reiterated the data set’s need to be fit to a multiplicative or a multiplicative
mixed model.
A multiplicative model is more useful in this scenario as the Australian red wine data set
suffers from seasonality. The model must take into account higher and lower value proportions,
rather than assume their difference is constant (Box, Jenkins, & Reinsel, 2008). In other words, a
7
multiplicative model assumes seasonal effects act proportionally on the data series. Of the three
multiplicative models, the MA(1,12) model performed best and proved to be sufficient for the
data series. Additionally, values forecasted from the MA(1,12) model have minimal standard
errors, maintain 95% confidence and appear logical and consistent with the rest of the series.
Future Work
If more time were available, it may be beneficial to consider the other orders suggested
by the smallest canonical correlation method (SCAN) and the extended sample autocorrelation
function (ESACF). SCAN and ESACF methods provide valuable suggestions from which to
uncover the order of a time series model (SAS, 2010). Each method proposed the ARMA(3,3)
model as optimal, but it might be interesting to compare results to other recommended mixed
models: ARMA(5,3), ARMA(1,5), and ARMA(2,5). Fit statistics of these models including AIC,
SBC, and standard error estimates might show a better fit, resulting in different and potentially
more accurate forecasts. Additionally, because these other models appear sufficient to represent
the data, it might be worthwhile to complete forecasted values. Afterwards, the forecasts could
be compared to the MA(1,12) values and evaluated for accuracy.
References
SAS. (2010, April). Retrieved November 25, 2012, from SAS/STAT(R) 9.2 User's Guide, Second
Edition: http://support.sas.com/
e-Handbook of Statistical Methods. (2012). Retrieved November 24, 2012, from
NIST/SEMATECH: http://www.itl.nist.gov/div898/handbook/
Box, G. E., Jenkins, G. M., & Reinsel, G. C. (2008). Time Series Analysis, Forecasting and
Control. Hoboken: John Wiley & Sons, Inc.
8
Nau, R. (2005, May 15). Statistical Forecasting. Retrieved November 24, 2012, from
Stationarity and differencing: http://people.duke.edu/~rnau/411diff.htm
Appendix
Time Series Plot of the Australian Red Wine Dataset
Time Series Plot of the Australian Red Wine Dataset w/ Natural Log,
9
Name of Variable = log_sales
Period(s) of Differencing 1
Mean of Working Series 0.010527
Standard Deviation 0.271498
Number of Observations 141
Observation(s) eliminated by differencing 1
Autocorrelation Check for White Noise
To
Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 31.48 6 <.0001 -0.240 -0.100 0.065 0.034 -0.023 -0.374
12 122.91 12 <.0001 -0.062 0.048 0.042 -0.089 -0.172 0.735
18 146.28 18 <.0001 -0.136 -0.088 0.075 0.063 -0.066 -0.322
24 218.84 24 <.0001 -0.055 0.027 0.029 -0.095 -0.125 0.626
30 239.26 30 <.0001 -0.099 -0.090 0.056 0.099 -0.092 -0.272
36 308.42 36 <.0001 -0.058 0.022 0.046 -0.111 -0.109 0.575
10
Final Multiplicative MA(1,12) Model with Forecast
Conditional Least Squares Estimation
Parameter Estimate
Standard
Error t Value
Approx
Pr > |t| Lag
MU -0.0005585 0.0007340 -0.76 0.4481 0
MA1,1 0.78686 0.05565 14.14 <.0001 1
MA2,1 0.75201 0.06917 10.87 <.0001 12
Constant Estimate -0.00056
Variance Estimate 0.01364
Std Error Estimate 0.11679
AIC -184.973
SBC -176.394
Number of Residuals 129
Autocorrelation Check of Residuals
To
Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 4.86 4 0.3023 0.057 0.002 -0.069 -0.065 0.116 -0.100
12 8.89 10 0.5422 -0.041 0.083 0.013 0.084 -0.112 -0.000
18 11.22 16 0.7956 -0.092 0.028 0.048 -0.035 -0.013 -0.053
24 12.58 22 0.9443 -0.036 0.010 -0.020 -0.014 0.069 0.043
11
Obs log_sales FORECAST STD L95 U95 RESIDUAL y
143 . 2201.84 0.11679 1739.45 2749.39 . .
144 . 2327.78 0.11941 1828.94 2920.72 . .
145 . 1123.58 0.12198 878.10 1416.46 . .
146 . 1566.94 0.12449 1218.19 1984.52 . .
147 . 1788.68 0.12696 1383.45 2275.62 . .
148 . 1867.70 0.12938 1437.30 2386.70 . .
149 . 2237.15 0.13175 1713.10 2871.25 . .
150 . 2235.17 0.13408 1703.24 2880.95 . .
151 . 2885.36 0.13637 2188.17 3734.57 . .
152 . 2721.93 0.13862 2054.50 3537.54 . .

More Related Content

Viewers also liked

(Forms of presentation of data)- Economics
(Forms of presentation of data)- Economics(Forms of presentation of data)- Economics
(Forms of presentation of data)- EconomicsMonika Makhija
 
Sehat Islami dan Alami Rasulullah SAW Kelompok 5
Sehat Islami dan Alami Rasulullah SAW Kelompok 5Sehat Islami dan Alami Rasulullah SAW Kelompok 5
Sehat Islami dan Alami Rasulullah SAW Kelompok 5Diah Ayu Wulan Sari
 
Redox reactions
Redox reactionsRedox reactions
Redox reactionsRaj Sharma
 

Viewers also liked (6)

(Forms of presentation of data)- Economics
(Forms of presentation of data)- Economics(Forms of presentation of data)- Economics
(Forms of presentation of data)- Economics
 
Sehat Islami dan Alami Rasulullah SAW Kelompok 5
Sehat Islami dan Alami Rasulullah SAW Kelompok 5Sehat Islami dan Alami Rasulullah SAW Kelompok 5
Sehat Islami dan Alami Rasulullah SAW Kelompok 5
 
Redox reactions
Redox reactionsRedox reactions
Redox reactions
 
Theory of supply
Theory of supplyTheory of supply
Theory of supply
 
Types of Companies
Types of CompaniesTypes of Companies
Types of Companies
 
Orbar&reformasi
Orbar&reformasiOrbar&reformasi
Orbar&reformasi
 

Similar to Writing Sample 2

Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...J. García - Verdugo
 
Forecasting%20Economic%20Series%20using%20ARMA
Forecasting%20Economic%20Series%20using%20ARMAForecasting%20Economic%20Series%20using%20ARMA
Forecasting%20Economic%20Series%20using%20ARMANagendra Belvadi
 
Time Series Project
Time Series ProjectTime Series Project
Time Series ProjectAsmar Farooq
 
Detecting Analytical Bias - Isaaks
Detecting Analytical Bias - IsaaksDetecting Analytical Bias - Isaaks
Detecting Analytical Bias - IsaaksEd Isaaks
 
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....Wendy Berg
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Simplilearn
 
Economic Forecasting Final Memo
Economic Forecasting Final MemoEconomic Forecasting Final Memo
Economic Forecasting Final MemoHannah Badgley
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-archjeevan solaskar
 
Proposed seasonal autoregressive integrated moving average model for forecast...
Proposed seasonal autoregressive integrated moving average model for forecast...Proposed seasonal autoregressive integrated moving average model for forecast...
Proposed seasonal autoregressive integrated moving average model for forecast...Alexander Decker
 
ANACHEM LAB REPORT PRESENTATION G5.pptx
ANACHEM LAB REPORT PRESENTATION G5.pptxANACHEM LAB REPORT PRESENTATION G5.pptx
ANACHEM LAB REPORT PRESENTATION G5.pptxbaeaeyah
 
Asset Price Prediction with Machine Learning
Asset Price Prediction with Machine LearningAsset Price Prediction with Machine Learning
Asset Price Prediction with Machine LearningTaweh Beysolow II
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingMaruthi Nataraj K
 
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docxblondellchancy
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...RAHUL WAGAJ
 
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?Smarten Augmented Analytics
 

Similar to Writing Sample 2 (20)

Occidental petroleum corp.
Occidental petroleum corp.Occidental petroleum corp.
Occidental petroleum corp.
 
Occidental petroleum corp.
Occidental petroleum corp.Occidental petroleum corp.
Occidental petroleum corp.
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Autocorrelation and...
 
Forecasting%20Economic%20Series%20using%20ARMA
Forecasting%20Economic%20Series%20using%20ARMAForecasting%20Economic%20Series%20using%20ARMA
Forecasting%20Economic%20Series%20using%20ARMA
 
Time Series Project
Time Series ProjectTime Series Project
Time Series Project
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
 
Detecting Analytical Bias - Isaaks
Detecting Analytical Bias - IsaaksDetecting Analytical Bias - Isaaks
Detecting Analytical Bias - Isaaks
 
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....
 
Wine.Final.Project.MJv3
Wine.Final.Project.MJv3Wine.Final.Project.MJv3
Wine.Final.Project.MJv3
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Economic Forecasting Final Memo
Economic Forecasting Final MemoEconomic Forecasting Final Memo
Economic Forecasting Final Memo
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-arch
 
Proposed seasonal autoregressive integrated moving average model for forecast...
Proposed seasonal autoregressive integrated moving average model for forecast...Proposed seasonal autoregressive integrated moving average model for forecast...
Proposed seasonal autoregressive integrated moving average model for forecast...
 
ANACHEM LAB REPORT PRESENTATION G5.pptx
ANACHEM LAB REPORT PRESENTATION G5.pptxANACHEM LAB REPORT PRESENTATION G5.pptx
ANACHEM LAB REPORT PRESENTATION G5.pptx
 
Asset Price Prediction with Machine Learning
Asset Price Prediction with Machine LearningAsset Price Prediction with Machine Learning
Asset Price Prediction with Machine Learning
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
 
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
 

Writing Sample 2

  • 1. Australian Red Wine Time Series Data Analysis Julia L. Nickle Predict 411 – Section 57 11/28/2012
  • 2. 2 Introduction As discussed in Assignment 7, the modeling of time series data requires the data to be stationary, assuming constant mean and variance. If a data series is non-stationary, an ARIMA model cannot be properly fit, resulting in inaccurate and inefficient forecasts. The purpose of this assignment is to explore the Australian red wine data set, which covers monthly sales over 11 years, and efficiently generate forecasts for ten periods ahead. Initial analysis reveals the data set suffers from both non-stationarity and seasonality. In order for the non-stationary series to become stationary, the data must be transformed to ensure the mean, variance and autocorrelation structure must not change over time (e-Handbook of Statistical Methods, 2012). Time series data that has been modified to account for this will show no trends and constant variance over time. When a series suffers from seasonality, there are periodic fluctuations; or reoccurring peaks and declines in the data. This can often be corrected through differencing, meaning taking the difference of the change from one period to the next (Nau, 2005). Because differencing can result in lost observations, time series analysis calls for large sample sizes. Another viable remedy is creating a multiplicative model that accounts for seasonality. Once non-stationarity and seasonality issues are remedied, the series can be identified, checked, and forecasted. This analysis of the Australian red wine data set covers its transformation to a stationary series, accounting for seasonality, identification methods through autocorrelation function and partial autocorrelation function plots, as well as fitting the data to various models. Finally, this analysis includes forecasts of red wine sales as suggested by an optimal model.
  • 3. 3 Data The Australian red wine data set consists of 142 observations. The data was collected from January 1980 through October 1991 and includes the monthly sales, in kiloliters, of red wine. Simple descriptive statistics show the mean of the series as 1,477.77 with a standard deviation of 531.03. A time series plot illustrates an upward trend and seasonal pattern within the data. Sales in kiloliters appear to peak in July and drop during January. This visualization indicates the data is affected by both non-stationarity and seasonality. As constant variance over time cannot be assumed, the data needs to be transformed in order to utilize the Box-Jenkins method for time series analysis. Transforming the data into a stationary series may be as simple as taking the natural log of sales in kiloliters. The time series plot of this transformation shows that the variability steadied somewhat, but further transformation is necessary. Taking the first difference at lag 1 eliminates the upward trend, achieving stationarity; the change can be seen in the time series plot after differencing. However, the issue of seasonality still remains. Additional differencing at lag 12 accounts for the issue, resulting in a time series plot that shows the Australian red wine data set as stationary and unaffected by seasonality. The series is ready to move forward with the next steps in the identification portion of the Box-Jenkins method. Analysis The best practice to identify an appropriate model for the Australian red wine data set begins with an evaluation of the series’ autocorrelation and partial autocorrelation function plots in addition to an assessment of the autocorrelation check for white noise. Analysis begins with the original, non-stationary data set. Proc ARIMA output illustrates the results of white noise
  • 4. 4 test; the autocorrelations up to lag 36 are highly significant with p-values <.0001. The null hypothesis, which states that none of the autocorrelations up to a given lag are significantly different than zero, can be rejected, meaning that an ARIMA model is in fact necessary for the data to be forecasted accurately. However, with the Australian red wine data set in its original state, affected by non-stationarity and seasonality, forecasts will be unreliable and inaccurate. Results for the data set using the natural log of sales in kiloliters are similar; while the variability is less erratic, the series is still too unstable for effective modeling. The autocorrelation function plot clearly demonstrates the effect of non-stationarity, featuring slow decays and increases. Modeling the series after taking the first difference proves to be a step in the right direction. As previously stated, the time series plot illustrates constant variance, yet, there are still significant peaks within the set. Moving to model the log of sales in kiloliters after both the first differencing and the difference at lag 12 provides a better solution. The autocorrelation check for white noise is consistent; the p-values are highly significant at <.0001 through lag 36. The ACF plot shows a sharp drop after lag one; the drops continue until the series dips below zero. The PACF plot shows a slightly more stable, exponential decay over the lags. Overall, however, the plots are similar, which suggests that a mixed model might be the best way to represent the series. To determine whether or not this is accurate, the series should be fit to both AR and MA models, and comparing results to a mixed, ARMA model. Proc ARIMA output illustrates how well the Australian red wine data set fits to an MA model of order 1. The moving average term of the model is significant with a p-value of .0003. However, the autocorrelation check of residuals shows a highly significant p-value of <.0001 at lag 12. Therefore, the MA(1) model is not sufficient to represent the Australian red wine series.
  • 5. 5 Moreover, if the data is fitted to an MA(12) model, results show that none of the MA terms are significant. Taken as a whole, it appears that an MA model alone is not suitable for the data set. If the series is fitted to an AR(1) model, results show the AR term as highly significant, with a p-value <.0001. But, as with the MA(1) model, the autocorrelation check of residuals shows a highly significant p-value of <.0001 at lag 12. Not all of the AR parameters are significant to the model, however. Parameters 1, 5, 8, 11, and 12 are significant to the model while the rest of the parameters are unnecessary and do not need to be included. Perhaps because the data illustrates trend and seasonal components, the series should be represented by a multiplicative model with differencing at lags 1 and 12. PROC ARIMA shows that a multiplicative AR(1,12) model proves to be a decent fit, but is not ideal. Both AR terms are highly significant with p-values < .0001. Fit statistics show an AIC of -147.262 and an SBC of -138.683 with a standard error of 0.135171. Yet, the autocorrelation check of residuals shows significant p-values to lag 24. The multiplicative MA(1,12) model, according to fit statistics, provides a better fit. Not only are both MA terms are highly significant, but the AIC and SBC values are smaller; -184.973 and -176.394, respectively. Moreover, the standard error estimate is smaller, at .011679. Here, the autocorrelation check of residuals test cannot be rejected; none of the p-values are statistically significant. This indicates that the model provides an adequate fit to the data. None of the other models thus far confirmed this through each lag. Additionally, according to the Q-Q plot, the residuals appear normally distributed. As a precaution, the data should also be modeled by a multiplicative ARMA(1,12), because the ACF and PACF indicated an ARMA might best represent the series. However, PROC ARIMA output shows that of the 5 parameters, only the MA terms are significant. As
  • 6. 6 such, the multiplicative MA(1,12) model appears to provide the optimal fit to the Australian red wine data set. Proc ARIMA initially forecasts results as log values of sales so final values are transformed to exponentiate the forecast values. Using the MA(1,12) model, forecasts of sales in kiloliters for ten periods ahead show values ranging from 1,123.58 to 2,885.36. The average forecast is 2,095.61 with a standard deviation of 166.37. Lower and upper confidence limits range from 878.10 to 2,188.17 and 1,416.46 to 3,734.57, respectively. The average standard error for the ten forecasts is .127983. Visually, the forecasted values decrease between observation 144 and 146, but begin to increase steadily between observation 148 and 151 and then taper off between before observation 152. Summary/Conclusions: Exploratory analysis of the Australian red wine data set using time series plots reveal the set’s non-stationary nature and sensitivity to seasonality. In order to appropriate apply the Box- Jenkins method for time series data analysis, the set requires transformation. Only when the data is stationary and seasonality is accounted for, can the series be identified correctly. During this critical step in the Box-Jenkins process, ACF and PACF plots illustrate the data set’s best possible representation through a mixed, ARMA model. Evaluation of AR(1), AR(12), MA(1), (MA12) models reiterated the data set’s need to be fit to a multiplicative or a multiplicative mixed model. A multiplicative model is more useful in this scenario as the Australian red wine data set suffers from seasonality. The model must take into account higher and lower value proportions, rather than assume their difference is constant (Box, Jenkins, & Reinsel, 2008). In other words, a
  • 7. 7 multiplicative model assumes seasonal effects act proportionally on the data series. Of the three multiplicative models, the MA(1,12) model performed best and proved to be sufficient for the data series. Additionally, values forecasted from the MA(1,12) model have minimal standard errors, maintain 95% confidence and appear logical and consistent with the rest of the series. Future Work If more time were available, it may be beneficial to consider the other orders suggested by the smallest canonical correlation method (SCAN) and the extended sample autocorrelation function (ESACF). SCAN and ESACF methods provide valuable suggestions from which to uncover the order of a time series model (SAS, 2010). Each method proposed the ARMA(3,3) model as optimal, but it might be interesting to compare results to other recommended mixed models: ARMA(5,3), ARMA(1,5), and ARMA(2,5). Fit statistics of these models including AIC, SBC, and standard error estimates might show a better fit, resulting in different and potentially more accurate forecasts. Additionally, because these other models appear sufficient to represent the data, it might be worthwhile to complete forecasted values. Afterwards, the forecasts could be compared to the MA(1,12) values and evaluated for accuracy. References SAS. (2010, April). Retrieved November 25, 2012, from SAS/STAT(R) 9.2 User's Guide, Second Edition: http://support.sas.com/ e-Handbook of Statistical Methods. (2012). Retrieved November 24, 2012, from NIST/SEMATECH: http://www.itl.nist.gov/div898/handbook/ Box, G. E., Jenkins, G. M., & Reinsel, G. C. (2008). Time Series Analysis, Forecasting and Control. Hoboken: John Wiley & Sons, Inc.
  • 8. 8 Nau, R. (2005, May 15). Statistical Forecasting. Retrieved November 24, 2012, from Stationarity and differencing: http://people.duke.edu/~rnau/411diff.htm Appendix Time Series Plot of the Australian Red Wine Dataset Time Series Plot of the Australian Red Wine Dataset w/ Natural Log,
  • 9. 9 Name of Variable = log_sales Period(s) of Differencing 1 Mean of Working Series 0.010527 Standard Deviation 0.271498 Number of Observations 141 Observation(s) eliminated by differencing 1 Autocorrelation Check for White Noise To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 31.48 6 <.0001 -0.240 -0.100 0.065 0.034 -0.023 -0.374 12 122.91 12 <.0001 -0.062 0.048 0.042 -0.089 -0.172 0.735 18 146.28 18 <.0001 -0.136 -0.088 0.075 0.063 -0.066 -0.322 24 218.84 24 <.0001 -0.055 0.027 0.029 -0.095 -0.125 0.626 30 239.26 30 <.0001 -0.099 -0.090 0.056 0.099 -0.092 -0.272 36 308.42 36 <.0001 -0.058 0.022 0.046 -0.111 -0.109 0.575
  • 10. 10 Final Multiplicative MA(1,12) Model with Forecast Conditional Least Squares Estimation Parameter Estimate Standard Error t Value Approx Pr > |t| Lag MU -0.0005585 0.0007340 -0.76 0.4481 0 MA1,1 0.78686 0.05565 14.14 <.0001 1 MA2,1 0.75201 0.06917 10.87 <.0001 12 Constant Estimate -0.00056 Variance Estimate 0.01364 Std Error Estimate 0.11679 AIC -184.973 SBC -176.394 Number of Residuals 129 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 4.86 4 0.3023 0.057 0.002 -0.069 -0.065 0.116 -0.100 12 8.89 10 0.5422 -0.041 0.083 0.013 0.084 -0.112 -0.000 18 11.22 16 0.7956 -0.092 0.028 0.048 -0.035 -0.013 -0.053 24 12.58 22 0.9443 -0.036 0.010 -0.020 -0.014 0.069 0.043
  • 11. 11 Obs log_sales FORECAST STD L95 U95 RESIDUAL y 143 . 2201.84 0.11679 1739.45 2749.39 . . 144 . 2327.78 0.11941 1828.94 2920.72 . . 145 . 1123.58 0.12198 878.10 1416.46 . . 146 . 1566.94 0.12449 1218.19 1984.52 . . 147 . 1788.68 0.12696 1383.45 2275.62 . . 148 . 1867.70 0.12938 1437.30 2386.70 . . 149 . 2237.15 0.13175 1713.10 2871.25 . . 150 . 2235.17 0.13408 1703.24 2880.95 . . 151 . 2885.36 0.13637 2188.17 3734.57 . . 152 . 2721.93 0.13862 2054.50 3537.54 . .