Writing Sample 1

Steel Exports
Time Series Data Analysis
Julia L. Nickle
Predict 411 – Section 57
11/19/2012

2
Introduction
The analysis of time series data, like linear regression, involves a methodical process.
Time series data, however, is a bit more complicated. The process entails four major steps,
model identification, model estimation, diagnostic checking, and forecasting. This can also be
considered a three step process; identification, estimation and checking, forecasting.
Identification, possibly the most difficult step to follow, is what separates time series analysis
from other regression analysis methods. Overall, however, the objective of time series analysis is
to substantiate the relationship between the present value of the time series and the past values in
order to predict future values more accurately (Wang, 2008). The purpose of this assignment is
to evaluate the steel exports data set with respect to univariate time series techniques and to
closely follow the Box-Jenkins method of time series modeling.
In order for a time series to be modeled, the data must be stationary; assuming a constant
mean and variance. Most data sets can be transformed from a non-stationary series to stationary
in order to be modeled using the Box-Jenkins method (Wang, 2008). The need to account for a
non-stationary series can be detected in the identification stage. Seasonality, or periodic
fluctuations, must be assessed as this time as well.
Autoregressive-Integrated-Moving Average (ARIMA) encompasses three different
models used in time series analysis depending on the data set’s characteristics. The first, an
autoregressive model (AR), allows for independent variables as time-lagged values of the
dependent variable (Wang, 2008). For example, a lag 1 autoregressive term is xt-1 multiplied by a
coefficient. The model’s order of autocorrelation depends on the amount of autoregressive terms.
Moving average terms, by contrast, represent past errors multiplied by a coefficient (Moving

3
Average Models, 2012). The third model under the ARMIA umbrella is an ARMA model which
includes autoregressive terms and moving average terms, and is thus a mixed model.
The autocorrelation function (ACF) and the partial autocorrelation function (PACF) aid
in identifying the terms in a model and subsequently dictate how the data should be modeled, to
correctly represent the data. George C.S. Wang’s article in the Journal of Business Forecasting
offers a concise explanation for each function: ACF values are calculated from the time series at
various lags in order to measure the significance (if any) of correlations between past and present
values. PACF values represent coefficients of a linear regression of the time series using lagged
values as independent variables. In other words, the ACF illustrates how the correlation between
any two values in a time series changes as their separation changes and the PACF also accounts
for the intervals in between.
Data
The steel exports data set consists of 44 observations. The variable I_S_Weights
represents iron and steel exports in million tons during the period from 1937 to 1980. Looking at
a time series plot of the data, it is easy to see that the set is stationary. There is no obvious trend
in the data, and it can be assumed that there is constant variance over time. The scatterplot with
the trend line reiterates the data set’s stationarity, showing an irregular line that neither increases
nor decreases consistently. The data set also does not appear to be affected by seasonality. There
are no visible periodic patterns of highs and lows illustrated in the plot. While the variable
I_S_Weights fluctuates over the years, the variation appears random in nature.

4
Simple statistics assist in better understanding the I_S_Weights values over the 44 year
period. The average I_S_Weights is 4.41818 with a standard deviation of 1.75358. The
maximum I_S_Weights value is 8.72 while the minimum is 2.13.
Analysis
The identification stage of time series analysis not only seeks to understand a data set’s
stationarity and seasonality, but also aims to choose a model that best represents its terms. In
order to identify an appropriate model for the steel exports data set, it is necessary to examine
autocorrelation and partial autocorrelation function results. Proc ARIMA with the IDENTIFY
statement produces plots of each function in addition to testing for white noise. The white noise
test evaluates the hypothesis that none of the autocorrelations up to a given lag are significantly
different from 0. If this hypothesis is found to be true for all lags within a series, then there is no
information in the series to model, meaning no ARIMA model is needed (SAS, 2010). In this
situation, the chi-square statistic 12.15 has a p-value of .0586. Because this value is over .05, the
null hypothesis cannot be rejected, meaning there is not enough evidence to determine that the
autocorrelations are significantly different from 0.
Yet, looking at the ACF, PACF, and inverse autocorrelation plots (IACF), it appears the
first autocorrelation, .47193, is outside the standard error bands. This suggests that the data set
does in fact need to be fitted to an ARIMA model (Brocklebank & Dickey, 2003). Because both
the ACF and PACF plots do not exponentially decrease, a mixed process does not appear be a
suitable model. However, as the ACF plot displays sharp cut after lag 1 and negative values
beyond lag 7, it appears a moving average model might suffice. The PACF plot, on the other
hand, depicts the need for an autoregressive model with a sharp cutoff after lag 1. A comparison

5
of error means can assist in determining which of the two models more accurately represents the
data.
PROC ARIMA output illustrates how well each model fits the steel exports data set
beginning with an autoregressive model of order 1. Parameter estimates show the mean term and
its estimated value, 4.41217 as well as the coefficient of the lagged value of the change in iron
and steel exports in million tons and its estimate, .47368. Both values are statistically significant
with p-values <.05. As such, both terms are necessary to the model. The standard error for the
mean term (MU) is .65984 while the standard error for the autoregressive parameter is .05485.
The output also includes goodness of fit statistics for the AR(1) model, Constant and Variance
estimates of roughly 2.32 and 2.44, a standard error estimate of 1.56, and an AIC and SBC of
166.15 and 169.72, respectively. Overall, smaller AIC and SBC values illustrate a better fitting
model and can be used for comparison purposes (SAS, 2010). The white noise test for the AR(1)
model shows an insignificant p-value of .8224 for the first 6 lags. This means the no-
autocorrelation hypothesis cannot be rejected, indicating the residuals are in fact white noise.
Thus, an AR(1) model appears to be an adequate model for the steel export series.
As for the moving average process, PROC ARIMA output shows the model is estimated
using the equation, . Both terms are significant to the model,
with p-values <.05. The p-value for the moving average term is more significant than the p-value
for the autoregressive term in the AR(1) model, at .0006. The model appears to fit well, as the
standard error estimate is smaller than the AR(1) model’s at 1.55. Additionally, AIC and SBC
values decreased to 165.57 and 169.14. Again, the white noise test verifies the residuals are
white noise, and the no-autocorrelation hypothesis cannot be rejected. Review of the Q-Q Plot of
Residuals validates the assumption that the residuals are normally distributed. As the plot of the

6
AR(1) model’s residuals does, the plot of the MA(1) model’s residuals does not show a
structured deviation from the line. Relying on fit statistics, it appears the MA(1) model the
preferred model for which to represent the steel exports data set.
When the model is estimated with both autoregressive and moving average terms, PROC
ARIMA output reiterates the data set’s incompatibility with a mixed model. The mean term is
the only term with a significant p-value <.0001. The moving average term is insignificant with a
p-value of .2646 and the autoregressive term is insignificant with a p-value of .4411, meaning the
terms are unnecessary to the model. While the standard error estimate is comparable to the other
models, at 1.56, the AIC and SBC values are larger, 166.94 and 172.29, respectively. These
values indicate the previous models are of better fit to the data set.
Summary/Conclusions:
Model identification is a crucial aspect of time series analysis. Mistakes made during the
identification stage can lead to incorrect estimations and inaccurate forecasts. Time taken to
precisely identify a model allows for more efficient and effective estimation and diagnostic
checking as well as forecasting stages for time series data. If ACF and PACF plots reveal that a
data set, such as the steel exports data set, might benefit from being fit to either an AR model or
an MA model, then fit statistics comparison is needed.
For the steel exports data set, the estimation and diagnostic checking stage validates the
identification stage. Assessment of the steel exports data suggest that a MA model provides a
better fit over and AR model. Not only are standard error, AIC and SBC values smaller with the
MA(1) model, but the MA term is highly significant. Additionally, the parameter estimates are
uncorrelated, indicating the model does not suffer from any collinearity issues. As identification

7
and diagnostics stages are consistent and complete, the steel exports series can now be forecasted
using an MA model.
Future Work
The steel exports data set might benefit from further evaluation using an AR(2) or an
MA(2) model. Adding a second term could provide a better fit and more accurate estimation of
the time series. It would be worth fitting the models and comparing fit statistics, but most
importantly evaluating the second terms significance. If the second term is found to be
insignificant, then the model is overfit and should be limited to either an AR(1) or an MA(1)
process.
Model’s fit with the differencing option in PROC ARIMA might also reveal a better fit
for the steel exports data set. If the data set was found to be nonstationary during the
identification stage, then transformation to a stationary series is needed. If more time were
available, it would be useful to validate the data set’s stationarity by comparing a differenced
model to the previously fitted AR(1) and AR(2) models.
References
SAS. (2010, April). Retrieved November 17, 2012, from SAS/STAT(R) 9.2 User's Guide, Second
Edition:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_intr
oreg_sect003.htm
Moving Average Models. (2012). Retrieved November 17, 2012, from The Pennsylvania State
University, Applied Time Series Analysis:
https://onlinecourses.science.psu.edu/stat510/?q=node/48
Brocklebank, J. C., & Dickey, D. A. (2003). SAS for Forecasting Time Series. SAS Institute; 2nd
edition.

8
Wang, G. C. (2008). A Guide to Box-Jenkins Modeling. The Journal of Business Forecasting.
Appendix

9
Name of Variable = isweights
Mean of Working Series 4.418182
Standard Deviation 1.73354
Number of Observations 44
Autocorrelation Check for White Noise
To
Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 12.15 6 0.0586 0.472 0.104 0.045 0.103 0.099 0.008
PROC ARIMA, No Differencing Applied, Estimate P=1

10
Conditional Least Squares Estimation
Parameter Estimate
Standard
Error t Value
Approx
Pr > |t| Lag
MU 4.41217 0.43509 10.14 <.0001 0
AR1,1 0.47368 0.13622 3.48 0.0012 1
Constant Estimate 2.322229
Variance Estimate 2.444518
Std Error Estimate 1.563495
AIC 166.149
SBC 169.7174
Number of Residuals 44
Autocorrelation Check of Residuals
To
6 2.19 5 0.8224 0.074 -0.151 -0.057 0.072 0.086 -0.020
12 4.32 11 0.9597 -0.020 -0.072 -0.018 -0.006 -0.165 0.046
18 7.29 17 0.9794 0.096 0.013 0.007 -0.061 0.130 -0.102
24 12.95 23 0.9530 -0.216 -0.094 -0.081 -0.039 0.042 -0.050
PROC ARIMA, No Differencing Applied, Estimate Q=1
Parameter Estimate
Standard
Error t Value
Approx
Pr > |t| Lag
MU 4.42102 0.34703 12.74 <.0001 0
MA1,1 -0.49827 0.13512 -3.69 0.0006 1
AIC 165.5704
SBC 169.1388

11
To
6 1.31 5 0.9336 0.059 0.094 -0.028 0.085 0.075 -0.020
12 3.23 11 0.9873 -0.006 -0.079 -0.052 -0.013 -0.146 0.039
18 6.68 17 0.9874 0.063 -0.001 0.044 -0.092 0.096 -0.149
24 14.00 23 0.9268 -0.206 -0.135 -0.114 -0.084 0.014 -0.072
PROC ARIMA, No Differencing Applied, Estimate P=1, Q=1
Parameter Estimate
Standard
Error t Value
Approx
Pr > |t| Lag
MU 4.42597 0.39769 11.13 <.0001 0
MA1,1 -0.32579 0.28804 -1.13 0.2646 1
AR1,1 0.23004 0.29571 0.78 0.4411 1
AIC 166.9369
SBC 172.2894
To
6 0.65 4 0.9577 0.002 -0.006 -0.023 0.069 0.080 -0.030
12 2.75 10 0.9867 0.000 -0.075 -0.034 0.004 -0.159 0.053
18 6.15 16 0.9864 0.070 -0.003 0.034 -0.093 0.124 -0.125
24 11.84 22 0.9606 -0.202 -0.094 -0.095 -0.065 0.029 -0.061

Writing Sample 1

Recommended

Recommended

More Related Content

Similar to Writing Sample 1

Similar to Writing Sample 1 (20)

Writing Sample 1