SlideShare a Scribd company logo
1 of 11
Download to read offline
Steel Exports
Time Series Data Analysis
Julia L. Nickle
Predict 411 – Section 57
11/19/2012
2
Introduction
The analysis of time series data, like linear regression, involves a methodical process.
Time series data, however, is a bit more complicated. The process entails four major steps,
model identification, model estimation, diagnostic checking, and forecasting. This can also be
considered a three step process; identification, estimation and checking, forecasting.
Identification, possibly the most difficult step to follow, is what separates time series analysis
from other regression analysis methods. Overall, however, the objective of time series analysis is
to substantiate the relationship between the present value of the time series and the past values in
order to predict future values more accurately (Wang, 2008). The purpose of this assignment is
to evaluate the steel exports data set with respect to univariate time series techniques and to
closely follow the Box-Jenkins method of time series modeling.
In order for a time series to be modeled, the data must be stationary; assuming a constant
mean and variance. Most data sets can be transformed from a non-stationary series to stationary
in order to be modeled using the Box-Jenkins method (Wang, 2008). The need to account for a
non-stationary series can be detected in the identification stage. Seasonality, or periodic
fluctuations, must be assessed as this time as well.
Autoregressive-Integrated-Moving Average (ARIMA) encompasses three different
models used in time series analysis depending on the data set’s characteristics. The first, an
autoregressive model (AR), allows for independent variables as time-lagged values of the
dependent variable (Wang, 2008). For example, a lag 1 autoregressive term is xt-1 multiplied by a
coefficient. The model’s order of autocorrelation depends on the amount of autoregressive terms.
Moving average terms, by contrast, represent past errors multiplied by a coefficient (Moving
3
Average Models, 2012). The third model under the ARMIA umbrella is an ARMA model which
includes autoregressive terms and moving average terms, and is thus a mixed model.
The autocorrelation function (ACF) and the partial autocorrelation function (PACF) aid
in identifying the terms in a model and subsequently dictate how the data should be modeled, to
correctly represent the data. George C.S. Wang’s article in the Journal of Business Forecasting
offers a concise explanation for each function: ACF values are calculated from the time series at
various lags in order to measure the significance (if any) of correlations between past and present
values. PACF values represent coefficients of a linear regression of the time series using lagged
values as independent variables. In other words, the ACF illustrates how the correlation between
any two values in a time series changes as their separation changes and the PACF also accounts
for the intervals in between.
Data
The steel exports data set consists of 44 observations. The variable I_S_Weights
represents iron and steel exports in million tons during the period from 1937 to 1980. Looking at
a time series plot of the data, it is easy to see that the set is stationary. There is no obvious trend
in the data, and it can be assumed that there is constant variance over time. The scatterplot with
the trend line reiterates the data set’s stationarity, showing an irregular line that neither increases
nor decreases consistently. The data set also does not appear to be affected by seasonality. There
are no visible periodic patterns of highs and lows illustrated in the plot. While the variable
I_S_Weights fluctuates over the years, the variation appears random in nature.
4
Simple statistics assist in better understanding the I_S_Weights values over the 44 year
period. The average I_S_Weights is 4.41818 with a standard deviation of 1.75358. The
maximum I_S_Weights value is 8.72 while the minimum is 2.13.
Analysis
The identification stage of time series analysis not only seeks to understand a data set’s
stationarity and seasonality, but also aims to choose a model that best represents its terms. In
order to identify an appropriate model for the steel exports data set, it is necessary to examine
autocorrelation and partial autocorrelation function results. Proc ARIMA with the IDENTIFY
statement produces plots of each function in addition to testing for white noise. The white noise
test evaluates the hypothesis that none of the autocorrelations up to a given lag are significantly
different from 0. If this hypothesis is found to be true for all lags within a series, then there is no
information in the series to model, meaning no ARIMA model is needed (SAS, 2010). In this
situation, the chi-square statistic 12.15 has a p-value of .0586. Because this value is over .05, the
null hypothesis cannot be rejected, meaning there is not enough evidence to determine that the
autocorrelations are significantly different from 0.
Yet, looking at the ACF, PACF, and inverse autocorrelation plots (IACF), it appears the
first autocorrelation, .47193, is outside the standard error bands. This suggests that the data set
does in fact need to be fitted to an ARIMA model (Brocklebank & Dickey, 2003). Because both
the ACF and PACF plots do not exponentially decrease, a mixed process does not appear be a
suitable model. However, as the ACF plot displays sharp cut after lag 1 and negative values
beyond lag 7, it appears a moving average model might suffice. The PACF plot, on the other
hand, depicts the need for an autoregressive model with a sharp cutoff after lag 1. A comparison
5
of error means can assist in determining which of the two models more accurately represents the
data.
PROC ARIMA output illustrates how well each model fits the steel exports data set
beginning with an autoregressive model of order 1. Parameter estimates show the mean term and
its estimated value, 4.41217 as well as the coefficient of the lagged value of the change in iron
and steel exports in million tons and its estimate, .47368. Both values are statistically significant
with p-values <.05. As such, both terms are necessary to the model. The standard error for the
mean term (MU) is .65984 while the standard error for the autoregressive parameter is .05485.
The output also includes goodness of fit statistics for the AR(1) model, Constant and Variance
estimates of roughly 2.32 and 2.44, a standard error estimate of 1.56, and an AIC and SBC of
166.15 and 169.72, respectively. Overall, smaller AIC and SBC values illustrate a better fitting
model and can be used for comparison purposes (SAS, 2010). The white noise test for the AR(1)
model shows an insignificant p-value of .8224 for the first 6 lags. This means the no-
autocorrelation hypothesis cannot be rejected, indicating the residuals are in fact white noise.
Thus, an AR(1) model appears to be an adequate model for the steel export series.
As for the moving average process, PROC ARIMA output shows the model is estimated
using the equation, . Both terms are significant to the model,
with p-values <.05. The p-value for the moving average term is more significant than the p-value
for the autoregressive term in the AR(1) model, at .0006. The model appears to fit well, as the
standard error estimate is smaller than the AR(1) model’s at 1.55. Additionally, AIC and SBC
values decreased to 165.57 and 169.14. Again, the white noise test verifies the residuals are
white noise, and the no-autocorrelation hypothesis cannot be rejected. Review of the Q-Q Plot of
Residuals validates the assumption that the residuals are normally distributed. As the plot of the
6
AR(1) model’s residuals does, the plot of the MA(1) model’s residuals does not show a
structured deviation from the line. Relying on fit statistics, it appears the MA(1) model the
preferred model for which to represent the steel exports data set.
When the model is estimated with both autoregressive and moving average terms, PROC
ARIMA output reiterates the data set’s incompatibility with a mixed model. The mean term is
the only term with a significant p-value <.0001. The moving average term is insignificant with a
p-value of .2646 and the autoregressive term is insignificant with a p-value of .4411, meaning the
terms are unnecessary to the model. While the standard error estimate is comparable to the other
models, at 1.56, the AIC and SBC values are larger, 166.94 and 172.29, respectively. These
values indicate the previous models are of better fit to the data set.
Summary/Conclusions:
Model identification is a crucial aspect of time series analysis. Mistakes made during the
identification stage can lead to incorrect estimations and inaccurate forecasts. Time taken to
precisely identify a model allows for more efficient and effective estimation and diagnostic
checking as well as forecasting stages for time series data. If ACF and PACF plots reveal that a
data set, such as the steel exports data set, might benefit from being fit to either an AR model or
an MA model, then fit statistics comparison is needed.
For the steel exports data set, the estimation and diagnostic checking stage validates the
identification stage. Assessment of the steel exports data suggest that a MA model provides a
better fit over and AR model. Not only are standard error, AIC and SBC values smaller with the
MA(1) model, but the MA term is highly significant. Additionally, the parameter estimates are
uncorrelated, indicating the model does not suffer from any collinearity issues. As identification
7
and diagnostics stages are consistent and complete, the steel exports series can now be forecasted
using an MA model.
Future Work
The steel exports data set might benefit from further evaluation using an AR(2) or an
MA(2) model. Adding a second term could provide a better fit and more accurate estimation of
the time series. It would be worth fitting the models and comparing fit statistics, but most
importantly evaluating the second terms significance. If the second term is found to be
insignificant, then the model is overfit and should be limited to either an AR(1) or an MA(1)
process.
Model’s fit with the differencing option in PROC ARIMA might also reveal a better fit
for the steel exports data set. If the data set was found to be nonstationary during the
identification stage, then transformation to a stationary series is needed. If more time were
available, it would be useful to validate the data set’s stationarity by comparing a differenced
model to the previously fitted AR(1) and AR(2) models.
References
SAS. (2010, April). Retrieved November 17, 2012, from SAS/STAT(R) 9.2 User's Guide, Second
Edition:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_intr
oreg_sect003.htm
Moving Average Models. (2012). Retrieved November 17, 2012, from The Pennsylvania State
University, Applied Time Series Analysis:
https://onlinecourses.science.psu.edu/stat510/?q=node/48
Brocklebank, J. C., & Dickey, D. A. (2003). SAS for Forecasting Time Series. SAS Institute; 2nd
edition.
8
Wang, G. C. (2008). A Guide to Box-Jenkins Modeling. The Journal of Business Forecasting.
Appendix
9
Name of Variable = isweights
Mean of Working Series 4.418182
Standard Deviation 1.73354
Number of Observations 44
Autocorrelation Check for White Noise
To
Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 12.15 6 0.0586 0.472 0.104 0.045 0.103 0.099 0.008
PROC ARIMA, No Differencing Applied, Estimate P=1
10
Conditional Least Squares Estimation
Parameter Estimate
Standard
Error t Value
Approx
Pr > |t| Lag
MU 4.41217 0.43509 10.14 <.0001 0
AR1,1 0.47368 0.13622 3.48 0.0012 1
Constant Estimate 2.322229
Variance Estimate 2.444518
Std Error Estimate 1.563495
AIC 166.149
SBC 169.7174
Number of Residuals 44
Autocorrelation Check of Residuals
To
Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 2.19 5 0.8224 0.074 -0.151 -0.057 0.072 0.086 -0.020
12 4.32 11 0.9597 -0.020 -0.072 -0.018 -0.006 -0.165 0.046
18 7.29 17 0.9794 0.096 0.013 0.007 -0.061 0.130 -0.102
24 12.95 23 0.9530 -0.216 -0.094 -0.081 -0.039 0.042 -0.050
PROC ARIMA, No Differencing Applied, Estimate Q=1
Conditional Least Squares Estimation
Parameter Estimate
Standard
Error t Value
Approx
Pr > |t| Lag
MU 4.42102 0.34703 12.74 <.0001 0
MA1,1 -0.49827 0.13512 -3.69 0.0006 1
Constant Estimate 4.421016
Variance Estimate 2.412583
Std Error Estimate 1.553249
AIC 165.5704
SBC 169.1388
Number of Residuals 44
11
Autocorrelation Check of Residuals
To
Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 1.31 5 0.9336 0.059 0.094 -0.028 0.085 0.075 -0.020
12 3.23 11 0.9873 -0.006 -0.079 -0.052 -0.013 -0.146 0.039
18 6.68 17 0.9874 0.063 -0.001 0.044 -0.092 0.096 -0.149
24 14.00 23 0.9268 -0.206 -0.135 -0.114 -0.084 0.014 -0.072
PROC ARIMA, No Differencing Applied, Estimate P=1, Q=1
Conditional Least Squares Estimation
Parameter Estimate
Standard
Error t Value
Approx
Pr > |t| Lag
MU 4.42597 0.39769 11.13 <.0001 0
MA1,1 -0.32579 0.28804 -1.13 0.2646 1
AR1,1 0.23004 0.29571 0.78 0.4411 1
Constant Estimate 3.407829
Variance Estimate 2.436096
Std Error Estimate 1.5608
AIC 166.9369
SBC 172.2894
Number of Residuals 44
Autocorrelation Check of Residuals
To
Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 0.65 4 0.9577 0.002 -0.006 -0.023 0.069 0.080 -0.030
12 2.75 10 0.9867 0.000 -0.075 -0.034 0.004 -0.159 0.053
18 6.15 16 0.9864 0.070 -0.003 0.034 -0.093 0.124 -0.125
24 11.84 22 0.9606 -0.202 -0.094 -0.095 -0.065 0.029 -0.061

More Related Content

Similar to Writing Sample 1

FE8513 Fin Time Series-Assignment
FE8513 Fin Time Series-AssignmentFE8513 Fin Time Series-Assignment
FE8513 Fin Time Series-Assignment
Raktim Ray
 
RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...
RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...
RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...
Kiogyf
 
Statitical consulting project report
Statitical consulting project reportStatitical consulting project report
Statitical consulting project report
Zhaoqiu Luo
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Simplilearn
 
Project time series ppt
Project time series pptProject time series ppt
Project time series ppt
amar patil
 
RDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebRDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_Web
Sahl Martin
 

Similar to Writing Sample 1 (20)

FE8513 Fin Time Series-Assignment
FE8513 Fin Time Series-AssignmentFE8513 Fin Time Series-Assignment
FE8513 Fin Time Series-Assignment
 
RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...
RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...
RESEARCH ON THE POSSIBILITIES OF USING LINEAR OBSERVATION MODELS IN WELDING P...
 
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
 
04_AJMS_288_20.pdf
04_AJMS_288_20.pdf04_AJMS_288_20.pdf
04_AJMS_288_20.pdf
 
ARIMA Model.ppt
ARIMA Model.pptARIMA Model.ppt
ARIMA Model.ppt
 
ARIMA Model for analysis of time series data.ppt
ARIMA Model for analysis of time series data.pptARIMA Model for analysis of time series data.ppt
ARIMA Model for analysis of time series data.ppt
 
ARIMA Model.ppt
ARIMA Model.pptARIMA Model.ppt
ARIMA Model.ppt
 
Statitical consulting project report
Statitical consulting project reportStatitical consulting project report
Statitical consulting project report
 
Applications Residual Control Charts Based on Variable Limits
Applications Residual Control Charts Based on Variable LimitsApplications Residual Control Charts Based on Variable Limits
Applications Residual Control Charts Based on Variable Limits
 
difference between dynamic programming and divide and conquer
difference between dynamic programming and divide and conquerdifference between dynamic programming and divide and conquer
difference between dynamic programming and divide and conquer
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Project time series ppt
Project time series pptProject time series ppt
Project time series ppt
 
Short-Term Load Forecasting Using ARIMA Model For Karnataka State Electrical ...
Short-Term Load Forecasting Using ARIMA Model For Karnataka State Electrical ...Short-Term Load Forecasting Using ARIMA Model For Karnataka State Electrical ...
Short-Term Load Forecasting Using ARIMA Model For Karnataka State Electrical ...
 
20120140503019
2012014050301920120140503019
20120140503019
 
Time_Series_Assignment
Time_Series_AssignmentTime_Series_Assignment
Time_Series_Assignment
 
RDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebRDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_Web
 
AR model
AR modelAR model
AR model
 
MFBLP Method Forecast for Regional Load Demand System
MFBLP Method Forecast for Regional Load Demand SystemMFBLP Method Forecast for Regional Load Demand System
MFBLP Method Forecast for Regional Load Demand System
 
RESEARCH PAPER
RESEARCH PAPERRESEARCH PAPER
RESEARCH PAPER
 
RESEARCH PAPER
RESEARCH PAPERRESEARCH PAPER
RESEARCH PAPER
 

Writing Sample 1

  • 1. Steel Exports Time Series Data Analysis Julia L. Nickle Predict 411 – Section 57 11/19/2012
  • 2. 2 Introduction The analysis of time series data, like linear regression, involves a methodical process. Time series data, however, is a bit more complicated. The process entails four major steps, model identification, model estimation, diagnostic checking, and forecasting. This can also be considered a three step process; identification, estimation and checking, forecasting. Identification, possibly the most difficult step to follow, is what separates time series analysis from other regression analysis methods. Overall, however, the objective of time series analysis is to substantiate the relationship between the present value of the time series and the past values in order to predict future values more accurately (Wang, 2008). The purpose of this assignment is to evaluate the steel exports data set with respect to univariate time series techniques and to closely follow the Box-Jenkins method of time series modeling. In order for a time series to be modeled, the data must be stationary; assuming a constant mean and variance. Most data sets can be transformed from a non-stationary series to stationary in order to be modeled using the Box-Jenkins method (Wang, 2008). The need to account for a non-stationary series can be detected in the identification stage. Seasonality, or periodic fluctuations, must be assessed as this time as well. Autoregressive-Integrated-Moving Average (ARIMA) encompasses three different models used in time series analysis depending on the data set’s characteristics. The first, an autoregressive model (AR), allows for independent variables as time-lagged values of the dependent variable (Wang, 2008). For example, a lag 1 autoregressive term is xt-1 multiplied by a coefficient. The model’s order of autocorrelation depends on the amount of autoregressive terms. Moving average terms, by contrast, represent past errors multiplied by a coefficient (Moving
  • 3. 3 Average Models, 2012). The third model under the ARMIA umbrella is an ARMA model which includes autoregressive terms and moving average terms, and is thus a mixed model. The autocorrelation function (ACF) and the partial autocorrelation function (PACF) aid in identifying the terms in a model and subsequently dictate how the data should be modeled, to correctly represent the data. George C.S. Wang’s article in the Journal of Business Forecasting offers a concise explanation for each function: ACF values are calculated from the time series at various lags in order to measure the significance (if any) of correlations between past and present values. PACF values represent coefficients of a linear regression of the time series using lagged values as independent variables. In other words, the ACF illustrates how the correlation between any two values in a time series changes as their separation changes and the PACF also accounts for the intervals in between. Data The steel exports data set consists of 44 observations. The variable I_S_Weights represents iron and steel exports in million tons during the period from 1937 to 1980. Looking at a time series plot of the data, it is easy to see that the set is stationary. There is no obvious trend in the data, and it can be assumed that there is constant variance over time. The scatterplot with the trend line reiterates the data set’s stationarity, showing an irregular line that neither increases nor decreases consistently. The data set also does not appear to be affected by seasonality. There are no visible periodic patterns of highs and lows illustrated in the plot. While the variable I_S_Weights fluctuates over the years, the variation appears random in nature.
  • 4. 4 Simple statistics assist in better understanding the I_S_Weights values over the 44 year period. The average I_S_Weights is 4.41818 with a standard deviation of 1.75358. The maximum I_S_Weights value is 8.72 while the minimum is 2.13. Analysis The identification stage of time series analysis not only seeks to understand a data set’s stationarity and seasonality, but also aims to choose a model that best represents its terms. In order to identify an appropriate model for the steel exports data set, it is necessary to examine autocorrelation and partial autocorrelation function results. Proc ARIMA with the IDENTIFY statement produces plots of each function in addition to testing for white noise. The white noise test evaluates the hypothesis that none of the autocorrelations up to a given lag are significantly different from 0. If this hypothesis is found to be true for all lags within a series, then there is no information in the series to model, meaning no ARIMA model is needed (SAS, 2010). In this situation, the chi-square statistic 12.15 has a p-value of .0586. Because this value is over .05, the null hypothesis cannot be rejected, meaning there is not enough evidence to determine that the autocorrelations are significantly different from 0. Yet, looking at the ACF, PACF, and inverse autocorrelation plots (IACF), it appears the first autocorrelation, .47193, is outside the standard error bands. This suggests that the data set does in fact need to be fitted to an ARIMA model (Brocklebank & Dickey, 2003). Because both the ACF and PACF plots do not exponentially decrease, a mixed process does not appear be a suitable model. However, as the ACF plot displays sharp cut after lag 1 and negative values beyond lag 7, it appears a moving average model might suffice. The PACF plot, on the other hand, depicts the need for an autoregressive model with a sharp cutoff after lag 1. A comparison
  • 5. 5 of error means can assist in determining which of the two models more accurately represents the data. PROC ARIMA output illustrates how well each model fits the steel exports data set beginning with an autoregressive model of order 1. Parameter estimates show the mean term and its estimated value, 4.41217 as well as the coefficient of the lagged value of the change in iron and steel exports in million tons and its estimate, .47368. Both values are statistically significant with p-values <.05. As such, both terms are necessary to the model. The standard error for the mean term (MU) is .65984 while the standard error for the autoregressive parameter is .05485. The output also includes goodness of fit statistics for the AR(1) model, Constant and Variance estimates of roughly 2.32 and 2.44, a standard error estimate of 1.56, and an AIC and SBC of 166.15 and 169.72, respectively. Overall, smaller AIC and SBC values illustrate a better fitting model and can be used for comparison purposes (SAS, 2010). The white noise test for the AR(1) model shows an insignificant p-value of .8224 for the first 6 lags. This means the no- autocorrelation hypothesis cannot be rejected, indicating the residuals are in fact white noise. Thus, an AR(1) model appears to be an adequate model for the steel export series. As for the moving average process, PROC ARIMA output shows the model is estimated using the equation, . Both terms are significant to the model, with p-values <.05. The p-value for the moving average term is more significant than the p-value for the autoregressive term in the AR(1) model, at .0006. The model appears to fit well, as the standard error estimate is smaller than the AR(1) model’s at 1.55. Additionally, AIC and SBC values decreased to 165.57 and 169.14. Again, the white noise test verifies the residuals are white noise, and the no-autocorrelation hypothesis cannot be rejected. Review of the Q-Q Plot of Residuals validates the assumption that the residuals are normally distributed. As the plot of the
  • 6. 6 AR(1) model’s residuals does, the plot of the MA(1) model’s residuals does not show a structured deviation from the line. Relying on fit statistics, it appears the MA(1) model the preferred model for which to represent the steel exports data set. When the model is estimated with both autoregressive and moving average terms, PROC ARIMA output reiterates the data set’s incompatibility with a mixed model. The mean term is the only term with a significant p-value <.0001. The moving average term is insignificant with a p-value of .2646 and the autoregressive term is insignificant with a p-value of .4411, meaning the terms are unnecessary to the model. While the standard error estimate is comparable to the other models, at 1.56, the AIC and SBC values are larger, 166.94 and 172.29, respectively. These values indicate the previous models are of better fit to the data set. Summary/Conclusions: Model identification is a crucial aspect of time series analysis. Mistakes made during the identification stage can lead to incorrect estimations and inaccurate forecasts. Time taken to precisely identify a model allows for more efficient and effective estimation and diagnostic checking as well as forecasting stages for time series data. If ACF and PACF plots reveal that a data set, such as the steel exports data set, might benefit from being fit to either an AR model or an MA model, then fit statistics comparison is needed. For the steel exports data set, the estimation and diagnostic checking stage validates the identification stage. Assessment of the steel exports data suggest that a MA model provides a better fit over and AR model. Not only are standard error, AIC and SBC values smaller with the MA(1) model, but the MA term is highly significant. Additionally, the parameter estimates are uncorrelated, indicating the model does not suffer from any collinearity issues. As identification
  • 7. 7 and diagnostics stages are consistent and complete, the steel exports series can now be forecasted using an MA model. Future Work The steel exports data set might benefit from further evaluation using an AR(2) or an MA(2) model. Adding a second term could provide a better fit and more accurate estimation of the time series. It would be worth fitting the models and comparing fit statistics, but most importantly evaluating the second terms significance. If the second term is found to be insignificant, then the model is overfit and should be limited to either an AR(1) or an MA(1) process. Model’s fit with the differencing option in PROC ARIMA might also reveal a better fit for the steel exports data set. If the data set was found to be nonstationary during the identification stage, then transformation to a stationary series is needed. If more time were available, it would be useful to validate the data set’s stationarity by comparing a differenced model to the previously fitted AR(1) and AR(2) models. References SAS. (2010, April). Retrieved November 17, 2012, from SAS/STAT(R) 9.2 User's Guide, Second Edition: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_intr oreg_sect003.htm Moving Average Models. (2012). Retrieved November 17, 2012, from The Pennsylvania State University, Applied Time Series Analysis: https://onlinecourses.science.psu.edu/stat510/?q=node/48 Brocklebank, J. C., & Dickey, D. A. (2003). SAS for Forecasting Time Series. SAS Institute; 2nd edition.
  • 8. 8 Wang, G. C. (2008). A Guide to Box-Jenkins Modeling. The Journal of Business Forecasting. Appendix
  • 9. 9 Name of Variable = isweights Mean of Working Series 4.418182 Standard Deviation 1.73354 Number of Observations 44 Autocorrelation Check for White Noise To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 12.15 6 0.0586 0.472 0.104 0.045 0.103 0.099 0.008 PROC ARIMA, No Differencing Applied, Estimate P=1
  • 10. 10 Conditional Least Squares Estimation Parameter Estimate Standard Error t Value Approx Pr > |t| Lag MU 4.41217 0.43509 10.14 <.0001 0 AR1,1 0.47368 0.13622 3.48 0.0012 1 Constant Estimate 2.322229 Variance Estimate 2.444518 Std Error Estimate 1.563495 AIC 166.149 SBC 169.7174 Number of Residuals 44 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 2.19 5 0.8224 0.074 -0.151 -0.057 0.072 0.086 -0.020 12 4.32 11 0.9597 -0.020 -0.072 -0.018 -0.006 -0.165 0.046 18 7.29 17 0.9794 0.096 0.013 0.007 -0.061 0.130 -0.102 24 12.95 23 0.9530 -0.216 -0.094 -0.081 -0.039 0.042 -0.050 PROC ARIMA, No Differencing Applied, Estimate Q=1 Conditional Least Squares Estimation Parameter Estimate Standard Error t Value Approx Pr > |t| Lag MU 4.42102 0.34703 12.74 <.0001 0 MA1,1 -0.49827 0.13512 -3.69 0.0006 1 Constant Estimate 4.421016 Variance Estimate 2.412583 Std Error Estimate 1.553249 AIC 165.5704 SBC 169.1388 Number of Residuals 44
  • 11. 11 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 1.31 5 0.9336 0.059 0.094 -0.028 0.085 0.075 -0.020 12 3.23 11 0.9873 -0.006 -0.079 -0.052 -0.013 -0.146 0.039 18 6.68 17 0.9874 0.063 -0.001 0.044 -0.092 0.096 -0.149 24 14.00 23 0.9268 -0.206 -0.135 -0.114 -0.084 0.014 -0.072 PROC ARIMA, No Differencing Applied, Estimate P=1, Q=1 Conditional Least Squares Estimation Parameter Estimate Standard Error t Value Approx Pr > |t| Lag MU 4.42597 0.39769 11.13 <.0001 0 MA1,1 -0.32579 0.28804 -1.13 0.2646 1 AR1,1 0.23004 0.29571 0.78 0.4411 1 Constant Estimate 3.407829 Variance Estimate 2.436096 Std Error Estimate 1.5608 AIC 166.9369 SBC 172.2894 Number of Residuals 44 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 0.65 4 0.9577 0.002 -0.006 -0.023 0.069 0.080 -0.030 12 2.75 10 0.9867 0.000 -0.075 -0.034 0.004 -0.159 0.053 18 6.15 16 0.9864 0.070 -0.003 0.034 -0.093 0.124 -0.125 24 11.84 22 0.9606 -0.202 -0.094 -0.095 -0.065 0.029 -0.061