2. Chapter 7: Diagnostic Tests of Data and Model
Learning Outcomes
Understand the meaning, consequences,
detection, and remedial measure of
autocorrelation, multicollinearity,
heteroscedasticity, model misspecification.
Apply the tests for the stability of parameters.
Perform the sensitivity analysis.
Apply the tests for the dynamic interaction among
variables.
Apply the tests for the forecastability of the model.
Quantitative Research Methods by Abdul Waheed 2
3. Meaning of Autocorrelation
Autocorrelation (also called serial correlation) occurs when
the errors follow a pattern. It reflects the violation of a
linear regression model's assumption that the error terms
are not related.
Autocorrelation is a common problem in time-series
regressions. It indicates that some relevant variable(s) are
not included as explanatory (independent) variables in the
model.
The word autocorrelation and serial correlation are
used as synonymously. However, the autocorrelation means
lag correlation of a given error term with itself. The serial
correlation is the lag correlation between two different
series of error terms.
Quantitative Research Methods by Abdul Waheed 3
4. Meaning of Autocorrelation
Autocorrelation can be a first-order or high order.
The first-order autocorrelation is presented in
the model if an error is influenced by the error
from the preceding time period.
Autocorrelation can be positive or negative.
Positive autocorrelation occurs when the error
tends to keep the same sign from one period to the
next. Negative autocorrelation occurs when the
sign of the error term tends to change its sign
regularly.
Quantitative Research Methods by Abdul Waheed 4
6. Consequences of Autocorrelation
Even when the residuals are serially
correlated (presence of autocorrelation in
the model), the estimated OLS parameters
are unbiased and consistent, but they are
not efficient.
The presence of autocorrelation affects the
standard error of the estimated parameters.
Hence the estimated parameters will have
smaller standard errors, consequently
narrower confidence intervals.
Quantitative Research Methods by Abdul Waheed 6
7. Consequences of Autocorrelation
Therefore, the usual t, F test, and will no
longer be valid and lead to misleading
inference and prediction.
In this case, the R2 and F-statistics may be
overestimated, and t-statistics over-
estimated.
If the regression also includes a lagged
dependent variable, then OLS estimates will
be biased besides that they are inefficient.
Quantitative Research Methods by Abdul Waheed 7
8. Detection of Autocorrelation
If we plot residuals against time and it shows a
systematic pattern, then autocorrelation is present in
the model.
The correlogram is the plot of the residual
covariances standardized by the residual variance.
It will confirm no autocorrelation in the residuals if
the autocorrelation and partial autocorrelation for all
lags are normally zero, and Q-statistics are
insignificant with large p-values.
The correlogram plot will not show a systematic
pattern.
Quantitative Research Methods by Abdul Waheed 8
10. Detection of Autocorrelation
The second method is to detect autocorrelation
through theDurbin-Watson test developed by two
Statisticians Durbin and Watson.
The Durbin-Watson (DW) statistic provides a test for
first-order autocorrelation, the most common
autocorrelation.
The condition for the DW test is that the regression
model should include an intercept term, and there is
no lagged dependent variable in the model.
If DW = 2, there is noautocorrelation. If 0<DW <2,
there is some degree of positive autocorrelation (in the
worst case, it will be near zero). If 2<DW<4, there is
some negative autocorrelation.
Quantitative Research Methods by Abdul Waheed 10
11. Detection of Autocorrelation
If there is a lagged dependent variable in the
regression model, then the DW test is no longer
valid.
In this situation, we can use the Serial Correlation
LM Test. It is the Breusch Godfrey Lagrange
Multiplier test.
The null hypothesis of the LM test is that there is
no autocorrelation. If the value of the F-statistic is
small, and its probability value is more than 0.05.
In that case, we accept that there is no problem of
autocorrelation in the regression model.
Quantitative Research Methods by Abdul Waheed 11
12. Remedial Measures of Autocorrelation
Different approaches are used to treat autocorrelation.
The best approach is to reformulate the regression
model and re-estimate the regression coefficients to
eliminate the problem of autocorrelation.
We can reformulate the model by adding/removing a
variable(s) or changing its functional form.
However, in some cases, the autocorrelation problem
cannot be removed by adding/removing the
variable(s) or changing the model's functional form.
In these cases, we can use the Cochrane-Orcutt
method or the AR(1) method to eliminate the
autocorrelation problem.
Quantitative Research Methods by Abdul Waheed 12
13. Meaning of Multicollinearity
Multicollinearity is the linear relationship among
some or all explanatory variables of the regression
model.
The multicollinearity can be perfect or imperfect. If
one explanatory variable is some multiple of the other
explanatory variables, the multicollinearity will be
perfect.
In the case of perfect multicollinearity, we cannot
find the estimates of the regression model.
Multicollinearity does not reduce the model's
predictive power; it only affects the coefficient of
independent variables.
Quantitative Research Methods by Abdul Waheed 13
14. Consequences of Multicollinearity
Large variance and standard errors of OLS estimators.
Insignificant ‘t’ ratios of OLS estimators, leading to the
acceptance of the null hypothesis that the true
population parameter is zero.
A high adjusted R2 but a few coefficients of the
regression model are significant.
Unstable OLS estimator, a wrong sign of few
regression coefficients.
The OLS estimators and their standard errors become
sensitive to small changes in the data.
The OLS estimators are still linear, unbiased, and
normally distributed, but are not efficient
Quantitative Research Methods by Abdul Waheed 14
15. Detection of Multicollinearity
High pairwise correlations among explanatory
variables.
A priori knowledge of the relationship among
variables.
High adjusted R2 but few coefficients are significant.
Coefficients of the models are statistically
insignificant, but the overall model (F-stat) is
statistically significant.
Quantitative Research Methods by Abdul Waheed 15
16. Remedial Measures of Multicollinearity
Increase the size of the sample.
Drop a variable(s) from the model.
Re-think a model by making it log-linear in variables.
Making some transformation in variables such as the
use of first difference form.
Find out which independent variable is some
combination of other variables of the model, and
remove it.
Combine the cross-sectional and time-series data and
use pooled/panel data for analysis.
Quantitative Research Methods by Abdul Waheed 16
17. Meaning of Heteroscedasticity
An essential assumption of the linear regression
analysis is that the error term's variance is constant
(homoscedastic).
If the error term variance is not constant (i.e., there is
heteroscedasticity problem), this means error term
observations are coming from more than one
probability distribution.
Heteroscedasticity mostly occurs in regression models
that are based on cross-section data, where there are
differences (heterogeneity) in the observations.
Quantitative Research Methods by Abdul Waheed 17
19. Consequences of Heteroscedasticity
The OLS estimators are still unbiased, but they are
no longer efficient.
It causes the OLS procedure to underestimate the
standard errors of the coefficients; thus, the t-
statistics are larger than they should be.
The F-statistic will also be larger than its true
value.
Because t-statistics and F-statistics are larger than
actual, the null hypothesis should not be rejected.
Quantitative Research Methods by Abdul Waheed 19
20. Detection of Heteroscedasticity
The most famous is the White Heteroscedasticity test.
If the probability value of F-statistic or -statistic
under the White Heteroscedasticity test is more than
0.05, we accept the null hypothesis that the residuals
are homoscedastic.
The Autoregressive Conditional Heteroscedasticity
(ARCH) LM test is also used for it. If the probability
value of F-statistic or -statistic is more than 0.05, we
accept the null hypothesis that the residuals are
homoscedastic.
2
2
Quantitative Research Methods by Abdul Waheed 20
21. Detection of Heteroscedasticity
Furthermore:
If we plot the error term observations and see a
systematic pattern, this shows the
heteroscedasticity problem.
The nature of data gives information about
heteroscedasticity. If we are using cross-sectional
data that involve heterogeneous units,
heteroscedasticity may be present in the regression
results
Quantitative Research Methods by Abdul Waheed 21
22. Remedial Measures of Heteroscedasticity
Use the log transformation of the data.
Estimate the model for sub-samples with a
homogeneous group.
Use a weighted least-square method, if the
value of 𝜎2
is known.
If 𝜎2
is not known, for a weighted least-
square method, obtain an estimate of 𝜎2
.
Quantitative Research Methods by Abdul Waheed 22
23. Meaning of Model Misspecification
One of the classical linear regression model's assumptions
is that the regression model used in the analysis is correctly
specified. We have a problem with model misspecification
if the model is not correctly specified. There are several
types of specification errors.
Omitting a relevant variable from the model-omitted
variable bias.
Including an unnecessary or irrelevant variable in the
model.
Adopting the wrong functional form.
Using a proxy variable instead of the true variable.
Quantitative Research Methods by Abdul Waheed 23
24. Consequences of Model Misspecification
The consequences of omitting a relevant variable are:
If the omitted variable is correlated with the included
variable, then the estimator will be biased and
inconsistent.
Even if the omitted variable is not correlated with the
included variable, the model's intercept will be biased, but
the slop coefficient will be unbiased.
The variance of disturbance term (𝜎2) will be incorrectly
estimated.
The usual hypothesis testing is likely to give misleading
conclusions about estimated parameters.
The forecast of the dependent variable will be unreliable.
Quantitative Research Methods by Abdul Waheed 24
25. Consequences of Model Misspecification
If we include an irrelevant variable in the model, the
following will be the consequences.
The estimators of the incorrect model will result in the
parameters that will be unbiased and consistent.
The variance of the error term (𝜎2) is correctly
estimated.
The usual hypothesis testing for the estimated
parameters is still valid.
The estimated parameters will be inefficient, (having
large variance than those of the correct model)
Quantitative Research Methods by Abdul Waheed 25
26. Detection of Model Misspecification
The examination of residuals, especially in
cross-sectional data, is a useful diagnostic
for model misspecification, such as the
omission of a relevant variable or incorrect
functional form.
If the model is not correctly specified, the
plot of the residuals will exhibit a pattern.
Quantitative Research Methods by Abdul Waheed 26
27. Detection of Model Misspecification
Ramsey has proposed a general test of model
specification error called Ramsey’s RESET
(Regression Specification error) test.
It is a test of linear specification against a
nonlinear specification.
The null hypothesis of the test is that the
specification of the model is linear. If the
computed test statistic's probability is equal to or
less than 0.05, we reject the null hypothesis and
conclude that the specification is nonlinear.
Quantitative Research Methods by Abdul Waheed 27
28. Remedial Measures of Model Misspecification
The following are some steps that must be
taken to avoid model misspecification.
Review the theoretical literature for the
identification of core variables of the model.
Review the empirical literature for the most
relevant variables of the model.
Adopt general to specific modeling
approach to arrive at a good fit model.
Quantitative Research Methods by Abdul Waheed 28
29. Test for Stability of Parameters
The Cusum test helps analyze the possible parameter
variation. The Cusum test is based on the cumulative
sum of the recursive residual.
In this case, we plot the recursive residuals with time
having a 5% critical region. If the residual movement
does not go outside the critical line, it will indicate
parameter stability.
The Cusum of Squares test can also be used to test the
stability of parameters. If the test plot of the recursive
residual squares against the time under the 5% critical
region does not go outside the critical line, it indicates
the parameters' stability.
Quantitative Research Methods by Abdul Waheed 29
31. Impulse Response Function
The impulse response function (IRF) traces the
variable's reaction over time to the exogenous
impulses (that is, it test for dynamic interaction
among variables).
Impulse means exogenous shock, such as a change
in fiscal policy parameter or monetary policy
parameter and then to see the response of
endogenous variables such as consumption,
output, investment, and employment at the time
of the shock and at subsequent points in time.
Quantitative Research Methods by Abdul Waheed 31
32. EViews Output for Impulse Response Function
-40
-30
-20
-10
0
10
20
30
40
50
1 2 3 4 5 6 7 8 9 10
Response of DSAV to Cholesky
One S.D. GDPRB Innovation
-60
-40
-20
0
20
40
60
80
1 2 3 4 5 6 7 8 9 10
Response of DSAV to Cholesky
One S.D. GOLDP Innovation
Quantitative Research Methods by Abdul Waheed 32
33. Sensitivity Analysis
In the regression analysis, suppose we find a
positive and statistically significant relationship
between X1 and Y. The causality analysis also
confirms that X1 positively cause Y. However, one
legitimate question is how robust our finding is?
To test the robustness of our results, we perform
the sensitivity analysis. This analysis was
introduced by Levine and Renelt (1992), who used
a modified version of Extreme Bound Analysis
(EBA) initially developed by Leamer (1985).
Quantitative Research Methods by Abdul Waheed 33
34. Sensitivity Analysis
For the understanding of this analysis,
consider the following model.
Where β1 is the constant term, X1 is the
variable of interest or focus variable, X2 is
the key variable, which is always included in
the regression. The Z is a subset of variables
chosen from a poll of variables.
Y = β1 +β2 1
X +β3 2
X +β3 Z +εi
Quantitative Research Methods by Abdul Waheed 34
35. Sensitivity Analysis
If the coefficient of focus variable has the same sign
and remains significant, one can maintain a fair
amount of confidence in the initial estimate, and the
result is called “robust."
If, however, the coefficient of focus variable changes its
sign or becomes insignificant, then one might feel less
confident in the relationship between and Y,
indicating that the results are “fragile."
Furthermore, we can conclude that the data yields
fairly sturdy information on the focus variable's
coefficient, if the highest and lowest values of the
focus variable's coefficient fall within a narrow
interval.
Quantitative Research Methods by Abdul Waheed 35
36. Test for Forecast Ability of Model
The regression model is often used for forecasting.
Forecasting is just a way to use variable historical data
to make the best guess about the dependent variable's
future values.
The forecast can be in-sample or out of sample.
If the dependent variable's forecast is based on the
independent variable's actual values, it is called in-
simple forecast.
If the dependent variable forecast is based on the
independent variable's future estimated values, it is
called out-of-the sample forecast
Quantitative Research Methods by Abdul Waheed 36
37. Test for Forecast Ability of Model
RMSE: Root mean square error is a scale-dependent
measure used to compare the forecast methods applied to a
single time series, or to time series with the same unit. It is
easy to understand and compute.
MAPE: The mean absolute percentage error is a unit free
measure and is frequently used to compare forecast
performances between data sets.
There are shortcomings of the above measures as they are
scaled sensitive. The relevant measure for the forecast
ability of the model is the Theil’s inequality coefficient.
Quantitative Research Methods by Abdul Waheed 37
38. Test for Forecast Ability of Model
Theil Inequality Coefficient
The Theil’s inequality coefficient provides a
measure of how well a time series of
estimated values compares to a
corresponding time series of observed
values.
The closer the value of the coefficient to
zero, the better the forecasting ability of the
model. A value of 1 means the forecast is just
a naïve guess.
Quantitative Research Methods by Abdul Waheed 38
39. Research Activity
1. Based on some theory, formulate a multiple regression
model (or select a model from Box 6.1 of chapter 6). Get the
time series data of the variables from the data file. Perform
the following task using computer software.
i) Estimate the regression model using the OLS
method and interpret the results.
ii) Discuss the econometric problems of the regression
results.
iii) Test for the stability of parameters using the
CUSUM and CUSUM square test.
iv) Test for the dynamic interaction among variables.
v) Get data on additional variables and perform
sensitivity analysis.
vi) Test the forecast ability of the model.
Quantitative Research Methods by Abdul Waheed 39