chapter_7.pptx

Quantitative Research Methods
A Practical Approach
Abdul Waheed
Quantitative Research Methods by Abdul Waheed 1

Chapter 7: Diagnostic Tests of Data and Model
Learning Outcomes
 Understand the meaning, consequences,
detection, and remedial measure of
autocorrelation, multicollinearity,
heteroscedasticity, model misspecification.
 Apply the tests for the stability of parameters.
 Perform the sensitivity analysis.
 Apply the tests for the dynamic interaction among
variables.
 Apply the tests for the forecastability of the model.

Meaning of Autocorrelation
 Autocorrelation (also called serial correlation) occurs when
the errors follow a pattern. It reflects the violation of a
linear regression model's assumption that the error terms
are not related.
 Autocorrelation is a common problem in time-series
regressions. It indicates that some relevant variable(s) are
not included as explanatory (independent) variables in the
model.
 The word autocorrelation and serial correlation are
used as synonymously. However, the autocorrelation means
lag correlation of a given error term with itself. The serial
correlation is the lag correlation between two different
series of error terms.

 Autocorrelation can be a first-order or high order.
The first-order autocorrelation is presented in
the model if an error is influenced by the error
from the preceding time period.
 Autocorrelation can be positive or negative.
Positive autocorrelation occurs when the error
tends to keep the same sign from one period to the
next. Negative autocorrelation occurs when the
sign of the error term tends to change its sign
regularly.

Consequences of Autocorrelation
 Even when the residuals are serially
correlated (presence of autocorrelation in
the model), the estimated OLS parameters
are unbiased and consistent, but they are
not efficient.
 The presence of autocorrelation affects the
standard error of the estimated parameters.
Hence the estimated parameters will have
smaller standard errors, consequently
narrower confidence intervals.

Consequences of Autocorrelation
 Therefore, the usual t, F test, and will no
longer be valid and lead to misleading
inference and prediction.
 In this case, the R2 and F-statistics may be
overestimated, and t-statistics over-
estimated.
 If the regression also includes a lagged
dependent variable, then OLS estimates will
be biased besides that they are inefficient.

Detection of Autocorrelation
 If we plot residuals against time and it shows a
systematic pattern, then autocorrelation is present in
the model.
 The correlogram is the plot of the residual
covariances standardized by the residual variance.
 It will confirm no autocorrelation in the residuals if
the autocorrelation and partial autocorrelation for all
lags are normally zero, and Q-statistics are
insignificant with large p-values.
 The correlogram plot will not show a systematic
pattern.

 The second method is to detect autocorrelation
through theDurbin-Watson test developed by two
Statisticians Durbin and Watson.
 The Durbin-Watson (DW) statistic provides a test for
first-order autocorrelation, the most common
autocorrelation.
 The condition for the DW test is that the regression
model should include an intercept term, and there is
no lagged dependent variable in the model.
 If DW = 2, there is noautocorrelation. If 0<DW <2,
there is some degree of positive autocorrelation (in the
worst case, it will be near zero). If 2<DW<4, there is
some negative autocorrelation.

 If there is a lagged dependent variable in the
regression model, then the DW test is no longer
valid.
 In this situation, we can use the Serial Correlation
LM Test. It is the Breusch Godfrey Lagrange
Multiplier test.
 The null hypothesis of the LM test is that there is
no autocorrelation. If the value of the F-statistic is
small, and its probability value is more than 0.05.
In that case, we accept that there is no problem of
autocorrelation in the regression model.

Remedial Measures of Autocorrelation
 Different approaches are used to treat autocorrelation.
 The best approach is to reformulate the regression
model and re-estimate the regression coefficients to
eliminate the problem of autocorrelation.
 We can reformulate the model by adding/removing a
variable(s) or changing its functional form.
 However, in some cases, the autocorrelation problem
cannot be removed by adding/removing the
variable(s) or changing the model's functional form.
 In these cases, we can use the Cochrane-Orcutt
method or the AR(1) method to eliminate the
autocorrelation problem.

Meaning of Multicollinearity
 Multicollinearity is the linear relationship among
some or all explanatory variables of the regression
model.
 The multicollinearity can be perfect or imperfect. If
one explanatory variable is some multiple of the other
explanatory variables, the multicollinearity will be
perfect.
 In the case of perfect multicollinearity, we cannot
find the estimates of the regression model.
 Multicollinearity does not reduce the model's
predictive power; it only affects the coefficient of
independent variables.

Consequences of Multicollinearity
 Large variance and standard errors of OLS estimators.
 Insignificant ‘t’ ratios of OLS estimators, leading to the
acceptance of the null hypothesis that the true
population parameter is zero.
 A high adjusted R2 but a few coefficients of the
regression model are significant.
 Unstable OLS estimator, a wrong sign of few
regression coefficients.
 The OLS estimators and their standard errors become
sensitive to small changes in the data.
 The OLS estimators are still linear, unbiased, and
normally distributed, but are not efficient

Detection of Multicollinearity
 High pairwise correlations among explanatory
variables.
 A priori knowledge of the relationship among
variables.
 High adjusted R2 but few coefficients are significant.
 Coefficients of the models are statistically
insignificant, but the overall model (F-stat) is
statistically significant.

Remedial Measures of Multicollinearity
 Increase the size of the sample.
 Drop a variable(s) from the model.
 Re-think a model by making it log-linear in variables.
 Making some transformation in variables such as the
use of first difference form.
 Find out which independent variable is some
combination of other variables of the model, and
remove it.
 Combine the cross-sectional and time-series data and
use pooled/panel data for analysis.

Meaning of Heteroscedasticity
 An essential assumption of the linear regression
analysis is that the error term's variance is constant
(homoscedastic).
 If the error term variance is not constant (i.e., there is
heteroscedasticity problem), this means error term
observations are coming from more than one
probability distribution.
 Heteroscedasticity mostly occurs in regression models
that are based on cross-section data, where there are
differences (heterogeneity) in the observations.

Representation of
Heteroscedasticity Problem

Consequences of Heteroscedasticity
 The OLS estimators are still unbiased, but they are
no longer efficient.
 It causes the OLS procedure to underestimate the
standard errors of the coefficients; thus, the t-
statistics are larger than they should be.
 The F-statistic will also be larger than its true
value.
 Because t-statistics and F-statistics are larger than
actual, the null hypothesis should not be rejected.

Detection of Heteroscedasticity
 The most famous is the White Heteroscedasticity test.
If the probability value of F-statistic or -statistic
under the White Heteroscedasticity test is more than
0.05, we accept the null hypothesis that the residuals
are homoscedastic.
 The Autoregressive Conditional Heteroscedasticity
(ARCH) LM test is also used for it. If the probability
value of F-statistic or -statistic is more than 0.05, we
accept the null hypothesis that the residuals are
homoscedastic.
2

2


Detection of Heteroscedasticity
 Furthermore:
 If we plot the error term observations and see a
systematic pattern, this shows the
heteroscedasticity problem.
 The nature of data gives information about
heteroscedasticity. If we are using cross-sectional
data that involve heterogeneous units,
heteroscedasticity may be present in the regression
results

Remedial Measures of Heteroscedasticity
 Use the log transformation of the data.
 Estimate the model for sub-samples with a
homogeneous group.
 Use a weighted least-square method, if the
value of 𝜎2
is known.
 If 𝜎2
is not known, for a weighted least-
square method, obtain an estimate of 𝜎2
.

Meaning of Model Misspecification
 One of the classical linear regression model's assumptions
is that the regression model used in the analysis is correctly
specified. We have a problem with model misspecification
if the model is not correctly specified. There are several
types of specification errors.
 Omitting a relevant variable from the model-omitted
variable bias.
 Including an unnecessary or irrelevant variable in the
model.
 Adopting the wrong functional form.
 Using a proxy variable instead of the true variable.

Consequences of Model Misspecification
 The consequences of omitting a relevant variable are:
 If the omitted variable is correlated with the included
variable, then the estimator will be biased and
inconsistent.
 Even if the omitted variable is not correlated with the
included variable, the model's intercept will be biased, but
the slop coefficient will be unbiased.
 The variance of disturbance term (𝜎2) will be incorrectly
estimated.
 The usual hypothesis testing is likely to give misleading
conclusions about estimated parameters.
 The forecast of the dependent variable will be unreliable.

Consequences of Model Misspecification
 If we include an irrelevant variable in the model, the
following will be the consequences.
 The estimators of the incorrect model will result in the
parameters that will be unbiased and consistent.
 The variance of the error term (𝜎2) is correctly
estimated.
 The usual hypothesis testing for the estimated
parameters is still valid.
 The estimated parameters will be inefficient, (having
large variance than those of the correct model)

Detection of Model Misspecification
 The examination of residuals, especially in
cross-sectional data, is a useful diagnostic
for model misspecification, such as the
omission of a relevant variable or incorrect
functional form.
 If the model is not correctly specified, the
plot of the residuals will exhibit a pattern.

Detection of Model Misspecification
 Ramsey has proposed a general test of model
specification error called Ramsey’s RESET
(Regression Specification error) test.
 It is a test of linear specification against a
nonlinear specification.
 The null hypothesis of the test is that the
specification of the model is linear. If the
computed test statistic's probability is equal to or
less than 0.05, we reject the null hypothesis and
conclude that the specification is nonlinear.

Remedial Measures of Model Misspecification
 The following are some steps that must be
taken to avoid model misspecification.
 Review the theoretical literature for the
identification of core variables of the model.
 Review the empirical literature for the most
relevant variables of the model.
 Adopt general to specific modeling
approach to arrive at a good fit model.

Test for Stability of Parameters
 The Cusum test helps analyze the possible parameter
variation. The Cusum test is based on the cumulative
sum of the recursive residual.
 In this case, we plot the recursive residuals with time
having a 5% critical region. If the residual movement
does not go outside the critical line, it will indicate
parameter stability.
 The Cusum of Squares test can also be used to test the
stability of parameters. If the test plot of the recursive
residual squares against the time under the 5% critical
region does not go outside the critical line, it indicates
the parameters' stability.

EViews Output for Cusum and
Cusum of Squares Plot
-20
-15
-10
-5
0
5
10
15
20
86 88 90 92 94 96 98 00 02 04 06 08 10 12 14
CUSUM 5% Significance
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
86 88 90 92 94 96 98 00 02 04 06 08 10 12 14
CUSUM of Squares 5% Significance

Impulse Response Function
 The impulse response function (IRF) traces the
variable's reaction over time to the exogenous
impulses (that is, it test for dynamic interaction
among variables).
 Impulse means exogenous shock, such as a change
in fiscal policy parameter or monetary policy
parameter and then to see the response of
endogenous variables such as consumption,
output, investment, and employment at the time
of the shock and at subsequent points in time.

EViews Output for Impulse Response Function
-40
-30
-20
-10
0
10
20
30
40
50
1 2 3 4 5 6 7 8 9 10
Response of DSAV to Cholesky
One S.D. GDPRB Innovation
-60
-40
-20
0
20
40
60
80
1 2 3 4 5 6 7 8 9 10
Response of DSAV to Cholesky
One S.D. GOLDP Innovation

Sensitivity Analysis
 In the regression analysis, suppose we find a
positive and statistically significant relationship
between X1 and Y. The causality analysis also
confirms that X1 positively cause Y. However, one
legitimate question is how robust our finding is?
 To test the robustness of our results, we perform
the sensitivity analysis. This analysis was
introduced by Levine and Renelt (1992), who used
a modified version of Extreme Bound Analysis
(EBA) initially developed by Leamer (1985).

 For the understanding of this analysis,
consider the following model.
 Where β1 is the constant term, X1 is the
variable of interest or focus variable, X2 is
the key variable, which is always included in
the regression. The Z is a subset of variables
chosen from a poll of variables.
Y = β1 +β2 1
X +β3 2
X +β3 Z +εi

 If the coefficient of focus variable has the same sign
and remains significant, one can maintain a fair
amount of confidence in the initial estimate, and the
result is called “robust."
 If, however, the coefficient of focus variable changes its
sign or becomes insignificant, then one might feel less
confident in the relationship between and Y,
indicating that the results are “fragile."
 Furthermore, we can conclude that the data yields
fairly sturdy information on the focus variable's
coefficient, if the highest and lowest values of the
focus variable's coefficient fall within a narrow
interval.

Test for Forecast Ability of Model
 The regression model is often used for forecasting.
 Forecasting is just a way to use variable historical data
to make the best guess about the dependent variable's
future values.
 The forecast can be in-sample or out of sample.
 If the dependent variable's forecast is based on the
independent variable's actual values, it is called in-
simple forecast.
 If the dependent variable forecast is based on the
independent variable's future estimated values, it is
called out-of-the sample forecast

 RMSE: Root mean square error is a scale-dependent
measure used to compare the forecast methods applied to a
single time series, or to time series with the same unit. It is
easy to understand and compute.

 MAPE: The mean absolute percentage error is a unit free
measure and is frequently used to compare forecast
performances between data sets.

 There are shortcomings of the above measures as they are
scaled sensitive. The relevant measure for the forecast
ability of the model is the Theil’s inequality coefficient.


Theil Inequality Coefficient
 The Theil’s inequality coefficient provides a
measure of how well a time series of
estimated values compares to a
corresponding time series of observed
values.
 The closer the value of the coefficient to
zero, the better the forecasting ability of the
model. A value of 1 means the forecast is just
a naïve guess.

Research Activity
 1. Based on some theory, formulate a multiple regression
model (or select a model from Box 6.1 of chapter 6). Get the
time series data of the variables from the data file. Perform
the following task using computer software.
 i) Estimate the regression model using the OLS
method and interpret the results.
 ii) Discuss the econometric problems of the regression
results.
 iii) Test for the stability of parameters using the
CUSUM and CUSUM square test.
 iv) Test for the dynamic interaction among variables.
 v) Get data on additional variables and perform
sensitivity analysis.
 vi) Test the forecast ability of the model.

chapter_7.pptx

Recommended

Recommended

More Related Content

Similar to chapter_7.pptx

Similar to chapter_7.pptx (20)

Recently uploaded

Recently uploaded (20)

chapter_7.pptx