SlideShare a Scribd company logo
1
☺☺AA
LECTURE6_GLS
HETEROSKEDASTICITY – GLS (WLS) ESTIMATORS – WHITE CORRECTION
Maria Elena Bontempi mariaelena.bontempi@unibo.it
Roberto Golinelli roberto.golinelli@unibo.it
07/11/2012
Preliminary; comments welcome
1. Introduction
The main assumption about the classical regression model errors is that they are identically and
independently distributed with mean equal to zero, in symbols: ε ~ iid(0, σ2
).
The E(ε) = 0 assumption is perfectly represented by OLS residuals that always sum to zero by
definition, provided that model’s specification includes the intercept.
The assumption of independently distributed errors (errors belonging to different observation are
not related each other) is not easily checked in cross-sections, given that there is not an obvious
way in which cross-section observations have to be ordered (listed). In this context, an appropriate
sampling design (random sampling) may prevent the insurgence of the problem. On the other side,
the assessment of errors being independently distributed is crucial in time series.
The assumption of identically distributed errors usually is no longer valid in cross-section data,
characterised by relevant variability. Errors’ heteroskedasticity is the most common problem: often
the errors variance seems not constant over different observations. In this case, the assumption of
identically distributed errors is no longer valid.
If iid assumption is valid we have that:
On average the regression
line is correct
NiE i ,...,1,0)( =∀=ε
identically
distributed
ε∼ )Iiid N
2
,0( σ
Homoskedasticity NiXVarX ii ,...,1,)|()|(E 22
=∀== σεε
Non cross-correlation ji,0)X|,(Cov)X|(E jiji ≠≠≠≠∀∀∀∀======== εεεε independently
distributed
where σ2
IN is the VCOV matrix of errors, equal to
E(εε′) =
0
00
0
10
00
01
2
2
22










=










=
σ
σ
σσ
L
O
L
L
O
L
NI .
In other terms, the VCOV matrix is a scalar matrix, i.e. a diagonal matrix whose diagonal elements
are all equal.
We can compute the variance of the estimator βˆ by (exogeneity assumed):
[ ] [ ]
1
1
212
1111
')(
)()|()(|)(|)()|ˆ(
−
=
−
−−−−






=′=
=′′′=′′=+′′=
∑
N
i
ii XXXX
XXXXVarXXXXXXXVarXXXXVarXVar
σσ
εεβεβ
2
where Xi is the (K×1) vector of explanatory variables for observation i.
In cross-section (and panel data) the homoskedasticity assumption is rarely satisfied.
For example, in cross-sectional data, it is hard to suppose that the consumption variability around
its mean is constant independently from the income level. Instead, rich person could have
variegated interests, tastes, and consumption opportunities: this makes consumption variance be
higher at high income levels.
Non spherical errors can be characterized by heteroskedasticity i.e. the errors variance is not
constant over different observations:
h
iii XVar ωσσε 22
)|( == .
In matrix notation, we can write:
ε|X)Var( =E(εε’|X)= ΩDiagDiag
h
N
h
h
i
N
i
2
1
22
2
2
1
2
0
00
0
)(
0
00
0
)( σ
ω
ω
σωσ
σ
σ
σ =










==










=
L
O
L
L
O
L
.
where Ω is a positive definite matrix, not necessarily scalar. Hence, it may be necessary to estimate
N additional parameters (the parameters along the main diagonal).
In presence of heteroskedasticity OLSβˆ is unbiased (unbiasedness is based on linearity and
exogeneity) but not efficient:
12121121
11
)()()()()()(
)()|()())ˆ)(ˆ(()|ˆ(
−−−−−
−−
′≠′′′=′Ω′′=
=′′′=′−−=
XXXXXDiagXXXXXXXXX
XXXXVarXXXEXVar
i σσσ
εβββββ
.
In particular, the variance of βˆ is higher than 12
)( −
′XXσ (homoskedastic case) by the positive
definite matrix XXXX Ω′′ −1
)( .
Moreover, the MSE, s2
, is a biased estimator of σ2
:
[ ]
2
2
2
2
)()(
1
)]([
1
)(
1)(ˆˆ
)(
σ
σ
σ
εεεε
εεεεεε
≠Ω
−
=Ω
−
=
=′
−
=′
−
=





−
′
=





−
′
=





−
′
=
Mtr
KN
Mtr
KN
MEtr
KN
MEtr
KNKN
Mtr
E
KN
M
E
KN
EsE
.
where M = I -PX = I - X(X'X)-1
X' is the matrix projecting Y upon the space orthogonal to the one
spanned by the columns of X:
εˆ = Y - Yˆ =Y - X βˆ = Y - X (X'X)-1
X'Y = (I-PX)Y= MXY
The matrix M is symmetric (M' = M), idempotent (M M = M), with rank(M) = tr(M) = N-K.
Hence, the variance of OLSβˆ is biased because the weighing matrix is no more 1
)( −
′XX and because
s2
is a biased estimator of σ2
.
As a consequence, inference (test t and F) is not correct: statistic-test do not have the standard
distribution; standard confidence regions are no more valid.
3
Consider the following example.
use GLS_data, clear
descr
Contains data
obs: 100
vars: 3 16 Nov 2004 18:25
size: 1,300 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
obs byte %8.0g families
cons1000 float %9.0g consumption in 2003 at constant prices
redd1000 float %9.0g income in 2002 at constant prices
-------------------------------------------------------------------------------
The idea of explaining the consumption with the income in the previous year predetermines the
dynamic relationship in a quite restrictive way but at the advantage of avoiding consumption-
income simultaneity and endogeneity problems.
Scatterplot tells us that the consumption variability growths with the level of income: richer people
behave in different ways. This fact per se implies the likely heteroschedasticity of the linear model
residuals.
. graph7 cons1000 redd1000, ylabel xlabel
CONS1000
REDD1000
0 50 100 150
0
50
100
Keynes’s (linear) consumption function
. reg cons1000 redd1000
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 1, 98) = 1036.50
Model | 46059.3208 1 46059.3208 Prob > F = 0.0000
Residual | 4354.87802 98 44.4375308 R-squared = 0.9136
-------------+------------------------------ Adj R-squared = 0.9127
Total | 50414.1988 99 509.234332 Root MSE = 6.6661
------------------------------------------------------------------------------
cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
redd1000 | .7002875 .0217517 32.19 0.000 .6571221 .743453
_cons | 5.668498 1.331578 4.26 0.000 3.026025 8.310971
------------------------------------------------------------------------------
4
2. Heteroskedasticity tests
Graphical analysis represents a first step towards discovering whether heteroskedasticity is present.
We are supposing that the error variance is a function of income:
version 7: rvfplot, oneway twoway box ylabel xlabel yline(0)
Residuals
Fitted values
0 50 100
-20
-10
0
10
20
5
Heteroskedasticity tests verify the hypothesis
H0: VAR(εi) = σ2
, ∀ i = 1, .., N.
In general, the tests use auxiliary regressions in the form
iii uZf +′= )(ˆ2
αε ,
where ui ∼ iid(0, 2
uσ ), α and Zi are V×1 vectors, with V number of variables in Z (and of associated
parameters αs) used to explain the error variance; for this Zi are called the variance indicator
variables.
The null hypothesis to be tested becomes H0: α=0.
What about the alternative hypothesis, H1? Non-constant variance implies that specific variance
behaviours must be assumed.
Under the alternative, the form of the detected heteroskedasticity depends on the choice of the
explanatory indicators Zi. The test is conditional on a set of variables which are presumed to
influence the error variance: fitted values, explanatory variables, any other variable presumed to
influence the error variance (for example, in the financial time-series setting, Engle (1982)
proposes an ARCH test, for autoregressive conditional heteroskedasticity:
tttt u+++= −− ...ˆˆˆ 2
2
21
2
1
2
εαεαε ).
The statistic is computed as either the F (small samples) or LM (large samples) for the overall
significance of the independent variables in explaining 2
ˆiε .
The F statistic is
)1/()1(
/
2
2
−−− VNR
VR
a
a
, where Ra
2
is the R-squared of the auxiliary regression.
The LM statistic is just the sample size times the R-squared of the auxiliary regression; under the
null, it is distributed asymptotically as 2
Vχ .
6
A first form of the test is the Breusch-Pagan (1979) (Breusch-Pagan (1979), Godfrey (1978), and
Cook-Weisberg (1983) separately derived the same test statistic).
It is a Lagrange multiplier test for heteroskedasticity in the error distribution.
It is the most general test, even if it is not powerful and it is sensitive to the assumption of error
normally distributed (this is the assumption of the original formulation; see below for a change in
this assumption).
The Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of freedom. It is
obtained by the following steps:
1) run the model regression and define the dependent variable of the Breusch-Pagan auxiliary
regression
∑=
= N
i
i
i
i
N
g
1
2
2
ˆ
1
ˆ
ε
ε
; 1
2) run the auxiliary regression iii uZg +′+= αα0 and obtain the BP statistic as BP = MSS/Mdf
This test can verify whether heteroskedasticity is conditional on any list of Zi variables, which are
presumed to influence the error variance (i.e. variance-indicators); they can be the fitted value, or
the explanatory variables of the model, or any variables you think they can be affect the errors’
variance. The trade-off in the choice of indicator variables in these tests is that a smaller set of
indicator variables will preserve degrees of freedom, at the cost of being unable to detect
heteroskedasticity in certain directions.
1
For this, Breusch-Pagan (1979, p. 1293) say: “… the quantity ig is of some importance in tests of heteroskedasticity.
Thus, if one is going to plot any quantity, it would seem more reasonable to plot ig than
2
ˆiε .”. By dividing for the
mean, residuals are normalised: under the null there are not noise terms that can affect the chi-squared distribution; it is
possible to use every variable you think is useful in explaining heteroskedasticity.
7
A second form of the heteroskedasticity test is the very often reported White (1980) test for
heteroskedasticity.
It is based on a different auxiliary regression where the squared residuals are regressed on the
model regressors, all their squares, and all their possible (not redundant) cross products.
The asymptotic chi-squared White test-statistic is obtained by the product of the number of
observations times the R-squared of the auxiliary regression.
The F-version for small samples is obtained by setting to zero all the explanatory variables of the
auxiliary regression (i.e. by looking at the F-test for the overall significance of the auxiliary
regression).
We have several commands to execute these heteroskedasticity tests.
Suppose the model yi = α + β1X1i + β2X2i + εi,
Different possibilities for the heteroskedasticity test are summarized in the following table.
Breusch-Pagan White
Fitted variable hettest
X1 X2 hettest, rhv
bpagan X1 X2
ivhettest, all (ivlev)
(output Breusch-
Pagan/Godfrey/Cook-
Weisberg)
X1 X2 X2
1 X2
2 X1×X2 hettest X1 X2 X2
1 X2
2 X1×X2
bpagan X1 X2 X2
1 X2
2 X1×X2
ivhettest, all ivcp (output
Breusch-Pagan/Godfrey/Cook-
Weisberg)
hettest X1 X2 X2
1 X2
2 X1×X2, iid
whitetst
ivhettest, ivcp (output
White/Koenker nR2 test
statistic)
NOTE: the command hettest is not appropriate after regress, nocons
8
For example, if we suppose that in our simple consumption model the levels of income and their
squares are both valid variance indicators, we can test for heteroskedasticity in the following way:
. g redd2=redd1000^2
. hettest redd1000 redd2
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000 redd2
chi2(2) = 25.00
Prob > chi2 = 0.0000
The same result can be obtained by applying a procedure, written by C. F. Baum and V. Wiggins,
that specifically run the Breusch-Pagan (1979) test for heteroskedasticity conditional on a set of
variables.
. bpagan redd1000 redd2
Breusch-Pagan LM statistic: 25.0018 Chi-sq( 2) P-value = 3.7e-06
9
In general, the Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of
freedom (in the latter example V=2). The statistic above may be replicated with the following steps.
1) Compute the dependent variable of the Breusch-Pagan auxiliary regression:
. reg cons1000 redd1000
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 1, 98) = 1036.50
Model | 46059.3208 1 46059.3208 Prob > F = 0.0000
Residual | 4354.87802 98 44.4375308 R-squared = 0.9136
-------------+------------------------------ Adj R-squared = 0.9127
Total | 50414.1988 99 509.234332 Root MSE = 6.6661
------------------------------------------------------------------------------
cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
redd1000 | .7002875 .0217517 32.19 0.000 .6571221 .743453
_cons | 5.668498 1.331578 4.26 0.000 3.026025 8.310971
------------------------------------------------------------------------------
. predict res, resid
. g BP_g= res^2/(e(rss)/e(N))
where e(rss)=4354.87802 and e(N)=100 are post estimation results corresponding, respectively,
to the residual sum of squares and to the total number of observations.
2) Run the Breusch-Pagan auxiliary regression and comute the test statistic and/or its P-value:
. reg BP_g redd1000 redd2
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 2, 97) = 12.90
Model | 50.0036064 2 25.0018032 Prob > F = 0.0000
Residual | 188.030708 97 1.93846091 R-squared = 0.2101
-------------+------------------------------ Adj R-squared = 0.1938
Total | 238.034314 99 2.40438701 Root MSE = 1.3923
------------------------------------------------------------------------------
BP_g | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
redd1000 | -.0092459 .0167538 -0.55 0.582 -.0424975 .0240058
redd2 | .0002848 .0001499 1.90 0.060 -.0000127 .0005823
_cons | .4226007 .4038909 1.05 0.298 -.3790109 1.224212
------------------------------------------------------------------------------
. di e(mss)/e(df_m)
25.001803
where e(mss)=50.0036064 and e(df_m)=2 are post estimation results corresponding, respectively,
to the model sum of squares and to the model degrees of freedom of the auxiliary regression.
The P-value of the test is obtained as:
. display chi2tail(2,e(mss)/e(df_m))
3.723e-06
10
The White test can be performed in several ways, the easiest is to run a procedure, always written
by Baum and Cox, that automatically computes the asymptotic version of the White test.
. qui reg cons1000 redd1000
. whitetst
White's general test statistic : 21.00689 Chi-sq( 2) P-value = 2.7e-05
This result may be replicated with the following steps.
1) Compute the dependent variable of the White auxiliary regression:
. g res2=res^2
2) Run the White auxiliary regression (remember that we have only one explanatory variable):
. reg res2 redd1000 redd2
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 2, 97) = 12.90
Model | 94831.6529 2 47415.8264 Prob > F = 0.0000
Residual | 356599.534 97 3676.28386 R-squared = 0.2101
-------------+------------------------------ Adj R-squared = 0.1938
Total | 451431.187 99 4559.91098 Root MSE = 60.632
------------------------------------------------------------------------------
res2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
redd1000 | -.4026459 .7296083 -0.55 0.582 -1.850716 1.045424
redd2 | .0124035 .0065273 1.90 0.060 -.0005513 .0253583
_cons | 18.40375 17.58896 1.05 0.298 -16.50546 53.31296
------------------------------------------------------------------------------
The White test-statistic and its P-value in the asymptotic version of the test
The LM test-statistic for heteroskedasticity is just the sample size N times the R-squared of the
auxiliary regression:
. di e(N)*e(r2)
21.00689
where e(N)=100 and e(r2)=0.2101 are post estimation results corresponding, respectively, to the
total number of observations and to the R-squared of the auxiliary regression. The P-value of the
test is obtained as:
. display chi2tail(2,e(N)*e(r2))
.00002744
The F version of the White test for small samples2
. testparm redd1000 redd2
( 1) redd1000 = 0.0
( 2) redd2 = 0.0
F( 2, 97) = 12.90
Prob > F = 0.0000
2
This command can be used also in the Breusch-Pagan auxiliary regression; of course the results in the two tests
coincide.
11
Note that:
. qui reg cons1000 redd1000
. hettest redd1000 red2, iid
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000 red2
chi2(2) = 21.01
Prob > chi2 = 0.0000
The Breusch-Pagan (1979) test from the hettest command is numerically equal to the White
(1980) test for heteroskedasticity, if the same White’s auxiliary regression is specified and the
option iid is used. Differently from the default of hettest and from bpagan, that compute the
original Breusch-Pagan test assuming that the regression disturbances are normally distributed, the
option iid causes hettest to compute the NR2
version of the score test, which drops the
normality assumption.3
A useful command that, despite its name, also works after OLS and performs both previous tests is:
. ivhettest, all ivcp
OLS heteroskedasticity test(s) using levels and cross products of all IVs
Ho: Disturbance is homoskedastic
White/Koenker nR2 test statistic : 21.007 Chi-sq(2) P-value = 0.0000
Breusch-Pagan/Godfrey/Cook-Weisberg : 25.002 Chi-sq(2) P-value = 0.0000
Note that if you write hettest only, the residual variance is assumed to depend on the fitted values
(i.e. ii yZ ˆ≡ , and V=1); if you use the option ,rhs the residual variance is assumed to depend on
the explanatory variables of the model (in our case of one explanatory variable these two tests
coincide).
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000
chi2(1) = 21.50
Prob > chi2 = 0.0000
. bpagan redd1000
Breusch-Pagan LM statistic: 21.50193 Chi-sq( 1) P-value = 3.5e-06
. ivhettest, all
OLS heteroskedasticity test(s) using levels of IVs only
Ho: Disturbance is homoskedastic
White/Koenker nR2 test statistic : 18.066 Chi-sq(1) P-value = 0.0000
4
Breusch-Pagan/Godfrey/Cook-Weisberg : 21.502 Chi-sq(1) P-value = 0.0000
3
Koenker (1981) showed that when the assumption of normality is removed, a version of the test is available that can
be calculated as the sample size N times the centered R-squared from an artificial regression of the squared residuals
from the original regression on the indicator variables.
4
This test is the Breusch-Pagan without the normality assumption.
12
3. How accounting for heteroskedasticity?
3.1. Heteroskedasticity-consistent estimates of the standard errors
A first way to account for heteroskedasticity is that of estimating model’s parameters by OLS (if
the Keynesian model is correctly specified, OLS estimator is unbiased and consistent, even if not
efficient due to heteroskedasticity), and of correcting the OLS estimates of the standard errors
(biased). To do so, consistent standard errors are needed.
The robust option of the regress Stata command specifies that the Eicker (1967)/Huber
(1973)/White (1980) sandwich estimator of variance is used instead of the traditional OLS error
variance estimator; inference is heteroskedasticity-robust.
In particular, White (1980) argues that it is not necessary to estimate all 2
iσ s, but that we simply
need a consistent estimator of the (K×K) matrix
∑∑∑∑====
====′′′′====′′′′====′′′′′′′′
N
1i
ii
2
i
2
i
2
'XXX)(DiagXXXX)(EX σσΩσεε .
If we define as Xi the (K×1) vector of explanatory variables for observation i, a consistent estimator
can be obtained as
∑
=
=
′′ N
i
iii XX
NN
XX
1
2
'ˆ
1ˆˆ
ε
εε
,
where iˆε is the OLS residual and plim XX
N
XˆˆX 2
Ωσ
εε
′′′′====
′′′′′′′′
.
Thus, the “sandwich”:
1
11
2
1
1
1
1
21
''ˆ')('ˆ)()ˆ(
−
==
−
=
−
=
−












=′′= ∑∑∑∑
N
i
ii
N
i
iii
N
i
ii
N
i
iii XXXXXXXXXXXXVar εεβ
can be used as an estimate of the true variance of the OLS estimator.
In our case above, after we detected residuals heteroskedasticity, under the assumption that the
other assumptions about our keynesian model hold, we can obtain consistent standard errors using a
very simple option:
. reg cons1000 redd1000, robust
Regression with robust standard errors Number of obs = 100
F( 1, 98) = 799.78
Prob > F = 0.0000
R-squared = 0.9136
Root MSE = 6.6661
------------------------------------------------------------------------------
| Robust
cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
redd1000 | .7002875 .0247622 28.28 0.000 .6511477 .7494274
_cons | 5.668498 1.076363 5.27 0.000 3.532492 7.804505
------------------------------------------------------------------------------
NOTE: parameter estimates (with and without standard errors correction) are identical: the White
correction does not modify the parameters’ estimates.
13
3.2. Feasible generalised least squares (FGLS)
If we have some idea about the heteroskedasticity determinants, we can introduce a different
estimator: FGLS (feasible generalised least squares), the efficient estimator in the context of
heteroskedastic errors (remember: OLS is only consistent but inefficient because does not account
for the heteroskedastic behaviour of errors).
If h
iii XVar ωσσε 22
)|( == , with ωi observed variable and h known constant,
the inverse of Ω is diagonal with generic element h
i
−
ω .
Let’s define the L matrix, diagonal with generic element 2/h
i
−
ω .
The general principle at the basis of FGLS is the following.
Suppose we know Ω or we dispose of a consistent estimate Ωˆ .
In addition, Ωˆ is not singular, and it is possible to find a (K×N) matrix L such that
LΩˆ L′ = IN and L′L=Ωˆ -1
.
The specific form of the L matrix depends on the problem one has to tackle.
But the general principle is that of minimise an appropriately weighted average of squared errors,
with lower weights to the observations characterised by the higher residual variance.
Pre-multiply by L the heteroskedastic model: y = Xβ +ε and obtain
y* = X*β +ε*
where y* = Ly, X* = LX and ε*=Lε
Now it is true that:
E(ε*) = E(Lε) = LE(ε) = 0
E(ε*ε*′) = E(Lε ε′L′) = LE(εε′)L’ = σ2
LΩˆ L′ = σ2
IN.
Hence, the OLS estimator of the transformed model is best (minimum variance) and corresponds to
the FGLS estimator:
yXXXLyLXLXLXyXXXFGLS
11111 ˆ)ˆ(')'(**'*)*'(ˆ −−−−−
Ω′Ω′=′′==β .
The FGLS are BLUE despite the presence of heteroskedasticty (and/or autocorrelation); in other
terms, the Aitken Theorem applied to transformed data substitutes for the Gauss-Markov Theorem,
and, in particular, the Gauss-Markov theorem is a special case of the Aitken theorem for Ω = IN.
14
When Ω is known in the structure and in the parameters we directly are in the case of FGLS.
For example, in the cases of:
group wise heteroskedasticity;
autocorrelation in the MA(1) form when we estimated dynamic panel after taking first
difference to remove individual effects (note that, given the presence of lagged dependent
variable among regressors, we need to use IV+GLS = GIVE, generalized instrumental variable
estimator).
Weighted least squares (WLS) is a specific case of FGLS, used, for example, in presence of group
wise heteroskedasticity i.e. when we know that heteroskedasticity derives from how the data are
collected: we only dispose of averaged or aggregated data (by clusters which may be industries,
typologies of companies and so on). In this case Ω is known in its structure and parameters.
Some examples are in the Appendix.
Usually Ω is stochastic, known in the structure, but unknown in the parameters.
Thus, we talk of UGLS, Unfeasible GLS. Estimation is possible only after we dispose of Ωˆ , a
consistent estimate of the VCOV errors matrix; in this case UGLS become feasible (FGLS).
FGLS estimator is consistent and asymptotically efficient (the small sample properties are
unknown).
As examples:
constant autocorrelation inside the individual in panel data with random effects;
cross-correlation in seemingly unrelated regressions (SUR);
comfac models, i.e static models with AR(1) errors (this case is very specific and not very
realistic).
Note that in an autoregressive model with autocorrelated errors OLS are biased and not consistent,
and FGLS are not applicable, unless we estimate with instrumental variables (IV) in order to obtain
Ωˆ . This is the generalized IV (GIVE) or heteroskedastic 2SLS (two stages least squares):
yZXZLyLZLXLZGIVE
1111 ˆ)ˆ(')'(ˆ −−−−
Ω′Ω′=′′=β .
See more in lecture_IV.
An alternative is to augment the dynamics i.e. to re-specify the model.
15
Behavioural assumption in the consumption-income relationship: the error variance is a linear
function of income (redd1000) because wealthy people have a larger set of consumption options.
If this is true, then it is reasonable use such information in the estimation phase down-weighting the
observation corresponding to higher incomes because less informative about the regression line.
In fact, they are assumed to be more dispersed (higher variance) than those of poorer people.
From the model Ci = α + βRi + εi, where Var(εi) = σ2
i = σ2
Ri .
Hence is this case
0
00
0
0
00
0
)(
1
2
2
2
1
2222










=










==
NN
i
R
R
DiagΩ
L
O
L
L
O
L
σ
ω
ω
σωσσ .
If we scale all the variables by the root square of income, we obtain the transformed model:
i
i
i
ii
i
i
i
ii
i
u
R
R
RRR
R
RR
C
++=++= β
αε
β
α
,
where 222 11
)(
1
)( σσσε
ε
====








= i
i
i
i
i
ii
i
i R
RR
Var
RR
VaruVar , i.e. errors ui are homoskedastic.
Hence, in this case:
10
00
01
/10
00
0/1 11










=










=
NN R
R
L
L
O
L
L
O
L
ω
ω
WLS are efficient just because the higher-variance observations (i.e. those corresponding to richer
people) have less weight.5
5
If the model we suppose able to explain the heteroskedasticity is right, the FGLS are more efficient than robust OLS.
16
. reg cons1000 redd1000 [aweight=1/redd1000]
(sum of wgt is 1.0121e+02)
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 1, 98) = 3053.19
Model | 2623.25305 1 2623.25305 Prob > F = 0.0000
Residual | 84.2000896 98 .859184587 R-squared = 0.9689
-------------+------------------------------ Adj R-squared = 0.9686
Total | 2707.45314 99 27.3480115 Root MSE = .92692
------------------------------------------------------------------------------
cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
redd1000 | .7145188 .0129311 55.26 0.000 .6888574 .7401803
_cons | 4.914329 .0935686 52.52 0.000 4.728645 5.100013
------------------------------------------------------------------------------
aweight stands for analytical weights are inversely proportional to the variance of an
observation. These are automatically employed in models that use averages, e.g. in Between Effects
panel regression.
FGLS (WLS) can be reproduced by the following steps:
. g peso=1/redd1000^0.5
. g consp=cons1000*peso
. g reddp=redd1000*peso
. reg consp reddp peso, noconst
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 2, 98) = 3364.82
Model | 5852.17626 2 2926.08813 Prob > F = 0.0000
Residual | 85.2218967 98 .869611191 R-squared = 0.9856
-------------+------------------------------ Adj R-squared = 0.9854
Total | 5937.39815 100 59.3739815 Root MSE = .93253
------------------------------------------------------------------------------
consp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
reddp | .7145188 .0129311 55.26 0.000 .6888574 .7401803
peso | 4.914329 .0935686 52.52 0.000 4.728645 5.100013
------------------------------------------------------------------------------
. whitetst
White's general test statistic : 5.086438 Chi-sq( 5) P-value = .4054
. g reddp2=reddp^2
. bpagan reddp reddp2
Breusch-Pagan LM statistic: 1.265137 Chi-sq( 2) P-value = .5312
Opposite to the previous two heteroskedasticity tests, note that after a regression without the
constant term we cannot run the hettest command:
. hettest
not appropriate after regress, nocons
r(301);
17
Previous heteroskedasticy tests are run for didactical reasons only, just to see that the
heteroskedasticity is no more present in the weighted regression; of course heroskedasticity tests
are not performable after FGLS.
All the issues raised above can be summarised in a single table on order to fix ideas. In doing so,
we use the previous (we checked it is heteroskedastic) consumption function. To do so, we can
quietly run three regressions of interest, namely: (1) heteroskedastic OLS without White’s standard
errors correction; (2) heteroskedastic OLS without White’s standard errors correction; (3) WLS
assuming that the errors variance is a linear function of incomes:
. qui reg cons1000 redd1000
. est store OLS
. qui reg cons1000 redd1000, robust
. est store white
. qui reg cons1000 redd1000 [aweight=1/redd1000]
. est store WLS
. est table OLS white WLS , b(%6.3f) se(%6.3f) t(%6.2f) /*
*/ stats(N df_r df_m r2 r2_a rmse F)
--------------------------------------------
Variable | OLS white WLS
-------------+------------------------------
redd1000 | 0.700 0.700 0.715
| 0.022 0.025 0.013
| 32.19 28.28 55.26
_cons | 5.668 5.668 4.914
| 1.332 1.076 0.094
| 4.26 5.27 52.52
-------------+------------------------------
N | 100 100 100
df_r | 98.000 98.000 98.000
df_m | 1.000 1.000 1.000
r2 | 0.914 0.914 0.969
r2_a | 0.913 0.913 0.969
rmse | 6.666 6.666 0.927
F | 1036.496 799.785 3053.189
--------------------------------------------
legend: b/se/t
Discussion. In the context of a model with heteroskedastic errors, both OLS and WLS estimators
are unbiased and consistent, therefore all the estimates are fairly closed each other. The parameters’
standard errors estimated by OLS in the first column are biased (because of heteroskedastic errors),
while those in the second column are robust to heteroskedasticity (hence, reliable). However, being
WLS also efficient, the standard errors reported in the third column are remarkably lower than
those in the second column.
18
Appendix
A1. Averaged data
ccc Xy εβ +=
where c = 1,2,..,C is the number of groups (or clusters). Each group is composed by i = 1,2,..,Nc
individuals which are averaged.
Single individuals have homoskedastic errors, Var(εi) = σ2
∀ i=1,2,..,N, and are not cross-sectional
correlated, Cov(εi, εj) = 0 for i ≠ j.
However, the available observations are
∑=
=++=
c
c
N
i
i
c
N
c
c
NN 1
1
1
)..(
1
εεεε .
Hence variance is:
c
c
c
c
N
i
i
c
c
N
N
NN
EVar
c
2
2
2
2
2
1
11
)( σ
σ
σεε ===





= ∑=
i.e. the error variance decreases as the number of individuals within a cluster, Nc, increases. 6
/1000
000
0/10
000
00/1
0
00
0
)(=X)|'E(=X)|Var(
1
2
2
2
1
2222
















=
=










==
C
c
N
i
N
N
N
Diag
L
OO
LL
OO
LL
L
O
L
σ
ω
ω
σωσσεεε Ω
.
FGLS/WLS weight each observation with cN giving more weight to observations with higher
variance c
2
σ .
In particular, the L matrix is L=
















C
c
N
N
N
L
OO
LL
OO
LL
000
000
00
000
001
.
If we multiply all the variables by the root square of each group dimension, we obtain the
transformed model:
εβ LXLyL += that, looking at the cth
observation, corresponds to
cccccc NXNyN εβ += ,
where 2
2
)( σ
σ
ε ==
c
ccc
N
NNVar , i.e. transformed errors are homoskedastic.
6
Note that with this kind of data we loose the within groups variation and hence the estimates of paramteres are less
precise. However, the fit, R2
, improves because the variation of errors are averaged.
19
The OLS estimator of the transformed model is best (minimum variance) and corresponds to the
FGLS/WLS estimator:
yXXXLyLXLXLXWLSFGLS
1111
/ )(')'(ˆ −−−−
Ω′Ω′=′′=β .
A2. Aggregated data
ccc Xy εβ +=
where c = 1,2,..,C is the number of groups (or clusters), and each group is the sum of i = 1,2,..,Nc
individuals.
Our observations are
∑=
=
cN
i
ic
1
εε .
Hence variance is:
2
2
1
)( σεε c
N
i
ic NEVar
c
=





= ∑=
i.e. the error variance increases as the number of individuals within a cluster, Nc, increases; this is
true even if the covariance among individuals within the cluster is negative.
000
000
00
000
00
0
00
0
)(=X)|'E(=X)|Var(
1
2
2
2
1
2222
















=
=










==
C
c
N
i
N
N
N
Diag
L
OO
LL
OO
LL
L
O
L
σ
ω
ω
σωσσεεε Ω
FGLS/WLS weight each observation with 1/ cN giving more weight to observations with higher
variance c
2
σ .
In particular, the L matrix is L=
















C
c
N
N
N
/1000
000
0/10
000
00/1 1
L
OO
LL
OO
LL
.
If we scale all the variables by the root square of each group dimension, we obtain the transformed
model:
εβ LXLyL += that, looking at the c observation, corresponds to
c
c
c
c
c
c N
X
N
y
N
εβ
111
+= ,
where 221
)
1
( σσε == c
c
c
c
N
NN
Var , i.e. transformed errors are homoskedastic.
20
A3. Some hints on panel data
To conclude this lecture and to add useful information especially in the panel data context, we
compare three OLS estimators with different correction of the standard errors that are available in
the regress command.
(1) No correction of the standard errors or homoskedastic estimator (regress)
1212111
)()()()ˆ()()ˆ( −−−−−
′=′′′=′′′= X)X(sXXXsXXXXXXVarXXXVar OLS εβ ,
where ∑
=−
=
N
i
i
KN
s
1
22
ˆ
1
ε .
(2) Heteroskedastic-consistent estimator (regress, robust)
1
1
211
1
1
ˆˆˆ)ˆ( −
=
−−
=
−
′




 ′′=′′′= ∑∑ X)X(XXεX)X(X)X()Xε)(Xε(X)X(Var
N
i
iii
N
i
iiiirobustβ .
where the center of the sandwich is sometimes multiplied by N/(N-K) as a degree-of-freedom
adjustment for finite-sample.
(3) Estimator that accounts for clustering into groups, with observations correlated within groups,
but independent between groups [regress, cluster(name_groups)]
1
1
1
ˆˆ)ˆ( −
=
−
′′′= ∑ X)X(uuX)X(Var
CN
c
ccclusterβ ,
where we have c = 1, 2, ..., NC clusters and ∑
∈
=
ci
iic Xu εˆˆ is the sum of observations within each
cluster c; the center of the sandwich is sometimes multiplied by (N-1)/(N-K)× Nc /( Nc -1) as finite-
sample adjustment.
Note that cluster implies robust option. The formula for the clustered estimator is simply that
of the robust (unclustered) estimator with the individual ii Xεˆ replaced by their sums over each
cluster. In other terms, the standard errors are computed based on aggregate y for the Nc
independent groups.
If the variance of the clustered estimator (3) is smaller than that of the robust (unclustered)
estimator (2), it means that the cluster sums of ii Xεˆ have less variability than the individual ii Xεˆ .
That is, when we sum the ii Xεˆ within a cluster, some of the variation gets cancelled out, and the
total variation is smaller.
This means that a big positive is summed with a big negative to produce something small; in other
words, there is negative correlation within cluster.
If the number of clusters is very small compared to the overall sample size, it could be that the
clustered standard errors (3) are quite larger than the homoskedastic ones (1), because they are
computed on aggregate data for few groups.
Interpreting a difference between (1) the OLS estimator and (2) or (3) is trickier.
In (1) the squared residuals are summed, but in (2) and (3) the residuals are multiplied by the X's
(then for (3) summed within cluster) and then “squared” and summed.
Hence, any difference between them has to do with very complicated relationships between the
residuals and the X's.
21
If big (in absolute value) iεˆ are paired with big Xi, then the robust variance estimate will be bigger
than the OLS estimate.
On the other hand, if the robust variance estimate is smaller than the OLS estimate, it is not clear at
all what is happening (in any case, it has to do with some odd correlations between the residuals
and the X's.
Note that if the OLS model is true, the residuals should, of course, be uncorrelated with the X's.
Indeed, if all the assumptions of the OLS model are true, then the expected values of (1) the OLS
estimator and (2) the robust (unclustered) estimator are approximately the same. So, if the robust
(unclustered) estimates are just a little smaller than the OLS estimates, it may be that the OLS
assumptions are true and we are seeing a bit of random variation. If the robust (unclustered)
estimates are much smaller than the OLS estimates, then either we are seeing a lot of random
variation (which is possible, but unlikely), or else there is something odd going on between the
residuals and the X's.

More Related Content

What's hot

Heteroscedasticity | Eonomics
Heteroscedasticity | EonomicsHeteroscedasticity | Eonomics
Heteroscedasticity | Eonomics
Transweb Global Inc
 
Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
Muhammad Ali
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
Muhammad Ali
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticityhalimuth
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
GunjanKhandelwal13
 
Econometrics ch3
Econometrics ch3Econometrics ch3
Econometrics ch3
Baterdene Batchuluun
 
ders 3 Unit root test.pptx
ders 3 Unit root test.pptxders 3 Unit root test.pptx
ders 3 Unit root test.pptx
Ergin Akalpler
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
Geethu Rangan
 
Chapter 06 - Heteroskedasticity.pptx
Chapter 06 - Heteroskedasticity.pptxChapter 06 - Heteroskedasticity.pptx
Chapter 06 - Heteroskedasticity.pptx
Farah Amir
 
regression assumption by Ammara Aftab
regression assumption by Ammara Aftabregression assumption by Ammara Aftab
regression assumption by Ammara AftabUniversity of Karachi
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
Pawan Kawan
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
PatilDevendra5
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regressiondessybudiyanti
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Dummy variables
Dummy variablesDummy variables
Dummy variables
Irfan Hussain
 
Eco Basic 1 8
Eco Basic 1 8Eco Basic 1 8
Eco Basic 1 8kit11229
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and Consequences
Shilpa Chaudhary
 
Identification problem in simultaneous equations model
Identification problem in simultaneous equations modelIdentification problem in simultaneous equations model
Identification problem in simultaneous equations model
GarimaGupta229
 
Dummy variable
Dummy variableDummy variable
Dummy variableAkram Ali
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.ppt
Ergin Akalpler
 

What's hot (20)

Heteroscedasticity | Eonomics
Heteroscedasticity | EonomicsHeteroscedasticity | Eonomics
Heteroscedasticity | Eonomics
 
Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticity
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Econometrics ch3
Econometrics ch3Econometrics ch3
Econometrics ch3
 
ders 3 Unit root test.pptx
ders 3 Unit root test.pptxders 3 Unit root test.pptx
ders 3 Unit root test.pptx
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Chapter 06 - Heteroskedasticity.pptx
Chapter 06 - Heteroskedasticity.pptxChapter 06 - Heteroskedasticity.pptx
Chapter 06 - Heteroskedasticity.pptx
 
regression assumption by Ammara Aftab
regression assumption by Ammara Aftabregression assumption by Ammara Aftab
regression assumption by Ammara Aftab
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Dummy variables
Dummy variablesDummy variables
Dummy variables
 
Eco Basic 1 8
Eco Basic 1 8Eco Basic 1 8
Eco Basic 1 8
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and Consequences
 
Identification problem in simultaneous equations model
Identification problem in simultaneous equations modelIdentification problem in simultaneous equations model
Identification problem in simultaneous equations model
 
Dummy variable
Dummy variableDummy variable
Dummy variable
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.ppt
 

Viewers also liked

Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Jonathan Zimmermann
 
Chapter8
Chapter8Chapter8
Chapter8
Vu Vo
 
Lorenz Curves
Lorenz CurvesLorenz Curves
Lorenz Curves
siriporn pongvinyoo
 
Gini coefficient
Gini coefficientGini coefficient
Gini coefficient
laysheng1995
 
Structured equation model
Structured equation modelStructured equation model
Structured equation model
King Abidi
 
Chapt 11 & 12 linear & multiple regression minitab
Chapt 11 & 12 linear &  multiple regression minitabChapt 11 & 12 linear &  multiple regression minitab
Chapt 11 & 12 linear & multiple regression minitabBoyu Deng
 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
Elkana Rorio
 
Tut10 heteroskedasticity
Tut10 heteroskedasticityTut10 heteroskedasticity
Tut9 multicollinearity
Tut9 multicollinearityTut9 multicollinearity
Lorenz curve block
Lorenz curve blockLorenz curve block
Lorenz curve blockTravis Klein
 

Viewers also liked (13)

20120140503019
2012014050301920120140503019
20120140503019
 
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
 
Chapter8
Chapter8Chapter8
Chapter8
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticity
 
Lorenz Curves
Lorenz CurvesLorenz Curves
Lorenz Curves
 
Gini coefficient
Gini coefficientGini coefficient
Gini coefficient
 
Structured equation model
Structured equation modelStructured equation model
Structured equation model
 
Chapt 11 & 12 linear & multiple regression minitab
Chapt 11 & 12 linear &  multiple regression minitabChapt 11 & 12 linear &  multiple regression minitab
Chapt 11 & 12 linear & multiple regression minitab
 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
 
Income inequality
Income inequalityIncome inequality
Income inequality
 
Tut10 heteroskedasticity
Tut10 heteroskedasticityTut10 heteroskedasticity
Tut10 heteroskedasticity
 
Tut9 multicollinearity
Tut9 multicollinearityTut9 multicollinearity
Tut9 multicollinearity
 
Lorenz curve block
Lorenz curve blockLorenz curve block
Lorenz curve block
 

Similar to gls

Talk 4
Talk 4Talk 4
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
Muhammad Ali
 
Temporal disaggregation methods
Temporal disaggregation methodsTemporal disaggregation methods
Temporal disaggregation methodsStephen Bradley
 
econometría pruebas especificación
econometría pruebas especificacióneconometría pruebas especificación
econometría pruebas especificación
JamesMAlvaradoTolent
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
Ryan Herzog
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and ApplicationUniversity of Salerno
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
Michael770443
 
Regression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with exampleRegression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with example
shivshankarshiva98
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
TanyaWadhwani4
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
Antoine De Henau
 
Regression
RegressionRegression
Regression
RegressionRegression
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
Dr Athar Khan
 
Introduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysisIntroduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysisSpringer
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
DevendraRavindraPati
 
一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单
一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单
一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单
ynlsmv4ja
 
一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单
一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单
一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单
ynlsmv4ja
 
一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单
一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单
一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单
ynlsmv4ja
 

Similar to gls (20)

Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
 
Talk 4
Talk 4Talk 4
Talk 4
 
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
 
Temporal disaggregation methods
Temporal disaggregation methodsTemporal disaggregation methods
Temporal disaggregation methods
 
econometría pruebas especificación
econometría pruebas especificacióneconometría pruebas especificación
econometría pruebas especificación
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with exampleRegression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with example
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Regression
RegressionRegression
Regression
 
Regression
RegressionRegression
Regression
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
Introduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysisIntroduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysis
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单
一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单
一比一原版(Otago毕业证书)新西兰奥塔哥大学毕业证成绩单
 
一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单
一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单
一比一原版(RoyalVeterinary毕业证书)皇家兽医学院毕业证成绩单
 
一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单
一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单
一比一原版(UCD毕业证书)爱尔兰都柏林大学毕业证成绩单
 

Recently uploaded

Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Arihant Webtech Pvt. Ltd
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
techboxsqauremedia
 
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
BBPMedia1
 
Cracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptxCracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptx
Workforce Group
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Lviv Startup Club
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
SynapseIndia
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
dylandmeas
 
What is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdfWhat is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdf
seoforlegalpillers
 
Recruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media MasterclassRecruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media Masterclass
LuanWise
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
taqyed
 
Authentically Social Presented by Corey Perlman
Authentically Social Presented by Corey PerlmanAuthentically Social Presented by Corey Perlman
Authentically Social Presented by Corey Perlman
Corey Perlman, Social Media Speaker and Consultant
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
creerey
 
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
dylandmeas
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
Norma Mushkat Gaffin
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
agatadrynko
 
LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
Lital Barkan
 
Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
zoyaansari11365
 
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
BBPMedia1
 
Improving profitability for small business
Improving profitability for small businessImproving profitability for small business
Improving profitability for small business
Ben Wann
 
The key differences between the MDR and IVDR in the EU
The key differences between the MDR and IVDR in the EUThe key differences between the MDR and IVDR in the EU
The key differences between the MDR and IVDR in the EU
Allensmith572606
 

Recently uploaded (20)

Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
 
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
 
Cracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptxCracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptx
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
 
What is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdfWhat is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdf
 
Recruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media MasterclassRecruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media Masterclass
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Authentically Social Presented by Corey Perlman
Authentically Social Presented by Corey PerlmanAuthentically Social Presented by Corey Perlman
Authentically Social Presented by Corey Perlman
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
 
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
 
LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
 
Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
 
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
 
Improving profitability for small business
Improving profitability for small businessImproving profitability for small business
Improving profitability for small business
 
The key differences between the MDR and IVDR in the EU
The key differences between the MDR and IVDR in the EUThe key differences between the MDR and IVDR in the EU
The key differences between the MDR and IVDR in the EU
 

gls

  • 1. 1 ☺☺AA LECTURE6_GLS HETEROSKEDASTICITY – GLS (WLS) ESTIMATORS – WHITE CORRECTION Maria Elena Bontempi mariaelena.bontempi@unibo.it Roberto Golinelli roberto.golinelli@unibo.it 07/11/2012 Preliminary; comments welcome 1. Introduction The main assumption about the classical regression model errors is that they are identically and independently distributed with mean equal to zero, in symbols: ε ~ iid(0, σ2 ). The E(ε) = 0 assumption is perfectly represented by OLS residuals that always sum to zero by definition, provided that model’s specification includes the intercept. The assumption of independently distributed errors (errors belonging to different observation are not related each other) is not easily checked in cross-sections, given that there is not an obvious way in which cross-section observations have to be ordered (listed). In this context, an appropriate sampling design (random sampling) may prevent the insurgence of the problem. On the other side, the assessment of errors being independently distributed is crucial in time series. The assumption of identically distributed errors usually is no longer valid in cross-section data, characterised by relevant variability. Errors’ heteroskedasticity is the most common problem: often the errors variance seems not constant over different observations. In this case, the assumption of identically distributed errors is no longer valid. If iid assumption is valid we have that: On average the regression line is correct NiE i ,...,1,0)( =∀=ε identically distributed ε∼ )Iiid N 2 ,0( σ Homoskedasticity NiXVarX ii ,...,1,)|()|(E 22 =∀== σεε Non cross-correlation ji,0)X|,(Cov)X|(E jiji ≠≠≠≠∀∀∀∀======== εεεε independently distributed where σ2 IN is the VCOV matrix of errors, equal to E(εε′) = 0 00 0 10 00 01 2 2 22           =           = σ σ σσ L O L L O L NI . In other terms, the VCOV matrix is a scalar matrix, i.e. a diagonal matrix whose diagonal elements are all equal. We can compute the variance of the estimator βˆ by (exogeneity assumed): [ ] [ ] 1 1 212 1111 ')( )()|()(|)(|)()|ˆ( − = − −−−−       =′= =′′′=′′=+′′= ∑ N i ii XXXX XXXXVarXXXXXXXVarXXXXVarXVar σσ εεβεβ
  • 2. 2 where Xi is the (K×1) vector of explanatory variables for observation i. In cross-section (and panel data) the homoskedasticity assumption is rarely satisfied. For example, in cross-sectional data, it is hard to suppose that the consumption variability around its mean is constant independently from the income level. Instead, rich person could have variegated interests, tastes, and consumption opportunities: this makes consumption variance be higher at high income levels. Non spherical errors can be characterized by heteroskedasticity i.e. the errors variance is not constant over different observations: h iii XVar ωσσε 22 )|( == . In matrix notation, we can write: ε|X)Var( =E(εε’|X)= ΩDiagDiag h N h h i N i 2 1 22 2 2 1 2 0 00 0 )( 0 00 0 )( σ ω ω σωσ σ σ σ =           ==           = L O L L O L . where Ω is a positive definite matrix, not necessarily scalar. Hence, it may be necessary to estimate N additional parameters (the parameters along the main diagonal). In presence of heteroskedasticity OLSβˆ is unbiased (unbiasedness is based on linearity and exogeneity) but not efficient: 12121121 11 )()()()()()( )()|()())ˆ)(ˆ(()|ˆ( −−−−− −− ′≠′′′=′Ω′′= =′′′=′−−= XXXXXDiagXXXXXXXXX XXXXVarXXXEXVar i σσσ εβββββ . In particular, the variance of βˆ is higher than 12 )( − ′XXσ (homoskedastic case) by the positive definite matrix XXXX Ω′′ −1 )( . Moreover, the MSE, s2 , is a biased estimator of σ2 : [ ] 2 2 2 2 )()( 1 )]([ 1 )( 1)(ˆˆ )( σ σ σ εεεε εεεεεε ≠Ω − =Ω − = =′ − =′ − =      − ′ =      − ′ =      − ′ = Mtr KN Mtr KN MEtr KN MEtr KNKN Mtr E KN M E KN EsE . where M = I -PX = I - X(X'X)-1 X' is the matrix projecting Y upon the space orthogonal to the one spanned by the columns of X: εˆ = Y - Yˆ =Y - X βˆ = Y - X (X'X)-1 X'Y = (I-PX)Y= MXY The matrix M is symmetric (M' = M), idempotent (M M = M), with rank(M) = tr(M) = N-K. Hence, the variance of OLSβˆ is biased because the weighing matrix is no more 1 )( − ′XX and because s2 is a biased estimator of σ2 . As a consequence, inference (test t and F) is not correct: statistic-test do not have the standard distribution; standard confidence regions are no more valid.
  • 3. 3 Consider the following example. use GLS_data, clear descr Contains data obs: 100 vars: 3 16 Nov 2004 18:25 size: 1,300 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- obs byte %8.0g families cons1000 float %9.0g consumption in 2003 at constant prices redd1000 float %9.0g income in 2002 at constant prices ------------------------------------------------------------------------------- The idea of explaining the consumption with the income in the previous year predetermines the dynamic relationship in a quite restrictive way but at the advantage of avoiding consumption- income simultaneity and endogeneity problems. Scatterplot tells us that the consumption variability growths with the level of income: richer people behave in different ways. This fact per se implies the likely heteroschedasticity of the linear model residuals. . graph7 cons1000 redd1000, ylabel xlabel CONS1000 REDD1000 0 50 100 150 0 50 100 Keynes’s (linear) consumption function . reg cons1000 redd1000 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1036.50 Model | 46059.3208 1 46059.3208 Prob > F = 0.0000 Residual | 4354.87802 98 44.4375308 R-squared = 0.9136 -------------+------------------------------ Adj R-squared = 0.9127 Total | 50414.1988 99 509.234332 Root MSE = 6.6661 ------------------------------------------------------------------------------ cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- redd1000 | .7002875 .0217517 32.19 0.000 .6571221 .743453 _cons | 5.668498 1.331578 4.26 0.000 3.026025 8.310971 ------------------------------------------------------------------------------
  • 4. 4 2. Heteroskedasticity tests Graphical analysis represents a first step towards discovering whether heteroskedasticity is present. We are supposing that the error variance is a function of income: version 7: rvfplot, oneway twoway box ylabel xlabel yline(0) Residuals Fitted values 0 50 100 -20 -10 0 10 20
  • 5. 5 Heteroskedasticity tests verify the hypothesis H0: VAR(εi) = σ2 , ∀ i = 1, .., N. In general, the tests use auxiliary regressions in the form iii uZf +′= )(ˆ2 αε , where ui ∼ iid(0, 2 uσ ), α and Zi are V×1 vectors, with V number of variables in Z (and of associated parameters αs) used to explain the error variance; for this Zi are called the variance indicator variables. The null hypothesis to be tested becomes H0: α=0. What about the alternative hypothesis, H1? Non-constant variance implies that specific variance behaviours must be assumed. Under the alternative, the form of the detected heteroskedasticity depends on the choice of the explanatory indicators Zi. The test is conditional on a set of variables which are presumed to influence the error variance: fitted values, explanatory variables, any other variable presumed to influence the error variance (for example, in the financial time-series setting, Engle (1982) proposes an ARCH test, for autoregressive conditional heteroskedasticity: tttt u+++= −− ...ˆˆˆ 2 2 21 2 1 2 εαεαε ). The statistic is computed as either the F (small samples) or LM (large samples) for the overall significance of the independent variables in explaining 2 ˆiε . The F statistic is )1/()1( / 2 2 −−− VNR VR a a , where Ra 2 is the R-squared of the auxiliary regression. The LM statistic is just the sample size times the R-squared of the auxiliary regression; under the null, it is distributed asymptotically as 2 Vχ .
  • 6. 6 A first form of the test is the Breusch-Pagan (1979) (Breusch-Pagan (1979), Godfrey (1978), and Cook-Weisberg (1983) separately derived the same test statistic). It is a Lagrange multiplier test for heteroskedasticity in the error distribution. It is the most general test, even if it is not powerful and it is sensitive to the assumption of error normally distributed (this is the assumption of the original formulation; see below for a change in this assumption). The Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of freedom. It is obtained by the following steps: 1) run the model regression and define the dependent variable of the Breusch-Pagan auxiliary regression ∑= = N i i i i N g 1 2 2 ˆ 1 ˆ ε ε ; 1 2) run the auxiliary regression iii uZg +′+= αα0 and obtain the BP statistic as BP = MSS/Mdf This test can verify whether heteroskedasticity is conditional on any list of Zi variables, which are presumed to influence the error variance (i.e. variance-indicators); they can be the fitted value, or the explanatory variables of the model, or any variables you think they can be affect the errors’ variance. The trade-off in the choice of indicator variables in these tests is that a smaller set of indicator variables will preserve degrees of freedom, at the cost of being unable to detect heteroskedasticity in certain directions. 1 For this, Breusch-Pagan (1979, p. 1293) say: “… the quantity ig is of some importance in tests of heteroskedasticity. Thus, if one is going to plot any quantity, it would seem more reasonable to plot ig than 2 ˆiε .”. By dividing for the mean, residuals are normalised: under the null there are not noise terms that can affect the chi-squared distribution; it is possible to use every variable you think is useful in explaining heteroskedasticity.
  • 7. 7 A second form of the heteroskedasticity test is the very often reported White (1980) test for heteroskedasticity. It is based on a different auxiliary regression where the squared residuals are regressed on the model regressors, all their squares, and all their possible (not redundant) cross products. The asymptotic chi-squared White test-statistic is obtained by the product of the number of observations times the R-squared of the auxiliary regression. The F-version for small samples is obtained by setting to zero all the explanatory variables of the auxiliary regression (i.e. by looking at the F-test for the overall significance of the auxiliary regression). We have several commands to execute these heteroskedasticity tests. Suppose the model yi = α + β1X1i + β2X2i + εi, Different possibilities for the heteroskedasticity test are summarized in the following table. Breusch-Pagan White Fitted variable hettest X1 X2 hettest, rhv bpagan X1 X2 ivhettest, all (ivlev) (output Breusch- Pagan/Godfrey/Cook- Weisberg) X1 X2 X2 1 X2 2 X1×X2 hettest X1 X2 X2 1 X2 2 X1×X2 bpagan X1 X2 X2 1 X2 2 X1×X2 ivhettest, all ivcp (output Breusch-Pagan/Godfrey/Cook- Weisberg) hettest X1 X2 X2 1 X2 2 X1×X2, iid whitetst ivhettest, ivcp (output White/Koenker nR2 test statistic) NOTE: the command hettest is not appropriate after regress, nocons
  • 8. 8 For example, if we suppose that in our simple consumption model the levels of income and their squares are both valid variance indicators, we can test for heteroskedasticity in the following way: . g redd2=redd1000^2 . hettest redd1000 redd2 Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: redd1000 redd2 chi2(2) = 25.00 Prob > chi2 = 0.0000 The same result can be obtained by applying a procedure, written by C. F. Baum and V. Wiggins, that specifically run the Breusch-Pagan (1979) test for heteroskedasticity conditional on a set of variables. . bpagan redd1000 redd2 Breusch-Pagan LM statistic: 25.0018 Chi-sq( 2) P-value = 3.7e-06
  • 9. 9 In general, the Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of freedom (in the latter example V=2). The statistic above may be replicated with the following steps. 1) Compute the dependent variable of the Breusch-Pagan auxiliary regression: . reg cons1000 redd1000 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1036.50 Model | 46059.3208 1 46059.3208 Prob > F = 0.0000 Residual | 4354.87802 98 44.4375308 R-squared = 0.9136 -------------+------------------------------ Adj R-squared = 0.9127 Total | 50414.1988 99 509.234332 Root MSE = 6.6661 ------------------------------------------------------------------------------ cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- redd1000 | .7002875 .0217517 32.19 0.000 .6571221 .743453 _cons | 5.668498 1.331578 4.26 0.000 3.026025 8.310971 ------------------------------------------------------------------------------ . predict res, resid . g BP_g= res^2/(e(rss)/e(N)) where e(rss)=4354.87802 and e(N)=100 are post estimation results corresponding, respectively, to the residual sum of squares and to the total number of observations. 2) Run the Breusch-Pagan auxiliary regression and comute the test statistic and/or its P-value: . reg BP_g redd1000 redd2 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 2, 97) = 12.90 Model | 50.0036064 2 25.0018032 Prob > F = 0.0000 Residual | 188.030708 97 1.93846091 R-squared = 0.2101 -------------+------------------------------ Adj R-squared = 0.1938 Total | 238.034314 99 2.40438701 Root MSE = 1.3923 ------------------------------------------------------------------------------ BP_g | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- redd1000 | -.0092459 .0167538 -0.55 0.582 -.0424975 .0240058 redd2 | .0002848 .0001499 1.90 0.060 -.0000127 .0005823 _cons | .4226007 .4038909 1.05 0.298 -.3790109 1.224212 ------------------------------------------------------------------------------ . di e(mss)/e(df_m) 25.001803 where e(mss)=50.0036064 and e(df_m)=2 are post estimation results corresponding, respectively, to the model sum of squares and to the model degrees of freedom of the auxiliary regression. The P-value of the test is obtained as: . display chi2tail(2,e(mss)/e(df_m)) 3.723e-06
  • 10. 10 The White test can be performed in several ways, the easiest is to run a procedure, always written by Baum and Cox, that automatically computes the asymptotic version of the White test. . qui reg cons1000 redd1000 . whitetst White's general test statistic : 21.00689 Chi-sq( 2) P-value = 2.7e-05 This result may be replicated with the following steps. 1) Compute the dependent variable of the White auxiliary regression: . g res2=res^2 2) Run the White auxiliary regression (remember that we have only one explanatory variable): . reg res2 redd1000 redd2 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 2, 97) = 12.90 Model | 94831.6529 2 47415.8264 Prob > F = 0.0000 Residual | 356599.534 97 3676.28386 R-squared = 0.2101 -------------+------------------------------ Adj R-squared = 0.1938 Total | 451431.187 99 4559.91098 Root MSE = 60.632 ------------------------------------------------------------------------------ res2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- redd1000 | -.4026459 .7296083 -0.55 0.582 -1.850716 1.045424 redd2 | .0124035 .0065273 1.90 0.060 -.0005513 .0253583 _cons | 18.40375 17.58896 1.05 0.298 -16.50546 53.31296 ------------------------------------------------------------------------------ The White test-statistic and its P-value in the asymptotic version of the test The LM test-statistic for heteroskedasticity is just the sample size N times the R-squared of the auxiliary regression: . di e(N)*e(r2) 21.00689 where e(N)=100 and e(r2)=0.2101 are post estimation results corresponding, respectively, to the total number of observations and to the R-squared of the auxiliary regression. The P-value of the test is obtained as: . display chi2tail(2,e(N)*e(r2)) .00002744 The F version of the White test for small samples2 . testparm redd1000 redd2 ( 1) redd1000 = 0.0 ( 2) redd2 = 0.0 F( 2, 97) = 12.90 Prob > F = 0.0000 2 This command can be used also in the Breusch-Pagan auxiliary regression; of course the results in the two tests coincide.
  • 11. 11 Note that: . qui reg cons1000 redd1000 . hettest redd1000 red2, iid Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: redd1000 red2 chi2(2) = 21.01 Prob > chi2 = 0.0000 The Breusch-Pagan (1979) test from the hettest command is numerically equal to the White (1980) test for heteroskedasticity, if the same White’s auxiliary regression is specified and the option iid is used. Differently from the default of hettest and from bpagan, that compute the original Breusch-Pagan test assuming that the regression disturbances are normally distributed, the option iid causes hettest to compute the NR2 version of the score test, which drops the normality assumption.3 A useful command that, despite its name, also works after OLS and performs both previous tests is: . ivhettest, all ivcp OLS heteroskedasticity test(s) using levels and cross products of all IVs Ho: Disturbance is homoskedastic White/Koenker nR2 test statistic : 21.007 Chi-sq(2) P-value = 0.0000 Breusch-Pagan/Godfrey/Cook-Weisberg : 25.002 Chi-sq(2) P-value = 0.0000 Note that if you write hettest only, the residual variance is assumed to depend on the fitted values (i.e. ii yZ ˆ≡ , and V=1); if you use the option ,rhs the residual variance is assumed to depend on the explanatory variables of the model (in our case of one explanatory variable these two tests coincide). . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: redd1000 chi2(1) = 21.50 Prob > chi2 = 0.0000 . bpagan redd1000 Breusch-Pagan LM statistic: 21.50193 Chi-sq( 1) P-value = 3.5e-06 . ivhettest, all OLS heteroskedasticity test(s) using levels of IVs only Ho: Disturbance is homoskedastic White/Koenker nR2 test statistic : 18.066 Chi-sq(1) P-value = 0.0000 4 Breusch-Pagan/Godfrey/Cook-Weisberg : 21.502 Chi-sq(1) P-value = 0.0000 3 Koenker (1981) showed that when the assumption of normality is removed, a version of the test is available that can be calculated as the sample size N times the centered R-squared from an artificial regression of the squared residuals from the original regression on the indicator variables. 4 This test is the Breusch-Pagan without the normality assumption.
  • 12. 12 3. How accounting for heteroskedasticity? 3.1. Heteroskedasticity-consistent estimates of the standard errors A first way to account for heteroskedasticity is that of estimating model’s parameters by OLS (if the Keynesian model is correctly specified, OLS estimator is unbiased and consistent, even if not efficient due to heteroskedasticity), and of correcting the OLS estimates of the standard errors (biased). To do so, consistent standard errors are needed. The robust option of the regress Stata command specifies that the Eicker (1967)/Huber (1973)/White (1980) sandwich estimator of variance is used instead of the traditional OLS error variance estimator; inference is heteroskedasticity-robust. In particular, White (1980) argues that it is not necessary to estimate all 2 iσ s, but that we simply need a consistent estimator of the (K×K) matrix ∑∑∑∑==== ====′′′′====′′′′====′′′′′′′′ N 1i ii 2 i 2 i 2 'XXX)(DiagXXXX)(EX σσΩσεε . If we define as Xi the (K×1) vector of explanatory variables for observation i, a consistent estimator can be obtained as ∑ = = ′′ N i iii XX NN XX 1 2 'ˆ 1ˆˆ ε εε , where iˆε is the OLS residual and plim XX N XˆˆX 2 Ωσ εε ′′′′==== ′′′′′′′′ . Thus, the “sandwich”: 1 11 2 1 1 1 1 21 ''ˆ')('ˆ)()ˆ( − == − = − = −             =′′= ∑∑∑∑ N i ii N i iii N i ii N i iii XXXXXXXXXXXXVar εεβ can be used as an estimate of the true variance of the OLS estimator. In our case above, after we detected residuals heteroskedasticity, under the assumption that the other assumptions about our keynesian model hold, we can obtain consistent standard errors using a very simple option: . reg cons1000 redd1000, robust Regression with robust standard errors Number of obs = 100 F( 1, 98) = 799.78 Prob > F = 0.0000 R-squared = 0.9136 Root MSE = 6.6661 ------------------------------------------------------------------------------ | Robust cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- redd1000 | .7002875 .0247622 28.28 0.000 .6511477 .7494274 _cons | 5.668498 1.076363 5.27 0.000 3.532492 7.804505 ------------------------------------------------------------------------------ NOTE: parameter estimates (with and without standard errors correction) are identical: the White correction does not modify the parameters’ estimates.
  • 13. 13 3.2. Feasible generalised least squares (FGLS) If we have some idea about the heteroskedasticity determinants, we can introduce a different estimator: FGLS (feasible generalised least squares), the efficient estimator in the context of heteroskedastic errors (remember: OLS is only consistent but inefficient because does not account for the heteroskedastic behaviour of errors). If h iii XVar ωσσε 22 )|( == , with ωi observed variable and h known constant, the inverse of Ω is diagonal with generic element h i − ω . Let’s define the L matrix, diagonal with generic element 2/h i − ω . The general principle at the basis of FGLS is the following. Suppose we know Ω or we dispose of a consistent estimate Ωˆ . In addition, Ωˆ is not singular, and it is possible to find a (K×N) matrix L such that LΩˆ L′ = IN and L′L=Ωˆ -1 . The specific form of the L matrix depends on the problem one has to tackle. But the general principle is that of minimise an appropriately weighted average of squared errors, with lower weights to the observations characterised by the higher residual variance. Pre-multiply by L the heteroskedastic model: y = Xβ +ε and obtain y* = X*β +ε* where y* = Ly, X* = LX and ε*=Lε Now it is true that: E(ε*) = E(Lε) = LE(ε) = 0 E(ε*ε*′) = E(Lε ε′L′) = LE(εε′)L’ = σ2 LΩˆ L′ = σ2 IN. Hence, the OLS estimator of the transformed model is best (minimum variance) and corresponds to the FGLS estimator: yXXXLyLXLXLXyXXXFGLS 11111 ˆ)ˆ(')'(**'*)*'(ˆ −−−−− Ω′Ω′=′′==β . The FGLS are BLUE despite the presence of heteroskedasticty (and/or autocorrelation); in other terms, the Aitken Theorem applied to transformed data substitutes for the Gauss-Markov Theorem, and, in particular, the Gauss-Markov theorem is a special case of the Aitken theorem for Ω = IN.
  • 14. 14 When Ω is known in the structure and in the parameters we directly are in the case of FGLS. For example, in the cases of: group wise heteroskedasticity; autocorrelation in the MA(1) form when we estimated dynamic panel after taking first difference to remove individual effects (note that, given the presence of lagged dependent variable among regressors, we need to use IV+GLS = GIVE, generalized instrumental variable estimator). Weighted least squares (WLS) is a specific case of FGLS, used, for example, in presence of group wise heteroskedasticity i.e. when we know that heteroskedasticity derives from how the data are collected: we only dispose of averaged or aggregated data (by clusters which may be industries, typologies of companies and so on). In this case Ω is known in its structure and parameters. Some examples are in the Appendix. Usually Ω is stochastic, known in the structure, but unknown in the parameters. Thus, we talk of UGLS, Unfeasible GLS. Estimation is possible only after we dispose of Ωˆ , a consistent estimate of the VCOV errors matrix; in this case UGLS become feasible (FGLS). FGLS estimator is consistent and asymptotically efficient (the small sample properties are unknown). As examples: constant autocorrelation inside the individual in panel data with random effects; cross-correlation in seemingly unrelated regressions (SUR); comfac models, i.e static models with AR(1) errors (this case is very specific and not very realistic). Note that in an autoregressive model with autocorrelated errors OLS are biased and not consistent, and FGLS are not applicable, unless we estimate with instrumental variables (IV) in order to obtain Ωˆ . This is the generalized IV (GIVE) or heteroskedastic 2SLS (two stages least squares): yZXZLyLZLXLZGIVE 1111 ˆ)ˆ(')'(ˆ −−−− Ω′Ω′=′′=β . See more in lecture_IV. An alternative is to augment the dynamics i.e. to re-specify the model.
  • 15. 15 Behavioural assumption in the consumption-income relationship: the error variance is a linear function of income (redd1000) because wealthy people have a larger set of consumption options. If this is true, then it is reasonable use such information in the estimation phase down-weighting the observation corresponding to higher incomes because less informative about the regression line. In fact, they are assumed to be more dispersed (higher variance) than those of poorer people. From the model Ci = α + βRi + εi, where Var(εi) = σ2 i = σ2 Ri . Hence is this case 0 00 0 0 00 0 )( 1 2 2 2 1 2222           =           == NN i R R DiagΩ L O L L O L σ ω ω σωσσ . If we scale all the variables by the root square of income, we obtain the transformed model: i i i ii i i i ii i u R R RRR R RR C ++=++= β αε β α , where 222 11 )( 1 )( σσσε ε ====         = i i i i i ii i i R RR Var RR VaruVar , i.e. errors ui are homoskedastic. Hence, in this case: 10 00 01 /10 00 0/1 11           =           = NN R R L L O L L O L ω ω WLS are efficient just because the higher-variance observations (i.e. those corresponding to richer people) have less weight.5 5 If the model we suppose able to explain the heteroskedasticity is right, the FGLS are more efficient than robust OLS.
  • 16. 16 . reg cons1000 redd1000 [aweight=1/redd1000] (sum of wgt is 1.0121e+02) Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 3053.19 Model | 2623.25305 1 2623.25305 Prob > F = 0.0000 Residual | 84.2000896 98 .859184587 R-squared = 0.9689 -------------+------------------------------ Adj R-squared = 0.9686 Total | 2707.45314 99 27.3480115 Root MSE = .92692 ------------------------------------------------------------------------------ cons1000 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- redd1000 | .7145188 .0129311 55.26 0.000 .6888574 .7401803 _cons | 4.914329 .0935686 52.52 0.000 4.728645 5.100013 ------------------------------------------------------------------------------ aweight stands for analytical weights are inversely proportional to the variance of an observation. These are automatically employed in models that use averages, e.g. in Between Effects panel regression. FGLS (WLS) can be reproduced by the following steps: . g peso=1/redd1000^0.5 . g consp=cons1000*peso . g reddp=redd1000*peso . reg consp reddp peso, noconst Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 2, 98) = 3364.82 Model | 5852.17626 2 2926.08813 Prob > F = 0.0000 Residual | 85.2218967 98 .869611191 R-squared = 0.9856 -------------+------------------------------ Adj R-squared = 0.9854 Total | 5937.39815 100 59.3739815 Root MSE = .93253 ------------------------------------------------------------------------------ consp | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- reddp | .7145188 .0129311 55.26 0.000 .6888574 .7401803 peso | 4.914329 .0935686 52.52 0.000 4.728645 5.100013 ------------------------------------------------------------------------------ . whitetst White's general test statistic : 5.086438 Chi-sq( 5) P-value = .4054 . g reddp2=reddp^2 . bpagan reddp reddp2 Breusch-Pagan LM statistic: 1.265137 Chi-sq( 2) P-value = .5312 Opposite to the previous two heteroskedasticity tests, note that after a regression without the constant term we cannot run the hettest command: . hettest not appropriate after regress, nocons r(301);
  • 17. 17 Previous heteroskedasticy tests are run for didactical reasons only, just to see that the heteroskedasticity is no more present in the weighted regression; of course heroskedasticity tests are not performable after FGLS. All the issues raised above can be summarised in a single table on order to fix ideas. In doing so, we use the previous (we checked it is heteroskedastic) consumption function. To do so, we can quietly run three regressions of interest, namely: (1) heteroskedastic OLS without White’s standard errors correction; (2) heteroskedastic OLS without White’s standard errors correction; (3) WLS assuming that the errors variance is a linear function of incomes: . qui reg cons1000 redd1000 . est store OLS . qui reg cons1000 redd1000, robust . est store white . qui reg cons1000 redd1000 [aweight=1/redd1000] . est store WLS . est table OLS white WLS , b(%6.3f) se(%6.3f) t(%6.2f) /* */ stats(N df_r df_m r2 r2_a rmse F) -------------------------------------------- Variable | OLS white WLS -------------+------------------------------ redd1000 | 0.700 0.700 0.715 | 0.022 0.025 0.013 | 32.19 28.28 55.26 _cons | 5.668 5.668 4.914 | 1.332 1.076 0.094 | 4.26 5.27 52.52 -------------+------------------------------ N | 100 100 100 df_r | 98.000 98.000 98.000 df_m | 1.000 1.000 1.000 r2 | 0.914 0.914 0.969 r2_a | 0.913 0.913 0.969 rmse | 6.666 6.666 0.927 F | 1036.496 799.785 3053.189 -------------------------------------------- legend: b/se/t Discussion. In the context of a model with heteroskedastic errors, both OLS and WLS estimators are unbiased and consistent, therefore all the estimates are fairly closed each other. The parameters’ standard errors estimated by OLS in the first column are biased (because of heteroskedastic errors), while those in the second column are robust to heteroskedasticity (hence, reliable). However, being WLS also efficient, the standard errors reported in the third column are remarkably lower than those in the second column.
  • 18. 18 Appendix A1. Averaged data ccc Xy εβ += where c = 1,2,..,C is the number of groups (or clusters). Each group is composed by i = 1,2,..,Nc individuals which are averaged. Single individuals have homoskedastic errors, Var(εi) = σ2 ∀ i=1,2,..,N, and are not cross-sectional correlated, Cov(εi, εj) = 0 for i ≠ j. However, the available observations are ∑= =++= c c N i i c N c c NN 1 1 1 )..( 1 εεεε . Hence variance is: c c c c N i i c c N N NN EVar c 2 2 2 2 2 1 11 )( σ σ σεε ===      = ∑= i.e. the error variance decreases as the number of individuals within a cluster, Nc, increases. 6 /1000 000 0/10 000 00/1 0 00 0 )(=X)|'E(=X)|Var( 1 2 2 2 1 2222                 = =           == C c N i N N N Diag L OO LL OO LL L O L σ ω ω σωσσεεε Ω . FGLS/WLS weight each observation with cN giving more weight to observations with higher variance c 2 σ . In particular, the L matrix is L=                 C c N N N L OO LL OO LL 000 000 00 000 001 . If we multiply all the variables by the root square of each group dimension, we obtain the transformed model: εβ LXLyL += that, looking at the cth observation, corresponds to cccccc NXNyN εβ += , where 2 2 )( σ σ ε == c ccc N NNVar , i.e. transformed errors are homoskedastic. 6 Note that with this kind of data we loose the within groups variation and hence the estimates of paramteres are less precise. However, the fit, R2 , improves because the variation of errors are averaged.
  • 19. 19 The OLS estimator of the transformed model is best (minimum variance) and corresponds to the FGLS/WLS estimator: yXXXLyLXLXLXWLSFGLS 1111 / )(')'(ˆ −−−− Ω′Ω′=′′=β . A2. Aggregated data ccc Xy εβ += where c = 1,2,..,C is the number of groups (or clusters), and each group is the sum of i = 1,2,..,Nc individuals. Our observations are ∑= = cN i ic 1 εε . Hence variance is: 2 2 1 )( σεε c N i ic NEVar c =      = ∑= i.e. the error variance increases as the number of individuals within a cluster, Nc, increases; this is true even if the covariance among individuals within the cluster is negative. 000 000 00 000 00 0 00 0 )(=X)|'E(=X)|Var( 1 2 2 2 1 2222                 = =           == C c N i N N N Diag L OO LL OO LL L O L σ ω ω σωσσεεε Ω FGLS/WLS weight each observation with 1/ cN giving more weight to observations with higher variance c 2 σ . In particular, the L matrix is L=                 C c N N N /1000 000 0/10 000 00/1 1 L OO LL OO LL . If we scale all the variables by the root square of each group dimension, we obtain the transformed model: εβ LXLyL += that, looking at the c observation, corresponds to c c c c c c N X N y N εβ 111 += , where 221 ) 1 ( σσε == c c c c N NN Var , i.e. transformed errors are homoskedastic.
  • 20. 20 A3. Some hints on panel data To conclude this lecture and to add useful information especially in the panel data context, we compare three OLS estimators with different correction of the standard errors that are available in the regress command. (1) No correction of the standard errors or homoskedastic estimator (regress) 1212111 )()()()ˆ()()ˆ( −−−−− ′=′′′=′′′= X)X(sXXXsXXXXXXVarXXXVar OLS εβ , where ∑ =− = N i i KN s 1 22 ˆ 1 ε . (2) Heteroskedastic-consistent estimator (regress, robust) 1 1 211 1 1 ˆˆˆ)ˆ( − = −− = − ′      ′′=′′′= ∑∑ X)X(XXεX)X(X)X()Xε)(Xε(X)X(Var N i iii N i iiiirobustβ . where the center of the sandwich is sometimes multiplied by N/(N-K) as a degree-of-freedom adjustment for finite-sample. (3) Estimator that accounts for clustering into groups, with observations correlated within groups, but independent between groups [regress, cluster(name_groups)] 1 1 1 ˆˆ)ˆ( − = − ′′′= ∑ X)X(uuX)X(Var CN c ccclusterβ , where we have c = 1, 2, ..., NC clusters and ∑ ∈ = ci iic Xu εˆˆ is the sum of observations within each cluster c; the center of the sandwich is sometimes multiplied by (N-1)/(N-K)× Nc /( Nc -1) as finite- sample adjustment. Note that cluster implies robust option. The formula for the clustered estimator is simply that of the robust (unclustered) estimator with the individual ii Xεˆ replaced by their sums over each cluster. In other terms, the standard errors are computed based on aggregate y for the Nc independent groups. If the variance of the clustered estimator (3) is smaller than that of the robust (unclustered) estimator (2), it means that the cluster sums of ii Xεˆ have less variability than the individual ii Xεˆ . That is, when we sum the ii Xεˆ within a cluster, some of the variation gets cancelled out, and the total variation is smaller. This means that a big positive is summed with a big negative to produce something small; in other words, there is negative correlation within cluster. If the number of clusters is very small compared to the overall sample size, it could be that the clustered standard errors (3) are quite larger than the homoskedastic ones (1), because they are computed on aggregate data for few groups. Interpreting a difference between (1) the OLS estimator and (2) or (3) is trickier. In (1) the squared residuals are summed, but in (2) and (3) the residuals are multiplied by the X's (then for (3) summed within cluster) and then “squared” and summed. Hence, any difference between them has to do with very complicated relationships between the residuals and the X's.
  • 21. 21 If big (in absolute value) iεˆ are paired with big Xi, then the robust variance estimate will be bigger than the OLS estimate. On the other hand, if the robust variance estimate is smaller than the OLS estimate, it is not clear at all what is happening (in any case, it has to do with some odd correlations between the residuals and the X's. Note that if the OLS model is true, the residuals should, of course, be uncorrelated with the X's. Indeed, if all the assumptions of the OLS model are true, then the expected values of (1) the OLS estimator and (2) the robust (unclustered) estimator are approximately the same. So, if the robust (unclustered) estimates are just a little smaller than the OLS estimates, it may be that the OLS assumptions are true and we are seeing a bit of random variation. If the robust (unclustered) estimates are much smaller than the OLS estimates, then either we are seeing a lot of random variation (which is possible, but unlikely), or else there is something odd going on between the residuals and the X's.