MARGINALIZATION (Different learners in Marginalized Group
Econometrics ch5
1. 405 ECONOMETRICS
Chapter # 4: CLASSICAL NORMAL LINEAR
REGRESSION MODEL (CNLRM)
Domodar N. Gujarati
Prof. M. El-SakkaProf. M. El-Sakka
Dept of Economics: Kuwait UniversityDept of Economics: Kuwait University
2. • the classical theory of statistical inference consists of two branches, namely,the classical theory of statistical inference consists of two branches, namely,
estimationestimation andand hypothesis testinghypothesis testing. We have thus far covered the topic of. We have thus far covered the topic of
estimation.estimation.
• Under the assumptions of theUnder the assumptions of the CLRM, we were able to show that theCLRM, we were able to show that the
estimators of these parametersestimators of these parameters,, βˆβˆ11, βˆ, βˆ22,, andand σˆσˆ22
, satisfy several desirable, satisfy several desirable
statisticalstatistical properties, such as unbiasedness, minimum variance, etc. Noteproperties, such as unbiasedness, minimum variance, etc. Note
that, since these are estimators, their values will change from sample tothat, since these are estimators, their values will change from sample to
sample, they aresample, they are random variablesrandom variables..
• In regression analysis our objective is not only to estimate the SRF, but alsoIn regression analysis our objective is not only to estimate the SRF, but also
to use it toto use it to draw inferences about the PRFdraw inferences about the PRF. Thus, we would like to find out. Thus, we would like to find out
how closehow close βˆβˆ11 is to the trueis to the true ββ11 or how closeor how close σσˆˆ22
is to the trueis to the true σσ22
.. sincesince βˆβˆ11,, βˆβˆ22, and, and
σˆσˆ22
are random variables,are random variables, we need to find out their probability distributionswe need to find out their probability distributions,,
otherwise, we willotherwise, we will not be able to relate them to their true values.not be able to relate them to their true values.
3. THE PROBABILITY DISTRIBUTION OF DISTURBANCES ui
• considerconsider βˆβˆ22. As showed in Appendix 3A.2,. As showed in Appendix 3A.2,
• βˆβˆ22 = ∑= ∑ kkiiYYii (4.1.1)(4.1.1)
• WhereWhere kkii = x= xii// ∑∑xxii
22
. But since the. But since the X’sX’s are assumed fixed, Eq. (4.1.1) shows thatare assumed fixed, Eq. (4.1.1) shows that
βˆβˆ22 is a linear function ofis a linear function of YYii,, which is randomwhich is random by assumption. But sinceby assumption. But since YYii = β= β11
+ β+ β22XXii + u+ uii , we can write (4.1.1) as, we can write (4.1.1) as
• βˆβˆ22 = ∑= ∑ kkii((ββ11 + β+ β22XXii + u+ uii)) (4.1.2)(4.1.2)
• BecauseBecause kkii, the betas, and X, the betas, and Xii are all fixed,are all fixed, βˆβˆ22 is ultimately a linear function ofis ultimately a linear function of
uuii,, which is random by assumptionwhich is random by assumption. Therefore, the. Therefore, the probability distribution ofprobability distribution of
βˆβˆ22 (and also of(and also of βˆβˆ11) will depend on the assumption) will depend on the assumption made about themade about the
probability distribution ofprobability distribution of uuii ..
• OLS does not make any assumption about the probabilistic nature ofOLS does not make any assumption about the probabilistic nature of uuii..
This void can be filled if we are willing to assume that theThis void can be filled if we are willing to assume that the u’su’s follow somefollow some
probability distribution.probability distribution.
4. THE NORMALITY ASSUMPTION FOR ui
• The classicalThe classical normal linear regression model assumes that eachnormal linear regression model assumes that each uuii isis
distributed normally withdistributed normally with
• Mean:Mean: E(uE(uii) = 0) = 0 (4.2.1)(4.2.1)
• Variance:Variance: E[uE[uii − E(u− E(uii)])]22
= Eu= Eu22
ii == σσ22
(4.2.2)(4.2.2)
• cov (cov (uuii, u, ujj):): E{[(uE{[(uii − E(u− E(uii)][u)][ujj − E(u− E(ujj )]} = E(u)]} = E(uii uujj ) = 0 i ≠ j) = 0 i ≠ j (4.2.3)(4.2.3)
• The assumptions given above can be more compactly stated asThe assumptions given above can be more compactly stated as
• uuii N(0,∼ N(0,∼ σσ22
)) (4.2.4)(4.2.4)
• The terms in the parentheses are the mean and the variance.The terms in the parentheses are the mean and the variance. ui and ujui and uj areare
not only uncorrelated but are also independently distributed. Therefore, wenot only uncorrelated but are also independently distributed. Therefore, we
can write (4.2.4) ascan write (4.2.4) as
• uuii NID(0,∼ NID(0,∼ σσ22
)) (4.2.5)(4.2.5)
• where NID stands forwhere NID stands for normally and independently distributed.normally and independently distributed.
5. • Why the Normality Assumption? There are several reasons:Why the Normality Assumption? There are several reasons:
• 1.1. uuii represent the influencerepresent the influence omitted variables, we hope that the influence ofomitted variables, we hope that the influence of
these omitted variablesthese omitted variables is smallis small and atand at best randombest random. By the central limit. By the central limit
theorem (CLT) of statistics, it can be shown that if there are a large numbertheorem (CLT) of statistics, it can be shown that if there are a large number
of independent and identically distributed random variables, then, theof independent and identically distributed random variables, then, the
distribution of their sum tends to a normal distributiondistribution of their sum tends to a normal distribution as the number of suchas the number of such
variables increase indefinitely.variables increase indefinitely.
• 2. A variant of the CLT states that, even if the number of variables is not2. A variant of the CLT states that, even if the number of variables is not
very large or if these variables arevery large or if these variables are not strictly independentnot strictly independent, their, their sum maysum may
still be normally distributedstill be normally distributed..
• 3. With the normality assumption, the probability distributions of OLS3. With the normality assumption, the probability distributions of OLS
estimators can beestimators can be easily derivedeasily derived because one property of the normalbecause one property of the normal
distribution is thatdistribution is that any linear function of normally distributed variablesany linear function of normally distributed variables isis
itselfitself normally distributednormally distributed. OLS estimators. OLS estimators βˆβˆ11 andand βˆβˆ22 areare linear functionslinear functions ofof
uuii . Therefore, if. Therefore, if uuii are normallyare normally distributed, so aredistributed, so are βˆβˆ11 andand βˆβˆ22, which makes, which makes
our task of hypothesisour task of hypothesis testing very straightforward.testing very straightforward.
6. • 4. The normal distribution is a comparatively4. The normal distribution is a comparatively simple distributionsimple distribution involvinginvolving
only two parameters (mean and variance).only two parameters (mean and variance).
• 5. Finally, if we are dealing with a small, or finite, sample size, say data of5. Finally, if we are dealing with a small, or finite, sample size, say data of
less than 100 observations,less than 100 observations, the normalitythe normality not only helps us to derive thenot only helps us to derive the
exact probability distributions of OLS estimators but also enables us to useexact probability distributions of OLS estimators but also enables us to use
thethe t, F, and χ2t, F, and χ2 statistical tests for regression modelsstatistical tests for regression models..
7. PROPERTIES OF OLS ESTIMATORS UNDER THE
NORMALITY ASSUMPTION
• WithWith uuii follow the normal distribution,follow the normal distribution, OLS estimators have the followingOLS estimators have the following
properties;.properties;.
• 1. They are1. They are unbiasedunbiased..
• 2. They have2. They have minimum varianceminimum variance. Combined with 1., this means that they are. Combined with 1., this means that they are
minimum-variance unbiased, orminimum-variance unbiased, or efficient estimatorsefficient estimators..
• 3. They have3. They have consistencyconsistency; that is, as the sample size increases indefinitely,; that is, as the sample size increases indefinitely,
the estimatorsthe estimators converge to their true populationconverge to their true population values.values.
• 4.4. βˆβˆ11 (being a linear function of(being a linear function of uuii) is normally distributed with) is normally distributed with
• Mean:Mean: E(βˆ1) = β1E(βˆ1) = β1 (4.3.1)(4.3.1)
• var (var (βˆ1):βˆ1): σσ22
ββˆˆ11 == ((∑∑ XX22
ii/n/n ∑∑ xx22
ii))σσ22
= (3.3.3)= (3.3.3) (4.3.2)(4.3.2)
• Or more compactly,Or more compactly,
• βˆβˆ11 ∼∼ N (N (ββ11, σ, σ22
ββ ˆˆ11))
• then by the properties of the normal distribution the variablethen by the properties of the normal distribution the variable ZZ, which is, which is
• defined as:defined as:
• Z = (Z = (βˆβˆ11 − β− β11 )/)/ σβˆσβˆ11 (4.3.3)(4.3.3)
8. • follows the standard normal distribution, that is, a normal distribution withfollows the standard normal distribution, that is, a normal distribution with
zero mean and unit ( = 1) variance, orzero mean and unit ( = 1) variance, or
• Z N(0, 1)∼Z N(0, 1)∼
• 5.5. βˆβˆ22 (being a linear function of(being a linear function of uuii) is normally distributed with) is normally distributed with
• Mean:Mean: E(βˆE(βˆ22) = β) = β22 (4.3.4)(4.3.4)
• var (var (βˆβˆ22):): σσ22
ββˆˆ22 =σ=σ22
// ∑∑ xx22
ii = (3.3.1)= (3.3.1) (4.3.5)(4.3.5)
• Or, more compactly,Or, more compactly,
• βˆβˆ22 ∼∼ N(N(ββ22, σ, σ22
ββˆˆ22))
• Then, as in (4.3.3),Then, as in (4.3.3),
• Z = (Z = (βˆβˆ22 − β− β22 )/)/σσβˆ2βˆ2 (4.3.6)(4.3.6)
• also follows the standard normal distribution.also follows the standard normal distribution.
• Geometrically, the probability distributions ofGeometrically, the probability distributions of βˆβˆ11 andand βˆβˆ22 are shown inare shown in
Figure 4.1.Figure 4.1.
9.
10. • 6.6. ((n− 2)( ˆσn− 2)( ˆσ22
/σ/σ 22
)) is distributed as theis distributed as the χχ22
(chi-square) distribution with(chi-square) distribution with ((n −n −
2)df2)df..
• 7.7. ((βˆβˆ11, βˆ, βˆ22)) are distributed independently ofare distributed independently of σˆσˆ22
..
• 8.8. βˆβˆ11 and βˆand βˆ22 have minimum variance in the entire class of unbiasedhave minimum variance in the entire class of unbiased
estimators, whether linear or not. This result, due to Rao, is very powerfulestimators, whether linear or not. This result, due to Rao, is very powerful
because,because, unlike the Gauss–Markov theorem, it is not restricted to the classunlike the Gauss–Markov theorem, it is not restricted to the class
of linear estimators only. Therefore, we can say that the least-squaresof linear estimators only. Therefore, we can say that the least-squares
estimators areestimators are best unbiased estimators (BUEbest unbiased estimators (BUE); that is, they have minimum); that is, they have minimum
variancevariance in the entire class of unbiased estimatorsin the entire class of unbiased estimators..
11. • To sum up: The important point to note is that the normality assumptionTo sum up: The important point to note is that the normality assumption
enables us to derive the probability, or sampling, distributions ofenables us to derive the probability, or sampling, distributions of βˆβˆ11 and βˆand βˆ22
(both normal) and ˆ(both normal) and ˆσσ22
(related to the chi square).(related to the chi square). This simplifies the task ofThis simplifies the task of
establishing confidence intervals and testing (statistical) hypotheses.establishing confidence intervals and testing (statistical) hypotheses.
• In passing, note that, with the assumption thatIn passing, note that, with the assumption that uuii N(0, σ∼ N(0, σ∼ 22
), Y), Yii , being a, being a
linear function oflinear function of uuii, is itself normally distributed with the mean and variance, is itself normally distributed with the mean and variance
given bygiven by
• E(YE(Yii)) == ββ11 + β+ β22XXii (4.3.7)(4.3.7)
• var (var (YYii)) == σσ22
(4.3.8)(4.3.8)
• More neatly, we can writeMore neatly, we can write
• YYii N(β∼ N(β∼ 11 + β+ β22XXii , σ, σ22
)) (4.3.9)(4.3.9)