2. Definition
• The explanatory variable(s) X is correlated with another explanatory
variable in the model
• Near multicollinearity or very high multicollinearity
• Perfect multicollinearity (rarely encounter)
• c
3. THE NATURE OF MULTICOLLINEARITY:THE CASE
OF PERFECT MULTICOLLINEARITY
Table 8.1: The demand for
widgets.
4. THE NATURE OF MULTICOLLINEARITY: THE CASE OF PERFECT MULTICOLLINEARITY
𝑌𝑖 = 𝐴1+𝐴2𝑋2𝑖 + 𝐴3𝑋3𝑖 + 𝑢𝑖 ……. 8.1
𝑌𝑖 = 𝐵1+𝐵2𝑋2𝑖 + 𝐵3𝑋4𝑖 + 𝑢𝑖 ……. 8.2
When attempt was made to fit the regression 8.1 to the data in table 8.1, the
software refuse to estimate the regression. Then we plot variables price 𝑋2and 𝑋3
we get the diagram as figure 8.1 and when we regress we get
𝑋3𝑖 = 300 − 2𝑋2𝑖 and 𝑅2
= 𝑟2
= 1. ….8.3
The income variable 𝑋3 and the price variable 𝑋2 are perfectly linearly related or
we call it perfect multicolinearity.
5. 8.1 THE NATURE OF MULTICOLLINEARITY:THE CASE
OF PERFECT MULTICOLLINEARITY
Figure 8.1 Scattergram between income (X3) and
price (X2).
7. 8.1 THE NATURE OF MULTICOLLINEARITY:THE CASE
OF PERFECT MULTICOLLINEARITY
Because the relationship in 8.3 we cant estimate the regression 8.1, and what we do is substitute the 8.3 into
8.1 and obtain
𝑌𝑖 = 𝐴1+𝐴2𝑋2𝑖 + 𝐴3(300 − 2𝑋2𝑖) + 𝑢𝑖
= (𝐴1 + 300𝐴3)+(𝐴2 − 2𝐴3)𝑋2𝑖 + 𝑢𝑖
Y = 𝐶1 + 𝐶2𝑋2𝑖 + 𝑢𝑖 8.4
Where 𝐶1 = 𝐴1 + 300𝐴3 ….. 8.5 and 𝐶2 = 𝐴2 − 2𝐴3…….. 8.6
In this case we cannot estimate regression in 8.1 but can estimate 8.4 because we have only simple two-
variable regression model between Y and 𝑋2
The results are as follows – use stata output
• Where 𝐶1 = 49.667 and 𝐶2 = −2.1576
8. Output table 8.1 y and X2
.
_cons 49.66667 .7464394 66.54 0.000 47.94537 51.38796
x2 -2.157576 .1202996 -17.94 0.000 -2.434987 -1.880164
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 393.6 9 43.7333333 Root MSE = 1.0927
Adj R-squared = 0.9727
Residual 9.55151515 8 1.19393939 R-squared = 0.9757
Model 384.048485 1 384.048485 Prob > F = 0.0000
F(1, 8) = 321.66
Source SS df MS Number of obs = 10
. regress y x2
𝑌𝑖 = 49.667 − 2.1576𝑋2𝑖
𝑠𝑒 = (0.746)(0.1203) ………….8.7
𝑡 = (66.538)(−17.935) 𝑟2
= 0.9757
9. 8.1 THE NATURE OF MULTICOLLINEARITY:THE CASE
OF PERFECT MULTICOLLINEARITY
• For the case of perfect linear relationship or perfect multicolinearity
among explanatory variables, we cannot obtain unique estimates of
all parameters. And since we cannot obtain their unique estimate, we
cannot draw any statisitical inferences (hypothesis testing) about
them for a given sample
10. 8.2 THE CASE OF NEAR, OR IMPERFECT, MULTICOLLINEARITY
• This is the case of near, or imperfect, or high multicollinearity. We will explain
what we mean by “high” collinearity shortly.
• From now on when talking about multicollinearity, we are refering to imperfect
multicollinearity.
• To see what we mean by near, or imperfect, multicollinearity, let us return to our
data in Table 8-1, but this time, we estimate regression (8.2) with earnings as the
income variable. The regression results are as follows
• 𝑌𝑖 = 𝐵1+𝐵2𝑋2𝑖 + 𝐵3𝑋4𝑖 + 𝑢𝑖 ……. 8.2
11. output table 8.1 y x2 and x4 (high collinearity
_cons 145.3635 120.0618 1.21 0.265 -138.5376 429.2646
x4 -.3190745 .4003047 -0.80 0.452 -1.265645 .6274958
x2 -2.797465 .812182 -3.44 0.011 -4.71797 -.8769598
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 393.6 9 43.7333333 Root MSE = 1.1185
Adj R-squared = 0.9714
Residual 8.75673526 7 1.25096218 R-squared = 0.9778
Model 384.843265 2 192.421632 Prob > F = 0.0000
F(2, 7) = 153.82
Source SS df MS Number of obs = 10
. regress y x2 x4
13. What we can learn from this results
• We can estimate regression 8.2 not regression 8.1 though the difference between 𝑋3𝑖 and 𝑋4𝑖 is small
• Price coefficient is both negative and statistically significant but t statistic is small for equation 8.8 compare
to 8.7. the s.e of 8.7 is smaller than 8.8
• The 𝑟2
= 0.9778 for two explanatory variable and 𝑟2
= 0.9757 for one explanatory variable and increase
only 0.0021 – no great increase
• The coefficient of the income (earnings) variable is statistically insignificant, but, more importantly, it has the
wrong sign. For most commodities, income has a positive effect on the quantity demanded, unless the
commodity in question happens to be an inferior good.
• Despite the insignificance of the income variable, if we were to test the hypothesis that B2 = B3 = 0 (i.e., the
hypothesis that R2 = 0), the hypothesis could be rejected easily by applying the F
• In other words, collectively or together, price and earnings have a significant impact on the quantity
demanded.
14. What happen here ?
Figure 8.2Earnings (X4) and price (X2)
relationship.
15. Price and earnings and not perfectly linear related, but there are high dependency
between the two
𝑋4 = 299 − 2.0055𝑋2
𝑠𝑒 = (0.6748)(0.1088) ………….8.9
𝑡 = (444.44)(−18.44) 𝑟2 = 0.9770
280
285
290
295
300
X4
0 2 4 6 8 10
X2
280
285
290
295
300
X4
0 2 4 6 8 10
X2
16. 8.4 PRACTICAL CONSEQUENCES OF MULTICOLLINEARITY
1. Large variances and standard errors of OLS estimators
2. Wider confidence intervals.
3. Insignificant t ratios.
4. high R2 value but few significant t ratios
5. OLS estimators and their standard errors become very sensitive to small changes
in the data; that is, they tend to be unstable
6. Wrong signs for regression coefficients
7. Difficulty in assessing the individual contributions of explanatory variables to the
explained sum of squares (ESS) or R 2
17. 8.5 DETECTION OF MULTICOLLINEARITY
1. High R2 but few significant t ratios
2. High pairwise correlations among explanatory
variables
3. Examination of partial correlations
4. Subsidiary, or auxiliary, regressions
5. The variance inflation factor (VIF).
18. The Demand For Chicken (table 7.8)
DEMAND FOR CHICKENS, UNITED STATES, 1960-1982
• Y = Per Capita Consumption of Chickens, Pounds
• X2 = Real Disposable Income Per Capita, $
• X3 = Real Retail Price of Chicken Per Pound, Cents
• X4 = Real Retail Price of Pork Per Pound, Cents
• X5 = Real Retail Price of Beef Per Pound, Cents
• X6 = Composite Real Price of Chicken Substitutes Per Pound, Cents
19. Example of detection multicolinearity –the demand for chicken (table 7.8) outpout 8.15
• Since we have fitted a log-linear demand function, all slope coefficients are partial elasticity of Y
with respect to the appropriate X variable. Thus, the income elasticity of demand is about 0.34
percent, the own-price elasticity of demand is about -0.50, the cross-(pork) price elasticity of
demand is about 0.15, and the cross-(beef) price elasticity of demand is about 0.09.
_cons 2.189793 .1557149 14.06 0.000 1.862648 2.516938
logx5 .0911056 .1007164 0.90 0.378 -.1204917 .302703
logx4 .1485461 .0996726 1.49 0.153 -.0608583 .3579505
logx3 -.5045934 .1108943 -4.55 0.000 -.7375737 -.2716132
logx2 .3425546 .0832663 4.11 0.001 .1676186 .5174907
logy Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total .77475309 22 .03521605 Root MSE = .02759
Adj R-squared = 0.9784
Residual .013702848 18 .000761269 R-squared = 0.9823
Model .761050242 4 .190262561 Prob > F = 0.0000
F(4, 18) = 249.93
Source SS df MS Number of obs = 23
. regress logy logx2 logx3 logx4 logx5
20. Method 1 -Table 8-3 –Collinearity Diagnostics for the Demand Function for Chickens – The
correlation matrix (high pairwise and partial correlation)
• The pairwise correlations between the explanatory variables are uniformly high; about 0.98
between the log of real income and the log of the price of beef, about 0.95 between the logs of
pork and beef prices, about 0.91 between the log of real income and the log price of chicken,
etc. Although such high pairwise correlations are no guarantee that our demand function suffers
from the collinearity problem, the possibility exists
logx5 0.9790 0.9331 0.9543 1.0000
logx4 0.9725 0.9468 1.0000
logx3 0.9072 1.0000
logx2 1.0000
logx2 logx3 logx4 logx5
(obs=23)
. correlate logx2 logx3 logx4 logx5
21. Method 2 Table 8-3 –Collinearity Diagnostics for the Demand Function for
Chickens – The Auxilary regression
Regress logX2 with other four explanatory variables
_cons .9460511 .3700782 2.56 0.019 .1714686 1.720634
logx5 1.017648 .1499918 6.78 0.000 .7037112 1.331584
logx4 .9483276 .1675781 5.66 0.000 .5975826 1.299073
logx3 -.8324264 .2385002 -3.49 0.002 -1.331613 -.3332397
logx2 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 7.14953453 22 .324978842 Root MSE = .07602
Adj R-squared = 0.9822
Residual .109799347 19 .005778913 R-squared = 0.9846
Model 7.03973518 3 2.34657839 Prob > F = 0.0000
F(3, 19) = 406.06
Source SS df MS Number of obs = 23
. regress logx2 logx3 logx4 logx5
22. Regress logX3 with other four explanatory variables
_cons 1.233211 .15405 8.01 0.000 .9107805 1.555641
logx5 .5954599 .1573283 3.78 0.001 .2661681 .9247517
logx4 .6694311 .1375953 4.87 0.000 .3814407 .9574214
logx2 -.4693167 .1344649 -3.49 0.002 -.7507551 -.1878784
logx3 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1.08244219 22 .049201918 Root MSE = .05708
Adj R-squared = 0.9338
Residual .061904178 19 .003258115 R-squared = 0.9428
Model 1.02053801 3 .340179336 Prob > F = 0.0000
F(3, 19) = 104.41
Source SS df MS Number of obs = 23
. regress logx3 logx2 logx4 logx5
23. Regress logX4 with other four explanatory variables
_cons -1.01269 .2729109 -3.71 0.001 -1.583899 -.4414809
logx5 -.4694776 .2052784 -2.29 0.034 -.8991303 -.039825
logx3 .8286526 .1703218 4.87 0.000 .4721649 1.18514
logx2 .6618281 .116951 5.66 0.000 .4170468 .9066094
logx4 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 3.17492969 22 .144314986 Root MSE = .06351
Adj R-squared = 0.9721
Residual .07662784 19 .004033044 R-squared = 0.9759
Model 3.09830185 3 1.03276728 Prob > F = 0.0000
F(3, 19) = 256.08
Source SS df MS Number of obs = 23
. regress logx4 logx2 logx3 logx5
24. Regress logX5 with other four explanatory variables
_cons -.70572 .3155863 -2.24 0.038 -1.36625 -.0451902
logx4 -.4597968 .2010455 -2.29 0.034 -.8805899 -.0390037
logx3 .7218887 .1907324 3.78 0.001 .3226812 1.121096
logx2 .6955612 .1025192 6.78 0.000 .4809859 .9101364
logx5 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 3.17504861 22 .144320392 Root MSE = .06285
Adj R-squared = 0.9726
Residual .075047746 19 .003949881 R-squared = 0.9764
Model 3.10000087 3 1.03333362 Prob > F = 0.0000
F(3, 19) = 261.61
Source SS df MS Number of obs = 23
. regress logx5 logx2 logx3 logx4
25. Table 8.4 demand for chicken Auxiliary
regressions
As this table shows, all regressions in this table have R2 values in excess of
0.94; the F test shown in Eq. (4.50) shows that all these R2’s are statistically
significant (see Problem 8.24), suggesting that each explanatory variable in
the regression output (8.15) is highly collinear with the
other explanatory variables.
26. Method 3 - The variance inflation factors
• So we want to know whether x2 (income) and wealth (x3) is highly correlated or not
So we set 𝑋2 = 𝜎0 + 𝜎1𝑋3
Table 8.6 :Hypothetical data on consumption expenditure (Y), weekly income
(X2), and wealth (X3).
27. VIF formula
The VIF formula is
𝑉𝐼𝐹𝑥2𝑥3 =
1
(1 − 𝑅𝑥2𝑥3
2
)
The rule of thumb
• If 𝑉𝐼𝐹𝑥2𝑥3 less than 10, there is no multicolinearity, therefore the 𝑅𝑥2𝑥3
2
must be
less than 0.900
• If the 𝑅𝑥2𝑥3
2
approaching the value of 1, 𝑉𝐼𝐹𝑥2𝑥3 will approach the infinity value,
exist a perfect multicolinearity
• If the 𝑅𝑥2𝑥3
2
approaching the value of 0, 𝑉𝐼𝐹𝑥2𝑥3 will approach the value of 1, no
multicolinearity
28. The result X2 and X3
• So the results is 𝑉𝐼𝐹𝑥2𝑥3 =
1
(1−0.9876)
=
1
0.02
= 50 the value 50 has exceeded more
than 10 so exist serious multicolinearity between income (X2) and Wealth (X3)
_cons 2.426457 7.010304 0.35 0.738 -13.73933 18.59225
x3 .0952122 .0037703 25.25 0.000 .0865178 .1039067
x2 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 33000 9 3666.66667 Root MSE = 7.1489
Adj R-squared = 0.9861
Residual 408.850204 8 51.1062756 R-squared = 0.9876
Model 32591.1498 1 32591.1498 Prob > F = 0.0000
F(1, 8) = 637.71
Source SS df MS Number of obs = 10
. regress x2 x3
29. 8.8 WHAT TO DO WITH MULTICOLLINEARITY:
REMEDIAL MEASURES
• Dropping a Variable(s) from the Model – BECAREFUL – we must follow the theory, not
simply drop the relevant variable. dropping those variables from the model will lead to
what is known as model specification error
• Acquiring Additional Data or a New Sample - increasing the sample size—can reduce the
severity of the collinearity problem. Getting additional data on variables already in the
sample may not be feasible because of cost and other considerations. But if these
constraints are not very prohibitive, by all means this remedy is certainly feasible
• Rethinking the Model - Sometimes a model chosen for empirical analysis is not carefully
thought out— maybe some important variables are omitted, or maybe the functional form
of the model is incorrectly chosen
• Prior Information about Some Parameters – Sometimes a particular phenomenon, such as
a demand function, is investigated time and again
• Transformation of Variables - Thus, the “trick” of converting the nominal variables into
“real” variables (i.e., transforming the original variables) has apparently eliminated the
collinearity problem
• Other Remedies - There are several other remedies suggested in the literature, such as
combining time series and cross-sectional data, factor or principal component analysis
and ridge regression
30. Dropping a Variable(s) from the Model –The demand for chicken model (equation
8.16)
We drop pork and beef variables from the model. Now we are not following the economic theory.
This is what we call a model specification error. As these results show, compared to the regression
(8.15), the income elasticity has gone up but the own-price elasticity, in absolute value, has
declined. In other words, estimated coefficients of the reduced model seem to be biased. So we
obtain biased estimates. The best practical advice is not to drop a variable from an economically
viable model just because the collinearity problem is serious
_cons 2.03282 .116183 17.50 0.000 1.790466 2.275173
logx3 -.3722119 .0634661 -5.86 0.000 -.5045998 -.239824
logx2 .4515277 .0246948 18.28 0.000 .4000153 .5030401
logy Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total .77475309 22 .03521605 Root MSE = .02778
Adj R-squared = 0.9781
Residual .015437406 20 .00077187 R-squared = 0.9801
Model .759315684 2 .379657842 Prob > F = 0.0000
F(2, 20) = 491.87
Source SS df MS Number of obs = 23
. regress logy logx2 logx3