Multicollniarity          Venkat Reddy
Note•   This presentation is just the lecture notes form the corporate training on    Regression Analysis•   The best way ...
Contents• What is “Multicollinearity”?• Causes• Detection• Effects• Redemption
What is “Multicollinearity”?• Multicollinearity (or inter correlation) exists when at least some of the  predictor variabl...
Multicollinearity-Illustration• When correlation among X’s is low, OLS has lots of                        X1  information ...
Perfect Multicollinearity• Recall to estimate b, the matrix (X’X)-1 had to exist• What is OLS estimate of b or beta ?• Thi...
Causes of Multicollinearity• Statistical model specification: adding polynomial terms or trend indicators.• Too many varia...
How to detect Multicollinearity•   A high F statistic or R2 leads us to reject the joint hypothesis that all of the    coe...
Effects of Multicollinearity• Even in the presence of multicollinearity, OLS is BLUE and consistent.• Standard errors of t...
Multicollniarity Redemption•   Principal components estimator: This involves using a weighted average of the    regressors...
Upcoming SlideShare
Loading in …5
×

Multicollinearity, Causes, Effects, Detection and Redemption

29,294 views
28,365 views

Published on

Published in: Education
1 Comment
7 Likes
Statistics
Notes
  • article was good and to the point. I guess a little more on VIF could be more useful. Thanks.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
29,294
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
582
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Multicollinearity, Causes, Effects, Detection and Redemption

  1. 1. Multicollniarity Venkat Reddy
  2. 2. Note• This presentation is just the lecture notes form the corporate training on Regression Analysis• The best way to treat this is as a high-level summary; the actual session went more in depth and contained other information.• Most of this material was written as informal notes, not intended for publication• Please send your questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com• Please check my website for latest version of this document -Venkat Reddy
  3. 3. Contents• What is “Multicollinearity”?• Causes• Detection• Effects• Redemption
  4. 4. What is “Multicollinearity”?• Multicollinearity (or inter correlation) exists when at least some of the predictor variables are correlated among themselves• A “linear” relation between the predictors. Predictors are usually related to some extent, it is a matter of degree.
  5. 5. Multicollinearity-Illustration• When correlation among X’s is low, OLS has lots of X1 information to estimate b. This gives us confidence in our estimates of b Y• What is the definition of regression coefficient by the way? X2• When correlation among X’s is high, OLS has very little information to estimate b. This makes us relatively uncertain about our estimate of b X1 Y X2
  6. 6. Perfect Multicollinearity• Recall to estimate b, the matrix (X’X)-1 had to exist• What is OLS estimate of b or beta ?• This meant that the matrix X had to be of full rank• That is, none of the X’s could be a perfect linear function of any combination of the other X’s• If so, then b is undefined- But this is very rare
  7. 7. Causes of Multicollinearity• Statistical model specification: adding polynomial terms or trend indicators.• Too many variables in the model – X’s measure the same conceptual variable.• Data collection methods employed.
  8. 8. How to detect Multicollinearity• A high F statistic or R2 leads us to reject the joint hypothesis that all of the coefficients are zero, but the individual t-statistics are low. (why?)• VIF=1/(1-Rk2)• One can compute the condition number. That is, the ratio of the largest to the smallest root of the matrix xx.• This may not always be useful as the standard errors of the estimates depend on the ratios of elements of the characteristic vectors to the roots.• High sample correlation coefficients are sufficient but not necessary for multicollinearity.
  9. 9. Effects of Multicollinearity• Even in the presence of multicollinearity, OLS is BLUE and consistent.• Standard errors of the estimates tend to be large.• Large standard errors mean large confidence intervals. Large standard errors mean small observed test statistics. The researcher will accept too many null hypotheses. The probability of a type II error is large.• Estimates of standard errors and parameters tend to be sensitive to changes in the data and the specification of the model.
  10. 10. Multicollniarity Redemption• Principal components estimator: This involves using a weighted average of the regressors, rather than all of the regressors.• Ridge regression technique: This involves putting extra weight on the main diagonal of xx so that it produces more precise estimates. This is a biased estimator.• Drop the troublesome RHS variables. (This begs the question of specification error)• Use additional data sources. This does not mean more of the same. It means pooling cross section and time series.• Transform the data. For example, inversion or differencing.• Use prior information or restrictions on the coefficients.

×