Bus 173_6.pptx

Niza Talukder
Lecture 6
Multiple Regression

In this lecture, we will deal with two independent variables for regression analysis. This would simply be an
extension of simple regression model but we shall face two problems:
- Discriminating between the influence of a given explanatory variable on a dependent variable and the effects
of other explanatory variables.
- Problem of model specification. All variables are not significant; we need to figure out which variables should
be included in the model.
We will be considering the determinants of earnings: years of schooling, cognitive ability (cognitive ability
refers to the individual's capacity to think, reason, and problem solved. Cognitive ability is measured through
tests of intelligence and cognitive skills)
Earnings
Years of Schooling: S
Cognitive ability: ASVABC
The true population relationship is: EARNINGS = β1 + β2S + β3ASVABC + u

where EARNINGS is hourly earnings, S is years of schooling (highest grade completed),
ASVABC is composite score on the cognitive tests and u is a disturbance term.
Predicted regression line: EARNINGS = b1 + b2 S + b3 ASVABC
EARNINGS = -4.26 + 0.74S + 0.15 ASVABC
• The equation should be interpreted as follows:
• For every additional grade completed, holding the ability score constant, hourly
earnings will increase by $0.74. For every point increase in the ability score,
holding schooling constant, earnings increase by $0.15. The constant has no
meaningful interpretation. Literally, it suggests that a respondent with 0 years of
schooling (no respondent had less than six) and an ASVABC score of 0 (impossible)
would earn minus $4.62 per hour.

Omitted Variable Bias:
If the regressor is correlated with a variable that has been omitted from the analysis and
that determines, in part, the dependent variable, then the OLS estimator will have
omitted variable bias.
Omitted variable bias occurs when two conditions are true:
(1) the omitted variable is correlated with the included regressor: and
(2) the omitted variable is a determinant of the dependent variable.

OVB and the First OLS Assumption
Recall the First OLS Assumption
E(𝑢𝑖|𝑋𝑖) = 0
- This assumption fails if Xi (the included regressor) and 𝑢𝑖 (other factors) are correlated.
- If the omitted variable is a determinant of Y, then it is part of u, the other factors.
- If the omitted variable is correlated with X, then u is correlated with X, which is a violation of the
First Least Squares assumption.
You cannot test for OVB except by including potential omitted variables.

Can we estimate the size and direction of our mistake?
1) The bias does not decline with a larger sample.
2) The size of the bias depends on the strength of the correlation between X and u. The stronger the correlation,
the larger is the bias.
3) The direction of the bias depends on the sign of the correlation between X and u.

Coefficient of S when ASVABC is not included in the model: 1.07
Coefficient of S when ASVABC is included in the model: 0.739
The difference in coefficients for S reflect the model is suffering from omitted variable
bias.
1.07 > o.739
Thus, the direction of bias is upward. We call this an upward bias.
If the coefficient of S in the simple regression was less than that of the multiple
regression, your model would suffer from downward bias.

Measures of Fit in Multiple Regression
• Three commonly used summary statistics in multiple regression are the standard error of regression,
𝑅2 and adjusted 𝑅2. All three statistics measures how well the OLS estimate of multiple regression
describes or ‘fits’ the data.
• Adjusted 𝑅2:
Because 𝑅2 increases when a new variable is added, an increase in the 𝑅2 does not mean that adding
variable improves the fit of the model. In this sense, 𝑅2 gives inflated estimate of how well the
regression fits the data. One way to correct this is to deflate or reduce the 𝑅2 by some factor and this is
what adjusted 𝑅2 does.
The adjusted 𝑅2 is a modified version of the 𝑅2 that does not necessarily increase when a new regressor
is added.

Back to interpretation:
Note that the 𝑅2
increases on adding more variables. This however does not mean the model has improved. It will
increase even when we add nonsensical variables. Adjusted 𝑅2
corrects this problem by penalizing another
regressor – adjusted 𝑅2 does not necessarily increase on adding another variable.
In multiple regression, 𝑅2 shows the combined/joint explanatory power of the independent variables. ASVABC and
S together explains 12.36% variability in earnings.
F tests:
In simple regression case: tests the explanatory power of a regression model. It is equivalent to a two sided t test.
In multiple regression model: t test shall test the significance of coefficients individually while f test will test the
joint significance of coefficients.
F = ESS/k-1
RSS/n-k
If RSS decrease on addition of explanatory variables, it means that there has been some improvement in the fit.

Some STATA Commands (with S and Earnings as examples)
summarize S Earnings
describe S Earnings
scatter Earnings S
scatter Earnings S ||lfit Earnings S
corr Earnings S
histogram S
histogram Earnings, normal
reg Earnings S

Bus 173_6.pptx

Recommended

Recommended

More Related Content

Similar to Bus 173_6.pptx

Similar to Bus 173_6.pptx (20)

More from ssuserbea996

More from ssuserbea996 (20)

Recently uploaded

Recently uploaded (20)

Bus 173_6.pptx