1. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
1
6.1. By equation (6.15) in the text, we know
2 2
1
1 (1 ).
1
n
R R
n k
−
= − −
− −
Thus, the values of 2
R for columns (1)–(3) are: 0.1648, 0.1817, 0.1843.
6.2. (a) Workers with college degrees earn $10.47/hour more, on average, than workers
with only high school degrees.
(b) Men earn $4.69/hour more, on average, than women.
6.3. (a) On average, a worker earns $0.61/hour more for each year he ages.
(b) Sally’s earnings prediction is 0.11 + 10.44×1 − 4.56×1 + 0.61×29 = 23.68
dollars per hour.
Betsy’s earnings prediction is 0.11 + 10.44×1 − 4.56×1 + 0.61×34 = 26.73
dollars per hour. The difference is 3.05 $/hour (= 0.61× (34-29)).
6.4. (a) Workers in the Northeast earn $0.74 more per hour than workers in the West, on
average, controlling for other variables in the regression. Workers in the Midwest
earn $1.54 less per hour than workers in the West, on average, controlling for
other variables in the regression. Workers in the South earn $0.44 less than
workers in the West, controlling for other variables in the regression.
(b) The regressor West is omitted to avoid perfect multicollinearity. If West is
included, then the intercept can be written as a perfect linear function of the four
regional regressors.
(c) The expected difference in earnings between Juanita and Jennifer is
−0.44 − (−1.54) = $0.90/hour.
2. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
2
6.5. (a) $23,400 (recall that Price is measured in $1000s).
(b) In this case BDR = 1 and Hsize = 100. The resulting expected change in price is
23.4 + 0.156 100 = 39.0 thousand dollars or $39,000.
(c) The loss is $48,800.
(d) From the text 2 2
1
1
1 (1 ),
n
n k
R R
−
− −
= − − so 2 2
1
1
1 (1 ),
n k
n
R R
− −
−
= − − thus R2
= 0.727.
6.6. (a) There are other important determinants of a country’s crime rate, including
demographic characteristics of the population.
(b) Suppose that the crime rate is positively affected by the fraction of young males
in the population, and that counties with high crime rates tend to hire more
police. In this case, the size of the police force is likely to be positively
correlated with the fraction of young males in the population leading to a
positive value for the omitted variable bias so that 1 1
ˆ .
6.7. (a) The proposed research in assessing the presence of sex bias in setting wages is
too limited. There might be some potentially important determinants of salaries:
type of engineer, amount of work experience of the employee, and education
level. The sex with the lower wages could reflect the type of engineer, the amount
of work experience of the employee, or the education level of the employee. The
research plan could be improved with the collection of additional data as indicated
above; an appropriate statistical technique for analyzing the data would then be a
multiple regression in which the dependent variable is wages and the independent
variables would include a binary variable for sex, indicator variables for type of
engineer, work experience and education level (highest grade level completed, for
example). The potential importance of the suggested omitted variables makes a
“difference in means” test inappropriate for assessing the presence of sex bias in
setting wages.
3. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
3
(b) The description suggests that the research goes a long way towards controlling
for potential omitted variable bias. Yet, there still may be problems. Omitted
from the analysis are characteristics associated with behavior that led to
incarceration (excessive drug or alcohol use, gang activity, and so forth) that
might be correlated with future earnings. Ideally, data on these variables should
be included in the analysis as additional control variables.
6.8. Omitted from the analysis are reasons why the survey respondents slept more or less
than average. People with certain chronic illnesses might sleep more than 8 hours
per night. People with other illnesses might sleep less than 5 hours. This study says
nothing about the causal effect of sleep on mortality.
6.9. For omitted variable bias to occur, two conditions must be true: X1 (the included
regressor) is correlated with the omitted variable, and the omitted variable is a
determinant of the dependent variable. Since X1 and X2 are uncorrelated, the OLS
estimator of 1 does not suffer from omitted variable bias.
4. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
4
6.10. (a)
Assume X1 and X2 are uncorrelated: 1 2
2
0
X X
=
1
2
ˆ
1 1 4
400 1 0 6
1 4 1
0.00167
400 6 600
=
−
= = =
(b) With 1 2
, 0.5
X X
=
sb̂1
2
=
1
400
1
1- 0.52
é
ë
ê
ù
û
ú
4
6
=
1
400
1
0.75
é
ë
ê
ù
û
ú
4
6
= 0.0022
(c) The statement correctly says that the larger is the correlation between X1 and X2
the larger is the variance of 1
ˆ ,
however the recommendation “it is best to leave
X2 out of the regression” is incorrect. If X2 is a determinant of Y, then leaving X2
out of the regression will lead to omitted variable bias in 1
ˆ .
5. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
5
6.11. (a) 2
1 1 2 2
( )
i i i
Y b X b X
− −
(b)
2
1 1 2 2
1 1 1 2 2
1
2
1 1 2 2
2 1 1 2 2
2
( )
2 ( )
( )
2 ( )
i i i
i i i i
i i i
i i i i
Y b X b X
X Y b X b X
b
Y b X b X
X Y b X b X
b
− −
= − − −
− −
= − − −
(c) From (b), 1
ˆ
satisfies X1i
(Yi
- b̂1
X1i
- b̂2
X2i
) = 0
å , or 1 2 1 2
1 2
1
ˆ
ˆ i i i i
i
X Y X X
X
−
=
and the result follows immediately.
(d) Following analysis as in (c) 2 1 1 2
2 2
2
ˆ
ˆ i i i i
i
X Y X X
X
−
=
and substituting this into the
expression for 1
ˆ
in (c) yields
2 1 1 2
2
2
ˆ
1 1 2
1 2
1
ˆ .
i i i i
i
X Y X X
i i i
X
i
X Y X X
X
−
=
Solving for 1
ˆ
yields:
2
2 1 1 2 2
1 2 2 2
1 2 1 2
ˆ
( )
i i i i i i i
i i i i
X X Y X X X Y
X X X X
−
=
−
(e) The least squares objective function is 2
0 1 1 2 2
( )
i i i
Y b b X b X
− − −
and the partial
derivative with respect to b0 is
2
0 1 1 2 2
0 1 1 2 2
0
( )
2 ( ).
i i i
i i i
Y b b X b X
Y b b X b X
b
− − −
= − − − −
Setting this to zero and solving for 0
ˆ
yields: 0 1 1 2 2
ˆ ˆ ˆ .
Y X X
= − −
6. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
6
(f) Substituting 0 1 1 2 2
ˆ ˆ ˆ .
Y X X
= − − into the least squares objective function yields
( )
2
2
0 1 1 2 2 1 1 1 2 2 2
ˆ
( ) ( ) ( ) ( )
i i i i i i
Y b X b X Y Y b X X b X X
− − − = − − − − −
, which is
identical to the least squares objective function in part (a), except that all variables
have been replaced with deviations from sample means. The result then follows as in
(c).
Notice that the estimator for 1 is identical to the OLS estimator from the regression
of Y onto X1, omitting X2. Said differently, when 1 1 2 2
( )( ) 0
i i
X X X X
− − =
, the
estimated coefficient on X1 in the OLS regression of Y onto both X1 and X2 is the
same as estimated coefficient in the OLS regression of Y onto X1.
6.12 (a) Treatment (assignment to small classes) was not randomly assigned in the
population (the continuing and newly-enrolled students) because of the
difference in the proportion of treated continuing and newly-enrolled students.
Thus, the treatment indicator X is correlated with W. If newly-enrolled students
perform systematically differently on standardized tests than continuing
students (perhaps because of adjustment to a new school), then this becomes
part of the error term u in (a). This leads means that E(u | X) ≠ 0. Because E(u |
X) ≠ 0, 1
ˆ
is biased and inconsistent.
(b) Because treatment was randomly assigned conditional on enrollment status
(continuing or newly-enrolled), E(u | X, W) will not depend on X. This means
that the assumption of conditional mean independence is satisfied, and 1
ˆ
is
unbiased and consistent. However, because W was not randomly assigned
(newly-enrolled students may, on average, have attributes other than being
newly enrolled that affect test scores), E(u | X, W) may depend of W, so that 2
ˆ
may be a biased and inconsistent estimator for the causal effect of transferring
to a new school.
7. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
7
Empirical Exercise 6.1
Calculations for this exercise are carried out in the STATA file EE_6_1.do.
(a) The estimated regression is
= 3432.1 − 253.2Smoker
The estimated effect of smoking on birthweight is −253.2 grams.
(b) The estimated regression is
= 3051.2 − 217.6Smoker − 30.5Alcohol + 34.1Nprevist
(i) Smoking may be correlated with both alcohol and the number of pre-natal doctor
visits, thus satisfying (1) in Key Concept 6.1. Moreover, both alcohol consumption
and the number of doctor visits may have their own independent affects on
birthweight, thus satisfying (2) in Key Concept 6.1.
(ii) The estimated is somewhat smaller: it has fallen to 217 grams from 253 grams, so
the regression in (a) may suffer from omitted variable bias.
(iii) = 3051.2 −217.6×1 − 30.5×0 + 34.1×8 = 3106.4
(iv) R2
= 0.0729 and R2
= 0.0719. They are nearly identical because the sample size
is very large (n = 3000).
(v) Nprevist is a control variable. It captures, for example, mother's access to
healthcare and health. Because Nprevist is a control variable, its coefficient does not
have a causal interpretation.
(c) The results from STATA are
. ** FW calculations;
. regress birthweight alcohol nprevist;
Source | SS df MS Number of obs = 3000
-------------+------------------------------ F( 2, 2997) = 82.64
Model | 54966381 2 27483190.5 Prob > F = 0.0000
Residual | 996653623 2997 332550.425 R-squared = 0.0523
-------------+------------------------------ Adj R-squared = 0.0516
Total | 1.0516e+09 2999 350656.887 Root MSE = 576.67
------------------------------------------------------------------------------
birthweight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
alcohol | -103.2781 76.53276 -1.35 0.177 -253.3402 46.78392
8. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
8
nprevist | 36.49956 2.870272 12.72 0.000 30.87166 42.12746
_cons | 2983.739 33.35198 89.46 0.000 2918.344 3049.134
------------------------------------------------------------------------------
. predict bw_res, r;
. regress smoker alcohol nprevist;
Source | SS df MS Number of obs = 3000
-------------+------------------------------ F( 2, 2997) = 38.97
Model | 11.8897961 2 5.94489803 Prob > F = 0.0000
Residual | 457.202204 2997 .152553288 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0247
Total | 469.092 2999 .156416139 Root MSE = .39058
------------------------------------------------------------------------------
smoker | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
alcohol | .334529 .0518358 6.45 0.000 .2328917 .4361662
nprevist | -.0111667 .001944 -5.74 0.000 -.0149785 -.0073549
_cons | .3102729 .0225893 13.74 0.000 .2659807 .3545651
------------------------------------------------------------------------------
. predict smoker_res, r;
. regress bw_res smoker_res;
Source | SS df MS Number of obs = 3000
-------------+------------------------------ F( 1, 2998) = 66.55
Model | 21644450.1 1 21644450.1 Prob > F = 0.0000
Residual | 975009170 2998 325219.87 R-squared = 0.0217
-------------+------------------------------ Adj R-squared = 0.0214
Total | 996653621 2999 332328.65 Root MSE = 570.28
------------------------------------------------------------------------------
bw_res | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoker_res | -217.5801 26.6707 -8.16 0.000 -269.8748 -165.2854
_cons | -2.75e-07 10.41185 -0.00 1.000 -20.41509 20.41509
------------------------------------------------------------------------------
(d) The estimated regression is
= 3454.5 − 228.8Smoker − 15.1Alcohol
−698.0Tripre0 − 100.8 Tripre2 − 137.0 Tripre3
(i) Tripre1 is omitted to avoid perfect multicollinearity. (Tripre0+ Tripre1+ Tripre2+
Tripre3 = 1, the value of the “constant” regressor that determines the intercept). The
regression would not run, or the software will report results from an arbitrary
normalization if Tripre0, Tripre1, Tripre2, Tripre3, and the constant term all included in
the regression.
(ii) Babies born to women who had no prenatal doctor visits (Tripre0 = 1) had
birthweights that on average were 698.0 grams (≈ 1.5 lbs) lower than babies from others
who saw a doctor during the first trimester (Tripre1 = 1).
(iii) Babies born to women whose first doctor visit was during the second trimester
(Tripre2 = 1) had birthweights that on average were 100.8 grams (≈ 0.2 lbs) lower than
babies from others who saw a doctor during the first trimester (Tripre1 = 1). Babies born
to women whose first doctor visit was during the third trimester (Tripre3 = 1) had
birthweights that on average were 137 grams (≈ 0.3 lbs) lower than babies from others
who saw a doctor during the first trimester (Tripre1 = 1).
9. Stock/Watson - Introduction to Econometrics 4th
Edition - Answers to Exercises: Chapter 6
_____________________________________________________________________________________________________
9
Empirical Exercise 6.2
Calculations for this exercise are carried out in the STATA file EE_6_2.do.
(a)
Variable Mean
Standard
Deviation Minimum Maximum Units
Growth 1.87 1.82 −2.81 7.16
Percentage
Points
Rgdp60 3131.0 2523.0 367.0 9895.0 $1960
Tradeshare 0.542 0.229 0.141 1.128 Unit free
yearsschool 3.95 2.55 0.20 10.07 years
Rev_coups 0.170 0.225 0.000 0.970
Coups per
year
Assassinations 0.281 0.494 0.000 2.467
Assassinations
per year
Oil 0.00 0.00 0.00 0.00
0–1 indicator
variable
(b) Estimated Regression (in table format):
Regressor Coefficient
tradeshare 1.34
yearsschool 0.56
rev_coups −2.15
assasinations 0.32
rgdp60 −0.00046
intercept 0.63
SER 1.59
R2
0.29
2
R 0.23
The coefficient on Rev_Coups is −2.15. An additional coup in a five year period,
reduces the average year growth rate by (2.15/5) = 0.43% over this 25 year period.
This means the GDP in 1995 is expected to be approximately .43 25 = 10.75%
lower. This is a large effect.
(c) The predicted growth rate at the mean values for all regressors is 1.87.
(d) The resulting predicted value is 2.18
(e) The variable “oil” takes on the value of 0 for all 64 countries in the sample. This
would generate perfect multicollinearity, since = − 0
1 1
i i
Oil X , and hence the
variable is a linear combination of one of the regressors, namely the constant.