1. _____________________________________________________________________________________________________
1
5.1 (a) The 95% confidence interval for 1
is { 5 82 1 96 2 21},
− that is
1
10 152 1 4884.
− −
(b) Calculate the t-statistic:
1
1
ˆ 0 5 82
2 6335
ˆ
SE( ) 2 21
act
t
− −
= = = −
The p-value for the test 0 1 0
H
= vs. 1 1 0
H
is
-value 2 ( | |) 2 ( 2 6335) 2 0 0042 0 0084
act
p t
= − = − = =
The p-value is less than 0.01, so we can reject the null hypothesis at the 5%
significance level, and also at the 1% significance level.
(c) The t-statistic is
1
1
ˆ ( 5.6) 0 22
0.10
ˆ
SE ( ) 2 21
act
t
− −
= = =
The p-value for the test 0 1
: 5.6
H = − vs. 1 1
: 5.6
H − is
-value 2 ( | |) 2 ( 0.10) 0.92
act
p t
= − = − =
The p-value is larger than 0.10, so we cannot reject the null hypothesis at the
10%, 5% or 1% significance level. Because 1 5.6
= − is not rejected at the 5%
level, this value is contained in the 95% confidence interval.
(d) The 99% confidence interval for 0 is {520.4 2.58 20.4},
that is,
0
467.7 573.0.
2. _____________________________________________________________________________________________________
2
5.2. (a) The estimated gender gap equals $2.12/hour.
(b) The hypothesis testing for the gender gap is 0 1 0
H
= vs. 1 1 0.
H
With a t-
statistic 1
1
ˆ 0 2.12
5.89
ˆ
( ) 0 36
act
t
SE
−
= = =
the p-value for the test is
p-value = 2F(-|tact
|) = 2F(-5.89) = 2´0.0000 = 0.000(to four decimal places).
The p-value is less than 0.01, so we can reject the null hypothesis that there is no
gender gap at a 1% significance level.
(c) The 95% confidence interval for the gender gap 1
is {2 12 1 96 0 36},
that
is, 1
1 41 2.83.
(d) The sample average wage of women is 0
ˆ 12 52/hour.
$
= The sample average
wage of men is 0 1
ˆ ˆ $12.52 $2.12 $14.64/hour.
+ = + =
(e) The binary variable regression model relating wages to gender can be
written as either 0 1 i
Wage Male u
= + + or 0 1 i
Wage Female v
= + + In the
first regression equation, Male equals 1 for men and 0 for women; 0
is the
population mean of wages for women and 0 1
+ is the population mean of
wages for men. In the second regression equation, Female equals 1 for women
and 0 for men; 0
is the population mean of wages for men and 0 1
+ is the
population mean of wages for women. We have the following relationship for the
coefficients in the two regression equations:
5.2 (continued)
3. _____________________________________________________________________________________________________
3
0 0 1
0 1 0
= +
+ =
Given the coefficient estimates 0
ˆ
and 1
ˆ
, we have
0 0 1
1 0
0 1
ˆ ˆ
ˆ 14.64
ˆ ˆ
ˆ ˆ 2 12
= + =
= − = − = −
Due to the relationship among coefficient estimates, for each individual
observation, the OLS residual is the same under the two regression equations:
.
ˆ ˆ
i i
u v
= Thus the sum of squared residuals, 2
1
,
ˆ
n
i
i
SSR u
=
= is the same under the
two regressions. This implies that both ( )
1
2
1
SSR
n
SER −
= and 2
1 SSR
TSS
R = − are
unchanged.
In summary, in regressing Wages on ,
Female we will get
2
0 06 SER 4.2
R = =
5. _____________________________________________________________________________________________________
5
5.4. (a) −12.12 + 2.37 16 = $25.80 per hour
(b) The wage is expected to increase by 2.37×2 = $4.74 per hour.
(c) The increase in wages for college education is 1 4. Thus, the counselor’s
assertion is that 1 = 10/4 = 2.50. The t-statistic for this null hypothesis is
t = 2.37-2.50
0.10
= -1.3, which has a p-value of 0.19 Thus, the counselor’s assertion
cannot be rejected at the 10% significance level. A 95% confidence for 1 4 is
4 (2.37 1.96 0.10) or $8.70 Gain $10.36.
6. _____________________________________________________________________________________________________
6
5. 5 (a) The estimated gain from being in a small class is 13.9 points. This is equal to
approximately 1/5 of the standard deviation in test scores, a moderate increase.
(b) The t-statistic is 13.9
2.5
5.56,
act
t = = which has a p-value of 0.00. Thus the null
hypothesis is rejected at the 5% (and 1%) level.
(c) 13.9 2.58 2.5 = 13.9 6.45.
(d) Yes. Students were randomly assigned to small or regular classes, so that SmallClass is
independent of characteristics of the student, including those affecting testscores, that is
u. Thus E(ui | ClassSizei) = 0.
7. _____________________________________________________________________________________________________
7
5.6. (a) The question asks whether the variability in test scores in large classes is the
same as the variability in small classes. It is hard to say. On the one hand, teachers in
small classes might able so spend more time bringing all of the students along, reducing
the poor performance of particularly unprepared students. On the other hand, most of
the variability in test scores might be beyond the control of the teacher.
(b)The formula in 5.3 is valid for heteroskesdasticity or homoskedasticity; thus
inferences are valid in either case.
8. _____________________________________________________________________________________________________
8
5.7. (a) The t-statistic is 3.2
1.5
2.13
= with a p-value of 0.03; since the p-value is less than
0.05, the null hypothesis is rejected at the 5% level.
(b)3.2 1.96 1.5 = 3.2 2.94
(c) Yes. If Y and X are independent, then 1 = 0; but the p-value in (a) was 0.03. This
means that only in 3% of all samples, the absolute value of t-statistic would be
2.13 (the value actually observed in this sample) or larger.
(d) 1 would be rejected at the 5% level in 5% of the samples; 95% of the
confidence intervals would contain the value 1 = 0.
9. _____________________________________________________________________________________________________
9
5.8. (a) 43.2 2.05 10.2 or 43.2 20.91, where 2.05 is the 5% two-sided critical value
from the t28 distribution.
(b) The t-statistic is 61.5 55
7.4
0.88,
act
t −
= = which is less (in absolute value) than the
critical value of 2.05. Thus, the null hypothesis is not rejected at the 5% level.
(c) The one sided 5% critical value is 1.70; tact
is less than this critical value, so that
the null hypothesis is not rejected at the 5% level.
11. _____________________________________________________________________________________________________
11
5.10. Let n0 denote the number of observation with X = 0 and n1 denote the number of
observations with X = 1; note that 1
1
;
n
i
i
X n
=
=
1| ;
X n n
= 1
1
1
1
;
n
i i
n i
X Y Y
=
=
( )
2
1 0
1 1
2 2 2
1 1
1 1
( ) 1 ;
n n n n
n n
i i n n n
i i
X X X nX n n
= =
− = − = − = − =
1 1 0 0 1
,
n
i
i
n Y n Y Y
=
+ = so
that 0
1
1 0
n
n
n n
Y Y Y
= +
From the least squares formula
b̂1
=
åi=1
n
(Xi
- X)(Yi
-Y )
åi=1
n
(Xi
- X)2
=
åi=1
n
Xi
(Yi
-Y )
åi=1
n
(Xi
- X )2
=
åi=1
n
Xi
Yi
-Yn1
n1
n0
|n
=
n
n0
(Y1
-Y ) =
n
n0
Y -
n1
n
Y1
-
n0
n
Y0
æ
è
ç
ö
ø
÷ = Y1
-Y0
,
and 0 1 0
1 1
0 1 0 1 1 0 0 0
ˆ ˆ ( )
n n n
n n
Y X Y Y Y Y Y Y
n n n n
+
= − = + − − = =
13. _____________________________________________________________________________________________________
13
5.12. Equation (4.20) gives
( ) ( )
0
2
ˆ 2 2
2
var ( )
where 1
i i X
i i
i
i
H u
H X
E X
n E H
= = −
Using the facts that ( | ) 0
i i
E u X = and var 2
( | )
i i u
u X
= (homoskedasticity), we have
( ) ( )
( )
2 2
2
( ) ( ) [ ( | )]
0 0 0
x x
i i i i i i i i i
i i
x
i
E H u E u X u E u E X E u X
E X E X
E X
= − = −
= − =
and
2
2
2
2
2 2 2 2
2 2
2
2 2
2 2
[( ) ]
2
2 |
X
i i i i i
i
X X
i i i i i
i i
X
X
i i i i
i i
E H u E u X u
E X
E u X u X u
E X E X
E u E X E u X
E X E X
= −
= − +
= − + 2 2
2
2
2 2 2
2 2
2
2
2
|
2
1 .
X
i i i
X
X
i
u u u
i i
X
u
i
E X E u X
E X
E X E X
E X
= − +
= −
Because ( ) 0,
i i
E H u = var 2
( ) [( ) ]
i i i i
H u E H u
= so
( )
2
2 2
2
var ( ) [( ) ] 1 X
i i i i u
i
H u E H u
E X
= = −
Also (continued next page)
14. _____________________________________________________________________________________________________
14
5.12 (continued)
( )
( ) ( ) ( )
( ) ( )
( ) ( )
2 2
2 2
2 2 2
2
2 2
2
2 2
2
1 1 2
1 2 1
X X X
i i i i
i i i
X
X X
i
i i
i
E H E X E X X
E X E X E X
E X
E X E X
E X
= − = − +
= − + = −
Thus
( )
( )
( ) ( )
( )
( )
0
2
2
2
2
2
ˆ 2
2
2 2
2
2
2
2 2
2 2
2
2 2
1
var ( )
1
1
( )
[ ]
X
u
i
i i u
i X
X
i
i
i u
i u
X
i X
E X
H u
nE H
n
n
E X
E X
E X
E X
n
n E X
−
= = =
−
−
= =
−
16. _____________________________________________________________________________________________________
16
5.14. (a) From Exercise (4.11), ˆ
i i
a Y
= where 2
1
i
n
j
j
X
i
X
a
=
=
. Since the weights depend
only on i
X but not on i
Y , ˆ
is a linear function of Y.
(b) E(b̂|X1
,… , Xn
) = b +
åi=1
n
Xi
E(ui
|X1
,… , Xn
)
åi=1
n
X j
2
= b since E(ui
|X1
,… , Xn
) = 0
(c) Var(b̂|X1
,… , Xn
) =
åi=1
n
Xi
2
Var(ui
|X1
,… , Xn
)
åi=1
n
X j
2
é
ë
ù
û
2
=
s 2
åi=1
n
X j
2
(d) This follows the proof in the appendix.
17. _____________________________________________________________________________________________________
17
5.15. Because the samples are independent, ,1
ˆ
m
and ,1
ˆ
w
are independent. Thus
,1 ,1 ,1 ,1
ˆ ˆ ˆ ˆ
var ( ) var ( ) var( ).
m w m w
− = + ,1
ˆ
Var ( )
m
is consistently estimated as
2
,1
ˆ
[ ( )]
m
SE and ,1
ˆ
Var ( )
w
is consistently estimated as 2
,1
ˆ
[ ( )] ,
w
SE so that
,1 ,1
ˆ ˆ
var( )
m w
− is consistently estimated by 2 2
,1 ,1
ˆ ˆ
[ ( )] [ ( )] ,
m w
SE SE
+ and the result
follows by noting the SE is the square root of the estimated variance.
18. _____________________________________________________________________________________________________
18
Empirical Exercise 5.1
Calculations for this exercise are carried out in the STATA file EE_5_1.do.
(a) The estimated regression is
= −512.7 + 707.7×Height
(3379.9) (50.4)
The 95% confidential interval for the slope coefficient is 707.7 ± 1.96×50.4, or
608.9 ≤ 1 ≤ 806.5. This interval does not include 1 = 0, so the estimated slope is
significantly different than 0 at the 5% level. Alternatively, the t-statistic is 707.7/50.4 ≈
14.0, which is greater in absolute value than the 5% critical value of 1.96. And finally,
the p-value for the t-statistic is p-value ≈ 0.000, which is smaller than 0.05.
(b) For women the estimated regression is
= 12650 + 511.2×Height
(6299) (97.6)
The 95% confidential interval for the slope coefficient is 511.2 ± 1.96×97.6, or
319.9 ≤ 1,Female ≤ 702.5. This interval does not include 1,Female = 0, so the estimated
slope is significantly different than 0 at the 5% level.
(c) For men the estimated regression is
= -43130 + 1306.9×Height
(6925) (98.9)
The 95% confidential interval for the slope coefficient is 1306.9 ± 1.96×98.9, or
1113.1 ≤ 1,Male ≤ 1500.6. This interval does not include 1,Male = 0, so the estimated
slope is significantly different than 0 at the 5% level.
(d) The estimate of 1,Male − 1,Female is b̂1,Male - b̂1,Female and the standard error is
SE b̂1,Male - b̂1,Female
( )= var(b̂1,Male )+ var(b̂1,Female ) = SE(b̂1,Male )2
+ SE(b̂1,Female )2
. Using
the estimated regressions in (b) and (c): b̂1,Male - b̂1,Female = 1306.9−511.2 = 795.7, and .
SE b̂1,Male - b̂1,Female
( )= 98.92
+ 97.62
=138.9 .
The 95% confidence interval for 1,Male − 1,Female is 795.7 ± 1.96 × 138.9 or
19. _____________________________________________________________________________________________________
19
523.5 ≤ 1,Male − 1 ≤ 1,067.8. This interval does not include 1,Male − 1 = 0, so the
estimated difference in the slopes is significantly different than 0 at the 5% level.
(e) The table below shows the estimated slope, its standard error, and number of
observations for various occupation groups.
Occupation
b̂1 SE(b̂1) t −
stat
n
Exec/Manager 469.5 153.0 3.1 1,906
Professionals 622.8 116.2 5.4 3,158
Technicians 649.7 213.0 3.1 875
Sales 1372.4 146.4 9.4 1,957
Administration 201.2 131.5 1.5 3,124
Household service −172.9 637.9 −0.3 113
Protective service 1503.0 391.1 3.84 364
Other Service 62.9 132.1 0.48 1,980
Farming 1049.2 297.3 3.53 361
Mechanics 571.2 354.9 1.61 534
Construction/Mining 967.0 308.7 3.13 616
Precision production 1080.3 284.3 3.80 439
Machine Operator 972.9 151.9 6.5 1268
Transport 1138.4 274.0 4.16 684
Laborer 549.1 237.0 2.3 491
Exec/Manager, Professionals, Technicians, Sales, Administration 829.8 65.1 12.7 11020
Protective service, Farming, Mechanics, Construction/Mining,
Precision production, Machine Operator, Transport, Laborer
1151.0 88.4 13.0 4,757
The predicted effect of height on earnings (1) is larger for occupations that require more
strength (see the last row in the table) than others (see the penultimate row in table). That
said, the estimated effect of height on earning is both large and statistically significant in
several occupations in which strength would not seem to have a large effect on
productivity (again, see the penultimate row in table).
20. _____________________________________________________________________________________________________
20
Empirical Exercise 5.2
Calculations for this exercise are carried out in the STATA file EE_5_2.do.
(a) The estimated regression is
0.96 1.68 Tradeshare
(0.54) (0.87)
The t-statistic for the slope coefficient is t = 1.68/0.87 = 1.94.
The t-statistic is larger in absolute value that the 10% critical value (1.64), but less than
the 5% and 1% critical values (1.96 and 2.58). Therefore the null hypothesis is rejected
at the 10% significance level, but not at the 5% or 1% levels.
(b) The p-value is 0.057.
(c) The 90% confidence interval is 1.68 ± 1.64×0.87 or 0.25 ≤ 1 ≤ 3.11.
21. _____________________________________________________________________________________________________
21
Empirical Exercise 5.3
Calculations for this exercise are carried out in the STATA file EE_5_3.do.
(a) Average birthweights, along with standard errors are shown in the table below.
(Birthweight is measured in grams.)
All Mothers Non-smokers Smokers
X 3383 3432.1 3178.8
SE( X ) 10.8 11.9 24.0
n 3000 2418 582
(b) The estimated difference is XSmokers - XNonSmokers
= −253.2. The standard error of the
difference is SE XSmokers - XNonSmokers
( )= SE(XSmokers )2
+ SE(XNonSmokers )2
= 26.8.
The 95% confidence for the difference is −253.2 ± 1.96×26.8 = (−305.9,−200.6).
(c) The estimated regression is
= 3432.1 − 253.2Smoker
(11.9) (26.8)
(i) The intercept is the average birthweight for non-smokers (Smoker = 0). The slope is
the difference between average birthweights for smokers (Smoker = 1) and non-smokers
(Smoker = 0).
(ii) They are the same.
(iii) This the same as the confidence interval in (b).
(d) Yes − and we’ll investigate this more in future empirical exercises.