SlideShare a Scribd company logo
1 of 27
_____________________________________________________________________________________________________
1
3.1. The central limit theorem says that when the sample size ( n ) is large, the distribution
of the sample average (Y ) is approximately 2
,
Y Y
N  
 
 
 
with
2
2
.
Y
n
Y

 = Given a
population 100,
Y
 = 2
43 0,
Y
 =  we have
(a) 100,
n =
2
2 43
100
0 43,
Y
n
Y

 = = =  and
100 101 100
Pr ( 101) Pr (1.525) 0 9364
0 43 0 43
Y
Y
 
− −
 =    =  
 
 
 
(b) 64,
n =
2
2 43
64 64
0 6719,
Y
Y

 = = =  and
101 100 100 103 100
Pr(101 103) Pr
0 6719 0 6719 0 6719
(3 6599) (1 2200) 0 9999 0 8888 0 1111
Y
Y
 
− − −
  =  
 
  
 
   −   =  −  =  
(c) 165,
n =
2
2 43
165
0 2606,
Y
n
Y

 = = =  and
100 98 100
Pr ( 98) 1 Pr ( 98) 1 Pr
0 2606 0 2606
1 ( 3 9178) (3 9178) 1 0000 (rounded to four decimal places)
Y
Y Y
 
− −
 = −  = − 
 
 
 
 −  −  =   =  
_____________________________________________________________________________________________________
2
3.2. Each random draw i
Y from the Bernoulli distribution takes a value of either zero or
one with probability Pr( 1)
i
Y p
= = and Pr( 0) 1 .
i
Y p
= = − The random variable i
Y
has mean
( ) 0 Pr( 0) 1 Pr( 1) ,
i
E Y Y Y p
=  = +  = =
and variance
2
2 2
2 2
var( ) [( ) ]
(0 ) Pr( 0) (1 ) Pr( 1)
(1 ) (1 ) (1 )
i i Y
i i
Y E Y
p Y p Y
p p p p p p

= −
= −  = + −  =
= − + − = − 
(a) The fraction of successes is
1
( 1)
(success)
ˆ
n
i i
i
Y
# Y
#
p Y
n n n
=

=
= = = = 
(b)
1
1 1
1 1
ˆ
( ) ( )
n n n
i i
i
i i
Y
E p E E Y p p
n n n
=
= =
 

= = = = 
 
 
 
 
(c)
1
2 2
1 1
1 1 (1 )
ˆ
var( ) var var( ) (1 )
n n n
i i
i
i i
Y p p
p Y p p
n n n n
=
= =
 
 −
= = = − = 
 
 
 
 
The second equality uses the fact that 1
Y , , Yn are i.i.d. draws and cov( , ) 0,
i j
Y Y =
for .
i j

_____________________________________________________________________________________________________
3
3.3. Denote each voter’s preference by Y, with Y = 1 if the voter prefers the incumbent
and Y = 0 if the voter prefers the challenger. Y is a Bernoulli random variable with
probability Pr( 1)
Y p
= = and Pr( 0) 1 .
Y p
= = − From the solution to Exercise 3.2, Y
has mean p and variance (1 ).
p p
−
(a) 215
400
ˆ 0 5375.
p = = 
(b) The estimated variance of p̂ is
The standard error is SE
1
2
ˆ ˆ
( ) (var( )) 0 0249.
p p
= = 
(c) The computed t-statistic is
0
ˆ 0 5375 0 5
1 506
ˆ
SE( ) 0 0249
p
act
p
t
p
 
−  − 
= = =  

Because of the large sample size ( 400),
n = we can use Equation (3.14) in the text
to compute the p-value for the test 0 0 5
H p
 =  vs. 1 0 5:
H p
  
-value 2 ( | |) 2 ( 1 506) 2 0 066 0 132
act
p t
=  − =  −  =   =  
(d) Using Equation (3.17) in the text, the p-value for the test 0 0 5
H p
 =  vs.
1 0 5
H p
   is
-value 1 ( ) 1 (1 506) 1 0 934 0 066
act
p t
= −  = −   = −  =  
(e) Part (c) is a two-sided test and the p-value is the area in the tails of the standard
normal distribution outside  (calculated t-statistic). Part (d) is a one-sided test
and the p-value is the area under the standard normal distribution to the right of
the calculated t-statistic.
(f) For the test H0: p = 0.5 versus H1: p > 0.5, we cannot reject the null hypothesis at
the 5% significance level. The p-value 0.066 is larger than 0.05. Equivalently the
calculated t-statistic 1.506 is less than the critical value 1.64 for a one-sided test
with a 5% significance level. The test suggests that the survey did not contain
statistically significant evidence that the incumbent was ahead of the challenger at
the time of the survey.
_____________________________________________________________________________________________________
4
3.4. Using Key Concept 3.7 in the text
(a) 95% confidence interval for p is
ˆ ˆ
1.96 ( ) 0.5375 1.96 0.0249 (0.4887,0.5863).
p SE p
 =   =
(b) 99% confidence interval for p is
ˆ ˆ
2.57 ( ) 0.5375 2.57 0.0249 (0.4735,0.6015).
p SE p
 =   =
(c) Mechanically, the interval in (b) is wider because of a larger critical value (2.57
versus 1.96). Substantively, a 99% confidence interval is wider than a 95%
confidence because a 99% confidence interval must contain the true value of p in
99% of all possible samples, while a 95% confidence interval must contain the
true value of p in only 95% of all possible samples.
(d) Since 0.50 lies inside the 95% confidence interval for p, we cannot reject the null
hypothesis at a 5% significance level.
_____________________________________________________________________________________________________
5
3.5.
(a) (i) The size is given by ˆ
Pr(| 0.5| .02),
p −  where the probability is computed assuming
that 0.5.
p =
Pr(|p̂ - 0.5|> .02) =1- Pr(-0.02 £ p̂ - 0.5 £ .02)
= 1- Pr
-0.02
.5´.5/1055
£
p̂ - 0.5
.5´.5/1055
£
0.02
.5´.5/1055
æ
è
ç
ö
ø
÷
= 1- Pr -1.30 £
p̂ - 0.5
.5´.5/1055
£1.30
æ
è
ç
ö
ø
÷
» 0.19
where the final equality using the central limit theorem approximation.
(ii) The power is given by ˆ
Pr(| 0.5| .02),
p −  where the probability is computed
assuming that p = 0.53.
Pr(|p̂ - 0.5| > .02) =1- Pr(-0.02 £ p̂ - 0.5 £ .02)
= 1- Pr
-0.02
.53´.47/1055
£
p̂ - 0.5
.53´.47/1055
£
0.02
.53´.47/1055
æ
è
ç
ö
ø
÷
= 1- Pr
-0.05
.53´.47/1055
£
p̂ - 0.53
.53´.47/1055
£
-0.01
.53´.47/1055
æ
è
ç
ö
ø
÷
= 1- Pr -3.25 £
p̂ - 0.53
.53´.47/1055
£ -0.65
æ
è
ç
ö
ø
÷
» 0.74
where the final equality using the central limit theorem approximation.
(b) (i) 0.54 0.5
0.54 0.46/1055
2.61, Pr(| | 2.61) .01,
t t
−

= =  = so that the null is rejected at the 5% level.
(ii) Pr( 2.61) .004,
t  = so that the null is rejected at the 5% level.
(iii) 0.54 1.96 0.54 0.46 /1055 0.54 0.03, or 0.51 to 0.57.
  = 
(iv) 0.54 2.58 0.54 0.46 /1055 0.54 0.04, or 0.50 to 0.58.
  = 
(v) 0.54 0.67 0.54 0.46 /1055 0.54 0.01, or 0.53 to 0.55.
  = 
_____________________________________________________________________________________________________
6
3.5 (continued)
(c) (i) The probability is 0.95 is any single survey, there are 20 independent
surveys, so the probability if 20
0.95 0.36
=
(ii) 95% of the 20 confidence intervals or 19.
(d) The sample size must be chosen so that 1.96×SE( p̂) < 0.01. Noting that SE( p̂)
= p(1- p)/n then n must be chosen so that n > 1.962
p(1- p)
.012 , and the value of n
that satisties this depends on the value of p. The largest value that p(1 − p) can
take on is 0.25 (that is, p = 0.5 makes p(1 − p) as large as possible). Thus if
2
2
1.96 0.25
.01
9604,
n 
 = then the margin of error is less than 0.01 for all values of p.
_____________________________________________________________________________________________________
7
3.6. (a) No. Because the p-value is less than 0.05 (=5%),  = 5 is rejected at the 5% level
and is therefore not contained in the 95% confidence interval.
(b) No. This would require calculation of the t-statistic for  = 6, which requires Y
and SE( ).
Y Only the p-value for test that  = 5 is given in the problem.
_____________________________________________________________________________________________________
8
3.7. The null hypothesis is that the survey is a random draw from a population with p
0.11. The t-statistic is ˆ 0.11
ˆ
( )
,
p
SE p
t −
= where ˆ ˆ ˆ
( ) (1 )/ .
SE p p p n
= − (An alternative formula
for SE( p̂ ) is 0.11 (1 0.11) / ,
n
 − which is valid under the null hypothesis that
0.11).
p = The value of the t-statistic is 2.71, which has a p-value that is less than
0.01. Thus the null hypothesis 0.11
p = (the survey is unbiased) can be rejected at
the 1% level.
_____________________________________________________________________________________________________
9
3.8 ( )
123
1000
1110 1.96
 or 1110 7.62.

_____________________________________________________________________________________________________
10
3.9. Denote the life of a light bulb from the new process by .
Y The mean of Y is  and
the standard deviation of Y is 200
Y
 = hours. Y is the sample mean with a
sample size 100.
n = The standard deviation of the sampling distribution of Y is
200
100
20
Y
Y n

 = = = hours. The hypothesis test is 0 : 2000
H  = vs. 1 2000 .
H 
 
The manager will accept the alternative hypothesis if 2100
Y  hours.
(a) The size of a test is the probability of erroneously rejecting a null hypothesis
when it is valid.
The size of the manager’s test is
7
size Pr( 2100| 2000) 1 Pr( 2100| 2000)
2000 2100 2000
1 Pr | 2000
20 20
1 (5) 1 0 999999713 2 87 10
Y Y
Y
 

−
=  = = −  =
 
− −
= −  =
 
 
= −  = −  =   
Pr( 2100| 2000)
Y 
 = means the probability that the sample mean is greater
than 2100 hours when the new process has a mean of 2000 hours.
(b) The power of a test is the probability of correctly rejecting a null hypothesis
when it is invalid. We calculate first the probability of the manager erroneously
accepting the null hypothesis when it is invalid:
2150 2100 2150
Pr( 2100| 2150) Pr | 2150
20 20
( 2 5) 1 (2 5) 1 0 9938 0 0062
Y
Y
  
 
− −
=  = =  =
 
 
=  −  = −   = −  =  
The power of the manager’s testing is 1 1 0 0062 0 9938.

− = −  = 
(c) For a test with size 5%, the rejection region for the null hypothesis contains
those values of the t-statistic exceeding 1.645.
2000
1 645 2000 1 645 20 2032 9
20
act
act act
Y
t Y
−
=     +   =  
The manager should believe the inventor’s claim if the sample mean life of the
new product is greater than 2032.9 hours if she wants the size of the test to be
5%.
_____________________________________________________________________________________________________
11
3.10. (a) New Jersey sample size 1 100,
n = sample average 1
Y = 58, and sample standard
deviation s1 = 58. The standard error of 1
Y is SE( 1
Y ) = 1
1
8
100
s
n
= = 0.8. The
95% confidence interval for the mean score of all New Jersey third graders is
m1
= Y1
±1.96SE(Y1
) = 58±1.96´0.8= (56.4,59.6).
(b) Iowa sample size 2 200,
n = sample average 2 62,
Y = sample standard deviation
2 11.
s = The standard error of 1 2
Y Y
− is SE
2 2
1 2
1 2
64 121
1 2 100 200
( ) 1 1158.
s s
n n
Y Y
− = + = + =  The 90% confidence interval for the
difference in mean score between the two states is
m1
- m2
= (Y1
-Y2
) ±1.64SE(Y1
-Y2
)
= (58- 62) ±1.64´1.1158 = (-5.8,- 2.2).
(c) The hypothesis tests for the difference in mean scores is
0 1 2 1 1 2
0 vs 0
H H
   
 − =   −  
From part (b) the standard error of the difference in the two sample means is SE
1 2
( ) 1 1158.
Y Y
− =  The t-statistic for testing the null hypothesis is
1 2
1 2
58 62
3 5849
SE( ) 1 1158
act Y Y
t
Y Y
− −
= = = −  
− 
Use Equation (3.14) in the text to compute the p-value:
value 2 ( | |) 2 ( 3 5849) 2 0 00017 0 00034
act
p t
− =  − =  −  =   =  
Because of the extremely low p-value, we can reject the null hypothesis with a
very high degree of confidence. That is, the population means for Iowa and New
Jersey students are different.
_____________________________________________________________________________________________________
12
3.11. Assume that n is an even number. Then is constructed by applying a weight of
1
2 to the 2
n
“odd” observations and a weight of 3
2 to the remaining 2
n
observations.
_____________________________________________________________________________________________________
13
3.12. Sample size for men 1 100,
n = sample average 1 3100,
Y = sample standard
deviation 1
200.
s = Sample size for women 2 64,
n = sample average 2 2900,
Y =
sample standard deviation 2 320.
s = The standard error of 1 2
Y Y
− is:
SE( 1
Y - 2
Y ) =
s1
2
n1
+
s2
2
n2
= 2002
100
+ 3202
64
= 44.721.
(a) The hypothesis test for the difference in mean monthly salaries is
0 1 2 1 1 2
0 vs 0
H H
   
 − =   −  
The t-statistic for testing the null hypothesis is
1 2
1 2
3100 2900
4 4722
SE( ) 44 721
act Y Y
t
Y Y
− −
= = =  
− 
Use Equation (3.14) in the text to get the p-value:
6 6
-value 2 ( | |) 2 ( 4 4722) 2 (3 8744 10 ) 7 7488 10
act
p t − −
=  − =  −  =    =   
The extremely low level of p-value implies that the difference in the monthly
salaries for men and women is statistically significant. We can reject the null
hypothesis with a high degree of confidence.
(b) From part (a), there is overwhelming statistical evidence that mean earnings for
men differ from mean earnings for women, and a related calculation shows
overwhelming evidence that mean earning for men are greater that mean
earnings for women. However, by itself, this does not imply sex discrimination
by the firm. Sex discrimination means that two workers, identical in every way
but gender, are paid different wages. The data description suggests that some
care has been taken to make sure that workers with similar jobs are being
compared. But, it is also important to control for characteristics of the workers
that may affect their productivity (education, years of experience, etc.). If these
characteristics are systematically different between men and women, then they
may be responsible for the difference in mean wages. (If this is true, it raises an
interesting and important question of why women tend to have less education or
less experience than men, but that is a question about something other than sex
discrimination by this firm.) Since these characteristics are not controlled for in
the statistical analysis, it is premature to reach a conclusion about sex
discrimination.
_____________________________________________________________________________________________________
14
3.13 (a) Sample size 420,
n = sample average Y = 646.2 sample standard deviation
19 5.
Y
s =  The standard error of Y is SE 19 5
420
( ) 0 9515.
Y
s
n
Y 
= = =  The 95%
confidence interval for the mean test score in the population is
m = Y ±1.96SE(Y) = 646.2±1.96´0.9515= (644.3, 648.1).
(b) The data are: sample size for small classes 1 238,
n = sample average 1 657 4,
Y = 
sample standard deviation 1
19 4;
s =  sample size for large classes 2 182,
n =
sample average 2 650 0,
Y =  sample standard deviation 2 17 9.
s =  The standard
error of 1 2
Y Y
− is
2 2 2 2
1 2
1 2
19 4 17 9
1 2 238 182
( ) 1 8281.
s s
n n
SE Y Y  
− = + = + =  The hypothesis
tests for higher average scores in smaller classes is
0 1 2 1 1 2
0 vs 0
H H
   
 − =   −  
The t-statistic is
1 2
1 2
657 4 650 0
4 0479
SE( ) 1 8281
act Y Y
t
Y Y
−  − 
= = =  
− 
The p-value for the one-sided test is:
5
-value 1 ( ) 1 (4 0479) 1 0 999974147 2 5853 10
act
p t −
= −  = −   = −  =   
With the small p-value, the null hypothesis can be rejected with a high degree of
confidence. There is statistically significant evidence that the districts with
smaller classes have higher average test scores.
_____________________________________________________________________________________________________
15
3.14. We have the following relations: 1 0 0254
in m
=  (or 1 39 37 ),
m in
= 
1 0 4536
lb kg
=  (or 1 2.2046 ).
kg lb
= The summary statistics in the metric system
are 70 5 0 0254 1 79 ;
X m
=    =  158 0 4536 71 669 ;
Y kg
=   = 
1 8 0 0254 0 0457 ;
X
s m
=    =  14 2 0 4536 6 4411 ;
Y
s kg
=    = 
21 73 0 0254 0 4536 0 2504 ,
XY
s m kg
=      =   and 0 85.
XY
r = 
_____________________________________________________________________________________________________
16
3.15. From the textbook equation (2.45), we know that E(Y ) = Y and from (2.46) we
know that var(Y ) =
2
Y
n

. In this problem, because Ya and Yb are Bernoulli random
variables, ˆa
p = a
Y , ˆb
p = b
Y , 2
Ya
 = pa(1–pa) and 2
Yb
 = pb(1–pb). The answers to (a)
follow from this.
For part (b), note that
var( ˆa
p – ˆb
p ) = var( ˆa
p ) + var( ˆb
p ) – 2cov( ˆa
p , ˆb
p ).
But, ˆa
p and ˆb
p are independent (they depend on data chosen from independent
samples), and this means that cov( ˆa
p , ˆb
p ) = 0. Thus var( ˆa
p – ˆb
p ) = var( ˆa
p ) + var(
ˆb
p ).
For part (c), equation (3.21) from the text (replacing Y with p̂ and using the result in
(b) to compute the SE).
For (d), apply the formula in (c) to obtain
95% CI is (.859 – .374) ± 1.96
0.859(1 0.859) 0.374(1 0.374)
5801 4249
− −
+ or 0.485 ±
0.017.
_____________________________________________________________________________________________________
17
3.16. (a) The 95% confidence interval if
108
453
1.96 SE( ) or 1013 1.96 or 1013 9.95.
Y Y
   
(b) The confidence interval in (a) does not include  = 1000, so the null hypothesis
that  = 1000 (Florida students have the same average performance as students in
the U.S.) can be rejected at the 5% level.
(c) (i) The 95% confidence interval is 1.96 ( )
prep Non prep prep Non prep
Y Y SE Y Y
− −
−  −
where
2 2
2 2
95 108
503 453
SE( ) 6.61;
prep non prep
prep non prep
S S
prep Non prep n n
Y Y −
−
−
− = + = + = the 95%
confidence interval is (1019 1013) 12.96 6 12.96.
or
−  
(ii) No. The 95% confidence interval includes mprep
- mnon-prep
= 0. These data do
not provide statistically significant evidence that the prep-course changes the
average test score.
(d) (i) Let X denote the change in the test score. The 95% confidence interval for X is
60
453
1.96 ( ), where SE( ) 2.82;
X SE X X
 = = thus, the confidence interval is
9 5.52.
 (ii) Yes. The 95% confidence interval does not include X = 0.
(iii) Randomly select n students who have taken the test only one time. Randomly
select one half of these students and have them take the prep course.
Administer the test again to all of the n students. Compare the gain in
performance of the prep-course second-time test takers to the non-prep-course
second-time test takers.
_____________________________________________________________________________________________________
18
3.17. (a) The 95% confidence interval is Ym, 2015
-Ym,1996
±1.96 SE(Ym, 2015
-Ym,1996
) where
SE(Ym, 2015
-Ym,1996
) =
Sm, 2015
2
nm, 2015
+
Sm,1996
2
nm,1996
= 14.372
1917
+ 11.442
1387
= 0.45; the 95% confidence
interval is (28.06 – 24.87) ± 0.88 or 3.19 ± 0.88 = ($2.31, 4.07).
(b) The 95% confidence interval is Yw, 2015
-Yw,1996
±1.96 SE(Yw, 2015
-Yw,1996
) where
SE(Yw, 2015
-Yw,1996
) =
Sw, 2015
2
nw, 2015
+
Sw,1996
2
nw,1996
= 11.222
1816
+ 8.932
1232
= 0.37; the 95% confidence
interval is (23.04-20.99) ± 0.72 or 2.05 ± 0.72 = ($1.33, 2.77).
(c) The 95% confidence interval is
(Ym, 2015
-Ym,1996
)-(Yw, 2015
-Yw,1996
) ±1.96 SE[(Ym, 2015
-Ym,1996
)- (Yw, 2015
-Yw,1996
)],
where
(Ym, 2015
- Ym,1996
) - (Yw, 2015
-Yw,1996
)
=
Sm, 2015
2
nm, 2015
+
Sm,1996
2
nm,1996
+
Sw, 2015
2
nw, 2015
+
Sw,1996
2
nw,1996
= 14.372
1917
+ 11.442
1387
+ 11.222
1816
+ 8.932
1232
= 0.58.
The 95% confidence interval is
(28.06 – 24.87) − (23.04-20.99) ± 1.96  0.58 or 1.14 ± 1.13 = ($0.01, 2.17)
_____________________________________________________________________________________________________
19
3.18. 1 n
Y … Y
  are i.i.d. with mean Y
 and variance 2
.
Y
 The covariance cov( , ) 0,
j i
Y Y =
.
j i
 The sampling distribution of the sample average Y has mean Y
 and
variance var
2
2
( ) .
Y
n
Y
Y


= =
(a)
2 2
2 2
2 2
[( ) ] {[( ) ( )] }
[( ) 2( )( ) ( ) ]
[( ) ] 2 [( )( )] [( ) ]
var( ) 2cov( , ) var( ).
i i Y Y
i Y i Y Y Y
i Y i Y Y Y
i i
E Y Y E Y Y
E Y Y Y Y
E Y E Y Y E Y
Y Y Y Y
 
   
   
− = − − −
= − − − − + −
= − − − − + −
= − +
(b)
1
1 2
2
2
cov( ) [( )( )] ( )
( ) 1 1
( ) [( ) [( )( )]
1 1
cov( , )
n
j j
Y i Y Y i Y
n
j j Y
i Y j Y i Y
I Y
j i
Y
Y j i
j i
Y
Y Y E Y Y E Y
n
Y
E Y E Y E Y Y
n n n
Y Y
n n n
   

   


 
 
 
 
=
 
 
 
 
 
 
 
 
 
 
=
 
 
  
 
 


 = − − = − −
 
 −
= − = − + − −
 
 
 
= + = 


(c)
E sY
2
æ
è
ç
ö
ø
÷ = E
1
n -1 i=1
n
å(Yi
-Y )2
æ
è
ç
ö
ø
÷ =
1
n -1 i=1
n
åE[(Yi
-Y )2
]
=
1
n -1 i=1
n
å[var(Yi
) - 2cov(Yi
, Y ) + var(Y )]
=
1
n -1 i=1
n
å sY
2
- 2 ´
sY
2
n
+
sY
2
n
é
ë
ê
ù
û
ú =
1
n -1 i=1
n
å
n -1
n
sY
2
æ
è
ç
ç
ç
ö
ø
÷
÷
÷
= sY
2
.
_____________________________________________________________________________________________________
20
3.19. (a) No. 2 2 2 2
( ) and ( ) for .
i Y Y i j Y
E Y E YY i j
  
= + =  Thus
2
2 2 2 2
2 2
1 1 1
1 1 1 1
( ) ( ) ( )
n n n
i i i j Y Y
i i i j i
E Y E Y E Y E YY
n n n n
 
= = = 
 
= = + = +
 
 
  
(b) Yes. If Y gets arbitrarily close to Y with probability approaching 1 as n gets
large, then 2
Y gets arbitrarily close to 2
Y
 with probability approaching 1 as n
gets large. (As it turns out, this is an example of the “continuous mapping
theorem” discussed in Chapter 18.)
_____________________________________________________________________________________________________
21
3.20. Using analysis like that in equation (3.29)
1
1
1
( )( )
1
1
( )( ) ( )( )
1 1
n
XY i i
i
n
i X i Y X Y
i
s X X Y Y
n
n n
X Y X Y
n n n
   
=
=
= − −
−
   
= − − − − −
 
 
− −
 
 


because
p
X
X 
→ and
p
Y
Y 
→ the final term converges in probability to zero.
Let ( )( ).
i i x i Y
W X Y
 
= − − Note Wi is iid with mean XY and second moment
2 2
[( ) ( ) ].
i X i Y
E X Y
 
− −
Also
2 2 4 4
[( ) ( ) ] ( ) ( )
i X i Y i X i Y
E X Y E X E Y
   
− −  − − from the Cauchy-Schwartz
inequality.
Because X and Y have finite fourth moments, the second moment of Wi is finite, so that it
has finite variance. Thus 1
n i=1
n
å Wi
®
p
E(Wi
) = s XY
, and hus,
p
XY XY
s 
→ (because the term
1
1
n
n−
→ ).
_____________________________________________________________________________________________________
22
3.21. Set nm = nw = n, and use equation (3.19) write the squared SE of m w
Y Y
− as
2 2
1 1
2
2 2
1 1
1 1
( ) ( )
( 1) ( 1)
[ ( )]
( ) ( )
.
( 1)
n n
i mi m i wi w
m w
n n
i mi m i wi w
Y Y Y Y
n n
SE Y Y
n n
Y Y Y Y
n n
= =
= =
 −  −
− −
− = +
 − +  −
=
−
Similary, using equation (3.23)
[SEpooled
(Ym
-Yw
)]2
=
1
2(n -1)
åi=1
n
(Ymi
-Ym
)2
+ åi=1
n
(Ywi
-Yw
)2
é
ë
ù
û
2n
=
åi=1
n
(Ymi
-Ym
)2
+ åi=1
n
(Ywi
-Yw
)2
n(n -1)
.
_____________________________________________________________________________________________________
23
3.22 Let Z denote a standard normal variable.
(a) Under H0, t ~ N(0,1) and Pr(t > 1.64) = Pr(Z > 1.64) = 0.05.
(b) When Y = 2, t = (Y -0) / SE(Y) =[(Y - 2) / SE(Y)]+ 2 = Z + 2 ~ N(2,1). Thus Pr(t >
1.64) = Pr(Z + 2 > 1.64) = Pr(Z > -0.36) = 0.64
(c)
(i) Pr(t >1.64) = Pr(t > 1.64, Y = 0) + Pr(t > 1.64, Y = 2)
= Pr(t > 1.64|Y = 0)Pr(Y = 0) + Pr(t > 1.64|Y = 2)Pr(Y = 2)
= 0.05 × 0.90 + 0.64 × 0.10 = 0.109
(ii)
Pr(mY
= 0 |t >1.64) =
Pr(mY
= 0,t >1.64)
Pr(t >1.64)
=
Pr(mY
= 0,t >1.64)
Pr(mY
= 0,t >1.64) + Pr(mY
= 2,t >1.64)
=
Pr(t >1.64 | mY
= 0)Pr(mY
= 0)
Pr(t >1.64 | mY
= 0)Pr(mY
= 0) + Pr(t >1.64 | mY
=1)Pr(mY
= 1)
=
0.05´ 0.90
0.05´ 0.90 + 0.64 ´ 0.10
= 0.41
(d) (i) These are the same values as in (a)-(c). The false positive rate is 0.41 or 41%.
(ii) Following the same steps above:
Pr(t > 2.57 | Y = 0) = Pr(Z > 2.57) = .005
Pr(t > 2.57 | Y = 2) = Pr(Z+2 > 2.57) = P(Z > 0.57) = 0.28
Thus (continued next page)
_____________________________________________________________________________________________________
24
Pr(mY
= 0 |t > 2.57) =
Pr(mY
= 0,t > 2.57)
Pr(t > 2.57)
=
Pr(mY
= 0,t > 2.57)
Pr(mY
= 0,t > 2.57) + Pr(mY
= 2,t > 2.57)
=
Pr(t > 2.57 | mY
= 0)Pr(mY
= 0)
Pr(t > 2.57 | mY
= 0)Pr(mY
= 0) + Pr(t > 2.57 | mY
= 1)Pr(mY
= 1)
=
0.005´ 0.90
0.005´ 0.90 + 0.28 ´ 0.10
= 0.14
The false positive rate is 0.14 or 14%.
_____________________________________________________________________________________________________
25
Empirical Exercise 3.1
Calculations for this exercise are carried out in the STATA file EE_3_1.do.
(a)
Average Hourly Earnings, Nominal $’s
Mean SE(Mean) 95% Confidence Interval
AHE1996 12.69 0.081 12.53 − 12.85
AHE2015 21.23 0.144 20.96 − 21.52
Difference SE(Difference) 95% Confidence Interval
AHE2015 − AHE1996 8.54 0.165 8.22 − 8.87
(b)
Average Hourly Earnings, Real $2015
Mean SE(Mean) 95% Confidence Interval
AHE1996 19.17 0.123 18.93 – 19.41
AHE2015 21.23 0.144 20.96 − 21.52
Difference SE(Difference) 95% Confidence Interval
AHE2015 − AHE1996 2.06 0.189 1.69 – 2.44
(c) The results from part (b) adjust for changes in purchasing power. These results should be
used.
(d)
Average Hourly Earnings in 2015
Mean SE(Mean) 95% Confidence Interval
High School 16.38 0.147 16.09 – 16.67
College 25.62 0.216 25.19 – 26.04
Difference SE(Difference) 95% Confidence Interval
College-High School 9.23 0.227 8.72 – 9.75
(e)
Average Hourly Earnings in 1996 (in $2015)
Mean SE(Mean) 95% Confidence Interval
High School 16.27 0.130 16.01 – 16.52
College 23.04 0.205 22.64 – 23.44
Difference SE(Difference) 95% Confidence Interval
College-High School 6.77 0.243 6.29 – 7.34
_____________________________________________________________________________________________________
26
(f)
Average Hourly Earnings in 2015
Mean SE(Mean) 95% Confidence Interval
AHEHS,2015 − AHEHS,1996 0.11 0.196 -0.27 – 0.50
AHECol,2015 − AHECol,1996 2.58 0.298 1.99 – 3.16
Col-HS Gap (1996) 6.77 0.243 6.29 – 7.34
Col-HS Gap (2015) 9.23 0.227 8.72 – 9.75
Difference SE(Difference) 95% Confidence Interval
Gap2015 − Gap1996 2.46 0.363 1.75 – 3.17
Wages of high school graduates increased by an estimated 0.11 dollars per hour (with a 95%
confidence interval of −0.27 to 0.50); Wages of college graduates increased by an estimated 2.58
dollars per hour (with a 95% confidence interval of 1.99 to 3.16). The College-High School gap
increased by an estimated 2.46 dollars per hour (with a 95% confidence interval of 1.75 to 3.17).
(g)
Gender Gap in Earnings for High School Graduates
Year
m
Y sm nm
w
Y sw nw
m
Y − w
Y SE( m
Y − w
Y ) 95% CI
1996 17.78 8.24 2168 13.77 5.83 1316 4.02 0.24 3.54 –
4.48
2015 17.50 9.03 2222 14.21 7.00 1143 3.29 0.28 2.74 –
3.84
There is a large and statistically significant gender gap in earnings for high school graduates. In
2015 the estimated gap was $3.29 per hour; in 1996 the estimated gap was even larger $4.02 per
hour (in $2012). The estimated gender gap in 2015 is smaller than is college graduates (which is
$5.02 in Table 3.1 in the text) in absolute terms but similar in relative terms (18% for both high
school and college graduates).
_____________________________________________________________________________________________________
27
Empirical Exercise 3.2
Calculations for this exercise are carried out in the STATA file EE_3_2.do.
(a) For part (iii): A person will trade if she received good A but prefers good B or she received
good B and prefers good A. 50% received good A, of these (100-X)% prefer good B; 50% receive
good B, of these X% prefer good A. Let x = X/100. Thus the
“Expected Fraction Traded” = 0.5  (1 − x) + 0.5x = 0.5.
To answer part (i) use x = 0, and for part (ii) use x = 0.50. Note that regardless of the value of x,
the expected fraction traded is equal to 50%.
(b) The fraction of trades in the sample was 0.338, with a standard error of 0.039. The 95%
confidence interval for the population fraction is 0.261 to 0.415; this set does not include 0.50, so
there is statistically significant evidence against the null hypothesis of no “endowment effect,”
that is that the true fraction is 0.50. (Alternatively the t-statistic for the null hypothesis is t =
(0.338 − 0.50)/ 0.039 = −4.15, which is larger in absolute value than the two-sided 5% critical
value of 1.96).
(c) The following table shows the difference between traders and non-traders, where x denotes
the sample average of the binary variable that is equal to 1 if a trade was made.
x SE( x ) 95% Confidence Interval for the population mean
Traders
0.45 0.06 0.33 – 0.56
Non-Traders
0.23 0.05 0.13 – 0.33
Difference
0.22 0.08 0.07 – 0.37
In the sample, 45% of the Traders made trades, while only 23% of the non-Traders made trades.
The estimate for Traders is not statistically significantly different from 0.5, so there is not
statistically significant evidence that Traders are influenced by an endowment effect. The results
suggest a large and statistically significant difference in the behavior of Traders and non-Traders.

More Related Content

Similar to Chapitre03_Solutions.pdf

C2 st lecture 11 the t-test handout
C2 st lecture 11   the t-test handoutC2 st lecture 11   the t-test handout
C2 st lecture 11 the t-test handout
fatima d
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
fatima d
 

Similar to Chapitre03_Solutions.pdf (20)

Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
 
U unit8 ksb
U unit8 ksbU unit8 ksb
U unit8 ksb
 
Sampling Theory Part 3
Sampling Theory Part 3Sampling Theory Part 3
Sampling Theory Part 3
 
Confidence Level and Sample Size
Confidence Level and Sample SizeConfidence Level and Sample Size
Confidence Level and Sample Size
 
Lec. 10: Making Assumptions of Missing data
Lec. 10: Making Assumptions of Missing dataLec. 10: Making Assumptions of Missing data
Lec. 10: Making Assumptions of Missing data
 
Mech ma6452 snm_notes
Mech ma6452 snm_notesMech ma6452 snm_notes
Mech ma6452 snm_notes
 
Week8 livelecture2010 follow_up
Week8 livelecture2010 follow_upWeek8 livelecture2010 follow_up
Week8 livelecture2010 follow_up
 
Student t t est
Student t t estStudent t t est
Student t t est
 
Chi square
Chi squareChi square
Chi square
 
C2 st lecture 11 the t-test handout
C2 st lecture 11   the t-test handoutC2 st lecture 11   the t-test handout
C2 st lecture 11 the t-test handout
 
5-Propability-2-87.pdf
5-Propability-2-87.pdf5-Propability-2-87.pdf
5-Propability-2-87.pdf
 
Chapitre02_Solutions.pdf
Chapitre02_Solutions.pdfChapitre02_Solutions.pdf
Chapitre02_Solutions.pdf
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
t-z-chi-square tests of sig.pdf
t-z-chi-square tests of sig.pdft-z-chi-square tests of sig.pdf
t-z-chi-square tests of sig.pdf
 
Chapitre04_Solutions.pdf
Chapitre04_Solutions.pdfChapitre04_Solutions.pdf
Chapitre04_Solutions.pdf
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
 
Chapter07.pdf
Chapter07.pdfChapter07.pdf
Chapter07.pdf
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 

Recently uploaded (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

Chapitre03_Solutions.pdf

  • 1. _____________________________________________________________________________________________________ 1 3.1. The central limit theorem says that when the sample size ( n ) is large, the distribution of the sample average (Y ) is approximately 2 , Y Y N         with 2 2 . Y n Y   = Given a population 100, Y  = 2 43 0, Y  =  we have (a) 100, n = 2 2 43 100 0 43, Y n Y   = = =  and 100 101 100 Pr ( 101) Pr (1.525) 0 9364 0 43 0 43 Y Y   − −  =    =         (b) 64, n = 2 2 43 64 64 0 6719, Y Y   = = =  and 101 100 100 103 100 Pr(101 103) Pr 0 6719 0 6719 0 6719 (3 6599) (1 2200) 0 9999 0 8888 0 1111 Y Y   − − −   =             −   =  −  =   (c) 165, n = 2 2 43 165 0 2606, Y n Y   = = =  and 100 98 100 Pr ( 98) 1 Pr ( 98) 1 Pr 0 2606 0 2606 1 ( 3 9178) (3 9178) 1 0000 (rounded to four decimal places) Y Y Y   − −  = −  = −         −  −  =   =  
  • 2. _____________________________________________________________________________________________________ 2 3.2. Each random draw i Y from the Bernoulli distribution takes a value of either zero or one with probability Pr( 1) i Y p = = and Pr( 0) 1 . i Y p = = − The random variable i Y has mean ( ) 0 Pr( 0) 1 Pr( 1) , i E Y Y Y p =  = +  = = and variance 2 2 2 2 2 var( ) [( ) ] (0 ) Pr( 0) (1 ) Pr( 1) (1 ) (1 ) (1 ) i i Y i i Y E Y p Y p Y p p p p p p  = − = −  = + −  = = − + − = −  (a) The fraction of successes is 1 ( 1) (success) ˆ n i i i Y # Y # p Y n n n =  = = = = =  (b) 1 1 1 1 1 ˆ ( ) ( ) n n n i i i i i Y E p E E Y p p n n n = = =    = = = =          (c) 1 2 2 1 1 1 1 (1 ) ˆ var( ) var var( ) (1 ) n n n i i i i i Y p p p Y p p n n n n = = =    − = = = − =          The second equality uses the fact that 1 Y , , Yn are i.i.d. draws and cov( , ) 0, i j Y Y = for . i j 
  • 3. _____________________________________________________________________________________________________ 3 3.3. Denote each voter’s preference by Y, with Y = 1 if the voter prefers the incumbent and Y = 0 if the voter prefers the challenger. Y is a Bernoulli random variable with probability Pr( 1) Y p = = and Pr( 0) 1 . Y p = = − From the solution to Exercise 3.2, Y has mean p and variance (1 ). p p − (a) 215 400 ˆ 0 5375. p = =  (b) The estimated variance of p̂ is The standard error is SE 1 2 ˆ ˆ ( ) (var( )) 0 0249. p p = =  (c) The computed t-statistic is 0 ˆ 0 5375 0 5 1 506 ˆ SE( ) 0 0249 p act p t p   −  −  = = =    Because of the large sample size ( 400), n = we can use Equation (3.14) in the text to compute the p-value for the test 0 0 5 H p  =  vs. 1 0 5: H p    -value 2 ( | |) 2 ( 1 506) 2 0 066 0 132 act p t =  − =  −  =   =   (d) Using Equation (3.17) in the text, the p-value for the test 0 0 5 H p  =  vs. 1 0 5 H p    is -value 1 ( ) 1 (1 506) 1 0 934 0 066 act p t = −  = −   = −  =   (e) Part (c) is a two-sided test and the p-value is the area in the tails of the standard normal distribution outside  (calculated t-statistic). Part (d) is a one-sided test and the p-value is the area under the standard normal distribution to the right of the calculated t-statistic. (f) For the test H0: p = 0.5 versus H1: p > 0.5, we cannot reject the null hypothesis at the 5% significance level. The p-value 0.066 is larger than 0.05. Equivalently the calculated t-statistic 1.506 is less than the critical value 1.64 for a one-sided test with a 5% significance level. The test suggests that the survey did not contain statistically significant evidence that the incumbent was ahead of the challenger at the time of the survey.
  • 4. _____________________________________________________________________________________________________ 4 3.4. Using Key Concept 3.7 in the text (a) 95% confidence interval for p is ˆ ˆ 1.96 ( ) 0.5375 1.96 0.0249 (0.4887,0.5863). p SE p  =   = (b) 99% confidence interval for p is ˆ ˆ 2.57 ( ) 0.5375 2.57 0.0249 (0.4735,0.6015). p SE p  =   = (c) Mechanically, the interval in (b) is wider because of a larger critical value (2.57 versus 1.96). Substantively, a 99% confidence interval is wider than a 95% confidence because a 99% confidence interval must contain the true value of p in 99% of all possible samples, while a 95% confidence interval must contain the true value of p in only 95% of all possible samples. (d) Since 0.50 lies inside the 95% confidence interval for p, we cannot reject the null hypothesis at a 5% significance level.
  • 5. _____________________________________________________________________________________________________ 5 3.5. (a) (i) The size is given by ˆ Pr(| 0.5| .02), p −  where the probability is computed assuming that 0.5. p = Pr(|p̂ - 0.5|> .02) =1- Pr(-0.02 £ p̂ - 0.5 £ .02) = 1- Pr -0.02 .5´.5/1055 £ p̂ - 0.5 .5´.5/1055 £ 0.02 .5´.5/1055 æ è ç ö ø ÷ = 1- Pr -1.30 £ p̂ - 0.5 .5´.5/1055 £1.30 æ è ç ö ø ÷ » 0.19 where the final equality using the central limit theorem approximation. (ii) The power is given by ˆ Pr(| 0.5| .02), p −  where the probability is computed assuming that p = 0.53. Pr(|p̂ - 0.5| > .02) =1- Pr(-0.02 £ p̂ - 0.5 £ .02) = 1- Pr -0.02 .53´.47/1055 £ p̂ - 0.5 .53´.47/1055 £ 0.02 .53´.47/1055 æ è ç ö ø ÷ = 1- Pr -0.05 .53´.47/1055 £ p̂ - 0.53 .53´.47/1055 £ -0.01 .53´.47/1055 æ è ç ö ø ÷ = 1- Pr -3.25 £ p̂ - 0.53 .53´.47/1055 £ -0.65 æ è ç ö ø ÷ » 0.74 where the final equality using the central limit theorem approximation. (b) (i) 0.54 0.5 0.54 0.46/1055 2.61, Pr(| | 2.61) .01, t t −  = =  = so that the null is rejected at the 5% level. (ii) Pr( 2.61) .004, t  = so that the null is rejected at the 5% level. (iii) 0.54 1.96 0.54 0.46 /1055 0.54 0.03, or 0.51 to 0.57.   =  (iv) 0.54 2.58 0.54 0.46 /1055 0.54 0.04, or 0.50 to 0.58.   =  (v) 0.54 0.67 0.54 0.46 /1055 0.54 0.01, or 0.53 to 0.55.   = 
  • 6. _____________________________________________________________________________________________________ 6 3.5 (continued) (c) (i) The probability is 0.95 is any single survey, there are 20 independent surveys, so the probability if 20 0.95 0.36 = (ii) 95% of the 20 confidence intervals or 19. (d) The sample size must be chosen so that 1.96×SE( p̂) < 0.01. Noting that SE( p̂) = p(1- p)/n then n must be chosen so that n > 1.962 p(1- p) .012 , and the value of n that satisties this depends on the value of p. The largest value that p(1 − p) can take on is 0.25 (that is, p = 0.5 makes p(1 − p) as large as possible). Thus if 2 2 1.96 0.25 .01 9604, n   = then the margin of error is less than 0.01 for all values of p.
  • 7. _____________________________________________________________________________________________________ 7 3.6. (a) No. Because the p-value is less than 0.05 (=5%),  = 5 is rejected at the 5% level and is therefore not contained in the 95% confidence interval. (b) No. This would require calculation of the t-statistic for  = 6, which requires Y and SE( ). Y Only the p-value for test that  = 5 is given in the problem.
  • 8. _____________________________________________________________________________________________________ 8 3.7. The null hypothesis is that the survey is a random draw from a population with p 0.11. The t-statistic is ˆ 0.11 ˆ ( ) , p SE p t − = where ˆ ˆ ˆ ( ) (1 )/ . SE p p p n = − (An alternative formula for SE( p̂ ) is 0.11 (1 0.11) / , n  − which is valid under the null hypothesis that 0.11). p = The value of the t-statistic is 2.71, which has a p-value that is less than 0.01. Thus the null hypothesis 0.11 p = (the survey is unbiased) can be rejected at the 1% level.
  • 10. _____________________________________________________________________________________________________ 10 3.9. Denote the life of a light bulb from the new process by . Y The mean of Y is  and the standard deviation of Y is 200 Y  = hours. Y is the sample mean with a sample size 100. n = The standard deviation of the sampling distribution of Y is 200 100 20 Y Y n   = = = hours. The hypothesis test is 0 : 2000 H  = vs. 1 2000 . H    The manager will accept the alternative hypothesis if 2100 Y  hours. (a) The size of a test is the probability of erroneously rejecting a null hypothesis when it is valid. The size of the manager’s test is 7 size Pr( 2100| 2000) 1 Pr( 2100| 2000) 2000 2100 2000 1 Pr | 2000 20 20 1 (5) 1 0 999999713 2 87 10 Y Y Y    − =  = = −  =   − − = −  =     = −  = −  =    Pr( 2100| 2000) Y   = means the probability that the sample mean is greater than 2100 hours when the new process has a mean of 2000 hours. (b) The power of a test is the probability of correctly rejecting a null hypothesis when it is invalid. We calculate first the probability of the manager erroneously accepting the null hypothesis when it is invalid: 2150 2100 2150 Pr( 2100| 2150) Pr | 2150 20 20 ( 2 5) 1 (2 5) 1 0 9938 0 0062 Y Y      − − =  = =  =     =  −  = −   = −  =   The power of the manager’s testing is 1 1 0 0062 0 9938.  − = −  =  (c) For a test with size 5%, the rejection region for the null hypothesis contains those values of the t-statistic exceeding 1.645. 2000 1 645 2000 1 645 20 2032 9 20 act act act Y t Y − =     +   =   The manager should believe the inventor’s claim if the sample mean life of the new product is greater than 2032.9 hours if she wants the size of the test to be 5%.
  • 11. _____________________________________________________________________________________________________ 11 3.10. (a) New Jersey sample size 1 100, n = sample average 1 Y = 58, and sample standard deviation s1 = 58. The standard error of 1 Y is SE( 1 Y ) = 1 1 8 100 s n = = 0.8. The 95% confidence interval for the mean score of all New Jersey third graders is m1 = Y1 ±1.96SE(Y1 ) = 58±1.96´0.8= (56.4,59.6). (b) Iowa sample size 2 200, n = sample average 2 62, Y = sample standard deviation 2 11. s = The standard error of 1 2 Y Y − is SE 2 2 1 2 1 2 64 121 1 2 100 200 ( ) 1 1158. s s n n Y Y − = + = + =  The 90% confidence interval for the difference in mean score between the two states is m1 - m2 = (Y1 -Y2 ) ±1.64SE(Y1 -Y2 ) = (58- 62) ±1.64´1.1158 = (-5.8,- 2.2). (c) The hypothesis tests for the difference in mean scores is 0 1 2 1 1 2 0 vs 0 H H      − =   −   From part (b) the standard error of the difference in the two sample means is SE 1 2 ( ) 1 1158. Y Y − =  The t-statistic for testing the null hypothesis is 1 2 1 2 58 62 3 5849 SE( ) 1 1158 act Y Y t Y Y − − = = = −   −  Use Equation (3.14) in the text to compute the p-value: value 2 ( | |) 2 ( 3 5849) 2 0 00017 0 00034 act p t − =  − =  −  =   =   Because of the extremely low p-value, we can reject the null hypothesis with a very high degree of confidence. That is, the population means for Iowa and New Jersey students are different.
  • 12. _____________________________________________________________________________________________________ 12 3.11. Assume that n is an even number. Then is constructed by applying a weight of 1 2 to the 2 n “odd” observations and a weight of 3 2 to the remaining 2 n observations.
  • 13. _____________________________________________________________________________________________________ 13 3.12. Sample size for men 1 100, n = sample average 1 3100, Y = sample standard deviation 1 200. s = Sample size for women 2 64, n = sample average 2 2900, Y = sample standard deviation 2 320. s = The standard error of 1 2 Y Y − is: SE( 1 Y - 2 Y ) = s1 2 n1 + s2 2 n2 = 2002 100 + 3202 64 = 44.721. (a) The hypothesis test for the difference in mean monthly salaries is 0 1 2 1 1 2 0 vs 0 H H      − =   −   The t-statistic for testing the null hypothesis is 1 2 1 2 3100 2900 4 4722 SE( ) 44 721 act Y Y t Y Y − − = = =   −  Use Equation (3.14) in the text to get the p-value: 6 6 -value 2 ( | |) 2 ( 4 4722) 2 (3 8744 10 ) 7 7488 10 act p t − − =  − =  −  =    =    The extremely low level of p-value implies that the difference in the monthly salaries for men and women is statistically significant. We can reject the null hypothesis with a high degree of confidence. (b) From part (a), there is overwhelming statistical evidence that mean earnings for men differ from mean earnings for women, and a related calculation shows overwhelming evidence that mean earning for men are greater that mean earnings for women. However, by itself, this does not imply sex discrimination by the firm. Sex discrimination means that two workers, identical in every way but gender, are paid different wages. The data description suggests that some care has been taken to make sure that workers with similar jobs are being compared. But, it is also important to control for characteristics of the workers that may affect their productivity (education, years of experience, etc.). If these characteristics are systematically different between men and women, then they may be responsible for the difference in mean wages. (If this is true, it raises an interesting and important question of why women tend to have less education or less experience than men, but that is a question about something other than sex discrimination by this firm.) Since these characteristics are not controlled for in the statistical analysis, it is premature to reach a conclusion about sex discrimination.
  • 14. _____________________________________________________________________________________________________ 14 3.13 (a) Sample size 420, n = sample average Y = 646.2 sample standard deviation 19 5. Y s =  The standard error of Y is SE 19 5 420 ( ) 0 9515. Y s n Y  = = =  The 95% confidence interval for the mean test score in the population is m = Y ±1.96SE(Y) = 646.2±1.96´0.9515= (644.3, 648.1). (b) The data are: sample size for small classes 1 238, n = sample average 1 657 4, Y =  sample standard deviation 1 19 4; s =  sample size for large classes 2 182, n = sample average 2 650 0, Y =  sample standard deviation 2 17 9. s =  The standard error of 1 2 Y Y − is 2 2 2 2 1 2 1 2 19 4 17 9 1 2 238 182 ( ) 1 8281. s s n n SE Y Y   − = + = + =  The hypothesis tests for higher average scores in smaller classes is 0 1 2 1 1 2 0 vs 0 H H      − =   −   The t-statistic is 1 2 1 2 657 4 650 0 4 0479 SE( ) 1 8281 act Y Y t Y Y −  −  = = =   −  The p-value for the one-sided test is: 5 -value 1 ( ) 1 (4 0479) 1 0 999974147 2 5853 10 act p t − = −  = −   = −  =    With the small p-value, the null hypothesis can be rejected with a high degree of confidence. There is statistically significant evidence that the districts with smaller classes have higher average test scores.
  • 15. _____________________________________________________________________________________________________ 15 3.14. We have the following relations: 1 0 0254 in m =  (or 1 39 37 ), m in =  1 0 4536 lb kg =  (or 1 2.2046 ). kg lb = The summary statistics in the metric system are 70 5 0 0254 1 79 ; X m =    =  158 0 4536 71 669 ; Y kg =   =  1 8 0 0254 0 0457 ; X s m =    =  14 2 0 4536 6 4411 ; Y s kg =    =  21 73 0 0254 0 4536 0 2504 , XY s m kg =      =   and 0 85. XY r = 
  • 16. _____________________________________________________________________________________________________ 16 3.15. From the textbook equation (2.45), we know that E(Y ) = Y and from (2.46) we know that var(Y ) = 2 Y n  . In this problem, because Ya and Yb are Bernoulli random variables, ˆa p = a Y , ˆb p = b Y , 2 Ya  = pa(1–pa) and 2 Yb  = pb(1–pb). The answers to (a) follow from this. For part (b), note that var( ˆa p – ˆb p ) = var( ˆa p ) + var( ˆb p ) – 2cov( ˆa p , ˆb p ). But, ˆa p and ˆb p are independent (they depend on data chosen from independent samples), and this means that cov( ˆa p , ˆb p ) = 0. Thus var( ˆa p – ˆb p ) = var( ˆa p ) + var( ˆb p ). For part (c), equation (3.21) from the text (replacing Y with p̂ and using the result in (b) to compute the SE). For (d), apply the formula in (c) to obtain 95% CI is (.859 – .374) ± 1.96 0.859(1 0.859) 0.374(1 0.374) 5801 4249 − − + or 0.485 ± 0.017.
  • 17. _____________________________________________________________________________________________________ 17 3.16. (a) The 95% confidence interval if 108 453 1.96 SE( ) or 1013 1.96 or 1013 9.95. Y Y     (b) The confidence interval in (a) does not include  = 1000, so the null hypothesis that  = 1000 (Florida students have the same average performance as students in the U.S.) can be rejected at the 5% level. (c) (i) The 95% confidence interval is 1.96 ( ) prep Non prep prep Non prep Y Y SE Y Y − − −  − where 2 2 2 2 95 108 503 453 SE( ) 6.61; prep non prep prep non prep S S prep Non prep n n Y Y − − − − = + = + = the 95% confidence interval is (1019 1013) 12.96 6 12.96. or −   (ii) No. The 95% confidence interval includes mprep - mnon-prep = 0. These data do not provide statistically significant evidence that the prep-course changes the average test score. (d) (i) Let X denote the change in the test score. The 95% confidence interval for X is 60 453 1.96 ( ), where SE( ) 2.82; X SE X X  = = thus, the confidence interval is 9 5.52.  (ii) Yes. The 95% confidence interval does not include X = 0. (iii) Randomly select n students who have taken the test only one time. Randomly select one half of these students and have them take the prep course. Administer the test again to all of the n students. Compare the gain in performance of the prep-course second-time test takers to the non-prep-course second-time test takers.
  • 18. _____________________________________________________________________________________________________ 18 3.17. (a) The 95% confidence interval is Ym, 2015 -Ym,1996 ±1.96 SE(Ym, 2015 -Ym,1996 ) where SE(Ym, 2015 -Ym,1996 ) = Sm, 2015 2 nm, 2015 + Sm,1996 2 nm,1996 = 14.372 1917 + 11.442 1387 = 0.45; the 95% confidence interval is (28.06 – 24.87) ± 0.88 or 3.19 ± 0.88 = ($2.31, 4.07). (b) The 95% confidence interval is Yw, 2015 -Yw,1996 ±1.96 SE(Yw, 2015 -Yw,1996 ) where SE(Yw, 2015 -Yw,1996 ) = Sw, 2015 2 nw, 2015 + Sw,1996 2 nw,1996 = 11.222 1816 + 8.932 1232 = 0.37; the 95% confidence interval is (23.04-20.99) ± 0.72 or 2.05 ± 0.72 = ($1.33, 2.77). (c) The 95% confidence interval is (Ym, 2015 -Ym,1996 )-(Yw, 2015 -Yw,1996 ) ±1.96 SE[(Ym, 2015 -Ym,1996 )- (Yw, 2015 -Yw,1996 )], where (Ym, 2015 - Ym,1996 ) - (Yw, 2015 -Yw,1996 ) = Sm, 2015 2 nm, 2015 + Sm,1996 2 nm,1996 + Sw, 2015 2 nw, 2015 + Sw,1996 2 nw,1996 = 14.372 1917 + 11.442 1387 + 11.222 1816 + 8.932 1232 = 0.58. The 95% confidence interval is (28.06 – 24.87) − (23.04-20.99) ± 1.96  0.58 or 1.14 ± 1.13 = ($0.01, 2.17)
  • 19. _____________________________________________________________________________________________________ 19 3.18. 1 n Y … Y   are i.i.d. with mean Y  and variance 2 . Y  The covariance cov( , ) 0, j i Y Y = . j i  The sampling distribution of the sample average Y has mean Y  and variance var 2 2 ( ) . Y n Y Y   = = (a) 2 2 2 2 2 2 [( ) ] {[( ) ( )] } [( ) 2( )( ) ( ) ] [( ) ] 2 [( )( )] [( ) ] var( ) 2cov( , ) var( ). i i Y Y i Y i Y Y Y i Y i Y Y Y i i E Y Y E Y Y E Y Y Y Y E Y E Y Y E Y Y Y Y Y           − = − − − = − − − − + − = − − − − + − = − + (b) 1 1 2 2 2 cov( ) [( )( )] ( ) ( ) 1 1 ( ) [( ) [( )( )] 1 1 cov( , ) n j j Y i Y Y i Y n j j Y i Y j Y i Y I Y j i Y Y j i j i Y Y Y E Y Y E Y n Y E Y E Y E Y Y n n n Y Y n n n                    =                     =               = − − = − −    − = − = − + − −       = + =    (c) E sY 2 æ è ç ö ø ÷ = E 1 n -1 i=1 n å(Yi -Y )2 æ è ç ö ø ÷ = 1 n -1 i=1 n åE[(Yi -Y )2 ] = 1 n -1 i=1 n å[var(Yi ) - 2cov(Yi , Y ) + var(Y )] = 1 n -1 i=1 n å sY 2 - 2 ´ sY 2 n + sY 2 n é ë ê ù û ú = 1 n -1 i=1 n å n -1 n sY 2 æ è ç ç ç ö ø ÷ ÷ ÷ = sY 2 .
  • 20. _____________________________________________________________________________________________________ 20 3.19. (a) No. 2 2 2 2 ( ) and ( ) for . i Y Y i j Y E Y E YY i j    = + =  Thus 2 2 2 2 2 2 2 1 1 1 1 1 1 1 ( ) ( ) ( ) n n n i i i j Y Y i i i j i E Y E Y E Y E YY n n n n   = = =    = = + = +        (b) Yes. If Y gets arbitrarily close to Y with probability approaching 1 as n gets large, then 2 Y gets arbitrarily close to 2 Y  with probability approaching 1 as n gets large. (As it turns out, this is an example of the “continuous mapping theorem” discussed in Chapter 18.)
  • 21. _____________________________________________________________________________________________________ 21 3.20. Using analysis like that in equation (3.29) 1 1 1 ( )( ) 1 1 ( )( ) ( )( ) 1 1 n XY i i i n i X i Y X Y i s X X Y Y n n n X Y X Y n n n     = = = − − −     = − − − − −     − −       because p X X  → and p Y Y  → the final term converges in probability to zero. Let ( )( ). i i x i Y W X Y   = − − Note Wi is iid with mean XY and second moment 2 2 [( ) ( ) ]. i X i Y E X Y   − − Also 2 2 4 4 [( ) ( ) ] ( ) ( ) i X i Y i X i Y E X Y E X E Y     − −  − − from the Cauchy-Schwartz inequality. Because X and Y have finite fourth moments, the second moment of Wi is finite, so that it has finite variance. Thus 1 n i=1 n å Wi ® p E(Wi ) = s XY , and hus, p XY XY s  → (because the term 1 1 n n− → ).
  • 22. _____________________________________________________________________________________________________ 22 3.21. Set nm = nw = n, and use equation (3.19) write the squared SE of m w Y Y − as 2 2 1 1 2 2 2 1 1 1 1 ( ) ( ) ( 1) ( 1) [ ( )] ( ) ( ) . ( 1) n n i mi m i wi w m w n n i mi m i wi w Y Y Y Y n n SE Y Y n n Y Y Y Y n n = = = =  −  − − − − = +  − +  − = − Similary, using equation (3.23) [SEpooled (Ym -Yw )]2 = 1 2(n -1) åi=1 n (Ymi -Ym )2 + åi=1 n (Ywi -Yw )2 é ë ù û 2n = åi=1 n (Ymi -Ym )2 + åi=1 n (Ywi -Yw )2 n(n -1) .
  • 23. _____________________________________________________________________________________________________ 23 3.22 Let Z denote a standard normal variable. (a) Under H0, t ~ N(0,1) and Pr(t > 1.64) = Pr(Z > 1.64) = 0.05. (b) When Y = 2, t = (Y -0) / SE(Y) =[(Y - 2) / SE(Y)]+ 2 = Z + 2 ~ N(2,1). Thus Pr(t > 1.64) = Pr(Z + 2 > 1.64) = Pr(Z > -0.36) = 0.64 (c) (i) Pr(t >1.64) = Pr(t > 1.64, Y = 0) + Pr(t > 1.64, Y = 2) = Pr(t > 1.64|Y = 0)Pr(Y = 0) + Pr(t > 1.64|Y = 2)Pr(Y = 2) = 0.05 × 0.90 + 0.64 × 0.10 = 0.109 (ii) Pr(mY = 0 |t >1.64) = Pr(mY = 0,t >1.64) Pr(t >1.64) = Pr(mY = 0,t >1.64) Pr(mY = 0,t >1.64) + Pr(mY = 2,t >1.64) = Pr(t >1.64 | mY = 0)Pr(mY = 0) Pr(t >1.64 | mY = 0)Pr(mY = 0) + Pr(t >1.64 | mY =1)Pr(mY = 1) = 0.05´ 0.90 0.05´ 0.90 + 0.64 ´ 0.10 = 0.41 (d) (i) These are the same values as in (a)-(c). The false positive rate is 0.41 or 41%. (ii) Following the same steps above: Pr(t > 2.57 | Y = 0) = Pr(Z > 2.57) = .005 Pr(t > 2.57 | Y = 2) = Pr(Z+2 > 2.57) = P(Z > 0.57) = 0.28 Thus (continued next page)
  • 24. _____________________________________________________________________________________________________ 24 Pr(mY = 0 |t > 2.57) = Pr(mY = 0,t > 2.57) Pr(t > 2.57) = Pr(mY = 0,t > 2.57) Pr(mY = 0,t > 2.57) + Pr(mY = 2,t > 2.57) = Pr(t > 2.57 | mY = 0)Pr(mY = 0) Pr(t > 2.57 | mY = 0)Pr(mY = 0) + Pr(t > 2.57 | mY = 1)Pr(mY = 1) = 0.005´ 0.90 0.005´ 0.90 + 0.28 ´ 0.10 = 0.14 The false positive rate is 0.14 or 14%.
  • 25. _____________________________________________________________________________________________________ 25 Empirical Exercise 3.1 Calculations for this exercise are carried out in the STATA file EE_3_1.do. (a) Average Hourly Earnings, Nominal $’s Mean SE(Mean) 95% Confidence Interval AHE1996 12.69 0.081 12.53 − 12.85 AHE2015 21.23 0.144 20.96 − 21.52 Difference SE(Difference) 95% Confidence Interval AHE2015 − AHE1996 8.54 0.165 8.22 − 8.87 (b) Average Hourly Earnings, Real $2015 Mean SE(Mean) 95% Confidence Interval AHE1996 19.17 0.123 18.93 – 19.41 AHE2015 21.23 0.144 20.96 − 21.52 Difference SE(Difference) 95% Confidence Interval AHE2015 − AHE1996 2.06 0.189 1.69 – 2.44 (c) The results from part (b) adjust for changes in purchasing power. These results should be used. (d) Average Hourly Earnings in 2015 Mean SE(Mean) 95% Confidence Interval High School 16.38 0.147 16.09 – 16.67 College 25.62 0.216 25.19 – 26.04 Difference SE(Difference) 95% Confidence Interval College-High School 9.23 0.227 8.72 – 9.75 (e) Average Hourly Earnings in 1996 (in $2015) Mean SE(Mean) 95% Confidence Interval High School 16.27 0.130 16.01 – 16.52 College 23.04 0.205 22.64 – 23.44 Difference SE(Difference) 95% Confidence Interval College-High School 6.77 0.243 6.29 – 7.34
  • 26. _____________________________________________________________________________________________________ 26 (f) Average Hourly Earnings in 2015 Mean SE(Mean) 95% Confidence Interval AHEHS,2015 − AHEHS,1996 0.11 0.196 -0.27 – 0.50 AHECol,2015 − AHECol,1996 2.58 0.298 1.99 – 3.16 Col-HS Gap (1996) 6.77 0.243 6.29 – 7.34 Col-HS Gap (2015) 9.23 0.227 8.72 – 9.75 Difference SE(Difference) 95% Confidence Interval Gap2015 − Gap1996 2.46 0.363 1.75 – 3.17 Wages of high school graduates increased by an estimated 0.11 dollars per hour (with a 95% confidence interval of −0.27 to 0.50); Wages of college graduates increased by an estimated 2.58 dollars per hour (with a 95% confidence interval of 1.99 to 3.16). The College-High School gap increased by an estimated 2.46 dollars per hour (with a 95% confidence interval of 1.75 to 3.17). (g) Gender Gap in Earnings for High School Graduates Year m Y sm nm w Y sw nw m Y − w Y SE( m Y − w Y ) 95% CI 1996 17.78 8.24 2168 13.77 5.83 1316 4.02 0.24 3.54 – 4.48 2015 17.50 9.03 2222 14.21 7.00 1143 3.29 0.28 2.74 – 3.84 There is a large and statistically significant gender gap in earnings for high school graduates. In 2015 the estimated gap was $3.29 per hour; in 1996 the estimated gap was even larger $4.02 per hour (in $2012). The estimated gender gap in 2015 is smaller than is college graduates (which is $5.02 in Table 3.1 in the text) in absolute terms but similar in relative terms (18% for both high school and college graduates).
  • 27. _____________________________________________________________________________________________________ 27 Empirical Exercise 3.2 Calculations for this exercise are carried out in the STATA file EE_3_2.do. (a) For part (iii): A person will trade if she received good A but prefers good B or she received good B and prefers good A. 50% received good A, of these (100-X)% prefer good B; 50% receive good B, of these X% prefer good A. Let x = X/100. Thus the “Expected Fraction Traded” = 0.5  (1 − x) + 0.5x = 0.5. To answer part (i) use x = 0, and for part (ii) use x = 0.50. Note that regardless of the value of x, the expected fraction traded is equal to 50%. (b) The fraction of trades in the sample was 0.338, with a standard error of 0.039. The 95% confidence interval for the population fraction is 0.261 to 0.415; this set does not include 0.50, so there is statistically significant evidence against the null hypothesis of no “endowment effect,” that is that the true fraction is 0.50. (Alternatively the t-statistic for the null hypothesis is t = (0.338 − 0.50)/ 0.039 = −4.15, which is larger in absolute value than the two-sided 5% critical value of 1.96). (c) The following table shows the difference between traders and non-traders, where x denotes the sample average of the binary variable that is equal to 1 if a trade was made. x SE( x ) 95% Confidence Interval for the population mean Traders 0.45 0.06 0.33 – 0.56 Non-Traders 0.23 0.05 0.13 – 0.33 Difference 0.22 0.08 0.07 – 0.37 In the sample, 45% of the Traders made trades, while only 23% of the non-Traders made trades. The estimate for Traders is not statistically significantly different from 0.5, so there is not statistically significant evidence that Traders are influenced by an endowment effect. The results suggest a large and statistically significant difference in the behavior of Traders and non-Traders.