SlideShare a Scribd company logo
1 of 63
9-1
Regression with a Binary Dependent Variable
(SW Ch. 9)
So far the dependent variable (Y) has been continuous:
 district-wide average test score
 traffic fatality rate
But we might want to understand the effect of X on a
binary variable:
 Y = get into college, or not
 Y = person smokes, or not
 Y = mortgage application is accepted, or not
9-2
Example: Mortgage denial and race
The Boston Fed HMDA data set
 Individual applications for single-family mortgages
made in 1990 in the greater Boston area
 2380 observations, collected under Home Mortgage
Disclosure Act (HMDA)
Variables
 Dependent variable:
oIs the mortgage denied or accepted?
 Independent variables:
oincome, wealth, employment status
oother loan, property characteristics
orace of applicant
9-3
The Linear Probability Model
(SW Section 9.1)
A natural starting point is the linear regression model
with a single regressor:
Yi = 0 + 1Xi + ui
But:
 What does 1 mean when Y is binary? Is 1 =
Y
X


?
 What does the line 0 + 1X mean when Y is binary?
 What does the predicted value ˆ
Y mean when Y is
binary? For example, what does ˆ
Y = 0.26 mean?
9-4
The linear probability model, ctd.
Yi = 0 + 1Xi + ui
Recall assumption #1: E(ui|Xi) = 0, so
E(Yi|Xi) = E(0 + 1Xi + ui|Xi) = 0 + 1Xi
When Y is binary,
E(Y) = 1 Pr(Y=1) + 0 Pr(Y=0) = Pr(Y=1)
so
E(Y|X) = Pr(Y=1|X)
9-5
The linear probability model, ctd.
When Y is binary, the linear regression model
Yi = 0 + 1Xi + ui
is called the linear probability model.
 The predicted value is a probability:
oE(Y|X=x) = Pr(Y=1|X=x) = prob. that Y = 1 given x
o ˆ
Y = the predicted probability that Yi = 1, given X
 1 = change in probability that Y = 1 for a given x:
1 =
Pr( 1| ) Pr( 1| )
Y X x x Y X x
x
      

Example: linear probability model, HMDA data
9-6
Mortgage denial v. ratio of debt payments to income
(P/I ratio) in the HMDA data set (subset)
9-7
Linear probability model: HMDA data
deny = -.080 + .604P/I ratio (n = 2380)
(.032) (.098)
 What is the predicted value for P/I ratio = .3?
Pr( 1| / .3)
deny P Iratio
  = -.080 + .604 .3 = .151
 Calculating “effects:” increase P/I ratio from .3 to .4:
Pr( 1| / .4)
deny P Iratio
  = -.080 + .604 .4 = .212
The effect on the probability of denial of an increase
in P/I ratio from .3 to .4 is to increase the probability
by .061, that is, by 6.1 percentage points (what?).
9-8
Next include black as a regressor:
deny = -.091 + .559P/I ratio + .177black
(.032) (.098) (.025)
Predicted probability of denial:
 for black applicant with P/I ratio = .3:
Pr( 1)
deny  = -.091 + .559 .3 + .177 1 = .254
 for white applicant, P/I ratio = .3:
Pr( 1)
deny  = -.091 + .559 .3 + .177 0 = .077
 difference = .177 = 17.7 percentage points
 Coefficient on black is significant at the 5% level
 Still plenty of room for omitted variable bias…
9-9
The linear probability model: Summary
 Models probability as a linear function of X
 Advantages:
osimple to estimate and to interpret
oinference is the same as for multiple regression
(need heteroskedasticity-robust standard errors)
 Disadvantages:
oDoes it make sense that the probability should be
linear in X?
oPredicted probabilities can be <0 or >1!
 These disadvantages can be solved by using a nonlinear
probability model: probit and logit regression
9-10
Probit and Logit Regression
(SW Section 9.2)
The problem with the linear probability model is that it
models the probability of Y=1 as being linear:
Pr(Y = 1|X) = 0 + 1X
Instead, we want:
 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
 Pr(Y = 1|X) to be increasing in X (for 1>0)
This requires a nonlinear functional form for the
probability. How about an “S-curve”…
9-11
The probit model satisfies these conditions:
 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
 Pr(Y = 1|X) to be increasing in X (for 1>0)
9-12
Probit regression models the probability that Y=1 using
the cumulative standard normal distribution function,
evaluated at z = 0 + 1X:
Pr(Y = 1|X) = (0 + 1X)
  is the cumulative normal distribution function.
 z = 0 + 1X is the “z-value” or “z-index” of the
probit model.
Example: Suppose 0 = -2, 1= 3, X = .4, so
Pr(Y = 1|X=.4) = (-2 + 3 .4) = (-0.8)
Pr(Y = 1|X=.4) = area under the standard normal density
to left of z = -.8, which is…
9-13
Pr(Z ≤ -0.8) = .2119
9-14
Probit regression, ctd.
Why use the cumulative normal probability distribution?
 The “S-shape” gives us what we want:
o0 ≤ Pr(Y = 1|X) ≤ 1 for all X
oPr(Y = 1|X) to be increasing in X (for 1>0)
 Easy to use – the probabilities are tabulated in the
cumulative normal tables
 Relatively straightforward interpretation:
oz-value = 0 + 1X
o 0
ˆ
 + 1
ˆ
 X is the predicted z-value, given X
o1 is the change in the z-value for a unit change
in X
9-15
STATA Example: HMDA data
. probit deny p_irat, r;
Iteration 0: log likelihood = -872.0853 We’ll discuss this later
Iteration 1: log likelihood = -835.6633
Iteration 2: log likelihood = -831.80534
Iteration 3: log likelihood = -831.79234
Probit estimates Number of obs = 2380
Wald chi2(1) = 40.68
Prob > chi2 = 0.0000
Log likelihood = -831.79234 Pseudo R2 = 0.0462
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901
_cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082
------------------------------------------------------------------------------
Pr( 1| / )
deny P Iratio
 = (-2.19 + 2.97 P/I ratio)
(.16) (.47)
9-16
STATA Example: HMDA data, ctd.
Pr( 1| / )
deny P Iratio
 = (-2.19 + 2.97 P/I ratio)
(.16) (.47)
 Positive coefficient: does this make sense?
 Standard errors have usual interpretation
 Predicted probabilities:
Pr( 1| / .3)
deny P Iratio
  = (-2.19+2.97 .3)
= (-1.30) = .097
 Effect of change in P/I ratio from .3 to .4:
Pr( 1| / .4)
deny P Iratio
  = (-2.19+2.97 .4) = .159
Predicted probability of denial rises from .097 to .159
9-17
Probit regression with multiple regressors
Pr(Y = 1|X1, X2) = (0 + 1X1 + 2X2)
  is the cumulative normal distribution function.
 z = 0 + 1X1 + 2X2 is the “z-value” or “z-index” of the
probit model.
 1 is the effect on the z-score of a unit change in X1,
holding constant X2
9-18
STATA Example: HMDA data
. probit deny p_irat black, r;
Iteration 0: log likelihood = -872.0853
Iteration 1: log likelihood = -800.88504
Iteration 2: log likelihood = -797.1478
Iteration 3: log likelihood = -797.13604
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
We’ll go through the estimation details later…
9-19
STATA Example: predicted probit probabilities
. probit deny p_irat black, r;
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
. sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0;
. display "Pred prob, p_irat=.3, white: "normprob(z1);
Pred prob, p_irat=.3, white: .07546603
NOTE
_b[_cons] is the estimated intercept (-2.258738)
_b[p_irat] is the coefficient on p_irat (2.741637)
sca creates a new scalar which is the result of a calculation
display prints the indicated information to the screen
9-20
STATA Example: HMDA data, ctd.
Pr( 1| / , )
deny P I black

= (-2.26 + 2.74 P/I ratio + .71 black)
(.16) (.44) (.08)
 Is the coefficient on black statistically significant?
 Estimated effect of race for P/I ratio = .3:
Pr( 1|.3,1)
deny  = (-2.26+2.74 .3+.71 1) = .233
Pr( 1|.3,0)
deny  = (-2.26+2.74 .3+.71 0) = .075
 Difference in rejection probabilities = .158 (15.8
percentage points)
 Still plenty of room still for omitted variable bias…
9-21
Logit regression
Logit regression models the probability of Y=1 as the
cumulative standard logistic distribution function,
evaluated at z = 0 + 1X:
Pr(Y = 1|X) = F(0 + 1X)
F is the cumulative logistic distribution function:
F(0 + 1X) = 0 1
( )
1
1 X
e  
 

9-22
Logistic regression, ctd.
Pr(Y = 1|X) = F(0 + 1X)
where F(0 + 1X) = 0 1
( )
1
1 X
e  
 

.
Example: 0 = -3, 1= 2, X = .4,
so 0 + 1X = -3 + 2 .4 = -2.2 so
Pr(Y = 1|X=.4) = 1/(1+e–(–2.2)
) = .0998
Why bother with logit if we have probit?
 Historically, numerically convenient
 In practice, very similar to probit
9-23
STATA Example: HMDA data
. logit deny p_irat black, r;
Iteration 0: log likelihood = -872.0853 Later…
Iteration 1: log likelihood = -806.3571
Iteration 2: log likelihood = -795.74477
Iteration 3: log likelihood = -795.69521
Iteration 4: log likelihood = -795.69521
Logit estimates Number of obs = 2380
Wald chi2(2) = 117.75
Prob > chi2 = 0.0000
Log likelihood = -795.69521 Pseudo R2 = 0.0876
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 5.370362 .9633435 5.57 0.000 3.482244 7.258481
black | 1.272782 .1460986 8.71 0.000 .9864339 1.55913
_cons | -4.125558 .345825 -11.93 0.000 -4.803362 -3.447753
------------------------------------------------------------------------------
. dis "Pred prob, p_irat=.3, white: "
> 1/(1+exp(-(_b[_cons]+_b[p_irat]*.3+_b[black]*0)));
Pred prob, p_irat=.3, white: .07485143
NOTE: the probit predicted probability is .07546603
9-24
Predicted probabilities from estimated probit and logit
models usually are very close.
9-25
Estimation and Inference in Probit (and Logit)
Models (SW Section 9.3)
Probit model:
Pr(Y = 1|X) = (0 + 1X)
 Estimation and inference
oHow to estimate 0 and 1?
oWhat is the sampling distribution of the estimators?
oWhy can we use the usual methods of inference?
 First discuss nonlinear least squares (easier to explain)
 Then discuss maximum likelihood estimation (what is
actually done in practice)
9-26
Probit estimation by nonlinear least squares
Recall OLS:
0 1
2
, 0 1
1
min [ ( )]
n
b b i i
i
Y b b X

 

 The result is the OLS estimators 0
ˆ
 and 1
ˆ

In probit, we have a different regression function – the
nonlinear probit model. So, we could estimate 0 and 1
by nonlinear least squares:
0 1
2
, 0 1
1
min [ ( )]
n
b b i i
i
Y b b X

  

Solving this yields the nonlinear least squares estimator
of the probit coefficients.
9-27
Nonlinear least squares, ctd.
0 1
2
, 0 1
1
min [ ( )]
n
b b i i
i
Y b b X

  

How to solve this minimization problem?
 Calculus doesn’t give and explicit solution.
 Must be solved numerically using the computer, e.g.
by “trial and error” method of trying one set of values
for (b0,b1), then trying another, and another,…
 Better idea: use specialized minimization algorithms
In practice, nonlinear least squares isn’t used because it
isn’t efficient – an estimator with a smaller variance is…
9-28
Probit estimation by maximum likelihood
The likelihood function is the conditional density of
Y1,…,Yn given X1,…,Xn, treated as a function of the
unknown parameters 0 and 1.
 The maximum likelihood estimator (MLE) is the value
of (0, 1) that maximize the likelihood function.
 The MLE is the value of (0, 1) that best describe the
full distribution of the data.
 In large samples, the MLE is:
oconsistent
onormally distributed
oefficient (has the smallest variance of all estimators)
9-29
Special case: the probit MLE with no X
Y =
1 with probability
0 with probability 1
p
p




(Bernoulli distribution)
Data: Y1,…,Yn, i.i.d.
Derivation of the likelihood starts with the density of Y1:
Pr(Y1 = 1) = p and Pr(Y1 = 0) = 1–p
so
Pr(Y1 = y1) = 1 1
1
(1 )
y y
p p 
 (verify this for y1=0, 1!)
9-30
Joint density of (Y1,Y2):
Because Y1 and Y2 are independent,
Pr(Y1 = y1,Y2 = y2) = Pr(Y1 = y1) Pr(Y2 = y2)
= [ 1 1
1
(1 )
y y
p p 
 ] [ 2 2
1
(1 )
y y
p p 
 ]
Joint density of (Y1,..,Yn):
Pr(Y1 = y1,Y2 = y2,…,Yn = yn)
= [ 1 1
1
(1 )
y y
p p 
 ] [ 2 2
1
(1 )
y y
p p 
 ] … [ 1
(1 )
n n
y y
p p 
 ]
=
 
1
1
(1 )
n
n
i
i i
i
n y
y
p p 


 
The likelihood is the joint density, treated as a function of
the unknown parameters, which here is p:
9-31
f(p;Y1,…,Yn) =
 
1
1
(1 )
n
n
i
i i
i
n Y
Y
p p 


 
The MLE maximizes the likelihood. Its standard to work
with the log likelihood, ln[f(p;Y1,…,Yn)]:
ln[f(p;Y1,…,Yn)] =    
1 1
ln( ) ln(1 )
n n
i i
i i
Y p n Y p
 
  
 
1
ln ( ; ,..., )
n
d f p Y Y
dp
=    
1 1
1 1
1
n n
i i
i i
Y n Y
p p
 
 

   

 
  = 0
Solving for p yields the MLE; that is, ˆ MLE
p satisfies,
9-32
   
1 1
1 1
ˆ ˆ
1
n n
i i
MLE MLE
i i
Y n Y
p p
 
 

   

 
  = 0
or
   
1 1
1 1
ˆ ˆ
1
n n
i i
MLE MLE
i i
Y n Y
p p
 
 

 
or
ˆ
ˆ
1 1
MLE
MLE
Y p
Y p

 
or
ˆ MLE
p = Y = fraction of 1’s
9-33
The MLE in the “no-X” case (Bernoulli distribution):
ˆ MLE
p = Y = fraction of 1’s
 For Yi i.i.d. Bernoulli, the MLE is the “natural”
estimator of p, the fraction of 1’s, which is Y
 We already know the essentials of inference:
oIn large n, the sampling distribution of ˆ MLE
p = Y is
normally distributed
oThus inference is “as usual:” hypothesis testing via
t-statistic, confidence interval as 1.96SE
 STATA note: to emphasize requirement of large-n, the
printout calls the t-statistic the z-statistic; instead of the
F-statistic, the chi-squared statstic (= q F).
9-34
The probit likelihood with one X
The derivation starts with the density of Y1, given X1:
Pr(Y1 = 1|X1) = (0 + 1X1)
Pr(Y1 = 0|X1) = 1–(0 + 1X1)
so
Pr(Y1 = y1|X1) = 1 1
1
0 1 1 0 1 1
( ) [1 ( )]
y y
X X
    
   
The probit likelihood function is the joint density of
Y1,…,Yn given X1,…,Xn, treated as a function of 0, 1:
f(0,1; Y1,…,Yn|X1,…,Xn)
= { 1 1
1
0 1 1 0 1 1
( ) [1 ( )]
Y Y
X X
    
    }
… { 1
0 1 0 1
( ) [1 ( )]
n n
Y Y
n n
X X
    
    }
The probit likelihood function:
9-35
f(0,1; Y1,…,Yn|X1,…,Xn)
= { 1 1
1
0 1 1 0 1 1
( ) [1 ( )]
Y Y
X X
    
    }
… { 1
0 1 0 1
( ) [1 ( )]
n n
Y Y
n n
X X
    
    }
 Can’t solve for the maximum explicitly
 Must maximize using numerical methods
 As in the case of no X, in large samples:
o 0
ˆMLE
 , 1
ˆMLE
 are consistent
o 0
ˆMLE
 , 1
ˆMLE
 are normally distributed (more later…)
oTheir standard errors can be computed
oTesting, confidence intervals proceeds as usual
 For multiple X’s, see SW App. 9.2
The logit likelihood with one X
9-36
 The only difference between probit and logit is the
functional form used for the probability:  is
replaced by the cumulative logistic function.
 Otherwise, the likelihood is similar; for details see
SW App. 9.2
 As with probit,
o 0
ˆMLE
 , 1
ˆMLE
 are consistent
o 0
ˆMLE
 , 1
ˆMLE
 are normally distributed
oTheir standard errors can be computed
oTesting, confidence intervals proceeds as usual
9-37
Measures of fit
The R2
and 2
R don’t make sense here (why?). So, two
other specialized measures are used:
1. The fraction correctly predicted = fraction of Y’s for
which predicted probability is >50% (if Yi=1) or is
<50% (if Yi=0).
2. The pseudo-R2
measure the fit using the likelihood
function: measures the improvement in the value of
the log likelihood, relative to having no X’s (see SW
App. 9.2). This simplifies to the R2
in the linear
model with normally distributed errors.
9-38
Large-n distribution of the MLE (not in SW)
 This is foundation of mathematical statistics.
 We’ll do this for the “no-X” special case, for which p is
the only unknown parameter. Here are the steps:
1. Derive the log likelihood (“(p)”) (done).
2. The MLE is found by setting its derivative to zero;
that requires solving a nonlinear equation.
3. For large n, ˆ MLE
p will be near the true p (ptrue
) so this
nonlinear equation can be approximated (locally) by
a linear equation (Taylor series around ptrue
).
4. This can be solved for ˆ MLE
p – ptrue
.
5. By the Law of Large Numbers and the CLT, for n
large, n ( ˆ MLE
p – ptrue
) is normally distributed.
9-39
1. Derive the log likelihood
Recall: the density for observation #1 is:
Pr(Y1 = y1) = 1 1
1
(1 )
y y
p p 
 (density)
so
f(p;Y1) = 1 1
1
(1 )
Y Y
p p 
 (likelihood)
The likelihood for Y1,…,Yn is,
f(p;Y1,…,Yn) = f(p;Y1) … f(p;Yn)
so the log likelihood is,
(p) = lnf(p;Y1,…,Yn)
= ln[f(p;Y1) … f(p;Yn)]
=
1
ln ( ; )
n
i
i
f p Y


2. Set the derivative of (p) to zero to define the MLE:
9-40
ˆ
( )
MLE
p
p
p


L
=
1 ˆ
ln ( ; )
MLE
n
i
i p
f p Y
p



 = 0
3. Use a Taylor series expansion around ptrue
to
approximate this as a linear function of ˆ MLE
p :
0 =
ˆ
( )
MLE
p
p
p


L ( )
true
p
p
p


L
+
2
2
( )
true
p
p
p


L
( ˆ MLE
p – ptrue
)
9-41
4. Solve this linear approximation for ( ˆ MLE
p – ptrue
):
( )
true
p
p
p


L
+
2
2
( )
true
p
p
p


L
( ˆ MLE
p – ptrue
) 0
so
2
2
( )
true
p
p
p


L
( ˆ MLE
p – ptrue
) –
( )
true
p
p
p


L
or
( ˆ MLE
p – ptrue
) –
1
2
2
( )
true
p
p
p

 

 

 
 
L ( )
true
p
p
p


L
9-42
5. Substitute things in and apply the LLN and CLT.
(p) =
1
ln ( ; )
n
i
i
f p Y


( )
true
p
p
p


L
=
1
ln ( ; )
true
n
i
i p
f p Y
p




2
2
( )
true
p
p
p


L
=
2
2
1
ln ( ; )
true
n
i
i p
f p Y
p




so
( ˆ MLE
p – ptrue
) –
1
2
2
( )
true
p
p
p

 

 

 
 
L ( )
true
p
p
p


L
=
1
2
2
1
ln ( ; )
true
n
i
i p
f p Y
p


 
 


 
 
 

 
 
 
 1
ln ( ; )
true
n
i
i p
f p Y
p

 

 
 

 

9-43
Multiply through by n :
n ( ˆ MLE
p – ptrue
)
1
2
2
1
1 ln ( ; )
true
n
i
i p
f p Y
n p


 
 


 
 
 

 
 
 
 1
1 ln ( ; )
true
n
i
i p
f p Y
p
n 
 
 

 
 
 

 
 
 

Because Yi is i.i.d., the ith
terms in the summands are also
i.i.d. Thus, if these terms have enough (2) moments, then
under general conditions (not just Bernoulli likelihood):
2
2
1
1 ln ( ; )
true
n
i
i p
f p Y
n p

 


 
 

 

p
 a (a constant) (WLLN)
1
1 ln ( ; )
true
n
i
i p
f p Y
p
n 
 

 
 

 

d
 N(0, 2
ln f
 ) (CLT) (Why?)
Putting this together,
9-44
n ( ˆ MLE
p – ptrue
)
1
2
2
1
1 ln ( ; )
true
n
i
i p
f p Y
n p


 
 


 
 
 

 
 
 
 1
1 ln ( ; )
true
n
i
i p
f p Y
p
n 
 
 

 
 
 

 
 
 

2
2
1
1 ln ( ; )
true
n
i
i p
f p Y
n p

 


 
 

 

p
 a (a constant) (WLLN)
1
1 ln ( ; )
true
n
i
i p
f p Y
p
n 
 

 
 

 

d
 N(0, 2
ln f
 ) (CLT) (Why?)
so
n ( ˆ MLE
p – ptrue
)
d
 N(0, 2
ln f
 /a2
) (large-n normal)
Work out the details for probit/no X (Bernoulli) case:
9-45
Recall:
f(p;Yi) = 1
(1 )
i i
Y Y
p p 

so
ln f(p;Yi) = Yilnp + (1–Yi)ln(1–p)
and
ln ( , )
i
f p Y
p


=
1
1
i i
Y Y
p p



=
(1 )
i
Y p
p p


and
2
2
ln ( , )
i
f p Y
p


= 2 2
1
(1 )
i i
Y Y
p p

 

= 2 2
1
(1 )
i i
Y Y
p p
 

 
 

 
9-46
Denominator term first:
2
2
ln ( , )
i
f p Y
p


= 2 2
1
(1 )
i i
Y Y
p p
 

 
 

 
so
2
2
1
1 ln ( ; )
true
n
i
i p
f p Y
n p

 


 
 

 
 = 2 2
1
1 1
(1 )
n
i i
i
Y Y
n p p

 


 

 

= 2 2
1
(1 )
Y Y
p p



p
 2 2
1
(1 )
p p
p p



(LLN)
=
1 1
1
p p


=
1
(1 )
p p

9-47
Next the numerator:
ln ( , )
i
f p Y
p


=
(1 )
i
Y p
p p


so
1
1 ln ( ; )
true
n
i
i p
f p Y
p
n 
 

 
 

 
 =
1
1
(1 )
n
i
i
Y p
p p
n 



=
1
1 1
( )
(1 )
n
i
i
Y p
p p n 
 

 

 

d
 N(0,
2
2
[ (1 )]
Y
p p


)
9-48
Put these pieces together:
n ( ˆ MLE
p – ptrue
)
1
2
2
1
1 ln ( ; )
true
n
i
i p
f p Y
n p


 
 


 
 
 

 
 
 
 1
1 ln ( ; )
true
n
i
i p
f p Y
p
n 
 
 

 
 
 

 
 
 

where
2
2
1
1 ln ( ; )
true
n
i
i p
f p Y
n p

 


 
 

 

p

1
(1 )
p p

1
1 ln ( ; )
true
n
i
i p
f p Y
p
n 
 

 
 

 

d
 N(0,
2
2
[ (1 )]
Y
p p


)
Thus
n ( ˆ MLE
p – ptrue
)
d
 N(0, 2
Y
 )
9-49
Summary: probit MLE, no-X case
The MLE: ˆ MLE
p = Y
Working through the full MLE distribution theory gave:
n ( ˆ MLE
p – ptrue
)
d
 N(0, 2
Y
 )
But because ptrue
= Pr(Y = 1) = E(Y) = Y, this is:
n (Y – Y)
d
 N(0, 2
Y
 )
A familiar result from the first week of class!
9-50
The MLE derivation applies generally
n ( ˆ MLE
p – ptrue
)
d
 N(0, 2
ln f
 /a2
))
 Standard errors are obtained from working out
expressions for 2
ln f
 /a2
 Extends to >1 parameter (0, 1) via matrix calculus
 Because the distribution is normal for large n, inference
is conducted as usual, for example, the 95% confidence
interval is MLE 1.96SE.
 The expression above uses “robust” standard errors,
further simplifications yield non-robust standard errors
which apply if ln ( ; )/
i
f p Y p
  is homoskedastic.
9-51
Summary: distribution of the MLE
(Why did I do this to you?)
 The MLE is normally distributed for large n
 We worked through this result in detail for the probit
model with no X’s (the Bernoulli distribution)
 For large n, confidence intervals and hypothesis testing
proceeds as usual
 If the model is correctly specified, the MLE is efficient,
that is, it has a smaller large-n variance than all other
estimators (we didn’t show this).
 These methods extend to other models with discrete
dependent variables, for example count data (#
crimes/day) – see SW App. 9.2.
9-52
Application to the Boston HMDA Data
(SW Section 9.4)
 Mortgages (home loans) are an essential part of
buying a home.
 Is there differential access to home loans by race?
 If two otherwise identical individuals, one white and
one black, applied for a home loan, is there a
difference in the probability of denial?
9-53
The HMDA Data Set
 Data on individual characteristics, property
characteristics, and loan denial/acceptance
 The mortgage application process circa 1990-1991:
oGo to a bank or mortgage company
oFill out an application (personal+financial info)
oMeet with the loan officer
 Then the loan officer decides – by law, in a race-blind
way. Presumably, the bank wants to make profitable
loans, and the loan officer doesn’t want to originate
defaults.
9-54
The loan officer’s decision
 Loan officer uses key financial variables:
oP/I ratio
ohousing expense-to-income ratio
oloan-to-value ratio
opersonal credit history
 The decision rule is nonlinear:
oloan-to-value ratio > 80%
oloan-to-value ratio > 95% (what happens in default?)
ocredit score
9-55
Regression specifications
Pr(deny=1|black, other X’s) = …
 linear probability model
 probit
Main problem with the regressions so far: potential
omitted variable bias. All these (i) enter the loan officer
decision function, all (ii) are or could be correlated with
race:
 wealth, type of employment
 credit history
 family status
Variables in the HMDA data set…
9-56
9-57
9-58
9-59
9-60
9-61
Summary of Empirical Results
 Coefficients on the financial variables make sense.
 Black is statistically significant in all specifications
 Race-financial variable interactions aren’t significant.
 Including the covariates sharply reduces the effect of
race on denial probability.
 LPM, probit, logit: similar estimates of effect of race
on the probability of denial.
 Estimated effects are large in a “real world” sense.
9-62
Remaining threats to internal, external validity
 Internal validity
1. omitted variable bias
 what else is learned in the in-person interviews?
2. functional form misspecification (no…)
3. measurement error (originally, yes; now, no…)
4. selection
 random sample of loan applications
 define population to be loan applicants
5. simultaneous causality (no)
 External validity
This is for Boston in 1990-91. What about today?
9-63
Summary
(SW Section 9.5)
 If Yi is binary, then E(Y| X) = Pr(Y=1|X)
 Three models:
olinear probability model (linear multiple regression)
oprobit (cumulative standard normal distribution)
ologit (cumulative standard logistic distribution)
 LPM, probit, logit all produce predicted probabilities
 Effect of X is change in conditional probability that
Y=1. For logit and probit, this depends on the initial X
 Probit and logit are estimated via maximum likelihood
oCoefficients are normally distributed for large n
oLarge-n hypothesis testing, conf. intervals is as usual

More Related Content

Similar to Ch 56669 Slides.doc.2234322344443222222344

1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxaulasnilda
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxjeremylockett77
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualLewisSimmonss
 
Curve fitting and Optimization
Curve fitting and OptimizationCurve fitting and Optimization
Curve fitting and OptimizationSyahrul Senin
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Amro Elfeki
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression modelRADO7900
 
ISI MSQE Entrance Question Paper (2011)
ISI MSQE Entrance Question Paper (2011)ISI MSQE Entrance Question Paper (2011)
ISI MSQE Entrance Question Paper (2011)CrackDSE
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator Muhammad Ali
 
Principles of Actuarial Science Chapter 2
Principles of Actuarial Science Chapter 2Principles of Actuarial Science Chapter 2
Principles of Actuarial Science Chapter 2ssuser8226b2
 
Regression
Regression Regression
Regression Ali Raza
 

Similar to Ch 56669 Slides.doc.2234322344443222222344 (20)

Chapter5.pdf.pdf
Chapter5.pdf.pdfChapter5.pdf.pdf
Chapter5.pdf.pdf
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions Manual
 
Curve fitting and Optimization
Curve fitting and OptimizationCurve fitting and Optimization
Curve fitting and Optimization
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Regression
RegressionRegression
Regression
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology
 
Bayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdfBayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdf
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression model
 
ISI MSQE Entrance Question Paper (2011)
ISI MSQE Entrance Question Paper (2011)ISI MSQE Entrance Question Paper (2011)
ISI MSQE Entrance Question Paper (2011)
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
 
Principles of Actuarial Science Chapter 2
Principles of Actuarial Science Chapter 2Principles of Actuarial Science Chapter 2
Principles of Actuarial Science Chapter 2
 
Regression
Regression Regression
Regression
 
Chapter 6 Probability
Chapter 6  ProbabilityChapter 6  Probability
Chapter 6 Probability
 

More from ohenebabismark508

Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838ohenebabismark508
 
Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776ohenebabismark508
 
Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777ohenebabismark508
 
Ch 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of businessCh 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of businessohenebabismark508
 
Ch 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modelingCh 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modelingohenebabismark508
 
Chapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric bookChapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric bookohenebabismark508
 

More from ohenebabismark508 (6)

Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838
 
Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776
 
Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777
 
Ch 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of businessCh 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of business
 
Ch 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modelingCh 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modeling
 
Chapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric bookChapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric book
 

Recently uploaded

Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...Pooja Nehwal
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineBruce Bennett
 
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...amitlee9823
 
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)amitlee9823
 
Internship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkInternship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkSujalTamhane
 
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual serviceanilsa9823
 
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur DubaiBur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubaiparisharma5056
 
Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Masuk Ahmed
 
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...sonalitrivedi431
 
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...shivangimorya083
 
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...robinsonayot
 
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual serviceanilsa9823
 
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...rightmanforbloodline
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negronnegronf24
 
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.GabrielaMiletti
 
Delhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying Online
 
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
 
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Internship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkInternship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmk
 
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
 
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur DubaiBur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
 
Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..
 
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
Hyderabad 💫✅💃 24×7 BEST GENUINE PERSON LOW PRICE CALL GIRL SERVICE FULL SATIS...
 
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
 
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
 
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
 
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negron
 
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
 
Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.
 
Delhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Delhi 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Ch 56669 Slides.doc.2234322344443222222344

  • 1. 9-1 Regression with a Binary Dependent Variable (SW Ch. 9) So far the dependent variable (Y) has been continuous:  district-wide average test score  traffic fatality rate But we might want to understand the effect of X on a binary variable:  Y = get into college, or not  Y = person smokes, or not  Y = mortgage application is accepted, or not
  • 2. 9-2 Example: Mortgage denial and race The Boston Fed HMDA data set  Individual applications for single-family mortgages made in 1990 in the greater Boston area  2380 observations, collected under Home Mortgage Disclosure Act (HMDA) Variables  Dependent variable: oIs the mortgage denied or accepted?  Independent variables: oincome, wealth, employment status oother loan, property characteristics orace of applicant
  • 3. 9-3 The Linear Probability Model (SW Section 9.1) A natural starting point is the linear regression model with a single regressor: Yi = 0 + 1Xi + ui But:  What does 1 mean when Y is binary? Is 1 = Y X   ?  What does the line 0 + 1X mean when Y is binary?  What does the predicted value ˆ Y mean when Y is binary? For example, what does ˆ Y = 0.26 mean?
  • 4. 9-4 The linear probability model, ctd. Yi = 0 + 1Xi + ui Recall assumption #1: E(ui|Xi) = 0, so E(Yi|Xi) = E(0 + 1Xi + ui|Xi) = 0 + 1Xi When Y is binary, E(Y) = 1 Pr(Y=1) + 0 Pr(Y=0) = Pr(Y=1) so E(Y|X) = Pr(Y=1|X)
  • 5. 9-5 The linear probability model, ctd. When Y is binary, the linear regression model Yi = 0 + 1Xi + ui is called the linear probability model.  The predicted value is a probability: oE(Y|X=x) = Pr(Y=1|X=x) = prob. that Y = 1 given x o ˆ Y = the predicted probability that Yi = 1, given X  1 = change in probability that Y = 1 for a given x: 1 = Pr( 1| ) Pr( 1| ) Y X x x Y X x x         Example: linear probability model, HMDA data
  • 6. 9-6 Mortgage denial v. ratio of debt payments to income (P/I ratio) in the HMDA data set (subset)
  • 7. 9-7 Linear probability model: HMDA data deny = -.080 + .604P/I ratio (n = 2380) (.032) (.098)  What is the predicted value for P/I ratio = .3? Pr( 1| / .3) deny P Iratio   = -.080 + .604 .3 = .151  Calculating “effects:” increase P/I ratio from .3 to .4: Pr( 1| / .4) deny P Iratio   = -.080 + .604 .4 = .212 The effect on the probability of denial of an increase in P/I ratio from .3 to .4 is to increase the probability by .061, that is, by 6.1 percentage points (what?).
  • 8. 9-8 Next include black as a regressor: deny = -.091 + .559P/I ratio + .177black (.032) (.098) (.025) Predicted probability of denial:  for black applicant with P/I ratio = .3: Pr( 1) deny  = -.091 + .559 .3 + .177 1 = .254  for white applicant, P/I ratio = .3: Pr( 1) deny  = -.091 + .559 .3 + .177 0 = .077  difference = .177 = 17.7 percentage points  Coefficient on black is significant at the 5% level  Still plenty of room for omitted variable bias…
  • 9. 9-9 The linear probability model: Summary  Models probability as a linear function of X  Advantages: osimple to estimate and to interpret oinference is the same as for multiple regression (need heteroskedasticity-robust standard errors)  Disadvantages: oDoes it make sense that the probability should be linear in X? oPredicted probabilities can be <0 or >1!  These disadvantages can be solved by using a nonlinear probability model: probit and logit regression
  • 10. 9-10 Probit and Logit Regression (SW Section 9.2) The problem with the linear probability model is that it models the probability of Y=1 as being linear: Pr(Y = 1|X) = 0 + 1X Instead, we want:  0 ≤ Pr(Y = 1|X) ≤ 1 for all X  Pr(Y = 1|X) to be increasing in X (for 1>0) This requires a nonlinear functional form for the probability. How about an “S-curve”…
  • 11. 9-11 The probit model satisfies these conditions:  0 ≤ Pr(Y = 1|X) ≤ 1 for all X  Pr(Y = 1|X) to be increasing in X (for 1>0)
  • 12. 9-12 Probit regression models the probability that Y=1 using the cumulative standard normal distribution function, evaluated at z = 0 + 1X: Pr(Y = 1|X) = (0 + 1X)   is the cumulative normal distribution function.  z = 0 + 1X is the “z-value” or “z-index” of the probit model. Example: Suppose 0 = -2, 1= 3, X = .4, so Pr(Y = 1|X=.4) = (-2 + 3 .4) = (-0.8) Pr(Y = 1|X=.4) = area under the standard normal density to left of z = -.8, which is…
  • 14. 9-14 Probit regression, ctd. Why use the cumulative normal probability distribution?  The “S-shape” gives us what we want: o0 ≤ Pr(Y = 1|X) ≤ 1 for all X oPr(Y = 1|X) to be increasing in X (for 1>0)  Easy to use – the probabilities are tabulated in the cumulative normal tables  Relatively straightforward interpretation: oz-value = 0 + 1X o 0 ˆ  + 1 ˆ  X is the predicted z-value, given X o1 is the change in the z-value for a unit change in X
  • 15. 9-15 STATA Example: HMDA data . probit deny p_irat, r; Iteration 0: log likelihood = -872.0853 We’ll discuss this later Iteration 1: log likelihood = -835.6633 Iteration 2: log likelihood = -831.80534 Iteration 3: log likelihood = -831.79234 Probit estimates Number of obs = 2380 Wald chi2(1) = 40.68 Prob > chi2 = 0.0000 Log likelihood = -831.79234 Pseudo R2 = 0.0462 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901 _cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082 ------------------------------------------------------------------------------ Pr( 1| / ) deny P Iratio  = (-2.19 + 2.97 P/I ratio) (.16) (.47)
  • 16. 9-16 STATA Example: HMDA data, ctd. Pr( 1| / ) deny P Iratio  = (-2.19 + 2.97 P/I ratio) (.16) (.47)  Positive coefficient: does this make sense?  Standard errors have usual interpretation  Predicted probabilities: Pr( 1| / .3) deny P Iratio   = (-2.19+2.97 .3) = (-1.30) = .097  Effect of change in P/I ratio from .3 to .4: Pr( 1| / .4) deny P Iratio   = (-2.19+2.97 .4) = .159 Predicted probability of denial rises from .097 to .159
  • 17. 9-17 Probit regression with multiple regressors Pr(Y = 1|X1, X2) = (0 + 1X1 + 2X2)   is the cumulative normal distribution function.  z = 0 + 1X1 + 2X2 is the “z-value” or “z-index” of the probit model.  1 is the effect on the z-score of a unit change in X1, holding constant X2
  • 18. 9-18 STATA Example: HMDA data . probit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Iteration 1: log likelihood = -800.88504 Iteration 2: log likelihood = -797.1478 Iteration 3: log likelihood = -797.13604 Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------ We’ll go through the estimation details later…
  • 19. 9-19 STATA Example: predicted probit probabilities . probit deny p_irat black, r; Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------ . sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0; . display "Pred prob, p_irat=.3, white: "normprob(z1); Pred prob, p_irat=.3, white: .07546603 NOTE _b[_cons] is the estimated intercept (-2.258738) _b[p_irat] is the coefficient on p_irat (2.741637) sca creates a new scalar which is the result of a calculation display prints the indicated information to the screen
  • 20. 9-20 STATA Example: HMDA data, ctd. Pr( 1| / , ) deny P I black  = (-2.26 + 2.74 P/I ratio + .71 black) (.16) (.44) (.08)  Is the coefficient on black statistically significant?  Estimated effect of race for P/I ratio = .3: Pr( 1|.3,1) deny  = (-2.26+2.74 .3+.71 1) = .233 Pr( 1|.3,0) deny  = (-2.26+2.74 .3+.71 0) = .075  Difference in rejection probabilities = .158 (15.8 percentage points)  Still plenty of room still for omitted variable bias…
  • 21. 9-21 Logit regression Logit regression models the probability of Y=1 as the cumulative standard logistic distribution function, evaluated at z = 0 + 1X: Pr(Y = 1|X) = F(0 + 1X) F is the cumulative logistic distribution function: F(0 + 1X) = 0 1 ( ) 1 1 X e     
  • 22. 9-22 Logistic regression, ctd. Pr(Y = 1|X) = F(0 + 1X) where F(0 + 1X) = 0 1 ( ) 1 1 X e      . Example: 0 = -3, 1= 2, X = .4, so 0 + 1X = -3 + 2 .4 = -2.2 so Pr(Y = 1|X=.4) = 1/(1+e–(–2.2) ) = .0998 Why bother with logit if we have probit?  Historically, numerically convenient  In practice, very similar to probit
  • 23. 9-23 STATA Example: HMDA data . logit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Later… Iteration 1: log likelihood = -806.3571 Iteration 2: log likelihood = -795.74477 Iteration 3: log likelihood = -795.69521 Iteration 4: log likelihood = -795.69521 Logit estimates Number of obs = 2380 Wald chi2(2) = 117.75 Prob > chi2 = 0.0000 Log likelihood = -795.69521 Pseudo R2 = 0.0876 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 5.370362 .9633435 5.57 0.000 3.482244 7.258481 black | 1.272782 .1460986 8.71 0.000 .9864339 1.55913 _cons | -4.125558 .345825 -11.93 0.000 -4.803362 -3.447753 ------------------------------------------------------------------------------ . dis "Pred prob, p_irat=.3, white: " > 1/(1+exp(-(_b[_cons]+_b[p_irat]*.3+_b[black]*0))); Pred prob, p_irat=.3, white: .07485143 NOTE: the probit predicted probability is .07546603
  • 24. 9-24 Predicted probabilities from estimated probit and logit models usually are very close.
  • 25. 9-25 Estimation and Inference in Probit (and Logit) Models (SW Section 9.3) Probit model: Pr(Y = 1|X) = (0 + 1X)  Estimation and inference oHow to estimate 0 and 1? oWhat is the sampling distribution of the estimators? oWhy can we use the usual methods of inference?  First discuss nonlinear least squares (easier to explain)  Then discuss maximum likelihood estimation (what is actually done in practice)
  • 26. 9-26 Probit estimation by nonlinear least squares Recall OLS: 0 1 2 , 0 1 1 min [ ( )] n b b i i i Y b b X      The result is the OLS estimators 0 ˆ  and 1 ˆ  In probit, we have a different regression function – the nonlinear probit model. So, we could estimate 0 and 1 by nonlinear least squares: 0 1 2 , 0 1 1 min [ ( )] n b b i i i Y b b X      Solving this yields the nonlinear least squares estimator of the probit coefficients.
  • 27. 9-27 Nonlinear least squares, ctd. 0 1 2 , 0 1 1 min [ ( )] n b b i i i Y b b X      How to solve this minimization problem?  Calculus doesn’t give and explicit solution.  Must be solved numerically using the computer, e.g. by “trial and error” method of trying one set of values for (b0,b1), then trying another, and another,…  Better idea: use specialized minimization algorithms In practice, nonlinear least squares isn’t used because it isn’t efficient – an estimator with a smaller variance is…
  • 28. 9-28 Probit estimation by maximum likelihood The likelihood function is the conditional density of Y1,…,Yn given X1,…,Xn, treated as a function of the unknown parameters 0 and 1.  The maximum likelihood estimator (MLE) is the value of (0, 1) that maximize the likelihood function.  The MLE is the value of (0, 1) that best describe the full distribution of the data.  In large samples, the MLE is: oconsistent onormally distributed oefficient (has the smallest variance of all estimators)
  • 29. 9-29 Special case: the probit MLE with no X Y = 1 with probability 0 with probability 1 p p     (Bernoulli distribution) Data: Y1,…,Yn, i.i.d. Derivation of the likelihood starts with the density of Y1: Pr(Y1 = 1) = p and Pr(Y1 = 0) = 1–p so Pr(Y1 = y1) = 1 1 1 (1 ) y y p p   (verify this for y1=0, 1!)
  • 30. 9-30 Joint density of (Y1,Y2): Because Y1 and Y2 are independent, Pr(Y1 = y1,Y2 = y2) = Pr(Y1 = y1) Pr(Y2 = y2) = [ 1 1 1 (1 ) y y p p   ] [ 2 2 1 (1 ) y y p p   ] Joint density of (Y1,..,Yn): Pr(Y1 = y1,Y2 = y2,…,Yn = yn) = [ 1 1 1 (1 ) y y p p   ] [ 2 2 1 (1 ) y y p p   ] … [ 1 (1 ) n n y y p p   ] =   1 1 (1 ) n n i i i i n y y p p      The likelihood is the joint density, treated as a function of the unknown parameters, which here is p:
  • 31. 9-31 f(p;Y1,…,Yn) =   1 1 (1 ) n n i i i i n Y Y p p      The MLE maximizes the likelihood. Its standard to work with the log likelihood, ln[f(p;Y1,…,Yn)]: ln[f(p;Y1,…,Yn)] =     1 1 ln( ) ln(1 ) n n i i i i Y p n Y p        1 ln ( ; ,..., ) n d f p Y Y dp =     1 1 1 1 1 n n i i i i Y n Y p p               = 0 Solving for p yields the MLE; that is, ˆ MLE p satisfies,
  • 32. 9-32     1 1 1 1 ˆ ˆ 1 n n i i MLE MLE i i Y n Y p p               = 0 or     1 1 1 1 ˆ ˆ 1 n n i i MLE MLE i i Y n Y p p        or ˆ ˆ 1 1 MLE MLE Y p Y p    or ˆ MLE p = Y = fraction of 1’s
  • 33. 9-33 The MLE in the “no-X” case (Bernoulli distribution): ˆ MLE p = Y = fraction of 1’s  For Yi i.i.d. Bernoulli, the MLE is the “natural” estimator of p, the fraction of 1’s, which is Y  We already know the essentials of inference: oIn large n, the sampling distribution of ˆ MLE p = Y is normally distributed oThus inference is “as usual:” hypothesis testing via t-statistic, confidence interval as 1.96SE  STATA note: to emphasize requirement of large-n, the printout calls the t-statistic the z-statistic; instead of the F-statistic, the chi-squared statstic (= q F).
  • 34. 9-34 The probit likelihood with one X The derivation starts with the density of Y1, given X1: Pr(Y1 = 1|X1) = (0 + 1X1) Pr(Y1 = 0|X1) = 1–(0 + 1X1) so Pr(Y1 = y1|X1) = 1 1 1 0 1 1 0 1 1 ( ) [1 ( )] y y X X          The probit likelihood function is the joint density of Y1,…,Yn given X1,…,Xn, treated as a function of 0, 1: f(0,1; Y1,…,Yn|X1,…,Xn) = { 1 1 1 0 1 1 0 1 1 ( ) [1 ( )] Y Y X X          } … { 1 0 1 0 1 ( ) [1 ( )] n n Y Y n n X X          } The probit likelihood function:
  • 35. 9-35 f(0,1; Y1,…,Yn|X1,…,Xn) = { 1 1 1 0 1 1 0 1 1 ( ) [1 ( )] Y Y X X          } … { 1 0 1 0 1 ( ) [1 ( )] n n Y Y n n X X          }  Can’t solve for the maximum explicitly  Must maximize using numerical methods  As in the case of no X, in large samples: o 0 ˆMLE  , 1 ˆMLE  are consistent o 0 ˆMLE  , 1 ˆMLE  are normally distributed (more later…) oTheir standard errors can be computed oTesting, confidence intervals proceeds as usual  For multiple X’s, see SW App. 9.2 The logit likelihood with one X
  • 36. 9-36  The only difference between probit and logit is the functional form used for the probability:  is replaced by the cumulative logistic function.  Otherwise, the likelihood is similar; for details see SW App. 9.2  As with probit, o 0 ˆMLE  , 1 ˆMLE  are consistent o 0 ˆMLE  , 1 ˆMLE  are normally distributed oTheir standard errors can be computed oTesting, confidence intervals proceeds as usual
  • 37. 9-37 Measures of fit The R2 and 2 R don’t make sense here (why?). So, two other specialized measures are used: 1. The fraction correctly predicted = fraction of Y’s for which predicted probability is >50% (if Yi=1) or is <50% (if Yi=0). 2. The pseudo-R2 measure the fit using the likelihood function: measures the improvement in the value of the log likelihood, relative to having no X’s (see SW App. 9.2). This simplifies to the R2 in the linear model with normally distributed errors.
  • 38. 9-38 Large-n distribution of the MLE (not in SW)  This is foundation of mathematical statistics.  We’ll do this for the “no-X” special case, for which p is the only unknown parameter. Here are the steps: 1. Derive the log likelihood (“(p)”) (done). 2. The MLE is found by setting its derivative to zero; that requires solving a nonlinear equation. 3. For large n, ˆ MLE p will be near the true p (ptrue ) so this nonlinear equation can be approximated (locally) by a linear equation (Taylor series around ptrue ). 4. This can be solved for ˆ MLE p – ptrue . 5. By the Law of Large Numbers and the CLT, for n large, n ( ˆ MLE p – ptrue ) is normally distributed.
  • 39. 9-39 1. Derive the log likelihood Recall: the density for observation #1 is: Pr(Y1 = y1) = 1 1 1 (1 ) y y p p   (density) so f(p;Y1) = 1 1 1 (1 ) Y Y p p   (likelihood) The likelihood for Y1,…,Yn is, f(p;Y1,…,Yn) = f(p;Y1) … f(p;Yn) so the log likelihood is, (p) = lnf(p;Y1,…,Yn) = ln[f(p;Y1) … f(p;Yn)] = 1 ln ( ; ) n i i f p Y   2. Set the derivative of (p) to zero to define the MLE:
  • 40. 9-40 ˆ ( ) MLE p p p   L = 1 ˆ ln ( ; ) MLE n i i p f p Y p     = 0 3. Use a Taylor series expansion around ptrue to approximate this as a linear function of ˆ MLE p : 0 = ˆ ( ) MLE p p p   L ( ) true p p p   L + 2 2 ( ) true p p p   L ( ˆ MLE p – ptrue )
  • 41. 9-41 4. Solve this linear approximation for ( ˆ MLE p – ptrue ): ( ) true p p p   L + 2 2 ( ) true p p p   L ( ˆ MLE p – ptrue ) 0 so 2 2 ( ) true p p p   L ( ˆ MLE p – ptrue ) – ( ) true p p p   L or ( ˆ MLE p – ptrue ) – 1 2 2 ( ) true p p p            L ( ) true p p p   L
  • 42. 9-42 5. Substitute things in and apply the LLN and CLT. (p) = 1 ln ( ; ) n i i f p Y   ( ) true p p p   L = 1 ln ( ; ) true n i i p f p Y p     2 2 ( ) true p p p   L = 2 2 1 ln ( ; ) true n i i p f p Y p     so ( ˆ MLE p – ptrue ) – 1 2 2 ( ) true p p p            L ( ) true p p p   L = 1 2 2 1 ln ( ; ) true n i i p f p Y p                       1 ln ( ; ) true n i i p f p Y p            
  • 43. 9-43 Multiply through by n : n ( ˆ MLE p – ptrue ) 1 2 2 1 1 ln ( ; ) true n i i p f p Y n p                       1 1 ln ( ; ) true n i i p f p Y p n                     Because Yi is i.i.d., the ith terms in the summands are also i.i.d. Thus, if these terms have enough (2) moments, then under general conditions (not just Bernoulli likelihood): 2 2 1 1 ln ( ; ) true n i i p f p Y n p              p  a (a constant) (WLLN) 1 1 ln ( ; ) true n i i p f p Y p n             d  N(0, 2 ln f  ) (CLT) (Why?) Putting this together,
  • 44. 9-44 n ( ˆ MLE p – ptrue ) 1 2 2 1 1 ln ( ; ) true n i i p f p Y n p                       1 1 ln ( ; ) true n i i p f p Y p n                     2 2 1 1 ln ( ; ) true n i i p f p Y n p              p  a (a constant) (WLLN) 1 1 ln ( ; ) true n i i p f p Y p n             d  N(0, 2 ln f  ) (CLT) (Why?) so n ( ˆ MLE p – ptrue ) d  N(0, 2 ln f  /a2 ) (large-n normal) Work out the details for probit/no X (Bernoulli) case:
  • 45. 9-45 Recall: f(p;Yi) = 1 (1 ) i i Y Y p p   so ln f(p;Yi) = Yilnp + (1–Yi)ln(1–p) and ln ( , ) i f p Y p   = 1 1 i i Y Y p p    = (1 ) i Y p p p   and 2 2 ln ( , ) i f p Y p   = 2 2 1 (1 ) i i Y Y p p     = 2 2 1 (1 ) i i Y Y p p          
  • 46. 9-46 Denominator term first: 2 2 ln ( , ) i f p Y p   = 2 2 1 (1 ) i i Y Y p p           so 2 2 1 1 ln ( ; ) true n i i p f p Y n p              = 2 2 1 1 1 (1 ) n i i i Y Y n p p            = 2 2 1 (1 ) Y Y p p    p  2 2 1 (1 ) p p p p    (LLN) = 1 1 1 p p   = 1 (1 ) p p 
  • 47. 9-47 Next the numerator: ln ( , ) i f p Y p   = (1 ) i Y p p p   so 1 1 ln ( ; ) true n i i p f p Y p n             = 1 1 (1 ) n i i Y p p p n     = 1 1 1 ( ) (1 ) n i i Y p p p n           d  N(0, 2 2 [ (1 )] Y p p   )
  • 48. 9-48 Put these pieces together: n ( ˆ MLE p – ptrue ) 1 2 2 1 1 ln ( ; ) true n i i p f p Y n p                       1 1 ln ( ; ) true n i i p f p Y p n                     where 2 2 1 1 ln ( ; ) true n i i p f p Y n p              p  1 (1 ) p p  1 1 ln ( ; ) true n i i p f p Y p n             d  N(0, 2 2 [ (1 )] Y p p   ) Thus n ( ˆ MLE p – ptrue ) d  N(0, 2 Y  )
  • 49. 9-49 Summary: probit MLE, no-X case The MLE: ˆ MLE p = Y Working through the full MLE distribution theory gave: n ( ˆ MLE p – ptrue ) d  N(0, 2 Y  ) But because ptrue = Pr(Y = 1) = E(Y) = Y, this is: n (Y – Y) d  N(0, 2 Y  ) A familiar result from the first week of class!
  • 50. 9-50 The MLE derivation applies generally n ( ˆ MLE p – ptrue ) d  N(0, 2 ln f  /a2 ))  Standard errors are obtained from working out expressions for 2 ln f  /a2  Extends to >1 parameter (0, 1) via matrix calculus  Because the distribution is normal for large n, inference is conducted as usual, for example, the 95% confidence interval is MLE 1.96SE.  The expression above uses “robust” standard errors, further simplifications yield non-robust standard errors which apply if ln ( ; )/ i f p Y p   is homoskedastic.
  • 51. 9-51 Summary: distribution of the MLE (Why did I do this to you?)  The MLE is normally distributed for large n  We worked through this result in detail for the probit model with no X’s (the Bernoulli distribution)  For large n, confidence intervals and hypothesis testing proceeds as usual  If the model is correctly specified, the MLE is efficient, that is, it has a smaller large-n variance than all other estimators (we didn’t show this).  These methods extend to other models with discrete dependent variables, for example count data (# crimes/day) – see SW App. 9.2.
  • 52. 9-52 Application to the Boston HMDA Data (SW Section 9.4)  Mortgages (home loans) are an essential part of buying a home.  Is there differential access to home loans by race?  If two otherwise identical individuals, one white and one black, applied for a home loan, is there a difference in the probability of denial?
  • 53. 9-53 The HMDA Data Set  Data on individual characteristics, property characteristics, and loan denial/acceptance  The mortgage application process circa 1990-1991: oGo to a bank or mortgage company oFill out an application (personal+financial info) oMeet with the loan officer  Then the loan officer decides – by law, in a race-blind way. Presumably, the bank wants to make profitable loans, and the loan officer doesn’t want to originate defaults.
  • 54. 9-54 The loan officer’s decision  Loan officer uses key financial variables: oP/I ratio ohousing expense-to-income ratio oloan-to-value ratio opersonal credit history  The decision rule is nonlinear: oloan-to-value ratio > 80% oloan-to-value ratio > 95% (what happens in default?) ocredit score
  • 55. 9-55 Regression specifications Pr(deny=1|black, other X’s) = …  linear probability model  probit Main problem with the regressions so far: potential omitted variable bias. All these (i) enter the loan officer decision function, all (ii) are or could be correlated with race:  wealth, type of employment  credit history  family status Variables in the HMDA data set…
  • 56. 9-56
  • 57. 9-57
  • 58. 9-58
  • 59. 9-59
  • 60. 9-60
  • 61. 9-61 Summary of Empirical Results  Coefficients on the financial variables make sense.  Black is statistically significant in all specifications  Race-financial variable interactions aren’t significant.  Including the covariates sharply reduces the effect of race on denial probability.  LPM, probit, logit: similar estimates of effect of race on the probability of denial.  Estimated effects are large in a “real world” sense.
  • 62. 9-62 Remaining threats to internal, external validity  Internal validity 1. omitted variable bias  what else is learned in the in-person interviews? 2. functional form misspecification (no…) 3. measurement error (originally, yes; now, no…) 4. selection  random sample of loan applications  define population to be loan applicants 5. simultaneous causality (no)  External validity This is for Boston in 1990-91. What about today?
  • 63. 9-63 Summary (SW Section 9.5)  If Yi is binary, then E(Y| X) = Pr(Y=1|X)  Three models: olinear probability model (linear multiple regression) oprobit (cumulative standard normal distribution) ologit (cumulative standard logistic distribution)  LPM, probit, logit all produce predicted probabilities  Effect of X is change in conditional probability that Y=1. For logit and probit, this depends on the initial X  Probit and logit are estimated via maximum likelihood oCoefficients are normally distributed for large n oLarge-n hypothesis testing, conf. intervals is as usual