SlideShare a Scribd company logo
1 of 52
U N I V E R S I T Y O F S O U T H F L O R I D A //
Discrete Choice Model
Dr. Shivendu
Agenda
5/24/2022 2
Discrete choice models
 Linear Probability Models
 Logit Models
 Probit Models
Quiz 7: Based on Class 8 Readings
Statistical Analysis III : Logistical Regression
 Chap 18 and 19 of DAU_SAS
 SAS Assignment 8 posted: Due before class 9
Type of Discrete Response Models
• Qualitative dichotomy (e.g., vote/not vote type variables)- We equate "no" with zero and
"yes" with 1. However, these are qualitative choices and the coding of 0-1 is arbitrary. We
could equally well code "no" as 1 and "yes" as zero.
• LPM
• Probit
• Logit
• CLASS 9: Revised Syllabus
• Multinominal Models: Qualitative multichotomy (e.g., occupational choice by an individual)-
Let 0 be a clerk, 1 an engineer, 2 an attorney, 3 a politician, 4 a college professor, and 5 other.
Here the codings are mere categories and the numbers have no real meaning.
• Ordinal Models: Rankings (e.g., opinions about a politician's job performance)- Strongly
approve (5), approve (4), don't know (3), disapprove (2), strongly disapprove (1). The values
that are chosen are not quantitative, but merely an ordering of preferences or opinions. The
difference between outcomes is not necessarily the same from 5 to 4 as it is from 2 to 1.
• Count outcomes: Censored Regression Models or Tobit Models
Binary Response Models
• Y variable has only two outcomes, which we can abstract as two values, 0 and 1
• We start with the thinking that the outcome Y depends on a set of X variables
• This is similar to linear regression model conceptualization
5/24/2022 4
Categorical Response Variables
Examples:
Whether or not a person
smokes 

 

Smoker
smoker
Non
Y
Success of a medical
treatment 



Dies
Survives
Y
Opinion poll responses






Disagree
Neutral
Agree
Y
Binary Response
Ordinal Response
OLS and Binary Y Variable
• Problem: OLS regression wasn’t really designed for dichotomous dependent
variables
• Two possible outcomes (typically labeled 0 & 1)
• What kinds of problems come up?
• Linearity assumption doesn’t hold up
• Error distribution is not normal
• The model offers nonsensical predicted values
• Instead of predicting pass (1) or fail (0), the regression line might predict -.5.
Example: Height predicts Gender
Y = Gender (0=Male 1=Female)
X = Height (inches)
Try an ordinary linear regression
> regmodel=lm(Gender~Hgt,data=Pulse)
> summary(regmodel)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.343647 0.397563 18.47 <2e-16 ***
Hgt -0.100658 0.005817 -17.30 <2e-16 ***
60 65 70 75
0.0
0.2
0.4
0.6
0.8
1.0
Hgt
Gender
The Linear Probability Model (LPM)
  i
Ki
K
i
i e
X
X
X
Y 




 


 ...
1
P 2
2
1
1
•Solution #1: Use OLS regression anyway!
•Dependent variable = the probability that Y =1 (as
opposed to 0)
• In previous example, Y=1, Female = ; 0 = Male
• We’ll assume that the probability changes as a linear
function of independent variable, height:
• Note: This assumption may not be appropriate
  i
K
j
ji
j e
X
Y 


 
1
1
P 

Linear Probability Model (LPM)
• The LPM may yield reasonable results
• Often good enough to get a “crude look” at your data
• Results tend to be better if data is well behaved
• Ex: If there are decent numbers of cases in each category of the dependent variable.
• Interpretation:
• Coefficients (b) reflect the increase in probability of Y=1 for each unit change in X
• Constant (a) reflects the base probability of Y=1 if all X variables are zero
• Significance tests are done; but may not be trustworthy due to OLS assumption violations.
LPM Example: Own a gun?
• Stata OLS output:
. regress gun male educ income south liberal
Source | SS df MS Number of obs = 850
-------------+------------------------------ F( 5, 844) = 17.86
Model | 18.3727851 5 3.67455703 Prob > F = 0.0000
Residual | 173.628391 844 .205720843 R-squared = 0.0957
-------------+------------------------------ Adj R-squared = 0.0903
Total | 192.001176 849 .226149796 Root MSE = .45356
------------------------------------------------------------------------------
gun | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .1637871 .0314914 5.20 0.000 .1019765 .2255978
educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616
income | .0379628 .0071879 5.28 0.000 .0238546 .0520711
south | .1539077 .0420305 3.66 0.000 .0714111 .2364043
liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708
_cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531
------------------------------------------------------------------------------
Interpretation: Each additional year of education decreases probability of gun
ownership by .015. What about other vars?
LPM Example: Own a gun?
  Ki
Ki
Ki
i
i Liberal
South
Inc
Educ
Male
Y 5
4
3
2
2
1
1
1
P 




 






• OLS results can yield predicted probabilities
• Just plug in values of constant, X’s into linear equation
• Ex: A conservative, poor, southern male:
  )
0
(
03
.
)
1
(
15
.
)
6
(
038
.
)
12
(
015
.
)
1
(
16
.
139
.
1
P 






Y
  501
.
1
P 

Y
------------------------------------------------------------------------------
gun | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .1637871 .0314914 5.20 0.000 .1019765 .2255978
educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616
income | .0379628 .0071879 5.28 0.000 .0238546 .0520711
south | .1539077 .0420305 3.66 0.000 .0714111 .2364043
liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708
_cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531
------------------------------------------------------------------------------
LPM Example: Own a gun?
  )
7
(
03
.
)
0
(
15
.
)
4
(
038
.
)
20
(
015
.
)
0
(
16
.
139
.
1
P 






Y
• Predicted probability for a female PhD student
• Highly educated northern liberal female
  23
.
1
P 


Y
  21
.
0
15
.
30
.
0
139
.
1
P 






Y
------------------------------------------------------------------------------
gun | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .1637871 .0314914 5.20 0.000 .1019765 .2255978
educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616
income | .0379628 .0071879 5.28 0.000 .0238546 .0520711
south | .1539077 .0420305 3.66 0.000 .0714111 .2364043
liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708
_cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531
------------------------------------------------------------------------------
LPM: Weaknesses
• Model yields nonsensical predicted values
• Probabilities should always fall between 0 and 1.
• Assumptions of OLS regression are violated
• Linearity
• Homoskedasticity (Equal error variance across values of X): error = low near 0, 1 & high at other
values.
• Normality of error distribution
• Coefficients (b) are not biased; but not “best” (i.e., lowest possible sampling variance)
• Variances & Standard errors will be inaccurate
• Hypothesis tests (t-tests, f-tests) can’t be trusted
Logistic Regression
•Better Alternative: Logistic Regression
• Also called “Logit”
• A non-linear form of regression that works well for binary or dichotomous
dependent variables
• Other non-linear formulations also work (e.g., probit)
•Based on “odds” rather than probability
• Rather than model P(Y=1), we model “log odds” of Y=1
• “Logit” refers to the natural log of an odds…
• Logistic regression is regression for a logit
• Rather than a simple variable “Y” (OLS)
• Or a probability (the Linear Probability Model).
Probability & Odds
outcomes
of
number
total
occurs
A
in which
outcomes
)
( 
A
p
•Probability of event A defined as p(A):
• Example: Coin Flip… probability of “heads”
• 1 outcome is “heads”, 2 total possible outcomes
• P(“heads”) = 1 / 2 = .5
• Odds of A = Number of outcomes that are A, divided
by number of outcomes that are not A
• Odds of “heads” = 1 / 1 = 1.0
• Also equivalent to: probability of event over probability of it not
happening: p/(1-p) = (.5 / 1-.5) = 1.0
Logistic Regression
i
i
i
p
p
odds


1
•We can convert a probability to odds:















K
j
ji
j
i
i
i X
p
p
L
p
1
1
ln
)
(
logit 

• “Logit” = natural log (ln) of an odds
• Natural log means base “e”, not base 10
– We can model a logit as a function of independent
variables:
• Just as we model Y or a probability (the LPM)
The Logit Curve
• Note: Logit always falls between 0 and 1
• From Knoke et al. p. 300
Logistic Regression
e
e
e K
j
ji
j
K
j
ji
j
K
j
ji
j
X
B
X
B
X
B
Y
P

 




















1
1 1
1
1 1
)
1
(



•Note: We can solve for “p” and reformulate the
model:
• Why model this rather than a probability?
– Because it is a useful non-linear transformation
• It always generates Ps between 0 and 1, regardless of the values
of X variables
• Note: probit transformation has similar effect.
Logistic Regression: Estimation
Ki
K
i
i
i X
X
X
L 


 


 ...
ˆ
2
2
1
1
• Estimation: We can model the logit
• Solution requires Maximum Likelihood Estimation
(MLE)
• In OLS there was an algebraic solution
• Here, we allow the computer to “search” for the best values of
coefficients (“a” and “b”s) to fit observed data.
OLS estimation vs MLE
• In OLS, estimated parameters are obtained by algebraic equaltion
• MLE:Maximum Likelihood Estimation (MLE) is a technique to find the:
most likely parameters or function that explain observed data.
5/24/2022 21
What is MLE?
• The maximum likelihood method is also based on a model and on a distribution.
• The model, P(X | p) is the probability of an event X dependent on model parameters p.
• The likelihood of the parameters given the data is the probability of observing X given p.
• The maximum likelihood method consists in optimizing the likelihood function:
• the goal is to estimate the parameters p which make it most likely to observe the data X.
5/24/2022 22
Logistic Regression: Estimation
• Properties of Maximum Likelihood Estimation
• “Consistent, efficient and asymptotically normal as N approaches infinity.” Large N = better!
• Rules of thumb regarding sample size
• N > 500 = fine; N < 100 can be worrisome
• Results aren’t necessarily wrong if N<100;
• But it is a possibility; and hard to know when problems crop up
• Higher N is needed if data are problematic due to:
• Multicollinearity
• Limited variation in dependent variable.
Logistic Regression
•Benefits of Logistic regression:
• You can now effectively model probability as a function of X variables
• You don’t have to worry about violations of OLS assumptions
• Predictions fall between 0 and 1
•Downsides
• You lose the “simple” interpretation of linear coefficients
• In a linear model, effect of each unit change in X on Y is consistent
• In a non-linear model, the effect isn’t consistent…
• Also, you can’t compute some stats (e.g., R-square).
Logistic Regression Example
• Stata output for gun ownership:
. logistic gun male educ income south liberal, coef
Logistic regression Number of obs = 850
LR chi2(5) = 89.53
Prob > chi2 = 0.0000
Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------
gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .7837017 .156764 5.00 0.000 .4764499 1.090954
educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984
income | .2416647 .0493794 4.89 0.000 .1448828 .3384466
south | .7363169 .1979038 3.72 0.000 .3484327 1.124201
liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921
_cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455
------------------------------------------------------------------------------
• Note: Results aren’t that different from LPM
• We’re dealing with big effects, large sample…
• But, predicted probabilities & SEs will be better.
Interpreting Coefficients
•Raw coefficients (s) show effect of 1-unit change in X on the log odds
of Y=1
• Positive coefficients make “Y=1” more likely
• Negative coefficients mean “less likely”
• But, effects are not linear
• Effect of unit change on p(Y=1) isn’t same for all values of X!
• Rather, Xs have a linear effect on the “log odds”
• But, it is hard to think in units of “log odds”, so we need to do further calculations
• NOTE: log-odds interpretation doesn’t work on Probit!
Interpreting Coefficients
•Best way to interpret logit coefficients is to exponentiate them
• This converts from “log odds” to simple “odds”
• Exponentiation = opposite of natural log
• On calculator use “ex” or “inverse ln” function
• Exponentiated coefficients are called odds ratios
• An odds ratio of 3.0 indicates odds are 3 times higher for each unit change in X
• Or, you can say the odds increase “by a factor of 3”.
• An odds ratio of .5 indicates odds decrease by ½ for each unit change in X.
• Odds ratios < 1 indicate negative effects.
Interpreting Coefficients
•Example: Do you drink coffee?
• Y=1 indicates coffee drinkers; Y=0 indicates no coffee
• Key independent variable: Year in grad program
• Observed “raw” coefficient: b = 0.67
• A positive effect… each year increases log odds by .67
• But how big is it really?
• Exponentiation: e.67= 1.95
• Odds increase multiplicatively by 1.95
• If a person’s initial odds were 2.0 (2:1), an extra year of school would
result in: 2.0*1.95 = 3.90
• The odds nearly DOUBLE for each unit change in X
• Net of other variables in the model…
Interpreting Coefficients
•Exponentiated coefficients (“odds ratios”) operate
multiplicatively
• Effect on odds is found by multiplying coefficients
• eb of 1.0 means that a variable has no effect
• Multiplying anything by 1.0 results in same value
• eb > 1.0 means that the variable has a positive effect on the
odds of “Y=1”
• eb < 1.0 means that the variable has a negative effect
•Hint: Papers may present results as “raw” coefficients
or odds ratios
• It is important to be aware of what you’re looking at
• If all coeffs are positive, they might be odds ratios!
Interpreting Coefficients
•To further aid interpretation, we can: convert
exponentiated coefficients to % change in odds
• Calculate: (exponentiated coef - 1)*100%
• Ex: (e.67 – 1) * 100% = (1.95 – 1) * 100% = 95%
• Interpretation: Every unit change in X (year of school) increases the odds of
coffee drinking by 95%
•What about a 2-point change in X?
• Is it 2 * 95%? No!!! You must multiply odds ratios:
• (1.95 * 1.95 – 1) * 100% = (3.80 – 1) * 100 = +280%
• 3-point change = (1.95 * 1.95 * 1.95 – 1) * 100%
• N-point change = (ORn – 1) * 100%
Interpreting Coefficients
•What is the effect of a 1-unit decrease in X?
• No, you can’t flip sign… it isn’t -95%
• You must invert odds ratios to see opposite effect
• Additional year in school = (1.95 – 1) * 100% = +95%
• One year less: (1/1.95 – 1)*100 =(.512 -1)*100= -48.7%
•What is the effect of two variables together?
• To combine odds ratios you must multiply
• Ex: Have a mean advisor; b=.1.2; OR = e1.2 = 3.32
• Effect of 1 additional year AND mean advisor:
• (1.95 * 3.32 – 1)*100 = (6.47 – 1) * 100% = 547% increase in odds of coffee
drinking…
Interpreting Coefficients
•Gun ownership: Effect of education?
. logistic gun male educ income south liberal, coef
Logistic regression Number of obs = 850
LR chi2(5) = 89.53
Prob > chi2 = 0.0000
Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------
gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .7837017 .156764 5.00 0.000 .4764499 1.090954
educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984
income | .2416647 .0493794 4.89 0.000 .1448828 .3384466
south | .7363169 .1979038 3.72 0.000 .3484327 1.124201
liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921
_cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455
------------------------------------------------------------------------------
• (e-.076-1)*100% = 7.39% lower odds per year
• Also: Male: (e.78-1)*100% = +118% -- more than double!
Raw Coefs vs. Odds ratios
• It is common to present results either way:
. logistic gun male educ income south liberal, coef
------------------------------------------------------------------------------
gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .7837017 .156764 5.00 0.000 .4764499 1.090954
educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984
income | .2416647 .0493794 4.89 0.000 .1448828 .3384466
south | .7363169 .1979038 3.72 0.000 .3484327 1.124201
liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921
_cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455
------------------------------------------------------------------------------
. logistic gun male educ income south liberal
------------------------------------------------------------------------------
gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112
educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768
income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767
south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757
liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762
------------------------------------------------------------------------------
Can you see the relationship? Negative coeffs yield ratios less below 1.0!
Interpreting Coefficients
•Raw coefficients (s) show effect of 1-unit change in X on the log odds
of Y=1
• Positive coefficients make “Y=1” more likely
• Negative coefficients mean “less likely”
• But, effects are not linear
• Effect of unit change on p(Y=1) isn’t same for all values of X!
• Rather, Xs have a linear effect on the “log odds”
• But, it is hard to think in units of “log odds”, so we need to do further calculations
• NOTE: log-odds interpretation doesn’t work on Probit!
Interpreting Coefficients
•Best way to interpret logit coefficients is to exponentiate them
• This converts from “log odds” to simple “odds”
• Exponentiation = opposite of natural log
• On calculator use “ex” or “inverse ln” function
• Exponentiated coefficients are called odds ratios
• An odds ratio of 3.0 indicates odds are 3 times higher for each unit change in X
• Or, you can say the odds increase “by a factor of 3”.
• An odds ratio of .5 indicates odds decrease by ½ for each unit change in X.
• Odds ratios < 1 indicate negative effects.
Predicted Probabilities
Ki
K
i
i
i X
X
X
L 


 


 ...
ˆ
2
2
1
1
•To determine predicted probabilities, first compute
the predicted Logit value:
e
e
e
e
i
i
K
j
ji
j
K
j
ji
j
L
L
X
B
X
B
Y
P











1
1
ˆ
ˆ
1
1
)
1
(


• Then, plug logit values back into P formula:
Predicted Probabilities: Own a gun?
0
.
4
)
7
(
16
.
)
0
(
73
.
)
4
(
24
.
)
20
(
077
.
)
0
(
78
.
28
.
2
ˆ 








i
L
•Predicted probability for a female PhD student
• Highly educated northern liberal female
017
.
018
.
1
018
.
)
1
(
1
1
0
.
4
0
.
4
ˆ
ˆ










e
e
e
e
i
i
L
L
Y
P
------------------------------------------------------------------------------
gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .7837017 .156764 5.00 0.000 .4764499 1.090954
educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984
income | .2416647 .0493794 4.89 0.000 .1448828 .3384466
south | .7363169 .1979038 3.72 0.000 .3484327 1.124201
liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921
_cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455
------------------------------------------------------------------------------
The Logit Curve
• Effect of log odds on probability = nonlinear!
• From Knoke et al. p. 300
Predicted Probabilities
•Important point: Substantive effect of a variable on
predicted probability differs depending on values of other
variables
• If probability is already high (or low), variable changes may matter less…
• Suppose a 1-point change in X doubles the odds…
• Effect isn’t substantively consequential if probability (Y=1) is already very high
• Ex: 20:1 odds = .95 probability; 40:1 odds = .975 probability
• Change in probability is only .025
• Effect matters a lot for cases with probabilities near .5
• 1:1 odds = .5 probability. 2:1 odds = .67 probability
• Change in probability is nearly .2!
Logit Example: Own a gun?
16
.
4
)
7
(
16
.
)
0
(
73
.
)
4
(
24
.
)
22
(
077
.
)
0
(
78
.
28
.
2
ˆ 








i
L
•Predicted probability of gun ownership for a female
PhD student is very low: P=.017
• Two additional years of education lowers probability from
.017 to .015 – not a big effect
• Additional unit change can’t have a big effect – because probability
can’t go below zero
• It would matter much more for a southern male…
0153
.
0156
.
1
0156
.
)
1
(
1
1
0
.
4
0
.
4
ˆ
ˆ










e
e
e
e
i
i
L
L
Y
P
Predicted Probabilities
•Predicted probabilities are a great way to make findings
accessible to a reader
• Often people make bar graphs of probabilities
• 1. Show predicted probabilities for real cases
• Ex: probability of civil war for Ghana vs. Sweden
• 2. Show probabilities for “hypothetical” cases that exemplify key
contrasts in your data
• Ex: Guns: Southern male vs. female PhD student
• 3. Show how a change in critical independent variable would affect
predicted probability
• Ex: Guns: What would happen to southern male who went and got a PhD?
Marginal Change in Logit
•Issue: How to best capture effect size in non-linear models?
• % Change in odds ratios for 1-unit change in X
• Change in actual probability for 1-unit change in X
• Either for hypothetical cases or an actual case
•Another option: marginal change
• The actual slope of the curve at a specific point
• Again, can be computed for real or hypothetical cases
• Recall from calculus: derivatives are slopes...
• So, a marginal change is just a derivative.
Sensitivity / Specificity of Prediction
• Sensitivity: Of gun owners, what proportion were correctly predicted to own a gun?
• Specificity: Of non-gun owners, what proportion did we correctly predict?
• Choosing a different probability cutoff affects those values
• If we reduce the cutoff to P > .4, we’ll catch a higher proportion of gun owners
• But, we’ll incorrectly identify more non-gun owners.
• And, we’ll have more false positives.
Hypothesis tests
•Testing hypotheses using logistic regression
• H0: There is no effect of year in grad program on coffee drinking
• H1: Year in grad school is associated with coffee
• Or, one-tail test: Year in school increases probability of coffee
• MLE estimation yields standard errors… like OLS
• Test statistic: 2 options; both yield same results
• t = b/SE… just like OLS regression
• Wald test (Chi-square, 1df); essentially the square of t
• Reject H0 if Wald or t > critical value
• Or if p-value less than alpha (usually .05).
Model Fit: Likelihood Ratio Tests
• MLE computes a likelihood for the model
• “Better” models have higher likelihoods
• Log likelihood is typically a negative value, so “better” means a less negative value… -100 > -1000
• Log likelihood ratio test: Allows comparison of any two nested models
• One model must be a subset of vars in other model
• You can’t compare totally unrelated models!
• Models must use the exact same sample.
Model Fit: Pseudo R-Square
•Pseudo R-square
• “A descriptive measure that indicates roughly the proportion of
observed variation accounted for by the… predictors.” Knoke et al, p.
313
Logistic regression Number of obs = 850
LR chi2(5) = 89.53
Prob > chi2 = 0.0000
Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------
gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112
educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768
income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767
south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757
liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762
------------------------------------------------------------------------------
Model explains roughly 8% of variation in Y
Assumptions & Problems
• Assumption: Independent random sample
• Serial correlation or clustering violate assumptions; bias SE estimates and hypothesis tests
• Multicollinearity: High correlation among independent variables causes
problems
• Unstable, inefficient estimates
• Watch for coefficient instability, check VIF/tolerance
• Remove unneeded variables or create indexes of related variables.
Assumptions & Problems
• Outliers/Influential cases
• Unusual/extreme cases can distort results, just like OLS
Assumptions & Problems
•Insufficient variance: You need cases for both values of
the dependent variable
• Extremely rare (or common) events can be a problem
• Suppose N=1000, but only 3 are coded Y=1
• Estimates won’t be great
•Also: Maximum likelihood estimates cannot be
computed if any independent variable perfectly predicts
the outcome (Y=1)
• Ex: Suppose sociology classes drives all students to drink coffee... So
there is no variation…
• In that case, you cannot include a dummy variable for taking sociology classes
in the model.
Assumptions & Problems
• Model specification / Omitted variable bias
• Just like any regression model, it is critical to include appropriate variables in the model
• Omission of important factors or ‘controls’ will lead to misleading results.
Probit
• Probit models are an alternative to logistic regression
• Involves a different non-linear transformation
• Generally yields results very similar to logit models
• Coefficients are rescaled by factor of (approx) 1.6
• For ‘garden variety’ analyses, there is little reason to prefer either logit or probit
• But, probit has advantages in some circumstances
• Ex: Multinomial models that violate the IIA assumption (to be discussed later).
Takeaway
• LPM models are easy to interpret and work well if independent variables do not take extreme
values
• Logit models are backbone of binary choice models and coefficients should be interpreted
carefully.
5/24/2022 52

More Related Content

Similar to Discrete Choice Model

Prediction the stock market with genetic programming
Prediction the stock market with genetic programmingPrediction the stock market with genetic programming
Prediction the stock market with genetic programming
David Moskowitz, Ph.D.
 

Similar to Discrete Choice Model (20)

Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
An introduction to the Multivariable analysis.ppt
An introduction to the Multivariable analysis.pptAn introduction to the Multivariable analysis.ppt
An introduction to the Multivariable analysis.ppt
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
cp cpk lean manufacturingggggggggggggg.pdf
cp cpk lean manufacturingggggggggggggg.pdfcp cpk lean manufacturingggggggggggggg.pdf
cp cpk lean manufacturingggggggggggggg.pdf
 
Ch4 slides
Ch4 slidesCh4 slides
Ch4 slides
 
scical manual fx-250HC
scical manual fx-250HCscical manual fx-250HC
scical manual fx-250HC
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Lecture slides stats1.13.l07.air
Lecture slides stats1.13.l07.airLecture slides stats1.13.l07.air
Lecture slides stats1.13.l07.air
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
 
Linear Programing.pptx
Linear Programing.pptxLinear Programing.pptx
Linear Programing.pptx
 
Simplex Algorithm
Simplex AlgorithmSimplex Algorithm
Simplex Algorithm
 
Top schools in noida
Top schools in noidaTop schools in noida
Top schools in noida
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
Six sigma
Six sigma Six sigma
Six sigma
 
Econometric (Indonesia's Economy).pptx
Econometric (Indonesia's Economy).pptxEconometric (Indonesia's Economy).pptx
Econometric (Indonesia's Economy).pptx
 
The fundamentals of regression
The fundamentals of regressionThe fundamentals of regression
The fundamentals of regression
 
Linear models
Linear modelsLinear models
Linear models
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Prediction the stock market with genetic programming
Prediction the stock market with genetic programmingPrediction the stock market with genetic programming
Prediction the stock market with genetic programming
 
Ab data
Ab dataAb data
Ab data
 

More from Michael770443

More from Michael770443 (9)

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical Analysis
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Classification
ClassificationClassification
Classification
 
Segmentation: Clustering and Classification
Segmentation: Clustering and ClassificationSegmentation: Clustering and Classification
Segmentation: Clustering and Classification
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical Methods
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical Concepts
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 

Discrete Choice Model

  • 1. U N I V E R S I T Y O F S O U T H F L O R I D A // Discrete Choice Model Dr. Shivendu
  • 2. Agenda 5/24/2022 2 Discrete choice models  Linear Probability Models  Logit Models  Probit Models Quiz 7: Based on Class 8 Readings Statistical Analysis III : Logistical Regression  Chap 18 and 19 of DAU_SAS  SAS Assignment 8 posted: Due before class 9
  • 3. Type of Discrete Response Models • Qualitative dichotomy (e.g., vote/not vote type variables)- We equate "no" with zero and "yes" with 1. However, these are qualitative choices and the coding of 0-1 is arbitrary. We could equally well code "no" as 1 and "yes" as zero. • LPM • Probit • Logit • CLASS 9: Revised Syllabus • Multinominal Models: Qualitative multichotomy (e.g., occupational choice by an individual)- Let 0 be a clerk, 1 an engineer, 2 an attorney, 3 a politician, 4 a college professor, and 5 other. Here the codings are mere categories and the numbers have no real meaning. • Ordinal Models: Rankings (e.g., opinions about a politician's job performance)- Strongly approve (5), approve (4), don't know (3), disapprove (2), strongly disapprove (1). The values that are chosen are not quantitative, but merely an ordering of preferences or opinions. The difference between outcomes is not necessarily the same from 5 to 4 as it is from 2 to 1. • Count outcomes: Censored Regression Models or Tobit Models
  • 4. Binary Response Models • Y variable has only two outcomes, which we can abstract as two values, 0 and 1 • We start with the thinking that the outcome Y depends on a set of X variables • This is similar to linear regression model conceptualization 5/24/2022 4
  • 5. Categorical Response Variables Examples: Whether or not a person smokes      Smoker smoker Non Y Success of a medical treatment     Dies Survives Y Opinion poll responses       Disagree Neutral Agree Y Binary Response Ordinal Response
  • 6. OLS and Binary Y Variable • Problem: OLS regression wasn’t really designed for dichotomous dependent variables • Two possible outcomes (typically labeled 0 & 1) • What kinds of problems come up? • Linearity assumption doesn’t hold up • Error distribution is not normal • The model offers nonsensical predicted values • Instead of predicting pass (1) or fail (0), the regression line might predict -.5.
  • 7. Example: Height predicts Gender Y = Gender (0=Male 1=Female) X = Height (inches) Try an ordinary linear regression > regmodel=lm(Gender~Hgt,data=Pulse) > summary(regmodel) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.343647 0.397563 18.47 <2e-16 *** Hgt -0.100658 0.005817 -17.30 <2e-16 ***
  • 8. 60 65 70 75 0.0 0.2 0.4 0.6 0.8 1.0 Hgt Gender
  • 9. The Linear Probability Model (LPM)   i Ki K i i e X X X Y           ... 1 P 2 2 1 1 •Solution #1: Use OLS regression anyway! •Dependent variable = the probability that Y =1 (as opposed to 0) • In previous example, Y=1, Female = ; 0 = Male • We’ll assume that the probability changes as a linear function of independent variable, height: • Note: This assumption may not be appropriate   i K j ji j e X Y      1 1 P  
  • 10. Linear Probability Model (LPM) • The LPM may yield reasonable results • Often good enough to get a “crude look” at your data • Results tend to be better if data is well behaved • Ex: If there are decent numbers of cases in each category of the dependent variable. • Interpretation: • Coefficients (b) reflect the increase in probability of Y=1 for each unit change in X • Constant (a) reflects the base probability of Y=1 if all X variables are zero • Significance tests are done; but may not be trustworthy due to OLS assumption violations.
  • 11. LPM Example: Own a gun? • Stata OLS output: . regress gun male educ income south liberal Source | SS df MS Number of obs = 850 -------------+------------------------------ F( 5, 844) = 17.86 Model | 18.3727851 5 3.67455703 Prob > F = 0.0000 Residual | 173.628391 844 .205720843 R-squared = 0.0957 -------------+------------------------------ Adj R-squared = 0.0903 Total | 192.001176 849 .226149796 Root MSE = .45356 ------------------------------------------------------------------------------ gun | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .1637871 .0314914 5.20 0.000 .1019765 .2255978 educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616 income | .0379628 .0071879 5.28 0.000 .0238546 .0520711 south | .1539077 .0420305 3.66 0.000 .0714111 .2364043 liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708 _cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531 ------------------------------------------------------------------------------ Interpretation: Each additional year of education decreases probability of gun ownership by .015. What about other vars?
  • 12. LPM Example: Own a gun?   Ki Ki Ki i i Liberal South Inc Educ Male Y 5 4 3 2 2 1 1 1 P              • OLS results can yield predicted probabilities • Just plug in values of constant, X’s into linear equation • Ex: A conservative, poor, southern male:   ) 0 ( 03 . ) 1 ( 15 . ) 6 ( 038 . ) 12 ( 015 . ) 1 ( 16 . 139 . 1 P        Y   501 . 1 P   Y ------------------------------------------------------------------------------ gun | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .1637871 .0314914 5.20 0.000 .1019765 .2255978 educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616 income | .0379628 .0071879 5.28 0.000 .0238546 .0520711 south | .1539077 .0420305 3.66 0.000 .0714111 .2364043 liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708 _cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531 ------------------------------------------------------------------------------
  • 13. LPM Example: Own a gun?   ) 7 ( 03 . ) 0 ( 15 . ) 4 ( 038 . ) 20 ( 015 . ) 0 ( 16 . 139 . 1 P        Y • Predicted probability for a female PhD student • Highly educated northern liberal female   23 . 1 P    Y   21 . 0 15 . 30 . 0 139 . 1 P        Y ------------------------------------------------------------------------------ gun | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .1637871 .0314914 5.20 0.000 .1019765 .2255978 educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616 income | .0379628 .0071879 5.28 0.000 .0238546 .0520711 south | .1539077 .0420305 3.66 0.000 .0714111 .2364043 liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708 _cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531 ------------------------------------------------------------------------------
  • 14. LPM: Weaknesses • Model yields nonsensical predicted values • Probabilities should always fall between 0 and 1. • Assumptions of OLS regression are violated • Linearity • Homoskedasticity (Equal error variance across values of X): error = low near 0, 1 & high at other values. • Normality of error distribution • Coefficients (b) are not biased; but not “best” (i.e., lowest possible sampling variance) • Variances & Standard errors will be inaccurate • Hypothesis tests (t-tests, f-tests) can’t be trusted
  • 15. Logistic Regression •Better Alternative: Logistic Regression • Also called “Logit” • A non-linear form of regression that works well for binary or dichotomous dependent variables • Other non-linear formulations also work (e.g., probit) •Based on “odds” rather than probability • Rather than model P(Y=1), we model “log odds” of Y=1 • “Logit” refers to the natural log of an odds… • Logistic regression is regression for a logit • Rather than a simple variable “Y” (OLS) • Or a probability (the Linear Probability Model).
  • 16. Probability & Odds outcomes of number total occurs A in which outcomes ) (  A p •Probability of event A defined as p(A): • Example: Coin Flip… probability of “heads” • 1 outcome is “heads”, 2 total possible outcomes • P(“heads”) = 1 / 2 = .5 • Odds of A = Number of outcomes that are A, divided by number of outcomes that are not A • Odds of “heads” = 1 / 1 = 1.0 • Also equivalent to: probability of event over probability of it not happening: p/(1-p) = (.5 / 1-.5) = 1.0
  • 17. Logistic Regression i i i p p odds   1 •We can convert a probability to odds:                K j ji j i i i X p p L p 1 1 ln ) ( logit   • “Logit” = natural log (ln) of an odds • Natural log means base “e”, not base 10 – We can model a logit as a function of independent variables: • Just as we model Y or a probability (the LPM)
  • 18. The Logit Curve • Note: Logit always falls between 0 and 1 • From Knoke et al. p. 300
  • 19. Logistic Regression e e e K j ji j K j ji j K j ji j X B X B X B Y P                        1 1 1 1 1 1 ) 1 (    •Note: We can solve for “p” and reformulate the model: • Why model this rather than a probability? – Because it is a useful non-linear transformation • It always generates Ps between 0 and 1, regardless of the values of X variables • Note: probit transformation has similar effect.
  • 20. Logistic Regression: Estimation Ki K i i i X X X L         ... ˆ 2 2 1 1 • Estimation: We can model the logit • Solution requires Maximum Likelihood Estimation (MLE) • In OLS there was an algebraic solution • Here, we allow the computer to “search” for the best values of coefficients (“a” and “b”s) to fit observed data.
  • 21. OLS estimation vs MLE • In OLS, estimated parameters are obtained by algebraic equaltion • MLE:Maximum Likelihood Estimation (MLE) is a technique to find the: most likely parameters or function that explain observed data. 5/24/2022 21
  • 22. What is MLE? • The maximum likelihood method is also based on a model and on a distribution. • The model, P(X | p) is the probability of an event X dependent on model parameters p. • The likelihood of the parameters given the data is the probability of observing X given p. • The maximum likelihood method consists in optimizing the likelihood function: • the goal is to estimate the parameters p which make it most likely to observe the data X. 5/24/2022 22
  • 23. Logistic Regression: Estimation • Properties of Maximum Likelihood Estimation • “Consistent, efficient and asymptotically normal as N approaches infinity.” Large N = better! • Rules of thumb regarding sample size • N > 500 = fine; N < 100 can be worrisome • Results aren’t necessarily wrong if N<100; • But it is a possibility; and hard to know when problems crop up • Higher N is needed if data are problematic due to: • Multicollinearity • Limited variation in dependent variable.
  • 24. Logistic Regression •Benefits of Logistic regression: • You can now effectively model probability as a function of X variables • You don’t have to worry about violations of OLS assumptions • Predictions fall between 0 and 1 •Downsides • You lose the “simple” interpretation of linear coefficients • In a linear model, effect of each unit change in X on Y is consistent • In a non-linear model, the effect isn’t consistent… • Also, you can’t compute some stats (e.g., R-square).
  • 25. Logistic Regression Example • Stata output for gun ownership: . logistic gun male educ income south liberal, coef Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000 Log likelihood = -502.7251 Pseudo R2 = 0.0818 ------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455 ------------------------------------------------------------------------------ • Note: Results aren’t that different from LPM • We’re dealing with big effects, large sample… • But, predicted probabilities & SEs will be better.
  • 26. Interpreting Coefficients •Raw coefficients (s) show effect of 1-unit change in X on the log odds of Y=1 • Positive coefficients make “Y=1” more likely • Negative coefficients mean “less likely” • But, effects are not linear • Effect of unit change on p(Y=1) isn’t same for all values of X! • Rather, Xs have a linear effect on the “log odds” • But, it is hard to think in units of “log odds”, so we need to do further calculations • NOTE: log-odds interpretation doesn’t work on Probit!
  • 27. Interpreting Coefficients •Best way to interpret logit coefficients is to exponentiate them • This converts from “log odds” to simple “odds” • Exponentiation = opposite of natural log • On calculator use “ex” or “inverse ln” function • Exponentiated coefficients are called odds ratios • An odds ratio of 3.0 indicates odds are 3 times higher for each unit change in X • Or, you can say the odds increase “by a factor of 3”. • An odds ratio of .5 indicates odds decrease by ½ for each unit change in X. • Odds ratios < 1 indicate negative effects.
  • 28. Interpreting Coefficients •Example: Do you drink coffee? • Y=1 indicates coffee drinkers; Y=0 indicates no coffee • Key independent variable: Year in grad program • Observed “raw” coefficient: b = 0.67 • A positive effect… each year increases log odds by .67 • But how big is it really? • Exponentiation: e.67= 1.95 • Odds increase multiplicatively by 1.95 • If a person’s initial odds were 2.0 (2:1), an extra year of school would result in: 2.0*1.95 = 3.90 • The odds nearly DOUBLE for each unit change in X • Net of other variables in the model…
  • 29. Interpreting Coefficients •Exponentiated coefficients (“odds ratios”) operate multiplicatively • Effect on odds is found by multiplying coefficients • eb of 1.0 means that a variable has no effect • Multiplying anything by 1.0 results in same value • eb > 1.0 means that the variable has a positive effect on the odds of “Y=1” • eb < 1.0 means that the variable has a negative effect •Hint: Papers may present results as “raw” coefficients or odds ratios • It is important to be aware of what you’re looking at • If all coeffs are positive, they might be odds ratios!
  • 30. Interpreting Coefficients •To further aid interpretation, we can: convert exponentiated coefficients to % change in odds • Calculate: (exponentiated coef - 1)*100% • Ex: (e.67 – 1) * 100% = (1.95 – 1) * 100% = 95% • Interpretation: Every unit change in X (year of school) increases the odds of coffee drinking by 95% •What about a 2-point change in X? • Is it 2 * 95%? No!!! You must multiply odds ratios: • (1.95 * 1.95 – 1) * 100% = (3.80 – 1) * 100 = +280% • 3-point change = (1.95 * 1.95 * 1.95 – 1) * 100% • N-point change = (ORn – 1) * 100%
  • 31. Interpreting Coefficients •What is the effect of a 1-unit decrease in X? • No, you can’t flip sign… it isn’t -95% • You must invert odds ratios to see opposite effect • Additional year in school = (1.95 – 1) * 100% = +95% • One year less: (1/1.95 – 1)*100 =(.512 -1)*100= -48.7% •What is the effect of two variables together? • To combine odds ratios you must multiply • Ex: Have a mean advisor; b=.1.2; OR = e1.2 = 3.32 • Effect of 1 additional year AND mean advisor: • (1.95 * 3.32 – 1)*100 = (6.47 – 1) * 100% = 547% increase in odds of coffee drinking…
  • 32. Interpreting Coefficients •Gun ownership: Effect of education? . logistic gun male educ income south liberal, coef Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000 Log likelihood = -502.7251 Pseudo R2 = 0.0818 ------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455 ------------------------------------------------------------------------------ • (e-.076-1)*100% = 7.39% lower odds per year • Also: Male: (e.78-1)*100% = +118% -- more than double!
  • 33. Raw Coefs vs. Odds ratios • It is common to present results either way: . logistic gun male educ income south liberal, coef ------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455 ------------------------------------------------------------------------------ . logistic gun male educ income south liberal ------------------------------------------------------------------------------ gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112 educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768 income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767 south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757 liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762 ------------------------------------------------------------------------------ Can you see the relationship? Negative coeffs yield ratios less below 1.0!
  • 34. Interpreting Coefficients •Raw coefficients (s) show effect of 1-unit change in X on the log odds of Y=1 • Positive coefficients make “Y=1” more likely • Negative coefficients mean “less likely” • But, effects are not linear • Effect of unit change on p(Y=1) isn’t same for all values of X! • Rather, Xs have a linear effect on the “log odds” • But, it is hard to think in units of “log odds”, so we need to do further calculations • NOTE: log-odds interpretation doesn’t work on Probit!
  • 35. Interpreting Coefficients •Best way to interpret logit coefficients is to exponentiate them • This converts from “log odds” to simple “odds” • Exponentiation = opposite of natural log • On calculator use “ex” or “inverse ln” function • Exponentiated coefficients are called odds ratios • An odds ratio of 3.0 indicates odds are 3 times higher for each unit change in X • Or, you can say the odds increase “by a factor of 3”. • An odds ratio of .5 indicates odds decrease by ½ for each unit change in X. • Odds ratios < 1 indicate negative effects.
  • 36. Predicted Probabilities Ki K i i i X X X L         ... ˆ 2 2 1 1 •To determine predicted probabilities, first compute the predicted Logit value: e e e e i i K j ji j K j ji j L L X B X B Y P            1 1 ˆ ˆ 1 1 ) 1 (   • Then, plug logit values back into P formula:
  • 37. Predicted Probabilities: Own a gun? 0 . 4 ) 7 ( 16 . ) 0 ( 73 . ) 4 ( 24 . ) 20 ( 077 . ) 0 ( 78 . 28 . 2 ˆ          i L •Predicted probability for a female PhD student • Highly educated northern liberal female 017 . 018 . 1 018 . ) 1 ( 1 1 0 . 4 0 . 4 ˆ ˆ           e e e e i i L L Y P ------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455 ------------------------------------------------------------------------------
  • 38. The Logit Curve • Effect of log odds on probability = nonlinear! • From Knoke et al. p. 300
  • 39. Predicted Probabilities •Important point: Substantive effect of a variable on predicted probability differs depending on values of other variables • If probability is already high (or low), variable changes may matter less… • Suppose a 1-point change in X doubles the odds… • Effect isn’t substantively consequential if probability (Y=1) is already very high • Ex: 20:1 odds = .95 probability; 40:1 odds = .975 probability • Change in probability is only .025 • Effect matters a lot for cases with probabilities near .5 • 1:1 odds = .5 probability. 2:1 odds = .67 probability • Change in probability is nearly .2!
  • 40. Logit Example: Own a gun? 16 . 4 ) 7 ( 16 . ) 0 ( 73 . ) 4 ( 24 . ) 22 ( 077 . ) 0 ( 78 . 28 . 2 ˆ          i L •Predicted probability of gun ownership for a female PhD student is very low: P=.017 • Two additional years of education lowers probability from .017 to .015 – not a big effect • Additional unit change can’t have a big effect – because probability can’t go below zero • It would matter much more for a southern male… 0153 . 0156 . 1 0156 . ) 1 ( 1 1 0 . 4 0 . 4 ˆ ˆ           e e e e i i L L Y P
  • 41. Predicted Probabilities •Predicted probabilities are a great way to make findings accessible to a reader • Often people make bar graphs of probabilities • 1. Show predicted probabilities for real cases • Ex: probability of civil war for Ghana vs. Sweden • 2. Show probabilities for “hypothetical” cases that exemplify key contrasts in your data • Ex: Guns: Southern male vs. female PhD student • 3. Show how a change in critical independent variable would affect predicted probability • Ex: Guns: What would happen to southern male who went and got a PhD?
  • 42. Marginal Change in Logit •Issue: How to best capture effect size in non-linear models? • % Change in odds ratios for 1-unit change in X • Change in actual probability for 1-unit change in X • Either for hypothetical cases or an actual case •Another option: marginal change • The actual slope of the curve at a specific point • Again, can be computed for real or hypothetical cases • Recall from calculus: derivatives are slopes... • So, a marginal change is just a derivative.
  • 43. Sensitivity / Specificity of Prediction • Sensitivity: Of gun owners, what proportion were correctly predicted to own a gun? • Specificity: Of non-gun owners, what proportion did we correctly predict? • Choosing a different probability cutoff affects those values • If we reduce the cutoff to P > .4, we’ll catch a higher proportion of gun owners • But, we’ll incorrectly identify more non-gun owners. • And, we’ll have more false positives.
  • 44. Hypothesis tests •Testing hypotheses using logistic regression • H0: There is no effect of year in grad program on coffee drinking • H1: Year in grad school is associated with coffee • Or, one-tail test: Year in school increases probability of coffee • MLE estimation yields standard errors… like OLS • Test statistic: 2 options; both yield same results • t = b/SE… just like OLS regression • Wald test (Chi-square, 1df); essentially the square of t • Reject H0 if Wald or t > critical value • Or if p-value less than alpha (usually .05).
  • 45. Model Fit: Likelihood Ratio Tests • MLE computes a likelihood for the model • “Better” models have higher likelihoods • Log likelihood is typically a negative value, so “better” means a less negative value… -100 > -1000 • Log likelihood ratio test: Allows comparison of any two nested models • One model must be a subset of vars in other model • You can’t compare totally unrelated models! • Models must use the exact same sample.
  • 46. Model Fit: Pseudo R-Square •Pseudo R-square • “A descriptive measure that indicates roughly the proportion of observed variation accounted for by the… predictors.” Knoke et al, p. 313 Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000 Log likelihood = -502.7251 Pseudo R2 = 0.0818 ------------------------------------------------------------------------------ gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112 educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768 income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767 south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757 liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762 ------------------------------------------------------------------------------ Model explains roughly 8% of variation in Y
  • 47. Assumptions & Problems • Assumption: Independent random sample • Serial correlation or clustering violate assumptions; bias SE estimates and hypothesis tests • Multicollinearity: High correlation among independent variables causes problems • Unstable, inefficient estimates • Watch for coefficient instability, check VIF/tolerance • Remove unneeded variables or create indexes of related variables.
  • 48. Assumptions & Problems • Outliers/Influential cases • Unusual/extreme cases can distort results, just like OLS
  • 49. Assumptions & Problems •Insufficient variance: You need cases for both values of the dependent variable • Extremely rare (or common) events can be a problem • Suppose N=1000, but only 3 are coded Y=1 • Estimates won’t be great •Also: Maximum likelihood estimates cannot be computed if any independent variable perfectly predicts the outcome (Y=1) • Ex: Suppose sociology classes drives all students to drink coffee... So there is no variation… • In that case, you cannot include a dummy variable for taking sociology classes in the model.
  • 50. Assumptions & Problems • Model specification / Omitted variable bias • Just like any regression model, it is critical to include appropriate variables in the model • Omission of important factors or ‘controls’ will lead to misleading results.
  • 51. Probit • Probit models are an alternative to logistic regression • Involves a different non-linear transformation • Generally yields results very similar to logit models • Coefficients are rescaled by factor of (approx) 1.6 • For ‘garden variety’ analyses, there is little reason to prefer either logit or probit • But, probit has advantages in some circumstances • Ex: Multinomial models that violate the IIA assumption (to be discussed later).
  • 52. Takeaway • LPM models are easy to interpret and work well if independent variables do not take extreme values • Logit models are backbone of binary choice models and coefficients should be interpreted carefully. 5/24/2022 52