Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

Linear Probability Models and Big Data:
Prediction, Inference and Selection Bias
Suneel Chatla
Galit Shmueli
Institute of Service Science
National Tsing Hua University
Taiwan

Outline
 Introduction to binary outcome models
 Motivation : Rare use of LPM
 Study goals
o Estimation and inference
o Classification
o Selection bias
 Simulation study
 eBay data – in paper
 Conclusions
2

E[𝑍|𝑥1, . . , 𝑥 𝑝] = 𝑝𝑟𝑜𝑏 (𝑍 = 1|𝑥1, . . , 𝑥 𝑝) ≝ 𝑝
𝐥𝐨𝐠
𝒑
𝟏−𝒑
= 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
𝜱−𝟏 𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
Binary outcome models 𝑍 = {0,1}
Logit
Probit
LPM
OLS Regression: 𝐸 𝑍 𝑥1, … , 𝑥 𝑝 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
Standard normal cdf
0
1

The purpose of binary-outcome regression models?
Inference
and
estimation
Selection
Bias
Prediction
(Classificat
ion)
5

Summary of IS literature (MISQ,JAIS,ISR and MS: 2000~2016)
• Inference and
estimation60
• Selection bias31
• Classification and
prediction5
Only 8 used LPM
3 are from this year alone
6
”Implementing a campaign fixed effects model with
Multinomial logit is challenging due to incidental
parameter problem so we opt to employ LPM …” –
Burtch et al. (2016)
”The LPM is simple for both estimation and inference.
LPM is fast and it allows for a reasonable accurate
approximation of true preferences.” – Schlereth &
Skiera (2016)

7
Statisticians don’t like LPM
Econometricians love LPM
Researchers rarely use LPM
WHY?

Criticisms
Non normal error
Non constant
error variance
Unbounded
predictions
Functional form
Logit
✔
✔
✔
✔✖
Probit
✔
✔
✔
✔✖
LPM
✖
✖
✖
✖
Comparison of three models in terms their
theoretical properties
8

Advantages
Convergence
issues
Incidental
parameters
Easier
interpretation
Computational
speed
Logit
✖
✖
✔
✔✖
Probit
✖
✖
✖
✔✖
LPM
✔
✔
✔
✔
Comparison in terms of practical issues
9

The Questions that Matter to Researchers?
Logit Probit LPM
Inference & Estimation
Classification
Selection Bias
10

Inference
and
estimation
• Consistency
• Marginal effects
11

Latent Framework
𝒀 𝑛×1 = 𝑋 𝑛×(𝑝+1) 𝛽(𝑝+1)×1 + 𝜀 𝑛×1
𝑍 =
1, 𝑖𝑓 𝒀 > 0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Latent continuous
(not observable)
12
Inference
and
estimation
𝑙𝑜𝑔𝑖𝑠(0,1) • Logit
model
𝑁(0,1) • Probit
model
𝑈(0,1)
• Linear
probability
model

 The MLE’s of both logit and probit are consistent.
𝛽
𝑝
𝛽
 LPM estimates are proportionally and directionally consistent
(Billinger, 2012) .
𝛽𝑙𝑝𝑚
𝑝
𝑘𝛽
n
𝑘𝛽
𝛽
𝛽
𝛽𝑙𝑝𝑚
13
Inference
and
estimation

Marginal effects for interpreting effect size
 For LPM
ME for 𝑥𝑖𝑘 =
𝜕𝐸[𝑧 𝑖]
𝜕𝑥 𝑘
= 𝛽 𝑘
 For logit model
𝜕𝐸[𝑧 𝑖]
𝜕𝑥 𝑘
=
𝑒 𝑥 𝑖 𝛽
(1+𝑒 𝑥 𝑖 𝛽)2
𝛽 𝑘
 For probit model
𝜕𝐸[𝑧 𝑖]
𝜕𝑥 𝑘
= ∅(𝑥𝑖 𝛽) 𝛽 𝑘
14
Easy
Interpretation
No direct
Interpretation
Inference
and
estimation

Simulation study
• Sample sizes {50,500,50000}
• Error distribution {Logistic, Normal, Uniform}
• 100 Bootstrap samples
15
Inference
and
estimation

Comparison of Standard Models
16
True Logit Probit LPM
Intercept 0 0 0 0.5
𝑥1 1 0.99 1 0.47
𝑥2 -1 -1 -1.01 -0.43
𝑥3 0.5 0.5 0.5 0.21
𝑥4 -0.5 -0.5 -0.5 -0.21
k=0.4
Inference
and
estimation
1.02
-1.07
0.52
-0.52

Non-
significance
results are
identical
coefficient
significance
results are
identical
Comparison of significance
17
Inference
and
estimation

Comparison of marginal effects
●●●●
●●
●
●
●
●●●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●●●
●
●
●
●● ●●●
●
●
●
Logistic Normal Uniform
−2
−1
0
1
2
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
MarginalEffect(ME
^
)
Model
Probit
Logit
LPM
Sample Size = 50
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●●
−1.0
−0.5
0.0
0.5
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
MarginalEffect(ME
^
)
Model
Probit
Logit
LPM
Sample Size = 500
●
●●
●
●
●
●●
●●●
●●
●●
●●
●●●●
●●
●●
●●●●● ● ● ●
−0.6
−0.3
0.0
0.3
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
MarginalEffect(ME
^
)
Model
Probit
Logit
LPM
Sample Size = 50,000
distributions
of marginal
effects are
identical
18
Inference
and
estimation

Classification
and
prediction
• Predictions
beyond [0,1]
19

Is trimming appropriate?
Replace
with 0.99,
0.999
Replace
with 0.001,
0.0001
LogitPredictions
20
Classification
and
prediction

Classification
21
Classification
accuracies are
identical
Classification
and
prediction

Quasi-experiments
Like randomized experimental designs that test causal hypotheses but lack
random assignment
Treatment Assignment
● Assigned by experimenter
● Self selection
23
Selection
Bias

Selection
BiasTwo-Stage (2SLS) Methods
Stage 1:
Selection
model (T)
Adjustment
Stage 2:
Outcome
model (Y)
𝐸[𝑇|𝑋]
= Φ(𝑋𝛾)
𝐼𝑀𝑅 =
𝜙(𝑋𝛾)
Φ(𝑋𝛾)
𝑌 = 𝑋𝜷
+𝛿 𝐼𝑀𝑅 + 𝜀
(Heckman, 1977)
𝐸[𝑇|𝑋]
= 𝑋𝛾
𝜆 = 𝑋𝛾 − 1
𝑌 = 𝑋𝜷
+𝛿 𝜆 + 𝜀
(Olsen, 1980)
Probit
LPM
24
Selection
Adjustment
Olsen is simpler

Selection Bias
Outcome model coefficients (bootstrap)
Both Heckman
and Olsen’s
methods
perform similar
to the MLE
25
Selection
Bias

Bottom line
Inference and
Estimation
• Use LPM with
large sample;
otherwise
logit/probit is
preferable
• With small-
sample LPM
use robust
standard errors
Classification
• Use LPM if goal
is classification
or ranking
• Trim predicted
probabilities
• If probabilities
are needed,
then logit/probit
is preferable
Selection Bias
• Use LPM if the
sample is large
• If both selection
and outcome
models have
the same
predictors, LPM
suffers from
multicollinearity
26

Thank you!
Suneel Chatla, Galit Shmueli, (2016), An Extensive Examination of
Linear Regression Models with a Binary Outcome Variable, Journal of
the Association for Information Systems (Accepted).
27

Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

Similar to Linear Probability Models and Big Data: Prediction, Inference and Selection Bias (20)

Recently uploaded

Recently uploaded (20)

Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

Editor's Notes