Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Linear Probability Models and Big Data:
Prediction, Inference and Selection Bias
Suneel Chatla
Galit Shmueli
Institute of ...
Outline
 Introduction to binary outcome models
 Motivation : Rare use of LPM
 Study goals
o Estimation and inference
o ...
3
E[𝑍|𝑥1, . . , 𝑥 𝑝] = 𝑝𝑟𝑜𝑏 (𝑍 = 1|𝑥1, . . , 𝑥 𝑝) ≝ 𝑝
𝐥𝐨𝐠
𝒑
𝟏−𝒑
= 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
𝜱−𝟏 𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
...
The purpose of binary-outcome regression models?
Inference
and
estimation
Selection
Bias
Prediction
(Classificat
ion)
5
Summary of IS literature (MISQ,JAIS,ISR and MS: 2000~2016)
• Inference and
estimation60
• Selection bias31
• Classificatio...
7
Statisticians don’t like LPM
Econometricians love LPM
Researchers rarely use LPM
WHY?
Criticisms
Non normal error
Non constant
error variance
Unbounded
predictions
Functional form
Logit
✔
✔
✔
✔✖
Probit
✔
✔
✔
...
Advantages
Convergence
issues
Incidental
parameters
Easier
interpretation
Computational
speed
Logit
✖
✖
✔
✔✖
Probit
✖
✖
✖
...
The Questions that Matter to Researchers?
Logit Probit LPM
Inference & Estimation
Classification
Selection Bias
10
Inference
and
estimation
• Consistency
• Marginal effects
11
Latent Framework
𝒀 𝑛×1 = 𝑋 𝑛×(𝑝+1) 𝛽(𝑝+1)×1 + 𝜀 𝑛×1
𝑍 =
1, 𝑖𝑓 𝒀 > 0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Latent continuous
(not observable)
12
Inf...
 The MLE’s of both logit and probit are consistent.
𝛽
𝑝
𝛽
 LPM estimates are proportionally and directionally consistent...
Marginal effects for interpreting effect size
 For LPM
ME for 𝑥𝑖𝑘 =
𝜕𝐸[𝑧 𝑖]
𝜕𝑥 𝑘
= 𝛽 𝑘
 For logit model
ME for 𝑥𝑖𝑘 =
𝜕𝐸[...
Simulation study
• Sample sizes {50,500,50000}
• Error distribution {Logistic, Normal, Uniform}
• 100 Bootstrap samples
15...
Comparison of Standard Models
16
True Logit Probit LPM
Intercept 0 0 0 0.5
𝑥1 1 0.99 1 0.47
𝑥2 -1 -1 -1.01 -0.43
𝑥3 0.5 0....
Non-
significance
results are
identical
coefficient
significance
results are
identical
Comparison of significance
17
Infer...
Comparison of marginal effects
●●●●
●●
●
●
●
●●●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●●●
●
●
●
●● ●●●
●
●
●
Logistic Normal...
Classification
and
prediction
• Predictions
beyond [0,1]
19
Is trimming appropriate?
Replace
with 0.99,
0.999
Replace
with 0.001,
0.0001
LogitPredictions
20
Classification
and
predic...
Classification
21
Classification
accuracies are
identical
Classification
and
prediction
Selection
Bias
22
Quasi-experiments
Like randomized experimental designs that test causal hypotheses but lack
random assignment
Treatment As...
Selection
BiasTwo-Stage (2SLS) Methods
Stage 1:
Selection
model (T)
Adjustment
Stage 2:
Outcome
model (Y)
𝐸[𝑇|𝑋]
= Φ(𝑋𝛾)
𝐼...
Selection Bias
Outcome model coefficients (bootstrap)
Both Heckman
and Olsen’s
methods
perform similar
to the MLE
25
Selec...
Bottom line
Inference and
Estimation
• Use LPM with
large sample;
otherwise
logit/probit is
preferable
• With small-
sampl...
Thank you!
Suneel Chatla, Galit Shmueli, (2016), An Extensive Examination of
Linear Regression Models with a Binary Outcom...
Upcoming SlideShare
Loading in …5
×

Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

229 views

Published on

We compare the LPM with logit and probit under different study goals: inference, prediction and selection bias

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

  1. 1. Linear Probability Models and Big Data: Prediction, Inference and Selection Bias Suneel Chatla Galit Shmueli Institute of Service Science National Tsing Hua University Taiwan
  2. 2. Outline  Introduction to binary outcome models  Motivation : Rare use of LPM  Study goals o Estimation and inference o Classification o Selection bias  Simulation study  eBay data – in paper  Conclusions 2
  3. 3. 3
  4. 4. E[𝑍|𝑥1, . . , 𝑥 𝑝] = 𝑝𝑟𝑜𝑏 (𝑍 = 1|𝑥1, . . , 𝑥 𝑝) ≝ 𝑝 𝐥𝐨𝐠 𝒑 𝟏−𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 𝜱−𝟏 𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 Binary outcome models 𝑍 = {0,1} Logit Probit LPM OLS Regression: 𝐸 𝑍 𝑥1, … , 𝑥 𝑝 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 Standard normal cdf 0 1
  5. 5. The purpose of binary-outcome regression models? Inference and estimation Selection Bias Prediction (Classificat ion) 5
  6. 6. Summary of IS literature (MISQ,JAIS,ISR and MS: 2000~2016) • Inference and estimation60 • Selection bias31 • Classification and prediction5 Only 8 used LPM 3 are from this year alone 6 ”Implementing a campaign fixed effects model with Multinomial logit is challenging due to incidental parameter problem so we opt to employ LPM …” – Burtch et al. (2016) ”The LPM is simple for both estimation and inference. LPM is fast and it allows for a reasonable accurate approximation of true preferences.” – Schlereth & Skiera (2016)
  7. 7. 7 Statisticians don’t like LPM Econometricians love LPM Researchers rarely use LPM WHY?
  8. 8. Criticisms Non normal error Non constant error variance Unbounded predictions Functional form Logit ✔ ✔ ✔ ✔✖ Probit ✔ ✔ ✔ ✔✖ LPM ✖ ✖ ✖ ✖ Comparison of three models in terms their theoretical properties 8
  9. 9. Advantages Convergence issues Incidental parameters Easier interpretation Computational speed Logit ✖ ✖ ✔ ✔✖ Probit ✖ ✖ ✖ ✔✖ LPM ✔ ✔ ✔ ✔ Comparison in terms of practical issues 9
  10. 10. The Questions that Matter to Researchers? Logit Probit LPM Inference & Estimation Classification Selection Bias 10
  11. 11. Inference and estimation • Consistency • Marginal effects 11
  12. 12. Latent Framework 𝒀 𝑛×1 = 𝑋 𝑛×(𝑝+1) 𝛽(𝑝+1)×1 + 𝜀 𝑛×1 𝑍 = 1, 𝑖𝑓 𝒀 > 0 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Latent continuous (not observable) 12 Inference and estimation 𝑙𝑜𝑔𝑖𝑠(0,1) • Logit model 𝑁(0,1) • Probit model 𝑈(0,1) • Linear probability model
  13. 13.  The MLE’s of both logit and probit are consistent. 𝛽 𝑝 𝛽  LPM estimates are proportionally and directionally consistent (Billinger, 2012) . 𝛽𝑙𝑝𝑚 𝑝 𝑘𝛽 n 𝑘𝛽 𝛽 𝛽 𝛽𝑙𝑝𝑚 13 Inference and estimation
  14. 14. Marginal effects for interpreting effect size  For LPM ME for 𝑥𝑖𝑘 = 𝜕𝐸[𝑧 𝑖] 𝜕𝑥 𝑘 = 𝛽 𝑘  For logit model ME for 𝑥𝑖𝑘 = 𝜕𝐸[𝑧 𝑖] 𝜕𝑥 𝑘 = 𝑒 𝑥 𝑖 𝛽 (1+𝑒 𝑥 𝑖 𝛽)2 𝛽 𝑘  For probit model ME for 𝑥𝑖𝑘 = 𝜕𝐸[𝑧 𝑖] 𝜕𝑥 𝑘 = ∅(𝑥𝑖 𝛽) 𝛽 𝑘 14 Easy Interpretation No direct Interpretation Inference and estimation
  15. 15. Simulation study • Sample sizes {50,500,50000} • Error distribution {Logistic, Normal, Uniform} • 100 Bootstrap samples 15 Inference and estimation
  16. 16. Comparison of Standard Models 16 True Logit Probit LPM Intercept 0 0 0 0.5 𝑥1 1 0.99 1 0.47 𝑥2 -1 -1 -1.01 -0.43 𝑥3 0.5 0.5 0.5 0.21 𝑥4 -0.5 -0.5 -0.5 -0.21 k=0.4 Inference and estimation 1.02 -1.07 0.52 -0.52
  17. 17. Non- significance results are identical coefficient significance results are identical Comparison of significance 17 Inference and estimation
  18. 18. Comparison of marginal effects ●●●● ●● ● ● ● ●●●● ●● ● ● ● ● ● ●●● ● ● ● ● ●●●● ● ● ●●● ● ● ● ●● ●●● ● ● ● Logistic Normal Uniform −2 −1 0 1 2 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 MarginalEffect(ME ^ ) Model Probit Logit LPM Sample Size = 50 ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ●● Logistic Normal Uniform −1.0 −0.5 0.0 0.5 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 MarginalEffect(ME ^ ) Model Probit Logit LPM Sample Size = 500 ● ●● ● ● ● ●● ●●● ●● ●● ●● ●●●● ●● ●● ●●●●● ● ● ● Logistic Normal Uniform −0.6 −0.3 0.0 0.3 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 MarginalEffect(ME ^ ) Model Probit Logit LPM Sample Size = 50,000 distributions of marginal effects are identical 18 Inference and estimation
  19. 19. Classification and prediction • Predictions beyond [0,1] 19
  20. 20. Is trimming appropriate? Replace with 0.99, 0.999 Replace with 0.001, 0.0001 LogitPredictions 20 Classification and prediction
  21. 21. Classification 21 Classification accuracies are identical Classification and prediction
  22. 22. Selection Bias 22
  23. 23. Quasi-experiments Like randomized experimental designs that test causal hypotheses but lack random assignment Treatment Assignment ● Assigned by experimenter ● Self selection 23 Selection Bias
  24. 24. Selection BiasTwo-Stage (2SLS) Methods Stage 1: Selection model (T) Adjustment Stage 2: Outcome model (Y) 𝐸[𝑇|𝑋] = Φ(𝑋𝛾) 𝐼𝑀𝑅 = 𝜙(𝑋𝛾) Φ(𝑋𝛾) 𝑌 = 𝑋𝜷 +𝛿 𝐼𝑀𝑅 + 𝜀 (Heckman, 1977) 𝐸[𝑇|𝑋] = 𝑋𝛾 𝜆 = 𝑋𝛾 − 1 𝑌 = 𝑋𝜷 +𝛿 𝜆 + 𝜀 (Olsen, 1980) Probit LPM 24 Selection Adjustment Olsen is simpler
  25. 25. Selection Bias Outcome model coefficients (bootstrap) Both Heckman and Olsen’s methods perform similar to the MLE 25 Selection Bias
  26. 26. Bottom line Inference and Estimation • Use LPM with large sample; otherwise logit/probit is preferable • With small- sample LPM use robust standard errors Classification • Use LPM if goal is classification or ranking • Trim predicted probabilities • If probabilities are needed, then logit/probit is preferable Selection Bias • Use LPM if the sample is large • If both selection and outcome models have the same predictors, LPM suffers from multicollinearity 26
  27. 27. Thank you! Suneel Chatla, Galit Shmueli, (2016), An Extensive Examination of Linear Regression Models with a Binary Outcome Variable, Journal of the Association for Information Systems (Accepted). 27

×