SlideShare a Scribd company logo
Linear Probability Models and Big Data:
Prediction, Inference and Selection Bias
Suneel Chatla
Galit Shmueli
Institute of Service Science
National Tsing Hua University
Taiwan
Outline
 Introduction to binary outcome models
 Motivation : Rare use of LPM
 Study goals
o Estimation and inference
o Classification
o Selection bias
 Simulation study
 eBay data – in paper
 Conclusions
2
3
E[𝑍|𝑥1, . . , 𝑥 𝑝] = 𝑝𝑟𝑜𝑏 (𝑍 = 1|𝑥1, . . , 𝑥 𝑝) ≝ 𝑝
𝐥𝐨𝐠
𝒑
𝟏−𝒑
= 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
𝜱−𝟏 𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
Binary outcome models 𝑍 = {0,1}
Logit
Probit
LPM
OLS Regression: 𝐸 𝑍 𝑥1, … , 𝑥 𝑝 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝
Standard normal cdf
0
1
The purpose of binary-outcome regression models?
Inference
and
estimation
Selection
Bias
Prediction
(Classificat
ion)
5
Summary of IS literature (MISQ,JAIS,ISR and MS: 2000~2016)
• Inference and
estimation60
• Selection bias31
• Classification and
prediction5
Only 8 used LPM
3 are from this year alone
6
”Implementing a campaign fixed effects model with
Multinomial logit is challenging due to incidental
parameter problem so we opt to employ LPM …” –
Burtch et al. (2016)
”The LPM is simple for both estimation and inference.
LPM is fast and it allows for a reasonable accurate
approximation of true preferences.” – Schlereth &
Skiera (2016)
7
Statisticians don’t like LPM
Econometricians love LPM
Researchers rarely use LPM
WHY?
Criticisms
Non normal error
Non constant
error variance
Unbounded
predictions
Functional form
Logit
✔
✔
✔
✔✖
Probit
✔
✔
✔
✔✖
LPM
✖
✖
✖
✖
Comparison of three models in terms their
theoretical properties
8
Advantages
Convergence
issues
Incidental
parameters
Easier
interpretation
Computational
speed
Logit
✖
✖
✔
✔✖
Probit
✖
✖
✖
✔✖
LPM
✔
✔
✔
✔
Comparison in terms of practical issues
9
The Questions that Matter to Researchers?
Logit Probit LPM
Inference & Estimation
Classification
Selection Bias
10
Inference
and
estimation
• Consistency
• Marginal effects
11
Latent Framework
𝒀 𝑛×1 = 𝑋 𝑛×(𝑝+1) 𝛽(𝑝+1)×1 + 𝜀 𝑛×1
𝑍 =
1, 𝑖𝑓 𝒀 > 0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Latent continuous
(not observable)
12
Inference
and
estimation
𝑙𝑜𝑔𝑖𝑠(0,1) • Logit
model
𝑁(0,1) • Probit
model
𝑈(0,1)
• Linear
probability
model
 The MLE’s of both logit and probit are consistent.
𝛽
𝑝
𝛽
 LPM estimates are proportionally and directionally consistent
(Billinger, 2012) .
𝛽𝑙𝑝𝑚
𝑝
𝑘𝛽
n
𝑘𝛽
𝛽
𝛽
𝛽𝑙𝑝𝑚
13
Inference
and
estimation
Marginal effects for interpreting effect size
 For LPM
ME for 𝑥𝑖𝑘 =
𝜕𝐸[𝑧 𝑖]
𝜕𝑥 𝑘
= 𝛽 𝑘
 For logit model
ME for 𝑥𝑖𝑘 =
𝜕𝐸[𝑧 𝑖]
𝜕𝑥 𝑘
=
𝑒 𝑥 𝑖 𝛽
(1+𝑒 𝑥 𝑖 𝛽)2
𝛽 𝑘
 For probit model
ME for 𝑥𝑖𝑘 =
𝜕𝐸[𝑧 𝑖]
𝜕𝑥 𝑘
= ∅(𝑥𝑖 𝛽) 𝛽 𝑘
14
Easy
Interpretation
No direct
Interpretation
Inference
and
estimation
Simulation study
• Sample sizes {50,500,50000}
• Error distribution {Logistic, Normal, Uniform}
• 100 Bootstrap samples
15
Inference
and
estimation
Comparison of Standard Models
16
True Logit Probit LPM
Intercept 0 0 0 0.5
𝑥1 1 0.99 1 0.47
𝑥2 -1 -1 -1.01 -0.43
𝑥3 0.5 0.5 0.5 0.21
𝑥4 -0.5 -0.5 -0.5 -0.21
k=0.4
Inference
and
estimation
1.02
-1.07
0.52
-0.52
Non-
significance
results are
identical
coefficient
significance
results are
identical
Comparison of significance
17
Inference
and
estimation
Comparison of marginal effects
●●●●
●●
●
●
●
●●●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●●●
●
●
●
●● ●●●
●
●
●
Logistic Normal Uniform
−2
−1
0
1
2
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
MarginalEffect(ME
^
)
Model
Probit
Logit
LPM
Sample Size = 50
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●●
Logistic Normal Uniform
−1.0
−0.5
0.0
0.5
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
MarginalEffect(ME
^
)
Model
Probit
Logit
LPM
Sample Size = 500
●
●●
●
●
●
●●
●●●
●●
●●
●●
●●●●
●●
●●
●●●●● ● ● ●
Logistic Normal Uniform
−0.6
−0.3
0.0
0.3
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
MarginalEffect(ME
^
)
Model
Probit
Logit
LPM
Sample Size = 50,000
distributions
of marginal
effects are
identical
18
Inference
and
estimation
Classification
and
prediction
• Predictions
beyond [0,1]
19
Is trimming appropriate?
Replace
with 0.99,
0.999
Replace
with 0.001,
0.0001
LogitPredictions
20
Classification
and
prediction
Classification
21
Classification
accuracies are
identical
Classification
and
prediction
Selection
Bias
22
Quasi-experiments
Like randomized experimental designs that test causal hypotheses but lack
random assignment
Treatment Assignment
● Assigned by experimenter
● Self selection
23
Selection
Bias
Selection
BiasTwo-Stage (2SLS) Methods
Stage 1:
Selection
model (T)
Adjustment
Stage 2:
Outcome
model (Y)
𝐸[𝑇|𝑋]
= Φ(𝑋𝛾)
𝐼𝑀𝑅 =
𝜙(𝑋𝛾)
Φ(𝑋𝛾)
𝑌 = 𝑋𝜷
+𝛿 𝐼𝑀𝑅 + 𝜀
(Heckman, 1977)
𝐸[𝑇|𝑋]
= 𝑋𝛾
𝜆 = 𝑋𝛾 − 1
𝑌 = 𝑋𝜷
+𝛿 𝜆 + 𝜀
(Olsen, 1980)
Probit
LPM
24
Selection
Adjustment
Olsen is simpler
Selection Bias
Outcome model coefficients (bootstrap)
Both Heckman
and Olsen’s
methods
perform similar
to the MLE
25
Selection
Bias
Bottom line
Inference and
Estimation
• Use LPM with
large sample;
otherwise
logit/probit is
preferable
• With small-
sample LPM
use robust
standard errors
Classification
• Use LPM if goal
is classification
or ranking
• Trim predicted
probabilities
• If probabilities
are needed,
then logit/probit
is preferable
Selection Bias
• Use LPM if the
sample is large
• If both selection
and outcome
models have
the same
predictors, LPM
suffers from
multicollinearity
26
Thank you!
Suneel Chatla, Galit Shmueli, (2016), An Extensive Examination of
Linear Regression Models with a Binary Outcome Variable, Journal of
the Association for Information Systems (Accepted).
27

More Related Content

What's hot

Basic econometrics lectues_1
Basic econometrics lectues_1Basic econometrics lectues_1
Basic econometrics lectues_1
Nivedita Sharma
 
Panel data
Panel dataPanel data
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
Geethu Rangan
 
Introduction to Econometrics
Introduction to EconometricsIntroduction to Econometrics
Introduction to Econometrics
Almaszabeen Badekhan
 
Cobb Douglas Production Function
Cobb Douglas Production FunctionCobb Douglas Production Function
Cobb Douglas Production Function
SwethaShree13
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ingMatt Grant
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentationCarlo Magno
 
Probit model
Probit modelProbit model
Probit model
Sakthivel R
 
General equilibrium theory
General equilibrium theoryGeneral equilibrium theory
General equilibrium theory
kevalkakadiya
 
Basic concepts of_econometrics
Basic concepts of_econometricsBasic concepts of_econometrics
Basic concepts of_econometrics
SwapnaJahan
 
Methodology of Econometrics / Hypothesis Testing
Methodology of Econometrics / Hypothesis Testing  Methodology of Econometrics / Hypothesis Testing
Methodology of Econometrics / Hypothesis Testing
Sakthivel R
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and Consequences
Shilpa Chaudhary
 
Autocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testAutocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d test
Shilpa Chaudhary
 
Logit and Probit and Tobit model: Basic Introduction
Logit and Probit  and Tobit model: Basic IntroductionLogit and Probit  and Tobit model: Basic Introduction
Logit and Probit and Tobit model: Basic Introduction
Rabeesh Verma
 
Theories of consumption II
Theories of consumption   IITheories of consumption   II
Theories of consumption II
Prabha Panth
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
_pem
 
Concept and application of cd and ces production function in resource managem...
Concept and application of cd and ces production function in resource managem...Concept and application of cd and ces production function in resource managem...
Concept and application of cd and ces production function in resource managem...
Nar B Chhetri
 
Permanent income hypothesis
Permanent income hypothesisPermanent income hypothesis
Permanent income hypothesis
punjab university
 

What's hot (20)

Basic econometrics lectues_1
Basic econometrics lectues_1Basic econometrics lectues_1
Basic econometrics lectues_1
 
Panel data
Panel dataPanel data
Panel data
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Introduction to Econometrics
Introduction to EconometricsIntroduction to Econometrics
Introduction to Econometrics
 
Cobb Douglas Production Function
Cobb Douglas Production FunctionCobb Douglas Production Function
Cobb Douglas Production Function
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ing
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Probit model
Probit modelProbit model
Probit model
 
Ols by hiron
Ols by hironOls by hiron
Ols by hiron
 
General equilibrium theory
General equilibrium theoryGeneral equilibrium theory
General equilibrium theory
 
Basic concepts of_econometrics
Basic concepts of_econometricsBasic concepts of_econometrics
Basic concepts of_econometrics
 
Methodology of Econometrics / Hypothesis Testing
Methodology of Econometrics / Hypothesis Testing  Methodology of Econometrics / Hypothesis Testing
Methodology of Econometrics / Hypothesis Testing
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and Consequences
 
Autocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testAutocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d test
 
Game theory
Game theoryGame theory
Game theory
 
Logit and Probit and Tobit model: Basic Introduction
Logit and Probit  and Tobit model: Basic IntroductionLogit and Probit  and Tobit model: Basic Introduction
Logit and Probit and Tobit model: Basic Introduction
 
Theories of consumption II
Theories of consumption   IITheories of consumption   II
Theories of consumption II
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
 
Concept and application of cd and ces production function in resource managem...
Concept and application of cd and ces production function in resource managem...Concept and application of cd and ces production function in resource managem...
Concept and application of cd and ces production function in resource managem...
 
Permanent income hypothesis
Permanent income hypothesisPermanent income hypothesis
Permanent income hypothesis
 

Similar to Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth
 
Numerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminarsNumerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminars
PhD ISG, Sapienza University of Rome
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model training
Jaey Jeong
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
Michael770443
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
GrubhubTech
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
Adam Doyle
 
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Amazon Web Services
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
Anton Kulesh
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
Vahid Taslimitehrani
 
ERF Training Workshop Panel Data 5
ERF Training WorkshopPanel Data 5ERF Training WorkshopPanel Data 5
ERF Training Workshop Panel Data 5
Economic Research Forum
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
harmonylab
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
AI Frontiers
 
Graduate admission Prediction: Comparing Regression and Classification models
Graduate admission Prediction: Comparing Regression and Classification modelsGraduate admission Prediction: Comparing Regression and Classification models
Graduate admission Prediction: Comparing Regression and Classification models
FaizaNoor21
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
Felipe
 
BSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBSSML17 - Logistic Regressions
BSSML17 - Logistic Regressions
BigML, Inc
 
Bottle sum
Bottle sumBottle sum
Bottle sum
MasatoUmakoshi
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
shuchismitjha2
 

Similar to Linear Probability Models and Big Data: Prediction, Inference and Selection Bias (20)

Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Numerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminarsNumerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminars
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model training
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
ERF Training Workshop Panel Data 5
ERF Training WorkshopPanel Data 5ERF Training WorkshopPanel Data 5
ERF Training Workshop Panel Data 5
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Graduate admission Prediction: Comparing Regression and Classification models
Graduate admission Prediction: Comparing Regression and Classification modelsGraduate admission Prediction: Comparing Regression and Classification models
Graduate admission Prediction: Comparing Regression and Classification models
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
BSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBSSML17 - Logistic Regressions
BSSML17 - Logistic Regressions
 
Bottle sum
Bottle sumBottle sum
Bottle sum
 
Simplifying stats
Simplifying  statsSimplifying  stats
Simplifying stats
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

Linear Probability Models and Big Data: Prediction, Inference and Selection Bias

  • 1. Linear Probability Models and Big Data: Prediction, Inference and Selection Bias Suneel Chatla Galit Shmueli Institute of Service Science National Tsing Hua University Taiwan
  • 2. Outline  Introduction to binary outcome models  Motivation : Rare use of LPM  Study goals o Estimation and inference o Classification o Selection bias  Simulation study  eBay data – in paper  Conclusions 2
  • 3. 3
  • 4. E[𝑍|𝑥1, . . , 𝑥 𝑝] = 𝑝𝑟𝑜𝑏 (𝑍 = 1|𝑥1, . . , 𝑥 𝑝) ≝ 𝑝 𝐥𝐨𝐠 𝒑 𝟏−𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 𝜱−𝟏 𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 𝒑 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 Binary outcome models 𝑍 = {0,1} Logit Probit LPM OLS Regression: 𝐸 𝑍 𝑥1, … , 𝑥 𝑝 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑝 𝑥 𝑝 Standard normal cdf 0 1
  • 5. The purpose of binary-outcome regression models? Inference and estimation Selection Bias Prediction (Classificat ion) 5
  • 6. Summary of IS literature (MISQ,JAIS,ISR and MS: 2000~2016) • Inference and estimation60 • Selection bias31 • Classification and prediction5 Only 8 used LPM 3 are from this year alone 6 ”Implementing a campaign fixed effects model with Multinomial logit is challenging due to incidental parameter problem so we opt to employ LPM …” – Burtch et al. (2016) ”The LPM is simple for both estimation and inference. LPM is fast and it allows for a reasonable accurate approximation of true preferences.” – Schlereth & Skiera (2016)
  • 7. 7 Statisticians don’t like LPM Econometricians love LPM Researchers rarely use LPM WHY?
  • 8. Criticisms Non normal error Non constant error variance Unbounded predictions Functional form Logit ✔ ✔ ✔ ✔✖ Probit ✔ ✔ ✔ ✔✖ LPM ✖ ✖ ✖ ✖ Comparison of three models in terms their theoretical properties 8
  • 10. The Questions that Matter to Researchers? Logit Probit LPM Inference & Estimation Classification Selection Bias 10
  • 12. Latent Framework 𝒀 𝑛×1 = 𝑋 𝑛×(𝑝+1) 𝛽(𝑝+1)×1 + 𝜀 𝑛×1 𝑍 = 1, 𝑖𝑓 𝒀 > 0 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Latent continuous (not observable) 12 Inference and estimation 𝑙𝑜𝑔𝑖𝑠(0,1) • Logit model 𝑁(0,1) • Probit model 𝑈(0,1) • Linear probability model
  • 13.  The MLE’s of both logit and probit are consistent. 𝛽 𝑝 𝛽  LPM estimates are proportionally and directionally consistent (Billinger, 2012) . 𝛽𝑙𝑝𝑚 𝑝 𝑘𝛽 n 𝑘𝛽 𝛽 𝛽 𝛽𝑙𝑝𝑚 13 Inference and estimation
  • 14. Marginal effects for interpreting effect size  For LPM ME for 𝑥𝑖𝑘 = 𝜕𝐸[𝑧 𝑖] 𝜕𝑥 𝑘 = 𝛽 𝑘  For logit model ME for 𝑥𝑖𝑘 = 𝜕𝐸[𝑧 𝑖] 𝜕𝑥 𝑘 = 𝑒 𝑥 𝑖 𝛽 (1+𝑒 𝑥 𝑖 𝛽)2 𝛽 𝑘  For probit model ME for 𝑥𝑖𝑘 = 𝜕𝐸[𝑧 𝑖] 𝜕𝑥 𝑘 = ∅(𝑥𝑖 𝛽) 𝛽 𝑘 14 Easy Interpretation No direct Interpretation Inference and estimation
  • 15. Simulation study • Sample sizes {50,500,50000} • Error distribution {Logistic, Normal, Uniform} • 100 Bootstrap samples 15 Inference and estimation
  • 16. Comparison of Standard Models 16 True Logit Probit LPM Intercept 0 0 0 0.5 𝑥1 1 0.99 1 0.47 𝑥2 -1 -1 -1.01 -0.43 𝑥3 0.5 0.5 0.5 0.21 𝑥4 -0.5 -0.5 -0.5 -0.21 k=0.4 Inference and estimation 1.02 -1.07 0.52 -0.52
  • 18. Comparison of marginal effects ●●●● ●● ● ● ● ●●●● ●● ● ● ● ● ● ●●● ● ● ● ● ●●●● ● ● ●●● ● ● ● ●● ●●● ● ● ● Logistic Normal Uniform −2 −1 0 1 2 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 MarginalEffect(ME ^ ) Model Probit Logit LPM Sample Size = 50 ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ●● Logistic Normal Uniform −1.0 −0.5 0.0 0.5 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 MarginalEffect(ME ^ ) Model Probit Logit LPM Sample Size = 500 ● ●● ● ● ● ●● ●●● ●● ●● ●● ●●●● ●● ●● ●●●●● ● ● ● Logistic Normal Uniform −0.6 −0.3 0.0 0.3 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 MarginalEffect(ME ^ ) Model Probit Logit LPM Sample Size = 50,000 distributions of marginal effects are identical 18 Inference and estimation
  • 20. Is trimming appropriate? Replace with 0.99, 0.999 Replace with 0.001, 0.0001 LogitPredictions 20 Classification and prediction
  • 23. Quasi-experiments Like randomized experimental designs that test causal hypotheses but lack random assignment Treatment Assignment ● Assigned by experimenter ● Self selection 23 Selection Bias
  • 24. Selection BiasTwo-Stage (2SLS) Methods Stage 1: Selection model (T) Adjustment Stage 2: Outcome model (Y) 𝐸[𝑇|𝑋] = Φ(𝑋𝛾) 𝐼𝑀𝑅 = 𝜙(𝑋𝛾) Φ(𝑋𝛾) 𝑌 = 𝑋𝜷 +𝛿 𝐼𝑀𝑅 + 𝜀 (Heckman, 1977) 𝐸[𝑇|𝑋] = 𝑋𝛾 𝜆 = 𝑋𝛾 − 1 𝑌 = 𝑋𝜷 +𝛿 𝜆 + 𝜀 (Olsen, 1980) Probit LPM 24 Selection Adjustment Olsen is simpler
  • 25. Selection Bias Outcome model coefficients (bootstrap) Both Heckman and Olsen’s methods perform similar to the MLE 25 Selection Bias
  • 26. Bottom line Inference and Estimation • Use LPM with large sample; otherwise logit/probit is preferable • With small- sample LPM use robust standard errors Classification • Use LPM if goal is classification or ranking • Trim predicted probabilities • If probabilities are needed, then logit/probit is preferable Selection Bias • Use LPM if the sample is large • If both selection and outcome models have the same predictors, LPM suffers from multicollinearity 26
  • 27. Thank you! Suneel Chatla, Galit Shmueli, (2016), An Extensive Examination of Linear Regression Models with a Binary Outcome Variable, Journal of the Association for Information Systems (Accepted). 27

Editor's Notes

  1. b
  2. Here is the outline of my presentation. First, I’m going to provide a brief introduction to the primary binary response models including LPM and talk about the motivation for our study. Then I’ll move on to examine the usage of LPM under different study goals namely estimation and inference, classification and selection bias. Finally, I’d like to discuss about the simulation study and the results. Then I ‘ll conclude with the guidelines about when the usage of LPM is appropriate and when it is not. I will be very happy to answer questions any time during the presentation
  3. It actually tells two things. 1. LPM definitely is not very popular 2. People are still using because probably it has some advantages over the other competetive models
  4. Change Y(nx1) to beta1,… betak notation
  5. Do we really need k? need it if we want to retrieve the original coefficients?
  6. Trim the unbounded prediction with 0.99 or 0.001