SlideShare a Scribd company logo
1 of 18
Download to read offline
Linear Probability Models and Big Data
Kosher or Not?
Galit Shmueli & Suneel Chatla
Linear Regression on Y:
Y = b0 +b1 X1+…+bk Xk+ e
e ~𝑖𝑖𝑑 N(0,s2)
Y={0,1}
What is a Linear Probability Model (LPM)?
Used for…
• Explaining: estimating/testing b
• Predicting: class probabilities
Popular in some fields
but not in Information
Systems
Criticism in the Literature e ~𝒊𝒊𝒅 N(0,s2)
Common advice: use logistic/probit model
Why do researchers still use LPM?
Compared to logit/probit:
• Easy coefficient interpretation
• Same statistical significance
• Works under quasi/full-separation
• Cheap computation
Relevant
for
InferenceRelevant
for
Prediction
LPM is rare in IS
Should we use LPM?
Our Approach: Extensive Simulation
Evaluation
Explanatory: Estimate b
Predictive: Predict new records
Big Data
Very large sample
Many variables
Models
Correctly specified
Over specified
Under specified
Simulated Data
Sample sizes: 50, 500, 2M
Signal-to-noise: High, low
Outcome Y: Binary, dichotomized
Yes/No High/Low
Study Design
Covariates:
X ~ U(-0.5,0.5)
e ~ N(0,s2)
Simulation Models:
y = 0.5 + β1x1 + ε
y = 0.5 + ε
y = 0.5 + β1x1 + β2x2 + ε
Signal-to-noise:
High: s=0.01, β1=1, (β2=0.01)
Low: s=0.10, β1=0.10, (β2=0.45)
Outcome Origin:
Binary: yb ~ Bernoulli (y)
Dichotomized: yd = I(y ≥ median(y))
Estimated Models:
y = 0.5 + β1x1 + ε
y = 0.5 + β1x1 + β2x2 + ε
Prediction:
n=500 holdout sample
Logit and Probit models
Binary Y
High Signal-to-noise Low Signal-to-noise
n=50
n=500
n=2M
— True Model
--- LPM y=0.5+b1x1+ε
--- LPM using WLS
Simulated: yb~Bernoulli(0.5+b1x1+e )
Fitted: Correctly-specified model
Goal: Estimate slope (b1)
Binary Y:
With large sample, LPM
is fine for estimation
𝐸 𝛽 𝑏 = 𝛽 +
𝑋′ 𝑋
𝑛
−1
𝑋′ 𝜀
𝑛
𝑛→∞
𝜷
Even with low signal
High Signal-to-noise Low Signal-to-noise
n=50
n=500
n=2M
Y=0 Y=0Y=1 Y=1
Binary Y:
LPM predictive power
same as logit/probit;
depends on signal (not n)
Binary Y
Goal: Predict 500 new records
Logit Probit LPM LPM using WLS
Dichotomized Y
High Signal-to-noise Low Signal-to-noise
n=50
n=500
n=2M
— OLS (numerical Y)
--- LPM (yd)
--- LPM using WLS
Dichotomized Y:
LPM gives biased coefs
𝛽 𝑑 =
1
2𝜋𝜎 𝑦
𝛽
WLS makes it worse
Can correct bias if sy can be estimated
Simulated: y=0.5+b1x1+e , yd=I(y>med)
Fitted: Correctly-specified model
Goal: Estimate slope (b1)
Dichotomized Y
High Signal-to-noise Low Signal-to-noise
n=50
n=500
n=2M
Logit
Probit
LPM
LPM+WLS
Y=0 Y=0Y=1 Y=1
Dichotomized Y:
LPM predictive power
similar to logit/probit;
depends on signal (not n)
LPM+WLS is best
Goal: Predict 500 new records
Dichotomized Y:
• LPM gives biased coefficients
WLS makes it worse
Can correct bias with estimate of sy
• Predictive power similar to logit/probit;
depends on signal (not n)
WLS improves predictive power
Quick Summary: Correctly specified model
Binary Y:
• With large n, LPM is fine for estimation
Even with low signal
• LPM predictive power same as
logit/probit; depends on signal (not n)
Over-specified models
b1 is of interest
Simulated: y = 0.5 + β1x1 + ε
Estimated: y = 0.5 + β1x1 + β2x2 + ε
Simulated: y = 0.5 + ε
Estimated: y = 0.5 + β1x1 + ε
Binary Y:
• b1 coef insignificant
All sample sizes
• Prediction=logit/probit
WLS doesn’t help
Binary Y:
• b1 (and b2) coefs unbiased
For n=2M, identical to OLS
• Prediction=logit/probit
WLS doesn’t help
Dichotomized Y:
• b1 coef insignificant
All sample sizes
• Prediction=logit/probit
WLS improves prediction
Dichotomized Y:
• b1 coef biased
Worse with WLS; can correct bias
• Prediction=logit/probit
WLS improves prediction
Modeling Auction Price
300,000 eBay auctions (Aug 2007- Jan 2008)
Price = f(min_bid, duration, seller_feedback, reserve)
1. Estimation/inference: determinants of price
2. Prediction: holdout sample (n = 5,000)
Dichotomized
Price
Inference/Estimation
Sample so large: all coefficients significant!
Bias due to dichotomization - corrected
Prediction
Removal of outliers gives identical ROC curves
Study Conclusions
• Explanatory modeling with a binary outcome –
large sample needed to reduce bias.
• Explanatory modeling with dichotomous outcome
requires sy to correct bias.
• Predicting a binary outcome (without WLS) or
dichotomous outcome (with WLS) – sample size
irrelevant
• Robust to over- or under-specified models
LPM is rare in IS
Linear Probability Models and Big Data: Kosher or Not?

More Related Content

What's hot

Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learningKazuki Fujikawa
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flowNaresh Kumar
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
Example of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional searchExample of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional searchAbhijeet Agarwal
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yuzukun
 
RBM from Scratch
RBM from Scratch RBM from Scratch
RBM from Scratch Hadi Sinaee
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)Bablu Shofi
 

What's hot (10)

Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flow
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
Tensorflowv5.0
Tensorflowv5.0Tensorflowv5.0
Tensorflowv5.0
 
Example of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional searchExample of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional search
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
RBM from Scratch
RBM from Scratch RBM from Scratch
RBM from Scratch
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
 
Close Graph
Close GraphClose Graph
Close Graph
 

Similar to Linear Probability Models and Big Data: Kosher or Not?

whitehead-logistic-regression.ppt
whitehead-logistic-regression.pptwhitehead-logistic-regression.ppt
whitehead-logistic-regression.ppt19DSMA012HarshSingh
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Zihui Li
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiersKrish_ver2
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Learning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectivesLearning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectivesGiuseppe (Pino) Di Fabbrizio
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communicationsDeepshika Reddy
 
NIPS2007: learning using many examples
NIPS2007: learning using many examplesNIPS2007: learning using many examples
NIPS2007: learning using many exampleszukun
 
FinalProject_Complete_6
FinalProject_Complete_6FinalProject_Complete_6
FinalProject_Complete_6Farhad Gholami
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine LearningPranav Challa
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...
2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...
2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...jensenbo
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 

Similar to Linear Probability Models and Big Data: Kosher or Not? (20)

whitehead-logistic-regression.ppt
whitehead-logistic-regression.pptwhitehead-logistic-regression.ppt
whitehead-logistic-regression.ppt
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiers
 
Regularization
RegularizationRegularization
Regularization
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Learning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectivesLearning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectives
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communications
 
NIPS2007: learning using many examples
NIPS2007: learning using many examplesNIPS2007: learning using many examples
NIPS2007: learning using many examples
 
FinalProject_Complete_6
FinalProject_Complete_6FinalProject_Complete_6
FinalProject_Complete_6
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Chapter 15
Chapter 15Chapter 15
Chapter 15
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...
2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...
2008 : A Case Study: How to Speed Up the Simplex Algorithms on Problems Minim...
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 

More from Galit Shmueli

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modificationGalit Shmueli
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?Galit Shmueli
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchGalit Shmueli
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomGalit Shmueli
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiGalit Shmueli
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information QualityGalit Shmueli
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareGalit Shmueli
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMGalit Shmueli
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageGalit Shmueli
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...Galit Shmueli
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...Galit Shmueli
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Galit Shmueli
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesGalit Shmueli
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
 

More from Galit Shmueli (20)

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare Research
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics Classroom
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information Quality
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should Care
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of Marriage
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 

Linear Probability Models and Big Data: Kosher or Not?

  • 1. Linear Probability Models and Big Data Kosher or Not? Galit Shmueli & Suneel Chatla
  • 2. Linear Regression on Y: Y = b0 +b1 X1+…+bk Xk+ e e ~𝑖𝑖𝑑 N(0,s2) Y={0,1} What is a Linear Probability Model (LPM)? Used for… • Explaining: estimating/testing b • Predicting: class probabilities Popular in some fields but not in Information Systems
  • 3. Criticism in the Literature e ~𝒊𝒊𝒅 N(0,s2) Common advice: use logistic/probit model
  • 4. Why do researchers still use LPM? Compared to logit/probit: • Easy coefficient interpretation • Same statistical significance • Works under quasi/full-separation • Cheap computation Relevant for InferenceRelevant for Prediction LPM is rare in IS
  • 6. Our Approach: Extensive Simulation Evaluation Explanatory: Estimate b Predictive: Predict new records Big Data Very large sample Many variables Models Correctly specified Over specified Under specified Simulated Data Sample sizes: 50, 500, 2M Signal-to-noise: High, low Outcome Y: Binary, dichotomized Yes/No High/Low
  • 7. Study Design Covariates: X ~ U(-0.5,0.5) e ~ N(0,s2) Simulation Models: y = 0.5 + β1x1 + ε y = 0.5 + ε y = 0.5 + β1x1 + β2x2 + ε Signal-to-noise: High: s=0.01, β1=1, (β2=0.01) Low: s=0.10, β1=0.10, (β2=0.45) Outcome Origin: Binary: yb ~ Bernoulli (y) Dichotomized: yd = I(y ≥ median(y)) Estimated Models: y = 0.5 + β1x1 + ε y = 0.5 + β1x1 + β2x2 + ε Prediction: n=500 holdout sample Logit and Probit models
  • 8. Binary Y High Signal-to-noise Low Signal-to-noise n=50 n=500 n=2M — True Model --- LPM y=0.5+b1x1+ε --- LPM using WLS Simulated: yb~Bernoulli(0.5+b1x1+e ) Fitted: Correctly-specified model Goal: Estimate slope (b1) Binary Y: With large sample, LPM is fine for estimation 𝐸 𝛽 𝑏 = 𝛽 + 𝑋′ 𝑋 𝑛 −1 𝑋′ 𝜀 𝑛 𝑛→∞ 𝜷 Even with low signal
  • 9. High Signal-to-noise Low Signal-to-noise n=50 n=500 n=2M Y=0 Y=0Y=1 Y=1 Binary Y: LPM predictive power same as logit/probit; depends on signal (not n) Binary Y Goal: Predict 500 new records Logit Probit LPM LPM using WLS
  • 10. Dichotomized Y High Signal-to-noise Low Signal-to-noise n=50 n=500 n=2M — OLS (numerical Y) --- LPM (yd) --- LPM using WLS Dichotomized Y: LPM gives biased coefs 𝛽 𝑑 = 1 2𝜋𝜎 𝑦 𝛽 WLS makes it worse Can correct bias if sy can be estimated Simulated: y=0.5+b1x1+e , yd=I(y>med) Fitted: Correctly-specified model Goal: Estimate slope (b1)
  • 11. Dichotomized Y High Signal-to-noise Low Signal-to-noise n=50 n=500 n=2M Logit Probit LPM LPM+WLS Y=0 Y=0Y=1 Y=1 Dichotomized Y: LPM predictive power similar to logit/probit; depends on signal (not n) LPM+WLS is best Goal: Predict 500 new records
  • 12. Dichotomized Y: • LPM gives biased coefficients WLS makes it worse Can correct bias with estimate of sy • Predictive power similar to logit/probit; depends on signal (not n) WLS improves predictive power Quick Summary: Correctly specified model Binary Y: • With large n, LPM is fine for estimation Even with low signal • LPM predictive power same as logit/probit; depends on signal (not n)
  • 13. Over-specified models b1 is of interest Simulated: y = 0.5 + β1x1 + ε Estimated: y = 0.5 + β1x1 + β2x2 + ε Simulated: y = 0.5 + ε Estimated: y = 0.5 + β1x1 + ε Binary Y: • b1 coef insignificant All sample sizes • Prediction=logit/probit WLS doesn’t help Binary Y: • b1 (and b2) coefs unbiased For n=2M, identical to OLS • Prediction=logit/probit WLS doesn’t help Dichotomized Y: • b1 coef insignificant All sample sizes • Prediction=logit/probit WLS improves prediction Dichotomized Y: • b1 coef biased Worse with WLS; can correct bias • Prediction=logit/probit WLS improves prediction
  • 14. Modeling Auction Price 300,000 eBay auctions (Aug 2007- Jan 2008) Price = f(min_bid, duration, seller_feedback, reserve) 1. Estimation/inference: determinants of price 2. Prediction: holdout sample (n = 5,000) Dichotomized Price
  • 15. Inference/Estimation Sample so large: all coefficients significant! Bias due to dichotomization - corrected
  • 16. Prediction Removal of outliers gives identical ROC curves
  • 17. Study Conclusions • Explanatory modeling with a binary outcome – large sample needed to reduce bias. • Explanatory modeling with dichotomous outcome requires sy to correct bias. • Predicting a binary outcome (without WLS) or dichotomous outcome (with WLS) – sample size irrelevant • Robust to over- or under-specified models LPM is rare in IS