SlideShare a Scribd company logo
1 of 48
Download to read offline
Logistic Regression: Behind the Scenes
Chris White
Capital One
October 9, 2016
Logistic Regression October 9, 2016 1 / 20
Outline Logistic Regression: A quick refresher
Generative Model
yi |β, xi ∼ Bernoulli σ(β, xi )
where
σ(β, x) :=
1
1 + exp (−β · x)
is the sigmoid function.
Logistic Regression October 9, 2016 2 / 20
Outline Logistic Regression: A quick refresher
Interpretation
This setup implies that the log-odds are a linear function of the inputs
log
P[y = 1]
P[y = 0]
= β · x
Logistic Regression October 9, 2016 3 / 20
Outline Logistic Regression: A quick refresher
Interpretation
This setup implies that the log-odds are a linear function of the inputs
log
P[y = 1]
P[y = 0]
= β · x
This linear relationship makes the individual coefficients βk easy to
interpret: a unit increase in xk increases the odds of y = 1 by a factor
of βk (all else held equal)
Logistic Regression October 9, 2016 3 / 20
Outline Logistic Regression: A quick refresher
Interpretation
This setup implies that the log-odds are a linear function of the inputs
log
P[y = 1]
P[y = 0]
= β · x
This linear relationship makes the individual coefficients βk easy to
interpret: a unit increase in xk increases the odds of y = 1 by a factor
of βk (all else held equal)
Question: Given data, how do we determine the ”best” values for β?
Logistic Regression October 9, 2016 3 / 20
Outline Logistic Regression: A quick refresher
Typical Answer
Maximum Likelihood Estimation
β∗
= arg max
β
i
σ (β · xi )yi
(1 − σ (β · xi ))1−yi
Logistic Regression October 9, 2016 4 / 20
Outline Logistic Regression: A quick refresher
Typical Answer
Maximum Likelihood Estimation
β∗
= arg max
β
i
σ (β · xi )yi
(1 − σ (β · xi ))1−yi
Log-likelihood is concave
Logistic Regression October 9, 2016 4 / 20
Outline Logistic Regression: A quick refresher
Typical Answer
Maximum Likelihood Estimation
β∗
= arg max
β
i
σ (β · xi )yi
(1 − σ (β · xi ))1−yi
Log-likelihood is concave
Maximum Likelihood Estimation allows us to do classical statistics
(i.e., we know the asymptotic distribution of the estimator)
Logistic Regression October 9, 2016 4 / 20
Outline Logistic Regression: A quick refresher
Typical Answer
Maximum Likelihood Estimation
β∗
= arg max
β
i
σ (β · xi )yi
(1 − σ (β · xi ))1−yi
Log-likelihood is concave
Maximum Likelihood Estimation allows us to do classical statistics
(i.e., we know the asymptotic distribution of the estimator)
Note: the likelihood is a function of the predicted probabilities and
the observed response
Logistic Regression October 9, 2016 4 / 20
Outline This is not the end of the story
This is not the end of the story
Now we are confronted with an optimization problem...
Logistic Regression October 9, 2016 5 / 20
Outline This is not the end of the story
This is not the end of the story
Now we are confronted with an optimization problem...
The story doesn’t end here!
Uncritically throwing your data into an off-the-shelf solver could result in a
bad model.
Logistic Regression October 9, 2016 5 / 20
Outline This is not the end of the story
Important Decisions
Inference vs. Prediction: your goals should influence your
implementation choices
Logistic Regression October 9, 2016 6 / 20
Outline This is not the end of the story
Important Decisions
Inference vs. Prediction: your goals should influence your
implementation choices
Coefficient accuracy vs. speed vs. predictive accuracy
Multicollinearity leads to both statistical and numerical issues
Logistic Regression October 9, 2016 6 / 20
Outline This is not the end of the story
Important Decisions
Inference vs. Prediction: your goals should influence your
implementation choices
Coefficient accuracy vs. speed vs. predictive accuracy
Multicollinearity leads to both statistical and numerical issues
Desired Input (e.g., missing values?)
Logistic Regression October 9, 2016 6 / 20
Outline This is not the end of the story
Important Decisions
Inference vs. Prediction: your goals should influence your
implementation choices
Coefficient accuracy vs. speed vs. predictive accuracy
Multicollinearity leads to both statistical and numerical issues
Desired Input (e.g., missing values?)
Desired Output (e.g., p-values)
Logistic Regression October 9, 2016 6 / 20
Outline This is not the end of the story
Important Decisions
Inference vs. Prediction: your goals should influence your
implementation choices
Coefficient accuracy vs. speed vs. predictive accuracy
Multicollinearity leads to both statistical and numerical issues
Desired Input (e.g., missing values?)
Desired Output (e.g., p-values)
Handling of edge cases (e.g., quasi-separable data)
Logistic Regression October 9, 2016 6 / 20
Outline This is not the end of the story
Important Decisions
Inference vs. Prediction: your goals should influence your
implementation choices
Coefficient accuracy vs. speed vs. predictive accuracy
Multicollinearity leads to both statistical and numerical issues
Desired Input (e.g., missing values?)
Desired Output (e.g., p-values)
Handling of edge cases (e.g., quasi-separable data)
Regularization and Bayesian Inference
Logistic Regression October 9, 2016 6 / 20
Outline This is not the end of the story
Important Decisions
Inference vs. Prediction: your goals should influence your
implementation choices
Coefficient accuracy vs. speed vs. predictive accuracy
Multicollinearity leads to both statistical and numerical issues
Desired Input (e.g., missing values?)
Desired Output (e.g., p-values)
Handling of edge cases (e.g., quasi-separable data)
Regularization and Bayesian Inference
Goal
We want to explore these questions with an eye on statsmodels,
scikit-learn and SAS’s proc logistic.
Logistic Regression October 9, 2016 6 / 20
Under the Hood How Needs Translate to Implementation Choices
Prediction vs. Inference
Coefficients and Convergence
In the case of inference, coefficient sign and precision are important.
Example: A model includes economic variables, and will be used for
‘stress-testing‘ adverse economic scenarios.
Logistic Regression October 9, 2016 7 / 20
Under the Hood How Needs Translate to Implementation Choices
Prediction vs. Inference
Coefficients and Convergence
In the case of inference, coefficient sign and precision are important.
Example: A model includes economic variables, and will be used for
‘stress-testing‘ adverse economic scenarios.
Logistic Regression October 9, 2016 7 / 20
Under the Hood How Needs Translate to Implementation Choices
Prediction vs. Inference, cont’d
Coefficients and Multicollinearity
Linear models are invariant under affine transformations of the data;
this leads to a determination problem in the presence of ”high”
multicollinearity
Multicollinearity can threaten long term model stability
Can test for multicollinearity with condition number or Variance
Inflation Factors (VIF)
Logistic Regression October 9, 2016 8 / 20
Under the Hood How Needs Translate to Implementation Choices
Default Convergence Criteria
Tool Algorithm Convergence Default Tol
SAS proc logistic IRWLS Relative Gradient 10−8
scikit-learn Coordinate Ascent Log-Likelihood 10−4
statsmodels Logit Newton ∞ 10−8
statsmodels GLM IRWLS Deviance 10−8
Logistic Regression October 9, 2016 9 / 20
Under the Hood How Needs Translate to Implementation Choices
Default Convergence Criteria
Tool Algorithm Convergence Default Tol
SAS proc logistic IRWLS Relative Gradient 10−8
scikit-learn Coordinate Ascent Log-Likelihood 10−4
statsmodels Logit Newton ∞ 10−8
statsmodels GLM IRWLS Deviance 10−8
Note
Convergence criteria based on log-likelihood convergence emphasize
prediction stability.
Logistic Regression October 9, 2016 9 / 20
Under the Hood How Needs Translate to Implementation Choices
Behavior under different implementations
Especially in edge cases (e.g. quasi-separability, high multicollinearity)
slightly different implementations can lead to drastically different outputs.
At github.com/moody-marlin/pydata logistic you can find a notebook
with various implementations of Newton’s method for logistic regression,
with explanations.
Logistic Regression October 9, 2016 10 / 20
Under the Hood Input / Output
Input
Logistic Regression October 9, 2016 11 / 20
Under the Hood Input / Output
Input
Missing value handling
SAS and statsmodels skip incomplete observations in build,
scikit-learn throws an error
Logistic Regression October 9, 2016 11 / 20
Under the Hood Input / Output
Input
Missing value handling
SAS and statsmodels skip incomplete observations in build,
scikit-learn throws an error
Categorical input variables
Packages which require numerical arrays (scikit-learn, ‘base‘
statsmodels) require the modeler to dummify
SAS and statsmodels + formulas can handle them by identifier
Logistic Regression October 9, 2016 11 / 20
Under the Hood Input / Output
Input
Missing value handling
SAS and statsmodels skip incomplete observations in build,
scikit-learn throws an error
Categorical input variables
Packages which require numerical arrays (scikit-learn, ‘base‘
statsmodels) require the modeler to dummify
SAS and statsmodels + formulas can handle them by identifier
Scoring conventions
scikit-learn and ‘base’ statsmodels will score by location
SAS and statsmodels + formula can score by name
Logistic Regression October 9, 2016 11 / 20
Under the Hood Input / Output
Output
Do you have access to necessary output?
Logistic Regression October 9, 2016 12 / 20
Under the Hood Input / Output
Output
Do you have access to necessary output?
Fit metrics (AIC, BIC, Adj-R2, etc.)
Convergence flags
p-values
Logistic Regression October 9, 2016 12 / 20
Under the Hood Input / Output
Output
Do you have access to necessary output?
Fit metrics (AIC, BIC, Adj-R2, etc.)
Convergence flags
p-values
...and if not, can you compute it? For example, sklearn always uses
regularization, computing p-values yourself will be incorrect!
Logistic Regression October 9, 2016 12 / 20
Under the Hood Input / Output
Rant on p-values
Imagine a misspecified ordinary least squares model in which the true
specification takes the form y ∼ N(f (x), σ2) for some non-linear f (x).
Logistic Regression October 9, 2016 13 / 20
Under the Hood Input / Output
Rant on p-values
Imagine a misspecified ordinary least squares model in which the true
specification takes the form y ∼ N(f (x), σ2) for some non-linear f (x).
The MLE is given by
β∗
= xT
y x −2
2
and converges to its expectation in the limit n → ∞, i.e.,
β∗
→ Ex xT
f (x) x −2
2
Logistic Regression October 9, 2016 13 / 20
Under the Hood Input / Output
Rant on p-values
Imagine a misspecified ordinary least squares model in which the true
specification takes the form y ∼ N(f (x), σ2) for some non-linear f (x).
The MLE is given by
β∗
= xT
y x −2
2
and converges to its expectation in the limit n → ∞, i.e.,
β∗
→ Ex xT
f (x) x −2
2
Conclusion
Note that limn→∞ β∗ > 0 if and only if Ex xT f (x) > 0.
Logistic Regression October 9, 2016 13 / 20
Under the Hood Input / Output
Rant on p-values
Consequently, in most situations the p-value for x will converge to 0!
Logistic Regression October 9, 2016 14 / 20
Under the Hood Input / Output
Rant on p-values
Consequently, in most situations the p-value for x will converge to 0!
Takeaway
p-values and Big Data are not friends!
Logistic Regression October 9, 2016 14 / 20
Regularization
Regularization
Regularization is the process of ”penalizing” candidate coefficients for
undesirable complexity, i.e.,
β∗
= arg max
β
L (x, y, β) − τg(β)
where g(β) is some application-dependent measure of ”complexity”, and
τ > 0.
Logistic Regression October 9, 2016 15 / 20
Regularization
Regularization, cont’d
Popular examples of regularizations include:
g(β) = β 1
Logistic Regression October 9, 2016 16 / 20
Regularization
Regularization, cont’d
Popular examples of regularizations include:
g(β) = β 1
Arises as the convex relaxation of best subset problem
Results in sparse coefficients
Logistic Regression October 9, 2016 16 / 20
Regularization
Regularization, cont’d
Popular examples of regularizations include:
g(β) = β 1
Arises as the convex relaxation of best subset problem
Results in sparse coefficients
g(β) = β 2
2
Logistic Regression October 9, 2016 16 / 20
Regularization
Regularization, cont’d
Popular examples of regularizations include:
g(β) = β 1
Arises as the convex relaxation of best subset problem
Results in sparse coefficients
g(β) = β 2
2
Results in ”ridged” coefficients
Historically used for inversion of poorly conditioned matrices
Logistic Regression October 9, 2016 16 / 20
Regularization
Regularization, cont’d
Popular examples of regularizations include:
g(β) = β 1
Arises as the convex relaxation of best subset problem
Results in sparse coefficients
g(β) = β 2
2
Results in ”ridged” coefficients
Historically used for inversion of poorly conditioned matrices
Be aware
Regularization means your estimated coefficients are no longer maximum
likelihood estimators, so be careful in your interpretation and inference!
Logistic Regression October 9, 2016 16 / 20
Regularization
Caution!
Many optimization options subtly result in regularization!
the ‘FIRTH’ option in SAS (and ‘logistf’ in R) arise from setting
g(β) = log (det I(β)) and τ = 1
2, where I(β) is the Fisher
information matrix
Logistic Regression October 9, 2016 17 / 20
Regularization
Caution!
Many optimization options subtly result in regularization!
the ‘FIRTH’ option in SAS (and ‘logistf’ in R) arise from setting
g(β) = log (det I(β)) and τ = 1
2, where I(β) is the Fisher
information matrix
providing a ”ridge factor” value in statsmodels is equivalent to
providing τ−1 with g(β) = β 2
2
Logistic Regression October 9, 2016 17 / 20
Regularization
Caution!
Many optimization options subtly result in regularization!
the ‘FIRTH’ option in SAS (and ‘logistf’ in R) arise from setting
g(β) = log (det I(β)) and τ = 1
2, where I(β) is the Fisher
information matrix
providing a ”ridge factor” value in statsmodels is equivalent to
providing τ−1 with g(β) = β 2
2
sklearn always regularizes
Logistic Regression October 9, 2016 17 / 20
Regularization
Statistical Implications: Bayesian Interpretation
Any regularization function for which
Rk
exp (−g(β)) dβ < ∞
results in a penalized log-likelihood with a Bayesian interpretation.
The regularized coefficients are maximum a posteriori (MAP) estimators
for a Bayesian model with prior distribution given by
exp (−g(β))
.
Logistic Regression October 9, 2016 18 / 20
Conclusion
Conclusion
Takeaway
Even if you don’t run into any edge cases, understanding what’s happening
under the hood gives you access to much more information that you can
leverage to build better, more informative models!
Logistic Regression October 9, 2016 19 / 20
THE END!
The End
Thanks for your time and attention! Questions?
Logistic Regression October 9, 2016 20 / 20

More Related Content

Viewers also liked

Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS
Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS
Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS MIPIMWorld
 
Haal meer IBM SPSS Statistics 11.11.14 Voorspellen aan de hand van logistis...
Haal meer IBM SPSS Statistics 11.11.14   Voorspellen aan de hand van logistis...Haal meer IBM SPSS Statistics 11.11.14   Voorspellen aan de hand van logistis...
Haal meer IBM SPSS Statistics 11.11.14 Voorspellen aan de hand van logistis...Daniel Westzaan
 
04. logistic regression ( 로지스틱 회귀 )
04. logistic regression ( 로지스틱 회귀 )04. logistic regression ( 로지스틱 회귀 )
04. logistic regression ( 로지스틱 회귀 )Jeonghun Yoon
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataJay (Jianqiang) Wang
 
VSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionVSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionBigML, Inc
 
Logistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentationLogistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentationMichael Hankin
 
Transparency7
Transparency7Transparency7
Transparency7A M
 
1.5.1 measures basic concepts
1.5.1 measures basic concepts1.5.1 measures basic concepts
1.5.1 measures basic conceptsA M
 
Mini thesis presentation
Mini thesis presentationMini thesis presentation
Mini thesis presentationYou-Wen Liang
 
(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regression(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regressionmothersafe
 
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...Alexander Efremov
 

Viewers also liked (16)

Logistic regression teaching
Logistic regression teachingLogistic regression teaching
Logistic regression teaching
 
Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS
Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS
Urban Logistics: Next challenge for cities- Christophe Ripert, SOGARIS
 
Haal meer IBM SPSS Statistics 11.11.14 Voorspellen aan de hand van logistis...
Haal meer IBM SPSS Statistics 11.11.14   Voorspellen aan de hand van logistis...Haal meer IBM SPSS Statistics 11.11.14   Voorspellen aan de hand van logistis...
Haal meer IBM SPSS Statistics 11.11.14 Voorspellen aan de hand van logistis...
 
Urban Logistics
Urban LogisticsUrban Logistics
Urban Logistics
 
Hotel Industery
Hotel IndusteryHotel Industery
Hotel Industery
 
04. logistic regression ( 로지스틱 회귀 )
04. logistic regression ( 로지스틱 회귀 )04. logistic regression ( 로지스틱 회귀 )
04. logistic regression ( 로지스틱 회귀 )
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
 
VSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionVSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic Regression
 
Logistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentationLogistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentation
 
Ordinal Logistic Regression
Ordinal Logistic RegressionOrdinal Logistic Regression
Ordinal Logistic Regression
 
Transparency7
Transparency7Transparency7
Transparency7
 
Hospitality - Food and Beverage
Hospitality - Food and BeverageHospitality - Food and Beverage
Hospitality - Food and Beverage
 
1.5.1 measures basic concepts
1.5.1 measures basic concepts1.5.1 measures basic concepts
1.5.1 measures basic concepts
 
Mini thesis presentation
Mini thesis presentationMini thesis presentation
Mini thesis presentation
 
(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regression(마더세이프 라운드) Logistic regression
(마더세이프 라운드) Logistic regression
 
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
 

Similar to Logistic Regression: Behind the Scenes

Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...AMIDST Toolbox
 
How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?contenidos-ort
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML, Inc
 
Statistical Clustering and Portfolio Management
Statistical Clustering and Portfolio ManagementStatistical Clustering and Portfolio Management
Statistical Clustering and Portfolio ManagementJuan Andrés Serur
 
Terry Johns: Uncertainty - understanding the impact and the importance of rec...
Terry Johns: Uncertainty - understanding the impact and the importance of rec...Terry Johns: Uncertainty - understanding the impact and the importance of rec...
Terry Johns: Uncertainty - understanding the impact and the importance of rec...Association for Project Management
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RaySpark Summit
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Time series analysis- Part 2
Time series analysis- Part 2Time series analysis- Part 2
Time series analysis- Part 2QuantUniversity
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesAndrea Gigli
 
Ausenco Logistics Supply Chain Modelling
Ausenco Logistics Supply Chain ModellingAusenco Logistics Supply Chain Modelling
Ausenco Logistics Supply Chain ModellingJoel Shirriff
 
Chapter 4 implementation issues of va r
Chapter 4   implementation issues of va rChapter 4   implementation issues of va r
Chapter 4 implementation issues of va rQuan Risk
 
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...Ashwin Rao
 
Six Sigma Process Capability Study (PCS) Training Module
Six Sigma Process Capability Study (PCS) Training Module Six Sigma Process Capability Study (PCS) Training Module
Six Sigma Process Capability Study (PCS) Training Module Frank-G. Adler
 

Similar to Logistic Regression: Behind the Scenes (20)

Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
 
How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?How Reliable is Duality Theory in Empirical Work?
How Reliable is Duality Theory in Empirical Work?
 
Time series Forecasting
Time series ForecastingTime series Forecasting
Time series Forecasting
 
PacMan Slides
PacMan SlidesPacMan Slides
PacMan Slides
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 Release
 
Time_Series_Assignment
Time_Series_AssignmentTime_Series_Assignment
Time_Series_Assignment
 
Statistical Clustering and Portfolio Management
Statistical Clustering and Portfolio ManagementStatistical Clustering and Portfolio Management
Statistical Clustering and Portfolio Management
 
Optimization
OptimizationOptimization
Optimization
 
Terry Johns: Uncertainty - understanding the impact and the importance of rec...
Terry Johns: Uncertainty - understanding the impact and the importance of rec...Terry Johns: Uncertainty - understanding the impact and the importance of rec...
Terry Johns: Uncertainty - understanding the impact and the importance of rec...
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Time series analysis- Part 2
Time series analysis- Part 2Time series analysis- Part 2
Time series analysis- Part 2
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial Services
 
Ausenco Logistics Supply Chain Modelling
Ausenco Logistics Supply Chain ModellingAusenco Logistics Supply Chain Modelling
Ausenco Logistics Supply Chain Modelling
 
Time series project
Time series projectTime series project
Time series project
 
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
 
UseR 2017
UseR 2017UseR 2017
UseR 2017
 
Chapter 4 implementation issues of va r
Chapter 4   implementation issues of va rChapter 4   implementation issues of va r
Chapter 4 implementation issues of va r
 
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...
 
Six Sigma Process Capability Study (PCS) Training Module
Six Sigma Process Capability Study (PCS) Training Module Six Sigma Process Capability Study (PCS) Training Module
Six Sigma Process Capability Study (PCS) Training Module
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

Logistic Regression: Behind the Scenes

  • 1. Logistic Regression: Behind the Scenes Chris White Capital One October 9, 2016 Logistic Regression October 9, 2016 1 / 20
  • 2. Outline Logistic Regression: A quick refresher Generative Model yi |β, xi ∼ Bernoulli σ(β, xi ) where σ(β, x) := 1 1 + exp (−β · x) is the sigmoid function. Logistic Regression October 9, 2016 2 / 20
  • 3. Outline Logistic Regression: A quick refresher Interpretation This setup implies that the log-odds are a linear function of the inputs log P[y = 1] P[y = 0] = β · x Logistic Regression October 9, 2016 3 / 20
  • 4. Outline Logistic Regression: A quick refresher Interpretation This setup implies that the log-odds are a linear function of the inputs log P[y = 1] P[y = 0] = β · x This linear relationship makes the individual coefficients βk easy to interpret: a unit increase in xk increases the odds of y = 1 by a factor of βk (all else held equal) Logistic Regression October 9, 2016 3 / 20
  • 5. Outline Logistic Regression: A quick refresher Interpretation This setup implies that the log-odds are a linear function of the inputs log P[y = 1] P[y = 0] = β · x This linear relationship makes the individual coefficients βk easy to interpret: a unit increase in xk increases the odds of y = 1 by a factor of βk (all else held equal) Question: Given data, how do we determine the ”best” values for β? Logistic Regression October 9, 2016 3 / 20
  • 6. Outline Logistic Regression: A quick refresher Typical Answer Maximum Likelihood Estimation β∗ = arg max β i σ (β · xi )yi (1 − σ (β · xi ))1−yi Logistic Regression October 9, 2016 4 / 20
  • 7. Outline Logistic Regression: A quick refresher Typical Answer Maximum Likelihood Estimation β∗ = arg max β i σ (β · xi )yi (1 − σ (β · xi ))1−yi Log-likelihood is concave Logistic Regression October 9, 2016 4 / 20
  • 8. Outline Logistic Regression: A quick refresher Typical Answer Maximum Likelihood Estimation β∗ = arg max β i σ (β · xi )yi (1 − σ (β · xi ))1−yi Log-likelihood is concave Maximum Likelihood Estimation allows us to do classical statistics (i.e., we know the asymptotic distribution of the estimator) Logistic Regression October 9, 2016 4 / 20
  • 9. Outline Logistic Regression: A quick refresher Typical Answer Maximum Likelihood Estimation β∗ = arg max β i σ (β · xi )yi (1 − σ (β · xi ))1−yi Log-likelihood is concave Maximum Likelihood Estimation allows us to do classical statistics (i.e., we know the asymptotic distribution of the estimator) Note: the likelihood is a function of the predicted probabilities and the observed response Logistic Regression October 9, 2016 4 / 20
  • 10. Outline This is not the end of the story This is not the end of the story Now we are confronted with an optimization problem... Logistic Regression October 9, 2016 5 / 20
  • 11. Outline This is not the end of the story This is not the end of the story Now we are confronted with an optimization problem... The story doesn’t end here! Uncritically throwing your data into an off-the-shelf solver could result in a bad model. Logistic Regression October 9, 2016 5 / 20
  • 12. Outline This is not the end of the story Important Decisions Inference vs. Prediction: your goals should influence your implementation choices Logistic Regression October 9, 2016 6 / 20
  • 13. Outline This is not the end of the story Important Decisions Inference vs. Prediction: your goals should influence your implementation choices Coefficient accuracy vs. speed vs. predictive accuracy Multicollinearity leads to both statistical and numerical issues Logistic Regression October 9, 2016 6 / 20
  • 14. Outline This is not the end of the story Important Decisions Inference vs. Prediction: your goals should influence your implementation choices Coefficient accuracy vs. speed vs. predictive accuracy Multicollinearity leads to both statistical and numerical issues Desired Input (e.g., missing values?) Logistic Regression October 9, 2016 6 / 20
  • 15. Outline This is not the end of the story Important Decisions Inference vs. Prediction: your goals should influence your implementation choices Coefficient accuracy vs. speed vs. predictive accuracy Multicollinearity leads to both statistical and numerical issues Desired Input (e.g., missing values?) Desired Output (e.g., p-values) Logistic Regression October 9, 2016 6 / 20
  • 16. Outline This is not the end of the story Important Decisions Inference vs. Prediction: your goals should influence your implementation choices Coefficient accuracy vs. speed vs. predictive accuracy Multicollinearity leads to both statistical and numerical issues Desired Input (e.g., missing values?) Desired Output (e.g., p-values) Handling of edge cases (e.g., quasi-separable data) Logistic Regression October 9, 2016 6 / 20
  • 17. Outline This is not the end of the story Important Decisions Inference vs. Prediction: your goals should influence your implementation choices Coefficient accuracy vs. speed vs. predictive accuracy Multicollinearity leads to both statistical and numerical issues Desired Input (e.g., missing values?) Desired Output (e.g., p-values) Handling of edge cases (e.g., quasi-separable data) Regularization and Bayesian Inference Logistic Regression October 9, 2016 6 / 20
  • 18. Outline This is not the end of the story Important Decisions Inference vs. Prediction: your goals should influence your implementation choices Coefficient accuracy vs. speed vs. predictive accuracy Multicollinearity leads to both statistical and numerical issues Desired Input (e.g., missing values?) Desired Output (e.g., p-values) Handling of edge cases (e.g., quasi-separable data) Regularization and Bayesian Inference Goal We want to explore these questions with an eye on statsmodels, scikit-learn and SAS’s proc logistic. Logistic Regression October 9, 2016 6 / 20
  • 19. Under the Hood How Needs Translate to Implementation Choices Prediction vs. Inference Coefficients and Convergence In the case of inference, coefficient sign and precision are important. Example: A model includes economic variables, and will be used for ‘stress-testing‘ adverse economic scenarios. Logistic Regression October 9, 2016 7 / 20
  • 20. Under the Hood How Needs Translate to Implementation Choices Prediction vs. Inference Coefficients and Convergence In the case of inference, coefficient sign and precision are important. Example: A model includes economic variables, and will be used for ‘stress-testing‘ adverse economic scenarios. Logistic Regression October 9, 2016 7 / 20
  • 21. Under the Hood How Needs Translate to Implementation Choices Prediction vs. Inference, cont’d Coefficients and Multicollinearity Linear models are invariant under affine transformations of the data; this leads to a determination problem in the presence of ”high” multicollinearity Multicollinearity can threaten long term model stability Can test for multicollinearity with condition number or Variance Inflation Factors (VIF) Logistic Regression October 9, 2016 8 / 20
  • 22. Under the Hood How Needs Translate to Implementation Choices Default Convergence Criteria Tool Algorithm Convergence Default Tol SAS proc logistic IRWLS Relative Gradient 10−8 scikit-learn Coordinate Ascent Log-Likelihood 10−4 statsmodels Logit Newton ∞ 10−8 statsmodels GLM IRWLS Deviance 10−8 Logistic Regression October 9, 2016 9 / 20
  • 23. Under the Hood How Needs Translate to Implementation Choices Default Convergence Criteria Tool Algorithm Convergence Default Tol SAS proc logistic IRWLS Relative Gradient 10−8 scikit-learn Coordinate Ascent Log-Likelihood 10−4 statsmodels Logit Newton ∞ 10−8 statsmodels GLM IRWLS Deviance 10−8 Note Convergence criteria based on log-likelihood convergence emphasize prediction stability. Logistic Regression October 9, 2016 9 / 20
  • 24. Under the Hood How Needs Translate to Implementation Choices Behavior under different implementations Especially in edge cases (e.g. quasi-separability, high multicollinearity) slightly different implementations can lead to drastically different outputs. At github.com/moody-marlin/pydata logistic you can find a notebook with various implementations of Newton’s method for logistic regression, with explanations. Logistic Regression October 9, 2016 10 / 20
  • 25. Under the Hood Input / Output Input Logistic Regression October 9, 2016 11 / 20
  • 26. Under the Hood Input / Output Input Missing value handling SAS and statsmodels skip incomplete observations in build, scikit-learn throws an error Logistic Regression October 9, 2016 11 / 20
  • 27. Under the Hood Input / Output Input Missing value handling SAS and statsmodels skip incomplete observations in build, scikit-learn throws an error Categorical input variables Packages which require numerical arrays (scikit-learn, ‘base‘ statsmodels) require the modeler to dummify SAS and statsmodels + formulas can handle them by identifier Logistic Regression October 9, 2016 11 / 20
  • 28. Under the Hood Input / Output Input Missing value handling SAS and statsmodels skip incomplete observations in build, scikit-learn throws an error Categorical input variables Packages which require numerical arrays (scikit-learn, ‘base‘ statsmodels) require the modeler to dummify SAS and statsmodels + formulas can handle them by identifier Scoring conventions scikit-learn and ‘base’ statsmodels will score by location SAS and statsmodels + formula can score by name Logistic Regression October 9, 2016 11 / 20
  • 29. Under the Hood Input / Output Output Do you have access to necessary output? Logistic Regression October 9, 2016 12 / 20
  • 30. Under the Hood Input / Output Output Do you have access to necessary output? Fit metrics (AIC, BIC, Adj-R2, etc.) Convergence flags p-values Logistic Regression October 9, 2016 12 / 20
  • 31. Under the Hood Input / Output Output Do you have access to necessary output? Fit metrics (AIC, BIC, Adj-R2, etc.) Convergence flags p-values ...and if not, can you compute it? For example, sklearn always uses regularization, computing p-values yourself will be incorrect! Logistic Regression October 9, 2016 12 / 20
  • 32. Under the Hood Input / Output Rant on p-values Imagine a misspecified ordinary least squares model in which the true specification takes the form y ∼ N(f (x), σ2) for some non-linear f (x). Logistic Regression October 9, 2016 13 / 20
  • 33. Under the Hood Input / Output Rant on p-values Imagine a misspecified ordinary least squares model in which the true specification takes the form y ∼ N(f (x), σ2) for some non-linear f (x). The MLE is given by β∗ = xT y x −2 2 and converges to its expectation in the limit n → ∞, i.e., β∗ → Ex xT f (x) x −2 2 Logistic Regression October 9, 2016 13 / 20
  • 34. Under the Hood Input / Output Rant on p-values Imagine a misspecified ordinary least squares model in which the true specification takes the form y ∼ N(f (x), σ2) for some non-linear f (x). The MLE is given by β∗ = xT y x −2 2 and converges to its expectation in the limit n → ∞, i.e., β∗ → Ex xT f (x) x −2 2 Conclusion Note that limn→∞ β∗ > 0 if and only if Ex xT f (x) > 0. Logistic Regression October 9, 2016 13 / 20
  • 35. Under the Hood Input / Output Rant on p-values Consequently, in most situations the p-value for x will converge to 0! Logistic Regression October 9, 2016 14 / 20
  • 36. Under the Hood Input / Output Rant on p-values Consequently, in most situations the p-value for x will converge to 0! Takeaway p-values and Big Data are not friends! Logistic Regression October 9, 2016 14 / 20
  • 37. Regularization Regularization Regularization is the process of ”penalizing” candidate coefficients for undesirable complexity, i.e., β∗ = arg max β L (x, y, β) − τg(β) where g(β) is some application-dependent measure of ”complexity”, and τ > 0. Logistic Regression October 9, 2016 15 / 20
  • 38. Regularization Regularization, cont’d Popular examples of regularizations include: g(β) = β 1 Logistic Regression October 9, 2016 16 / 20
  • 39. Regularization Regularization, cont’d Popular examples of regularizations include: g(β) = β 1 Arises as the convex relaxation of best subset problem Results in sparse coefficients Logistic Regression October 9, 2016 16 / 20
  • 40. Regularization Regularization, cont’d Popular examples of regularizations include: g(β) = β 1 Arises as the convex relaxation of best subset problem Results in sparse coefficients g(β) = β 2 2 Logistic Regression October 9, 2016 16 / 20
  • 41. Regularization Regularization, cont’d Popular examples of regularizations include: g(β) = β 1 Arises as the convex relaxation of best subset problem Results in sparse coefficients g(β) = β 2 2 Results in ”ridged” coefficients Historically used for inversion of poorly conditioned matrices Logistic Regression October 9, 2016 16 / 20
  • 42. Regularization Regularization, cont’d Popular examples of regularizations include: g(β) = β 1 Arises as the convex relaxation of best subset problem Results in sparse coefficients g(β) = β 2 2 Results in ”ridged” coefficients Historically used for inversion of poorly conditioned matrices Be aware Regularization means your estimated coefficients are no longer maximum likelihood estimators, so be careful in your interpretation and inference! Logistic Regression October 9, 2016 16 / 20
  • 43. Regularization Caution! Many optimization options subtly result in regularization! the ‘FIRTH’ option in SAS (and ‘logistf’ in R) arise from setting g(β) = log (det I(β)) and τ = 1 2, where I(β) is the Fisher information matrix Logistic Regression October 9, 2016 17 / 20
  • 44. Regularization Caution! Many optimization options subtly result in regularization! the ‘FIRTH’ option in SAS (and ‘logistf’ in R) arise from setting g(β) = log (det I(β)) and τ = 1 2, where I(β) is the Fisher information matrix providing a ”ridge factor” value in statsmodels is equivalent to providing τ−1 with g(β) = β 2 2 Logistic Regression October 9, 2016 17 / 20
  • 45. Regularization Caution! Many optimization options subtly result in regularization! the ‘FIRTH’ option in SAS (and ‘logistf’ in R) arise from setting g(β) = log (det I(β)) and τ = 1 2, where I(β) is the Fisher information matrix providing a ”ridge factor” value in statsmodels is equivalent to providing τ−1 with g(β) = β 2 2 sklearn always regularizes Logistic Regression October 9, 2016 17 / 20
  • 46. Regularization Statistical Implications: Bayesian Interpretation Any regularization function for which Rk exp (−g(β)) dβ < ∞ results in a penalized log-likelihood with a Bayesian interpretation. The regularized coefficients are maximum a posteriori (MAP) estimators for a Bayesian model with prior distribution given by exp (−g(β)) . Logistic Regression October 9, 2016 18 / 20
  • 47. Conclusion Conclusion Takeaway Even if you don’t run into any edge cases, understanding what’s happening under the hood gives you access to much more information that you can leverage to build better, more informative models! Logistic Regression October 9, 2016 19 / 20
  • 48. THE END! The End Thanks for your time and attention! Questions? Logistic Regression October 9, 2016 20 / 20