SlideShare a Scribd company logo
Comparative Study for Penalized Logistic Regression Methods with Applications
A Thesis Submitted in Partial Fulfilment for the Requirements of master’s degree in Statistics
Hamdy Abd Elaal Badry
Master student, F.G.S.S.R, Cairo University
Supervised by
Prof.Salah Mahdi Mohamed
Department of Applied Statistics and Econometrics
F.G.S.S.R., Cairo University
DR .Hazem Refaat Ahmed
Department of Applied Statistics and Econometrics
F.G.S.S.R., Cairo University
• ‫االنحدار‬ ‫عن‬ ‫نبذة‬
‫و‬ ‫بتعته‬ ‫المعادلة‬ ‫و‬
‫المعالم‬ ‫نقدر‬
• ‫المشكلة‬
‫ال‬ ‫ولذيلك‬
‫بنستخدم‬ penalized logistic regression high
dimensional data
• ‫المعادلة‬ ‫نستعرض‬ ‫طريقة‬ ‫كل‬
• , ‫الداتا‬ ‫نعرض‬ ‫و‬
In the fields of medicine and social science, logistic regression is considered one of the most important methods
used in binary classification problems, where the response variable has two values coded as zero (0) and one (1).
applying logistic regression to high-dimensional data,
where the number of variables p , exceeds the number of sample size n, is one of the major problem and challenge
that researchers face.
the response variable y is a Bernoulli random variable, and the conditional probability that y is equal to 1 given 𝑿 ∈
𝑹𝑷 which is denoted as π( X ) is
exp(𝛽0+ 𝑗=1
1+exp(𝛽0+ 𝑗=1
where j=1,2,……….,p
Binary Logistic Regression Model
Y = Binary response X = Quantitative predictor
π = proportion of 1’s (yes,success) at any X
p =
b0 +b1 X
1+ e
b0 +b1 X
Equivalent forms of the logistic regression model:
What does this look like?
log 
Logit form Probability form
N.B.: This is natural log (aka “ln”)
Real-World Example
Comparison with Other Methods
Challenges and Limitations
Best Practices
Thank You
About the Presenter
Audience Poll
Case Study
Future Directions
• Introduction
• What is Logistic Regression?
• Why Use Penalized Logistic Regression?
• Types of Penalized Logistic Regression
• L1 Regularization
• L2 Regularization
• Elastic Net Regularization
• Advantages of Penalized Logistic Regression
• Disadvantages of Penalized Logistic Regression
• Choosing the Right Penalty
• Cross-Validation
• Implementation
• Applications of Penalized Logistic Regression
• Introduction
• Welcome to this presentation on penalized logistic regression! In today's world, data is
everywhere, and it's growing at an exponential rate. As a result, machine learning has
become a vital tool for making sense of all this data. One problem that arises with
machine learning is overfitting, where the model becomes too complex and fits the
training data too closely, leading to poor performance on new data. Penalized logistic
regression is a technique used to address this issue by adding a penalty term to the loss
function, which reduces the complexity of the model. This presentation will explore what
penalized logistic regression is, why it's important, and how it can be implemented in
machine learning.
• In this presentation, we'll dive into the world of penalized logistic regression and explore
its many benefits. We'll explain the different types of penalties, such as L1 and L2
regularization, and discuss how to choose the right penalty for your specific problem.
We'll also look at real-world examples of penalized logistic regression in action and
compare it to other methods of machine learning. By the end of this presentation, you'll
have a comprehensive understanding of penalized logistic regression and its importance
in machine learning.
What is Logistic Regression?
• Logistic Regression is a statistical method used to analyze a dataset in
which there are one or more independent variables that determine an
outcome. The outcome is measured with a dichotomous variable (in which
there are only two possible outcomes). For example, we might use logistic
regression to model whether a student gets admitted to a university based
on their GPA, test scores, and the rank of the high school they attended.
• In machine learning, logistic regression is used to classify data into discrete
categories, such as determining whether an email is spam or not. It's a
popular algorithm for binary classification problems (problems with two
class values). Logistic regression can also be used for multiclass
classification problems (problems with more than two class values), but it
requires some extensions.
Why Use Penalized Logistic Regression?
• Penalized logistic regression is a powerful tool in machine learning that addresses
the issue of overfitting. Overfitting occurs when a model becomes too complex
and starts to fit the noise in the data rather than the underlying patterns.
Penalized logistic regression helps to prevent this by adding a penalty term to the
cost function, which discourages the model from fitting the noise. This results in a
more generalizable model that performs better on new, unseen data.
• For example, let's say we're trying to predict whether a customer will buy a
product based on their browsing history. If we use traditional logistic regression,
we may end up with a model that fits the noise in the data, such as the fact that
the customer happened to browse a lot of unrelated products before making a
purchase. However, if we use penalized logistic regression, the model will be
more focused on the relevant patterns in the data, such as the customer's
interest in similar products, resulting in a more accurate
Types of Penalized Logistic Regression
• Penalized logistic regression is a powerful tool in machine learning
that allows us to tackle complex problems with ease. There are
several types of penalized logistic regression, each with its own
unique advantages and disadvantages.
• L1 regularization, also known as Lasso regularization, is a type of
penalized logistic regression that adds a penalty term to the loss
function based on the absolute value of the coefficients. This results
in sparse solutions where some of the coefficients are set to zero. L2
regularization, on the other hand, adds a penalty term based on the
square of the coefficients, resulting in smoother solutions where all
coefficients are non-zero. Elastic net regularization combines L1 and
L2 regularization, offering the best of both worlds.
L1 Regularization
• L1 regularization, also known as Lasso regularization, is a technique
used in machine learning to prevent overfitting of models. It works by
adding a penalty term to the loss function that encourages the model
to have sparse coefficients. This means that some of the coefficients
will be set to zero, effectively removing them from the model.
• One example of where L1 regularization can be useful is in feature
selection. In a dataset with many features, L1 regularization can help
identify which features are most important for predicting the target
variable. By setting some coefficients to zero, it effectively removes
those features from consideration, resulting in a simpler and more
interpretable model.
L2 Regularization
• L2 regularization, also known as ridge regression, is a type of
penalized regression that adds a penalty term to the logistic
regression cost function. This penalty term is proportional to the
square of the magnitude of the coefficients, which means that it
shrinks the coefficients towards zero. The amount of shrinkage is
controlled by the regularization parameter lambda.
• The main advantage of L2 regularization is that it helps to prevent
overfitting by reducing the variance of the model. It does this by
discouraging large coefficients, which can lead to overfitting. In
addition, L2 regularization can be used to improve the numerical
stability of the model by reducing the sensitivity of the coefficients to
small changes in the data.
Elastic Net Regularization
• Elastic net regularization is a method that combines L1 and L2
regularization to achieve better performance in machine learning
models. While L1 regularization can lead to sparse solutions and L2
regularization can lead to dense solutions, elastic net regularization
strikes a balance between the two.
• In elastic net regularization, the penalty term is a weighted
combination of the L1 and L2 penalties. The weight of each penalty is
controlled by a hyperparameter alpha. When alpha is set to 0, elastic
net regularization reduces to L2 regularization, and when alpha is set
to 1, it reduces to L1 regularization. By tuning alpha, we can adjust
the trade-off between sparsity and smoothness in the model.
Advantages of Penalized Logistic Regression
• Penalized logistic regression offers several advantages over traditional
logistic regression. One of the main advantages is that it helps to prevent
overfitting. By adding a penalty term to the cost function, penalized logistic
regression encourages the model to select only the most important
features, which can help to improve its generalization performance. This is
particularly useful when dealing with high-dimensional datasets.
• Another advantage of penalized logistic regression is that it can handle
collinear features. When two or more features are highly correlated,
traditional logistic regression can have difficulty determining their
individual contributions to the outcome variable. Penalized logistic
regression, on the other hand, can assign appropriate weights to each
feature, even when they are highly correlated.
Disadvantages of Penalized Logistic
• While penalized logistic regression has many advantages, there are also
some disadvantages to consider. One potential drawback is that it can be
computationally expensive, especially when dealing with large datasets.
This is because the algorithm must perform multiple iterations to find the
optimal penalty parameter values. Additionally, if the dataset contains a
large number of features, it can be difficult to choose the most appropriate
penalty type and value.
• Another disadvantage of penalized logistic regression is that it may not
always improve predictive accuracy compared to traditional logistic
regression. In some cases, the penalty term may cause the model to
underfit the data, resulting in lower accuracy. Finally, penalized logistic
regression assumes that the relationship between the independent
variables and the dependent variable is linear. If this assumption is not
met, the model may not perform well.
Choosing the Right Penalty
• When it comes to choosing the right penalty for penalized logistic
regression, there are a few things to consider. One important factor is the
size of the dataset. In general, larger datasets can handle stronger penalties
without overfitting. Another factor to consider is the nature of the data
itself. If the data is highly correlated, L1 regularization may be more
appropriate, while L2 regularization may be better suited for data with less
• Another important consideration is the goal of the model. If the goal is to
identify a small number of important features, L1 regularization may be the
best choice. On the other hand, if the goal is to predict accurately using all
available features, L2 regularization may be more appropriate. Ultimately,
the choice of penalty will depend on the specific characteristics of the
dataset and the goals of the model.
• Cross-validation is a method used to evaluate the performance of a model on an
independent dataset. In penalized logistic regression, cross-validation is used to
choose the right penalty parameter that balances between overfitting and
underfitting. It involves dividing the dataset into k-folds, training the model on k-1
folds, and evaluating its performance on the remaining fold. This process is
repeated k times, with each fold serving as the validation set once. The average
performance across all folds is used as an estimate of the model's generalization
• For example, suppose we have a dataset of 1000 observations and we want to
use penalized logistic regression to predict whether a customer will buy a product
or not. We can divide the dataset into 5 folds, where each fold contains 200
observations. We can then train the model on 4 folds (800 observations) and
evaluate its performance on the remaining fold (200 observations). This process is
repeated 5 times, with each fold serving as the validation set once. The average
performance across all 5 folds is used to choose the right penalty parameter.
• To implement penalized logistic regression in machine learning, you
first need to choose the appropriate penalty parameter. This can be
done using techniques such as cross-validation or grid search. Once
you have chosen the penalty parameter, you can train your model
using an optimization algorithm such as gradient descent.
• It is important to note that implementing penalized logistic regression
requires some knowledge of programming and machine learning
concepts. However, there are many resources available online that
can help you get started, including tutorials, code examples, and
open-source libraries.
Applications of Penalized Logistic Regression
• Penalized logistic regression has numerous applications in machine
learning, including but not limited to: feature selection, image
classification, and text classification. One example of its use is in the
field of medical research, where it can be used to predict the
likelihood of a patient developing a certain disease based on their
medical history and other factors.
• Another example is in the field of finance, where penalized logistic
regression can be used to predict the likelihood of a borrower
defaulting on a loan. This information can then be used by lenders to
make more informed decisions about lending practices.
Real-World Example
• One example of penalized logistic regression in action is in the healthcare
industry, where it is used to predict patient outcomes based on various
factors such as age, gender, and medical history. In one study, researchers
used penalized logistic regression to predict the likelihood of readmission
for patients with heart failure. By using this method, they were able to
identify high-risk patients and provide them with more intensive care,
ultimately reducing the rate of readmissions.
• Another example is in the field of marketing, where penalized logistic
regression can be used to predict customer behavior and target specific
demographics with personalized advertising. For instance, a company could
use this method to predict which customers are most likely to make a
purchase and then tailor their marketing efforts accordingly. This not only
increases sales but also improves customer satisfaction by providing them
with relevant content.
Comparison with Other Methods
• Penalized logistic regression is a powerful method for machine learning,
but how does it compare to other methods? One comparison can be made
with traditional logistic regression. While traditional logistic regression
assumes that all variables are equally important, penalized logistic
regression allows for variable selection and assigns weights to each
variable based on their importance. This can lead to more accurate
predictions and better model performance.
• Another comparison can be made with support vector machines (SVMs).
While SVMs are also a popular method for classification, they can be
computationally expensive and require more tuning of parameters.
Penalized logistic regression, on the other hand, is relatively simple to
implement and requires minimal parameter tuning. Additionally, penalized
logistic regression can handle both binary and multi-class classification
problems, while SVMs are typically used for binary classification.
Challenges and Limitations
• While penalized logistic regression has many advantages over traditional logistic
regression, it also has some challenges and limitations that must be considered.
One challenge is determining the appropriate penalty parameter. This can be
difficult, as different penalties may result in different models and predictions.
Additionally, the performance of penalized logistic regression can be sensitive to
the choice of penalty parameter, meaning that small changes can have a large
impact on the results.
• Another limitation of penalized logistic regression is that it assumes that the
relationship between the predictors and the outcome is linear. If this assumption
is violated, the model may not perform well. Finally, penalized logistic regression
requires a relatively large sample size compared to other methods, such as
decision trees or random forests. This is because it involves estimating a large
number of parameters, which can lead to overfitting if the sample size is too
Best Practices
• One best practice for using penalized logistic regression in machine
learning is to carefully choose the penalty parameter. This can be done
through techniques such as cross-validation, which involves splitting the
data into training and validation sets and testing different penalty values on
the validation set to find the optimal one. Another best practice is to
normalize the input features before applying penalized logistic regression,
as this can improve performance and reduce the impact of outliers.
• It is also important to consider the balance between L1 and L2
regularization when using elastic net regularization. The ratio between the
two penalties can have a significant impact on the resulting model, so it is
important to experiment with different ratios to find the optimal one for
the given problem.
• In conclusion, penalized logistic regression is a powerful tool in machine
learning that allows for better prediction accuracy and model
interpretability. By adding a penalty term to the cost function, we can
effectively reduce overfitting and select important features in our models.
• We have explored the different types of penalties, such as L1 and L2
regularization, and discussed their advantages and disadvantages. We have
also looked at how to choose the right penalty using cross-validation and
provided best practices for implementing penalized logistic regression in
machine learning.
• It is clear that penalized logistic regression has numerous applications in
various fields, from finance to healthcare. As machine learning continues to
grow and evolve, it is important to stay up-to-date with the latest
techniques and tools. Penalized logistic regression is definitely one of those
tools that should be in every data scientist's toolkit.
• 1. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical
learning: data mining, inference, and prediction. Springer Science &
Business Media.
• 2. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the
elastic net. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 67(2), 301-320.
• 3. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society: Series B (Methodological), 58(1),
• 4. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for
generalized linear models via coordinate descent. Journal of statistical
software, 33(1), 1-22.
• thank you for your attention. Now, it's time to open up the floor for
questions and answers. We encourage you to ask any questions you
may have about penalized logistic regression. Our team of experts is
here to provide clear and concise answers that will deepen your
understanding of this important topic.
• Remember, there are no bad questions. So don't be shy. Ask away and
let's continue our journey into the world of machine learning
Thank You
• Thank you for your attention and interest in penalized logistic
regression. We hope that this presentation has provided you with
valuable insights into the importance and applications of this
powerful machine learning technique.
• If you have any further questions or would like more information,
please do not hesitate to contact us. We are always happy to discuss
our work and share our knowledge with others.
About the Presenter
• Presenter: John Doe
• Qualifications: PhD in Computer Science from Stanford University,
specializing in machine learning and data analytics. Has published
multiple papers in top-tier conferences and journals.
Audience Poll
• Now that we've covered the basics of penalized logistic regression,
let's take a moment to gauge your understanding of the topic with a
quick poll. Don't worry, this isn't a graded quiz!
• Please take out your phones and go to the following website: We'll be asking a few
multiple-choice questions about the material we just covered. Your
answers will help us understand how well we've explained the
concepts, and if there are any areas we need to spend more time on.
Thank you for your participation!
Case Study
• In a recent study, researchers used penalized logistic regression to
predict whether patients with heart disease would experience a
cardiac event within the next year. The study included data from over
10,000 patients and used a variety of clinical variables, such as age,
sex, and medical history, to make predictions.
• The results showed that penalized logistic regression was able to
accurately predict cardiac events with a high degree of accuracy,
outperforming traditional logistic regression models. This study
highlights the importance of using advanced machine learning
techniques, such as penalized logistic regression, in real-world
scenarios where accurate predictions can have life-saving
Future Directions
• As machine learning continues to advance, there are many exciting future
directions for penalized logistic regression. One area of focus is on
developing more efficient algorithms that can handle larger datasets and
more complex models. This will allow researchers to tackle even more
challenging problems and make more accurate predictions.
• Another promising direction is the integration of penalized logistic
regression with other machine learning techniques, such as deep learning.
By combining these approaches, researchers can develop even more
powerful models that can handle a wider range of data types and produce
more accurate predictions. For example, using penalized logistic regression
in conjunction with convolutional neural networks has shown promise in
image classification tasks.

More Related Content

Similar to Penalized Logistic Regression methods .pptx

Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
Iaetsd protecting privacy preserving for cost effective adaptive actions
Iaetsd protecting  privacy preserving for cost effective adaptive actionsIaetsd protecting  privacy preserving for cost effective adaptive actions
Iaetsd protecting privacy preserving for cost effective adaptive actions
Iaetsd Iaetsd
BigML Education - Logistic Regression
BigML Education - Logistic RegressionBigML Education - Logistic Regression
BigML Education - Logistic Regression
BigML, Inc
An Integrated Solver For Optimization Problems
An Integrated Solver For Optimization ProblemsAn Integrated Solver For Optimization Problems
An Integrated Solver For Optimization Problems
Monica Waters
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
Sunny Mervyne Baa
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Dr. Amarjeet Singh
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
Matthew Evans
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Chris Ohk
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptxFeature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
R in Insurance 2014
R in Insurance 2014R in Insurance 2014
R in Insurance 2014
Giorgio Alfredo Spedicato
Operation's research models
Operation's research modelsOperation's research models
Operation's research models
Abhinav Kp

Similar to Penalized Logistic Regression methods .pptx (20)

Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
Iaetsd protecting privacy preserving for cost effective adaptive actions
Iaetsd protecting  privacy preserving for cost effective adaptive actionsIaetsd protecting  privacy preserving for cost effective adaptive actions
Iaetsd protecting privacy preserving for cost effective adaptive actions
BigML Education - Logistic Regression
BigML Education - Logistic RegressionBigML Education - Logistic Regression
BigML Education - Logistic Regression
An Integrated Solver For Optimization Problems
An Integrated Solver For Optimization ProblemsAn Integrated Solver For Optimization Problems
An Integrated Solver For Optimization Problems
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptxFeature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
R in Insurance 2014
R in Insurance 2014R in Insurance 2014
R in Insurance 2014
Operation's research models
Operation's research modelsOperation's research models
Operation's research models

Recently uploaded

Satta Matka dpboss Matka guessing Indian
Satta Matka dpboss Matka guessing IndianSatta Matka dpboss Matka guessing Indian
Satta Matka dpboss Matka guessing Indian
❾❸❹❽❺❾❼❾❾⓿Dpboss Satta Matka guessing fix
Satta Matka Dpboss Matka guessing Indian
Satta Matka Dpboss Matka guessing IndianSatta Matka Dpboss Matka guessing Indian
SSG Boorman Purple Heart found research.pdf
SSG Boorman Purple Heart found research.pdfSSG Boorman Purple Heart found research.pdf
SSG Boorman Purple Heart found research.pdf
The Evolution and Impact of Hip Hop a cultural and artistic
The Evolution and Impact of Hip Hop a cultural and artisticThe Evolution and Impact of Hip Hop a cultural and artistic
The Evolution and Impact of Hip Hop a cultural and artistic
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
Tex Libellus 097 - Tombstone epitaph.pdf
Tex Libellus 097 - Tombstone epitaph.pdfTex Libellus 097 - Tombstone epitaph.pdf
Tex Libellus 097 - Tombstone epitaph.pdf
Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul...
 Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul... Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul...
Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul...
➒➌➍➑➊➑➏➍➋➒ Satta Matka Satta result marka result
Clouded storyboard why does the title need to be so long
Clouded storyboard why does the title need to be so longClouded storyboard why does the title need to be so long
Clouded storyboard why does the title need to be so long
Karina Young
Mr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdf
Mr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdfMr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdf
Mr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdf
Frank Fluegel
Shivna Prakashan
Class 12 Geography Practical file PDF.pdf
Class 12 Geography Practical file PDF.pdfClass 12 Geography Practical file PDF.pdf
Class 12 Geography Practical file PDF.pdf

Recently uploaded (20)

Satta Matka dpboss Matka guessing Indian
Satta Matka dpboss Matka guessing IndianSatta Matka dpboss Matka guessing Indian
Satta Matka dpboss Matka guessing Indian
Satta Matka Dpboss Matka guessing Indian
Satta Matka Dpboss Matka guessing IndianSatta Matka Dpboss Matka guessing Indian
Satta Matka Dpboss Matka guessing Indian
SSG Boorman Purple Heart found research.pdf
SSG Boorman Purple Heart found research.pdfSSG Boorman Purple Heart found research.pdf
SSG Boorman Purple Heart found research.pdf
The Evolution and Impact of Hip Hop a cultural and artistic
The Evolution and Impact of Hip Hop a cultural and artisticThe Evolution and Impact of Hip Hop a cultural and artistic
The Evolution and Impact of Hip Hop a cultural and artistic
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing
Tex Libellus 097 - Tombstone epitaph.pdf
Tex Libellus 097 - Tombstone epitaph.pdfTex Libellus 097 - Tombstone epitaph.pdf
Tex Libellus 097 - Tombstone epitaph.pdf
Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul...
 Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul... Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul...
Helpline number ➒➌➍➑➊➑➏➍➋➒ result update new market new business satta resul...
Clouded storyboard why does the title need to be so long
Clouded storyboard why does the title need to be so longClouded storyboard why does the title need to be so long
Clouded storyboard why does the title need to be so long
Mr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdf
Mr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdfMr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdf
Mr. Brainwash ❤️ Beautiful Girl _ FRANK FLUEGEL GALERIE.pdf
Class 12 Geography Practical file PDF.pdf
Class 12 Geography Practical file PDF.pdfClass 12 Geography Practical file PDF.pdf
Class 12 Geography Practical file PDF.pdf

Penalized Logistic Regression methods .pptx

  • 1. Comparative Study for Penalized Logistic Regression Methods with Applications A Thesis Submitted in Partial Fulfilment for the Requirements of master’s degree in Statistics By Hamdy Abd Elaal Badry Master student, F.G.S.S.R, Cairo University Supervised by Prof.Salah Mahdi Mohamed Department of Applied Statistics and Econometrics F.G.S.S.R., Cairo University DR .Hazem Refaat Ahmed Department of Applied Statistics and Econometrics F.G.S.S.R., Cairo University 2021
  • 2. • ‫االنحدار‬ ‫عن‬ ‫نبذة‬ ‫اللوجيستى‬ ‫و‬ ‫بتعته‬ ‫المعادلة‬ ‫و‬ ‫ازى‬ ‫المعالم‬ ‫نقدر‬ ‫بتاعته‬ • ‫المشكلة‬ ‫بتاعت‬ ‫ال‬ ‫ولذيلك‬ ‫بنستخدم‬ penalized logistic regression high dimensional data • ‫المعادلة‬ ‫نستعرض‬ ‫طريقة‬ ‫كل‬ ‫بتاعتها‬ • , ‫الداتا‬ ‫نعرض‬ ‫و‬
  • 3. LOGISTIC REGRESSION In the fields of medicine and social science, logistic regression is considered one of the most important methods used in binary classification problems, where the response variable has two values coded as zero (0) and one (1). applying logistic regression to high-dimensional data, where the number of variables p , exceeds the number of sample size n, is one of the major problem and challenge that researchers face. the response variable y is a Bernoulli random variable, and the conditional probability that y is equal to 1 given 𝑿 ∈ 𝑹𝑷 which is denoted as π( X ) is P(𝑦𝑖=1/𝑋𝑖𝑗)=𝜋𝑖= exp(𝛽0+ 𝑗=1 𝑝 𝑋𝑖𝑗 𝑇 𝛽𝑗) 1+exp(𝛽0+ 𝑗=1 𝑝 𝑋𝑖𝑗 𝑇𝛽𝑗) where j=1,2,……….,p
  • 4. Binary Logistic Regression Model Y = Binary response X = Quantitative predictor π = proportion of 1’s (yes,success) at any X p = e b0 +b1 X 1+ e b0 +b1 X Equivalent forms of the logistic regression model: What does this look like? X 1 0 1 log              Logit form Probability form N.B.: This is natural log (aka “ln”)
  • 5. Real-World Example Comparison with Other Methods Challenges and Limitations Best Practices Conclusion References Q&A Thank You About the Presenter Audience Poll Case Study Future Directions • Introduction • What is Logistic Regression? • Why Use Penalized Logistic Regression? • Types of Penalized Logistic Regression • L1 Regularization • L2 Regularization • Elastic Net Regularization • Advantages of Penalized Logistic Regression • Disadvantages of Penalized Logistic Regression • Choosing the Right Penalty • Cross-Validation • Implementation • Applications of Penalized Logistic Regression
  • 6. • Introduction • Welcome to this presentation on penalized logistic regression! In today's world, data is everywhere, and it's growing at an exponential rate. As a result, machine learning has become a vital tool for making sense of all this data. One problem that arises with machine learning is overfitting, where the model becomes too complex and fits the training data too closely, leading to poor performance on new data. Penalized logistic regression is a technique used to address this issue by adding a penalty term to the loss function, which reduces the complexity of the model. This presentation will explore what penalized logistic regression is, why it's important, and how it can be implemented in machine learning. • In this presentation, we'll dive into the world of penalized logistic regression and explore its many benefits. We'll explain the different types of penalties, such as L1 and L2 regularization, and discuss how to choose the right penalty for your specific problem. We'll also look at real-world examples of penalized logistic regression in action and compare it to other methods of machine learning. By the end of this presentation, you'll have a comprehensive understanding of penalized logistic regression and its importance in machine learning.
  • 7. What is Logistic Regression? • Logistic Regression is a statistical method used to analyze a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). For example, we might use logistic regression to model whether a student gets admitted to a university based on their GPA, test scores, and the rank of the high school they attended. • In machine learning, logistic regression is used to classify data into discrete categories, such as determining whether an email is spam or not. It's a popular algorithm for binary classification problems (problems with two class values). Logistic regression can also be used for multiclass classification problems (problems with more than two class values), but it requires some extensions.
  • 8. Why Use Penalized Logistic Regression? • Penalized logistic regression is a powerful tool in machine learning that addresses the issue of overfitting. Overfitting occurs when a model becomes too complex and starts to fit the noise in the data rather than the underlying patterns. Penalized logistic regression helps to prevent this by adding a penalty term to the cost function, which discourages the model from fitting the noise. This results in a more generalizable model that performs better on new, unseen data. • For example, let's say we're trying to predict whether a customer will buy a product based on their browsing history. If we use traditional logistic regression, we may end up with a model that fits the noise in the data, such as the fact that the customer happened to browse a lot of unrelated products before making a purchase. However, if we use penalized logistic regression, the model will be more focused on the relevant patterns in the data, such as the customer's interest in similar products, resulting in a more accurate
  • 9. Types of Penalized Logistic Regression • Penalized logistic regression is a powerful tool in machine learning that allows us to tackle complex problems with ease. There are several types of penalized logistic regression, each with its own unique advantages and disadvantages. • L1 regularization, also known as Lasso regularization, is a type of penalized logistic regression that adds a penalty term to the loss function based on the absolute value of the coefficients. This results in sparse solutions where some of the coefficients are set to zero. L2 regularization, on the other hand, adds a penalty term based on the square of the coefficients, resulting in smoother solutions where all coefficients are non-zero. Elastic net regularization combines L1 and L2 regularization, offering the best of both worlds.
  • 10. L1 Regularization • L1 regularization, also known as Lasso regularization, is a technique used in machine learning to prevent overfitting of models. It works by adding a penalty term to the loss function that encourages the model to have sparse coefficients. This means that some of the coefficients will be set to zero, effectively removing them from the model. • One example of where L1 regularization can be useful is in feature selection. In a dataset with many features, L1 regularization can help identify which features are most important for predicting the target variable. By setting some coefficients to zero, it effectively removes those features from consideration, resulting in a simpler and more interpretable model.
  • 11. L2 Regularization • L2 regularization, also known as ridge regression, is a type of penalized regression that adds a penalty term to the logistic regression cost function. This penalty term is proportional to the square of the magnitude of the coefficients, which means that it shrinks the coefficients towards zero. The amount of shrinkage is controlled by the regularization parameter lambda. • The main advantage of L2 regularization is that it helps to prevent overfitting by reducing the variance of the model. It does this by discouraging large coefficients, which can lead to overfitting. In addition, L2 regularization can be used to improve the numerical stability of the model by reducing the sensitivity of the coefficients to small changes in the data.
  • 12. Elastic Net Regularization • Elastic net regularization is a method that combines L1 and L2 regularization to achieve better performance in machine learning models. While L1 regularization can lead to sparse solutions and L2 regularization can lead to dense solutions, elastic net regularization strikes a balance between the two. • In elastic net regularization, the penalty term is a weighted combination of the L1 and L2 penalties. The weight of each penalty is controlled by a hyperparameter alpha. When alpha is set to 0, elastic net regularization reduces to L2 regularization, and when alpha is set to 1, it reduces to L1 regularization. By tuning alpha, we can adjust the trade-off between sparsity and smoothness in the model.
  • 13. Advantages of Penalized Logistic Regression • Penalized logistic regression offers several advantages over traditional logistic regression. One of the main advantages is that it helps to prevent overfitting. By adding a penalty term to the cost function, penalized logistic regression encourages the model to select only the most important features, which can help to improve its generalization performance. This is particularly useful when dealing with high-dimensional datasets. • Another advantage of penalized logistic regression is that it can handle collinear features. When two or more features are highly correlated, traditional logistic regression can have difficulty determining their individual contributions to the outcome variable. Penalized logistic regression, on the other hand, can assign appropriate weights to each feature, even when they are highly correlated.
  • 14. Disadvantages of Penalized Logistic Regression • While penalized logistic regression has many advantages, there are also some disadvantages to consider. One potential drawback is that it can be computationally expensive, especially when dealing with large datasets. This is because the algorithm must perform multiple iterations to find the optimal penalty parameter values. Additionally, if the dataset contains a large number of features, it can be difficult to choose the most appropriate penalty type and value. • Another disadvantage of penalized logistic regression is that it may not always improve predictive accuracy compared to traditional logistic regression. In some cases, the penalty term may cause the model to underfit the data, resulting in lower accuracy. Finally, penalized logistic regression assumes that the relationship between the independent variables and the dependent variable is linear. If this assumption is not met, the model may not perform well.
  • 15. Choosing the Right Penalty • When it comes to choosing the right penalty for penalized logistic regression, there are a few things to consider. One important factor is the size of the dataset. In general, larger datasets can handle stronger penalties without overfitting. Another factor to consider is the nature of the data itself. If the data is highly correlated, L1 regularization may be more appropriate, while L2 regularization may be better suited for data with less correlation. • Another important consideration is the goal of the model. If the goal is to identify a small number of important features, L1 regularization may be the best choice. On the other hand, if the goal is to predict accurately using all available features, L2 regularization may be more appropriate. Ultimately, the choice of penalty will depend on the specific characteristics of the dataset and the goals of the model.
  • 16. Cross-Validation • Cross-validation is a method used to evaluate the performance of a model on an independent dataset. In penalized logistic regression, cross-validation is used to choose the right penalty parameter that balances between overfitting and underfitting. It involves dividing the dataset into k-folds, training the model on k-1 folds, and evaluating its performance on the remaining fold. This process is repeated k times, with each fold serving as the validation set once. The average performance across all folds is used as an estimate of the model's generalization performance. • For example, suppose we have a dataset of 1000 observations and we want to use penalized logistic regression to predict whether a customer will buy a product or not. We can divide the dataset into 5 folds, where each fold contains 200 observations. We can then train the model on 4 folds (800 observations) and evaluate its performance on the remaining fold (200 observations). This process is repeated 5 times, with each fold serving as the validation set once. The average performance across all 5 folds is used to choose the right penalty parameter.
  • 17. Implementation • To implement penalized logistic regression in machine learning, you first need to choose the appropriate penalty parameter. This can be done using techniques such as cross-validation or grid search. Once you have chosen the penalty parameter, you can train your model using an optimization algorithm such as gradient descent. • It is important to note that implementing penalized logistic regression requires some knowledge of programming and machine learning concepts. However, there are many resources available online that can help you get started, including tutorials, code examples, and open-source libraries.
  • 18. Applications of Penalized Logistic Regression • Penalized logistic regression has numerous applications in machine learning, including but not limited to: feature selection, image classification, and text classification. One example of its use is in the field of medical research, where it can be used to predict the likelihood of a patient developing a certain disease based on their medical history and other factors. • Another example is in the field of finance, where penalized logistic regression can be used to predict the likelihood of a borrower defaulting on a loan. This information can then be used by lenders to make more informed decisions about lending practices.
  • 19. Real-World Example • One example of penalized logistic regression in action is in the healthcare industry, where it is used to predict patient outcomes based on various factors such as age, gender, and medical history. In one study, researchers used penalized logistic regression to predict the likelihood of readmission for patients with heart failure. By using this method, they were able to identify high-risk patients and provide them with more intensive care, ultimately reducing the rate of readmissions. • Another example is in the field of marketing, where penalized logistic regression can be used to predict customer behavior and target specific demographics with personalized advertising. For instance, a company could use this method to predict which customers are most likely to make a purchase and then tailor their marketing efforts accordingly. This not only increases sales but also improves customer satisfaction by providing them with relevant content.
  • 20. Comparison with Other Methods • Penalized logistic regression is a powerful method for machine learning, but how does it compare to other methods? One comparison can be made with traditional logistic regression. While traditional logistic regression assumes that all variables are equally important, penalized logistic regression allows for variable selection and assigns weights to each variable based on their importance. This can lead to more accurate predictions and better model performance. • Another comparison can be made with support vector machines (SVMs). While SVMs are also a popular method for classification, they can be computationally expensive and require more tuning of parameters. Penalized logistic regression, on the other hand, is relatively simple to implement and requires minimal parameter tuning. Additionally, penalized logistic regression can handle both binary and multi-class classification problems, while SVMs are typically used for binary classification.
  • 21. Challenges and Limitations • While penalized logistic regression has many advantages over traditional logistic regression, it also has some challenges and limitations that must be considered. One challenge is determining the appropriate penalty parameter. This can be difficult, as different penalties may result in different models and predictions. Additionally, the performance of penalized logistic regression can be sensitive to the choice of penalty parameter, meaning that small changes can have a large impact on the results. • Another limitation of penalized logistic regression is that it assumes that the relationship between the predictors and the outcome is linear. If this assumption is violated, the model may not perform well. Finally, penalized logistic regression requires a relatively large sample size compared to other methods, such as decision trees or random forests. This is because it involves estimating a large number of parameters, which can lead to overfitting if the sample size is too small.
  • 22. Best Practices • One best practice for using penalized logistic regression in machine learning is to carefully choose the penalty parameter. This can be done through techniques such as cross-validation, which involves splitting the data into training and validation sets and testing different penalty values on the validation set to find the optimal one. Another best practice is to normalize the input features before applying penalized logistic regression, as this can improve performance and reduce the impact of outliers. • It is also important to consider the balance between L1 and L2 regularization when using elastic net regularization. The ratio between the two penalties can have a significant impact on the resulting model, so it is important to experiment with different ratios to find the optimal one for the given problem.
  • 23. Conclusion • In conclusion, penalized logistic regression is a powerful tool in machine learning that allows for better prediction accuracy and model interpretability. By adding a penalty term to the cost function, we can effectively reduce overfitting and select important features in our models. • We have explored the different types of penalties, such as L1 and L2 regularization, and discussed their advantages and disadvantages. We have also looked at how to choose the right penalty using cross-validation and provided best practices for implementing penalized logistic regression in machine learning. • It is clear that penalized logistic regression has numerous applications in various fields, from finance to healthcare. As machine learning continues to grow and evolve, it is important to stay up-to-date with the latest techniques and tools. Penalized logistic regression is definitely one of those tools that should be in every data scientist's toolkit.
  • 24. References • 1. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. • 2. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320. • 3. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. • 4. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1-22.
  • 25. Q&A • thank you for your attention. Now, it's time to open up the floor for questions and answers. We encourage you to ask any questions you may have about penalized logistic regression. Our team of experts is here to provide clear and concise answers that will deepen your understanding of this important topic. • Remember, there are no bad questions. So don't be shy. Ask away and let's continue our journey into the world of machine learning together.
  • 26. Thank You • Thank you for your attention and interest in penalized logistic regression. We hope that this presentation has provided you with valuable insights into the importance and applications of this powerful machine learning technique. • If you have any further questions or would like more information, please do not hesitate to contact us. We are always happy to discuss our work and share our knowledge with others.
  • 27. About the Presenter • Presenter: John Doe • Qualifications: PhD in Computer Science from Stanford University, specializing in machine learning and data analytics. Has published multiple papers in top-tier conferences and journals.
  • 28. Audience Poll • Now that we've covered the basics of penalized logistic regression, let's take a moment to gauge your understanding of the topic with a quick poll. Don't worry, this isn't a graded quiz! • Please take out your phones and go to the following website: We'll be asking a few multiple-choice questions about the material we just covered. Your answers will help us understand how well we've explained the concepts, and if there are any areas we need to spend more time on. Thank you for your participation!
  • 29. Case Study • In a recent study, researchers used penalized logistic regression to predict whether patients with heart disease would experience a cardiac event within the next year. The study included data from over 10,000 patients and used a variety of clinical variables, such as age, sex, and medical history, to make predictions. • The results showed that penalized logistic regression was able to accurately predict cardiac events with a high degree of accuracy, outperforming traditional logistic regression models. This study highlights the importance of using advanced machine learning techniques, such as penalized logistic regression, in real-world scenarios where accurate predictions can have life-saving implications.
  • 30. Future Directions • As machine learning continues to advance, there are many exciting future directions for penalized logistic regression. One area of focus is on developing more efficient algorithms that can handle larger datasets and more complex models. This will allow researchers to tackle even more challenging problems and make more accurate predictions. • Another promising direction is the integration of penalized logistic regression with other machine learning techniques, such as deep learning. By combining these approaches, researchers can develop even more powerful models that can handle a wider range of data types and produce more accurate predictions. For example, using penalized logistic regression in conjunction with convolutional neural networks has shown promise in image classification tasks.