Logistic Regression
Classification
• Classification techniques that are essential part of machine learning
• Approximately 70% of Data science problems are classification
problems
• Some regression algorithms is used for classifications
• There are lots of classification problems that are available but Logistic
regression is common and useful method for solving binary
classification problems Dichotomous
Examples
• Spam detection
• Loan applications
• Customer purchase or churn
• Diseases detection
• It is easy to implement and consider the base line of binary
classification problem
Classification
Email: Spam / Not Spam?
Online Transactions: Fraudulent (Yes / No)?
Tumor: Malignant / Benign ?
0: “Negative Class” (e.g., benign tumor)
1: “Positive Class” (e.g., malignant tumor)
Logistic regression (Logit regression) is based on the probability
estimation concept
What the probability that this email is spam?
It based on the threshold concept, if the estimated probability is
greater than 50% , then the model predict that the mail is belongs to
class 1 and otherwise it belongs to class 0
This is called binary classification
It’s a special case of linear regression where the target variable is
categorical in nature.
Logistic Regression
•Logistic Regression is a Supervised statistical technique to
find the probability of dependent variable(Classes present
in the variable).
•Logistic regression uses functions called the logit
functions,that helps derive a relationship between the
dependent variable and independent variables by
predicting the probabilities or chances of occurrence.
•The logistic functions (also known as the sigmoid functions)
convert the probabilities into binary values which could be
further used for predictions
•Logistic Regression compute a weighted sum of the input
features , but instead of outputting the result directly like
linear regression models, it passed the output to the
sigmoid function and outputs the logistic of the result.
Z=
Logistic Regression
Where in Linear Regression
Logistic Regression
Logistic Regression
Tumor Size
Threshold classifier output at 0.5:
If , predict “y = 1”
If , predict “y = 0”
Tumor Size
Malignant ?
(Yes) 1
(No) 0
Classification: y = 0 or 1
can be > 1 or < 0
Logistic Regression:
Sigmoid function
Logistic function
Logistic Regression Model
Want
1
0.5
0
Interpretation of Hypothesis Output
= estimated probability that y = 1 on input x
Tell patient that 70% chance of tumor being malignant
Example: If
“probability that y = 1, given x,
parameterized by ”
why can’t we use Linear Regression?
• Linear Regression predicts continuous variables like price of house, and the
output of the Linear Regression can range from negative infinity to positive
infinity.
• Since, The predicted values is not probability value but a continuous value for
the classes, it will be very hard to find the right threshold that can help
distinguish between the classes.
• .
• In a multiclass problem there can n number of classes, Now each classes will
be labelled from 0-n.
Suppose, we have 5 class problem 0,1,2,3 and 4 these classes won’t carry or
won’t be having any meaningful order. However, they would be forced to
establish some kind of relation between the dependent and the independent
features.
Decision boundary
•A decision Boundary is a line or margin that separates the classes.
•Classification algorithm is all about finding the decision boundary that
helps distinguish between the classes perfectly or close to perfect.
•Logistic Regression decides a proper fit to the decision boundary so that
we will be able to predict which class a new data will correspond to.
Decision boundary
Logistic regression
Suppose predict “ “ if
predict “ “ if
z
1
x1
x2
Decision Boundary
1 2 3
1
2
3
Predict “ “ if
How to fit (find) Parameter θ
Parameter θ (θ0, θ1, θ2) defines the decision boundary
not the training set. Training set may be used to find the
Parameter θ
Non-linear decision boundaries
x1
x2
Predict “ “ if
x1
x2
1
-1
-1
1
Cost function
Cost Function is a function that measures the performance of a Machine
Learning model for given data.
Cost Function is basically the calculation of the error between predicted
values and expected values and presents it in the form of a single real
number.
Many people gets confused between Cost Function and Loss Function,
Well to put this in simple terms Cost Function is the average of error of n-
sample in the data and Loss Function is the error for individual data points.
In other words, Loss Function is for one training example, Cost Function is
the for the entire training set.
Cost function
Training set:
How to choose parameters ?
m examples
Cost function
Linear regression:
“non-convex” “convex”
We want J(θ) to
behave like this
J(θ) is non-linear beacause of the
presence of non-linear sigmoid function
Non-Linear
Function
Logistic regression cost function Different
Cost Function
Logistic regression cost function
If y = 0
Logistic regression cost function
If you combine the above two equations in one,You will get a convex function
and this cost function will help the Logistic Regression model converge towards
Global Minimum faster.
Simplified cost function & gradient descent
Logistic regression cost function
Output
Logistic regression cost function
To fit parameters :
To make a prediction given new :
Gradient Descent
Want :
Repeat
(simultaneously update all )
Gradient Descent
Want :
(simultaneously update all )
Repeat
Algorithm looks identical to linear regression!
Optimization algorithm
Cost function . Want .
Given , we have code that can compute
-
- (for )
Repeat
Gradient descent:
Multiclass classification
Email foldering/tagging: Work, Friends, Family, Hobby
Medical diagrams: Not ill, Cold, Flu
Weather: Sunny, Cloudy, Rain, Snow
x1
x2
x1
x2
Binary classification: Multi-class classification:
x1
x2
One-vs-all (one-vs-rest):
Class 1:
Class 2:
Class 3:
x1
x2
x1
x2
x1
x2
One-vs-all
Train a logistic regression classifier for each class
to predict the probability that .
On a new input , to make a prediction, pick the
class that maximizes
• Because of its efficient and straightforward nature, doesn't require high
computation power, easy to implement, easily interpretable, used widely by
data analyst and scientist. Also, it doesn't require scaling of features.
• Logistic regression provides a probability score for observations.
Advantages
• Logistic regression is not able to handle a large number of categorical features/variables.
• It is vulnerable to overfitting. Also, can't solve the non-linear problem with the logistic
regression that is why it requires a transformation of non-linear features.
• Logistic regression will not perform well with independent variables that are not
correlated to the target variable and are very similar or correlated to each other.
Disadvantages
https://medium.com/analytics-vidhya/understanding-logistic-regression-b3c672deac04
https://machinelearningmastery.com/logistic-regression-for-machine-learning/
https://medium.com/analytics-vidhya/understanding-logistic-regression-b3c672deac04
https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4
d56c9c8
https://towardsdatascience.com/logistic-regression-b0af09cdb8ad
https://medium.com/analytics-vidhya/ml-from-scratch-logistic-regression-gradient-descent-6
3b6beb1664c
Resources

lec+5+_part+1 cloud .pptx

Editor's Notes

  • #27 https://medium.com/analytics-vidhya/understanding-logistic-regression-b3c672deac04