2. Logistic Regression
Extension of linear regression where the dependent variable is categorical and not continuous.
It is a prediction done with categorical variable. It predicts the probability of the outcome
variable being true.
In other words, the interest is in predicting which of two possible events are going to happen given
certain other information
• Is he a prospective customer ie., likelihood of purchase vs non-purchase ?
• Is he a defaulter for bill payment ie., credit worthiness
• Low claim or high risk for an insurance
• Medicine: is it going to work or does not work
3. Logistic Function
Logistic regression is named for the function used at the core of the method - the logistic
function(or sigmoid function). It converts any number into a number between 0 and 1.
Logistic regression models the probability of the
default class.
Logistic regression is a linear method, but the
predictions are transformed using the logistic function.
4. Logistic Regression - Example
You have data for some customers. You need to predict the probability whether a customer will buy
(y) a particular product or not. Let us assume we are given the “Age” of each customer.
g(y) = βo + β(Age)
In logistic regression, we are only concerned about the probability of outcome
dependent variable ( success or failure)
p = exp(βo + β(Age)) / exp(βo + β(Age)) + 1
5. Performance Metrics
Some of the metrics to evaluate a ML Model performance are:
Classification
Accuracy
Precision
Recall
F1 score
Regression
Mean Absolute Error
Mean Squared Error
R2 Score
6. Classification Metrics
It’s a 2 by 2 matrix which compares the actual outcomes to the predicted outcomes. The rows are labeled with
actual outcomes while the columns are labeled with predicted outcomes.
True Positives (TP) - These are the correctly predicted positive values which means that the value of actual
class is yes and the value of predicted class is also yes. E.g. if actual class value indicates that this passenger
survived and predicted class tells you the same thing.
True Negatives (TN) - These are the correctly predicted negative values which means that the value of actual
class is no and value of predicted class is also no. E.g. if actual class says this passenger did not survive and
predicted class tells you the same thing.
False Positives (FP) – When actual class is no and predicted class is yes. E.g. if actual class says this passenger
did not survive but predicted class tells you that this passenger will survive.
False Negatives (FN) – When actual class is yes but predicted class in no. E.g. if actual class value indicates
that this passenger survived and predicted class tells you that passenger will die.
7. Classification Metrics
Accuracy : it is simply a ratio of correctly predicted observation to the total observations
Accuracy = TP+TN/TP+FP+FN+TN
Sensitivity or Recall = True Positives/(True Positives + False Negatives)
Out of all the items that are truly positive, how many were correctly classified as positive. Or simply,
how many positive items were 'recalled' from the dataset. The question recall answers is: Of all the
passengers that truly survived, how many did we label as Survived?
Specificity or Precision = True Positives/(True Positives + False Positives) aka Precision
Out of all the items labeled as positive, how many truly belong to the positive class. It measures
how precise you are at predicting a positive label. The question that this metric answer is of all
passengers that labeled as survived, how many actually survived?
F1-Score: F1 score combines precision and recall relative to a specific positive class. The F1 score can
be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best
value at 1 and worst at 0:
F1 = 2 * (precision * recall) / (precision + recall)
8. False Positives and False Negatives
Precision: Out of the patients, diagnosed with illness, how many were classified correctly. Out
of emails sent to spam folder, how many of them were actually Spam.
Recall: Out of the sick patients, how many did we correctly diagnosed as sick? Out of all the
spam emails, how many of them were correctly sent to spam?
9. Accuracy Paradox
Accuracy can often be misleading, especially in cases where classes are heavily distributed
unevenly through the dataset. Consider the case where you have a dataset of 950 positive items
and 50 negative items. Without even using a model, predicting everything to be positive will
net a 95% accuracy score. This is known as the Accuracy Paradox. This information doesn't
really tell us anything at all. Our precision value would also yield the same percentage of 95%,
but this value is more telling. We know that out of all the items classified as positive, 95% truly
belong to that positive class. Our recall on the positive class would be 100%. Great! But recall
on the negative class is 0%. Accuracy couldn't tell us any of that information.
10. ROC Curve
Receiver Operating Characteristic(ROC) summarizes the model’s
performance by evaluating the trade offs between true positive rate
(sensitivity) and false positive rate(1- specificity).
The area under curve (AUC), referred to as index of accuracy(A) or
concordance index, is a perfect performance metric for ROC curve.
Higher the area under curve, better the prediction power of the
model. Below is a sample ROC curve. The ROC of a perfect
predictive model has TP equals 1 and FP equals 0.