SlideShare a Scribd company logo
1 of 7
Download to read offline
Blogpost:
Binary classification metrics
neptune.ml We bring collaboration to data science projects.
based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them)
@NeptuneML /neptune-ml contact@neptune.ml
1/7
False positive rate | Type-I error
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
false_positive_rate = fp / (fp + tn)
When to use it
How to calculate
Formula
Usually, it is not used alone but rather
with some other metric,
If the cost of dealing with an alert is
high you should consider increasing the
threshold to get fewer alerts.
Explanation
How many false alerts your
model raises
False negative rate | Type-II error
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
false_negative_rate = fn / (tp + fn)
When to use it
How to calculate
Formula
Usually, it is not used alone but rather
with some other metric,
If the cost of letting the fraudulent
transactions through is high and the
value you get from the users isn’t you
can consider focusing on this number.
Explanation
How often your model misses
trully fraudulent transactions.
False discovery rate
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
false_discovery_rate = fp / (tp + fp)
When to use it
How to calculate
Formula
-
Explanation
Reversed precision (1-
precision).
Negative predictive value
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
negative_predictive_value = tn / (tn + fn)
When to use it
How to calculate
Formula
Usually, you don’t use it alone but rather as an auxiliary metric,
When we care about high precision on negative predictions. For
example, imagine we really don’t want to have any additional process
for screening the transactions predicted as clean. In that case, we
may want to make sure that our negative predictive value is high.
Explanation
Think of it as precision for negative class.
True negative rate | Specificity
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
true_negative_rate = tn / (tn + fp)
When to use it
How to calculate
Formula
Usually, you don’t use it alone but rather as an auxiliary metric,
When you really want to be sure that you are right when you say
something is safe. A typical example would be a doctor telling a
patient “you are healthy”. Making a mistake here and telling a sick
person they are safe and can go home is something you may want to
avoid.
Explanation
Think of it as recall for negative class.
Blogpost:
Binary classification metrics
neptune.ml We bring collaboration to data science projects.
based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them)
@NeptuneML /neptune-ml contact@neptune.ml
2/7
Log loss
from sklearn.metrics import log_loss
loss = log_loss(y_true, y_pred)
When to use it
How to calculate
Formula
Usually used as objective not metric.
Explanation
Averaged difference between ground truth and
logarithm of predicted score for every
observation. Heavily penalizes when the model
is confident about something yet wrong.
Brier score (loss)
from sklearn.metrics import brier_score_loss
brier = brier_score_loss(y_true, y_pred[:,1])
When to use it
How to calculate
Formula
When you care about calibrated probabilities.
Explanation
Mean squared error between ground truth and
predicted score.
ROC AUC score
from sklearn.metrics import roc_auc_score
roc_auc = roc_auc_score(y_true, y_pred_pos)
When to use it
How to calculate
Formula
You should use it when you ultimately care about ranking predictions.
You should not use it when your data is heavily imbalanced.
You should use it when you care equally about positive and negative
class.
Explanation
Rank correlation between predictions and
target. Tells you how good at ranking
predictions (positive over negative) your model
is.
Precison-Recall AUC | Average precision
from sklearn.metrics import average_precision_score
avg_precision = average_precision_score(y_true, y_pred)
When to use it
How to calculate
Formula
When you want to communicate precision/recall decision to other
stakeholders and want to choose the threshold that fits the business
problem.
When your data is heavily imbalanced.
When you care more about positive than negative class.
Explanation
What is the average precision over all recall
values.
Area under ROC curve Area under Precision-Recall curve
Blogpost:
Binary classification metrics
neptune.ml We bring collaboration to data science projects.
based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them)
@NeptuneML /neptune-ml contact@neptune.ml
3/7
F beta
from sklearn.metrics import fbeta_score
fbeta_score(y_true, y_pred_class, beta)
When to use it
How to calculate
Formula
When you want to combine precision
and recall in one metric and would like
to be able to adjust how much focus
you put on one or the other.
Explanation
Geometrically averaged
precision and recall with a
weight beta. ( 0<beta<1 favours
precision; beta>1 favours recall )
F1 score
from sklearn.metrics import fbeta_score
fbeta_score(y_true, y_pred_class, beta=1)
When to use it
How to calculate
Formula
Pretty much in every binary
classification problem.
It is my go-to metric when working on
those problems.
Explanation
Geometric average of precision
and recall.
F2 score
from sklearn.metrics import fbeta_score
fbeta_score(y_true, y_pred_class, beta=2)
When to use it
How to calculate
Formula
When recalling positive observations
(fraudulent transactions) is more
important than being precise about it
but you still want to have a nice and
simple metric that combines precision
and recall.
Explanation
Geometric average of precison
and recall with twice as much
weight on recall.
Cohen Kappa
from sklearn.metrics import cohen_kappa_score
cohen_kappa_score(y_true, y_pred_class)
When to use it
How to calculate
Formula
When you want to combine precision
and recall in one metric and would like
to be able to adjust how much focus
you put on one or the other.
Explanation
How much better is your model
over the random classifier that
predicts based on class
frequencies.
Matthews correlation coefficient
from sklearn.metrics import matthews_corrcoef
matthews_corrcoef(y_true, y_pred_class)
When to use it
How to calculate
Formula
Pretty much in every binary
classification problem.
It is my go-to metric when working on
those problems.
Explanation
Correlation between predicted
classes and ground truth.
Kolmogorov-Smirnov statistics
from scikitplot.helpers import binary_ks_curve
res = binary_ks_curve(y_true, y_pred[:, 1])
ks_stat = res[3]
When to use it
How to calculate
Formula
when your problem is about sorting/
prioritizing the most relevant
observations and you care equally
about positive and negative class.
Explanation
It helps to assess the separation
between prediction distributions
for positive and negative class.
Max distance between KS curves
Blogpost:
Binary classification metrics
neptune.ml We bring collaboration to data science projects.
based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them)
@NeptuneML /neptune-ml contact@neptune.ml
4/7
Positive predictive value | Precision
from sklearn.metrics import precision_score
precision_score(y_true, y_pred_class)
When to use it
How to calculate
Formula
Again, it usually doesn’t make sense to use it alone but rather
coupled with other metrics like recall.
When raising false alerts is costly and you really want all the positive
predictions to be worth looking at you should optimize for precision.
Explanation
Make sure that people that go to prison are
guilty.
True positive rate | Recall | Sensitivity
from sklearn.metrics import recall_score
recall_score(y_true, y_pred_class)
When to use it
How to calculate
Formula
Usually, you will not use it alone but rather coupled with other metrics
like precision.
That being said recall is a go-to metric, when you really care about
catching all fraudulent transactions even at a cost of false alerts.
Potentially it is cheap for you to process those alerts and very
expensive when the transaction goes unseen.
Explanation
Put all guilty in prison.
Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred_class)
When to use it
How to calculate
Formula
When your problem is balanced using
accuracy is usually a good start.
When every class is equally important
to you.
Explanation
How good at classyfying both
positive and negative
cases your model is.
Blogpost:
Binary classification metrics
neptune.ml We bring collaboration to data science projects.
based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them)
@NeptuneML /neptune-ml contact@neptune.ml
5/7
Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, y_pred_class)
tn, fp, fn, tp = cm.ravel()
When to use it
How to calculate
Pretty much always.
Get the gist of model imbalance and where the
predictions fall.
Explanation
Table that contains true negative (tn), false positive (fp),
false negative (fn), and true positive (tp) predictions.
Cumulative gain chart
from scikitplot.metrics import plot_cumulative_gain
fig, ax = plt.subplots()
plot_cumulative_gain(y_true, y_pred, ax=ax)
When to use it
How to calculate
Whenever you want to use your model to choose the best
customers/transactions to target by sorting all
predictions you should consider using cumulative gain
charts.
Explanation
In simple words, it helps you gauge how much you gain by
using your model over a random model for a given fraction
of top scored predictions.
Blogpost:
Binary classification metrics
neptune.ml We bring collaboration to data science projects.
based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them)
@NeptuneML /neptune-ml contact@neptune.ml
6/7
Kolmogorov-Smirnov chart
from scikitplot.metrics import plot_ks_statistic
fig, ax = plt.subplots()
plot_ks_statistic(y_true, y_pred, ax=ax)
When to use it
How to calculate
When your problem is about sorting/prioritizing the most
relevant observations and you care equally about positive
and negative class.
Explanation
It helps to assess the separation between prediction
distributions for positive and negative class.
So it works similarly to Cumulative gain chart but instead of
just looking at positive class it looks at the separation
between positive and negative class.
Lift curve
from scikitplot.metrics import plot_lift_curve
fig, ax = plt.subplots()
plot_lift_curve(y_true, y_pred, ax=ax)
When to use it
How to calculate
Whenever you want to use your model to choose the best
customers/transactions to target by sorting all
predictions you should consider using a lift curve.
Explanation
In simple words, it helps you gauge how much you gain by
using your model over a random model for a given fraction
of top scored predictions.
It tells you how much better your model is than a random
model for the given percentile of top scored predictions.
Blogpost:
Binary classification metrics
neptune.ml We bring collaboration to data science projects.
based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them)
@NeptuneML /neptune-ml contact@neptune.ml
7/7
Precision-Recall curve
from scikitplot.metrics import plot_precision_recall
fig, ax = plt.subplots()
plot_precision_recall(y_true, y_pred, ax=ax)
When to use it
How to calculate
It is a curve that combines precision (PPV) and Recall
(TPR) in a single visualization. For every threshold, you
calculate PPV and TPR and plot it. The higher on y-axis
your curve is the better your model performance.
Explanation
When you want to communicate precision/recall decision to
other stakeholders and want to choose the threshold that
fits the business problem.
When your data is heavily imbalanced.
When you care more about positive than negative class.
ROC curve
from scikitplot.metrics import plot_roc
fig, ax = plt.subplots()
plot_roc(y_true, y_pred, ax=ax)
When to use it
How to calculate
You should use it when you ultimately care about ranking
predictions.
You should not use it when your data is heavily
imbalanced.
You should use it when you care equally about positive
and negative class.
Explanation
It is a chart that visualizes the tradeoff between true
positive rate (TPR) and false positive rate (FPR). Basically,
for every threshold, we calculate TPR and FPR and plot it on
one chart.

More Related Content

What's hot

Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...PAPIs.io
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)Abhimanyu Dwivedi
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMapAshish Patel
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningPranov Mishra
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciencesfsmart01
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningVenkata Karthik Gullapalli
 
Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"LefterisMitsimponas
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Exmailund
 

What's hot (15)

Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciences
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Housing price prediction
Housing price predictionHousing price prediction
Housing price prediction
 
Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
numerical analysis
numerical analysisnumerical analysis
numerical analysis
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
 

Similar to Binary classification metrics_cheatsheet

Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Evaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guideEvaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guideneptune.ml
 
Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxChode Amarnath
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance OptimizationAlbert Chu
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regressionRaman Kannan
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook ProjectBrian Ryan
 
Classification Chapter 3 Hands on Machine Learning Book
Classification Chapter 3 Hands on Machine Learning BookClassification Chapter 3 Hands on Machine Learning Book
Classification Chapter 3 Hands on Machine Learning BookUniversity of Ulsan
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptBharatDaiyaBharat
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Using machine learning in anti money laundering part 2
Using machine learning in anti money laundering   part 2Using machine learning in anti money laundering   part 2
Using machine learning in anti money laundering part 2Naveen Grover
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 

Similar to Binary classification metrics_cheatsheet (20)

Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Evaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guideEvaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guide
 
Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptx
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook Project
 
working with python
working with pythonworking with python
working with python
 
Classification Chapter 3 Hands on Machine Learning Book
Classification Chapter 3 Hands on Machine Learning BookClassification Chapter 3 Hands on Machine Learning Book
Classification Chapter 3 Hands on Machine Learning Book
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.ppt
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Using machine learning in anti money laundering part 2
Using machine learning in anti money laundering   part 2Using machine learning in anti money laundering   part 2
Using machine learning in anti money laundering part 2
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 

More from Jakub Czakon

Evaluation metrics for binary classification: The ultimate guide
Evaluation metrics for binary classification: The ultimate guideEvaluation metrics for binary classification: The ultimate guide
Evaluation metrics for binary classification: The ultimate guideJakub Czakon
 
Data science is not Software Development and how Experiment Management can ma...
Data science is not Software Development and how Experiment Management can ma...Data science is not Software Development and how Experiment Management can ma...
Data science is not Software Development and how Experiment Management can ma...Jakub Czakon
 
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019Jakub Czakon
 
How to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsHow to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsJakub Czakon
 
How to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsHow to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsJakub Czakon
 
Dss2019 hyperparameter optimization landscape
Dss2019 hyperparameter optimization landscapeDss2019 hyperparameter optimization landscape
Dss2019 hyperparameter optimization landscapeJakub Czakon
 

More from Jakub Czakon (6)

Evaluation metrics for binary classification: The ultimate guide
Evaluation metrics for binary classification: The ultimate guideEvaluation metrics for binary classification: The ultimate guide
Evaluation metrics for binary classification: The ultimate guide
 
Data science is not Software Development and how Experiment Management can ma...
Data science is not Software Development and how Experiment Management can ma...Data science is not Software Development and how Experiment Management can ma...
Data science is not Software Development and how Experiment Management can ma...
 
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
 
How to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsHow to track and organize your machine learning experimentations
How to track and organize your machine learning experimentations
 
How to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsHow to track and organize your machine learning experimentations
How to track and organize your machine learning experimentations
 
Dss2019 hyperparameter optimization landscape
Dss2019 hyperparameter optimization landscapeDss2019 hyperparameter optimization landscape
Dss2019 hyperparameter optimization landscape
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

Binary classification metrics_cheatsheet

  • 1. Blogpost: Binary classification metrics neptune.ml We bring collaboration to data science projects. based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them) @NeptuneML /neptune-ml contact@neptune.ml 1/7 False positive rate | Type-I error from sklearn.metrics import confusion_matrix tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() false_positive_rate = fp / (fp + tn) When to use it How to calculate Formula Usually, it is not used alone but rather with some other metric, If the cost of dealing with an alert is high you should consider increasing the threshold to get fewer alerts. Explanation How many false alerts your model raises False negative rate | Type-II error from sklearn.metrics import confusion_matrix tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() false_negative_rate = fn / (tp + fn) When to use it How to calculate Formula Usually, it is not used alone but rather with some other metric, If the cost of letting the fraudulent transactions through is high and the value you get from the users isn’t you can consider focusing on this number. Explanation How often your model misses trully fraudulent transactions. False discovery rate from sklearn.metrics import confusion_matrix tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() false_discovery_rate = fp / (tp + fp) When to use it How to calculate Formula - Explanation Reversed precision (1- precision). Negative predictive value from sklearn.metrics import confusion_matrix tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() negative_predictive_value = tn / (tn + fn) When to use it How to calculate Formula Usually, you don’t use it alone but rather as an auxiliary metric, When we care about high precision on negative predictions. For example, imagine we really don’t want to have any additional process for screening the transactions predicted as clean. In that case, we may want to make sure that our negative predictive value is high. Explanation Think of it as precision for negative class. True negative rate | Specificity from sklearn.metrics import confusion_matrix tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel() true_negative_rate = tn / (tn + fp) When to use it How to calculate Formula Usually, you don’t use it alone but rather as an auxiliary metric, When you really want to be sure that you are right when you say something is safe. A typical example would be a doctor telling a patient “you are healthy”. Making a mistake here and telling a sick person they are safe and can go home is something you may want to avoid. Explanation Think of it as recall for negative class.
  • 2. Blogpost: Binary classification metrics neptune.ml We bring collaboration to data science projects. based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them) @NeptuneML /neptune-ml contact@neptune.ml 2/7 Log loss from sklearn.metrics import log_loss loss = log_loss(y_true, y_pred) When to use it How to calculate Formula Usually used as objective not metric. Explanation Averaged difference between ground truth and logarithm of predicted score for every observation. Heavily penalizes when the model is confident about something yet wrong. Brier score (loss) from sklearn.metrics import brier_score_loss brier = brier_score_loss(y_true, y_pred[:,1]) When to use it How to calculate Formula When you care about calibrated probabilities. Explanation Mean squared error between ground truth and predicted score. ROC AUC score from sklearn.metrics import roc_auc_score roc_auc = roc_auc_score(y_true, y_pred_pos) When to use it How to calculate Formula You should use it when you ultimately care about ranking predictions. You should not use it when your data is heavily imbalanced. You should use it when you care equally about positive and negative class. Explanation Rank correlation between predictions and target. Tells you how good at ranking predictions (positive over negative) your model is. Precison-Recall AUC | Average precision from sklearn.metrics import average_precision_score avg_precision = average_precision_score(y_true, y_pred) When to use it How to calculate Formula When you want to communicate precision/recall decision to other stakeholders and want to choose the threshold that fits the business problem. When your data is heavily imbalanced. When you care more about positive than negative class. Explanation What is the average precision over all recall values. Area under ROC curve Area under Precision-Recall curve
  • 3. Blogpost: Binary classification metrics neptune.ml We bring collaboration to data science projects. based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them) @NeptuneML /neptune-ml contact@neptune.ml 3/7 F beta from sklearn.metrics import fbeta_score fbeta_score(y_true, y_pred_class, beta) When to use it How to calculate Formula When you want to combine precision and recall in one metric and would like to be able to adjust how much focus you put on one or the other. Explanation Geometrically averaged precision and recall with a weight beta. ( 0<beta<1 favours precision; beta>1 favours recall ) F1 score from sklearn.metrics import fbeta_score fbeta_score(y_true, y_pred_class, beta=1) When to use it How to calculate Formula Pretty much in every binary classification problem. It is my go-to metric when working on those problems. Explanation Geometric average of precision and recall. F2 score from sklearn.metrics import fbeta_score fbeta_score(y_true, y_pred_class, beta=2) When to use it How to calculate Formula When recalling positive observations (fraudulent transactions) is more important than being precise about it but you still want to have a nice and simple metric that combines precision and recall. Explanation Geometric average of precison and recall with twice as much weight on recall. Cohen Kappa from sklearn.metrics import cohen_kappa_score cohen_kappa_score(y_true, y_pred_class) When to use it How to calculate Formula When you want to combine precision and recall in one metric and would like to be able to adjust how much focus you put on one or the other. Explanation How much better is your model over the random classifier that predicts based on class frequencies. Matthews correlation coefficient from sklearn.metrics import matthews_corrcoef matthews_corrcoef(y_true, y_pred_class) When to use it How to calculate Formula Pretty much in every binary classification problem. It is my go-to metric when working on those problems. Explanation Correlation between predicted classes and ground truth. Kolmogorov-Smirnov statistics from scikitplot.helpers import binary_ks_curve res = binary_ks_curve(y_true, y_pred[:, 1]) ks_stat = res[3] When to use it How to calculate Formula when your problem is about sorting/ prioritizing the most relevant observations and you care equally about positive and negative class. Explanation It helps to assess the separation between prediction distributions for positive and negative class. Max distance between KS curves
  • 4. Blogpost: Binary classification metrics neptune.ml We bring collaboration to data science projects. based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them) @NeptuneML /neptune-ml contact@neptune.ml 4/7 Positive predictive value | Precision from sklearn.metrics import precision_score precision_score(y_true, y_pred_class) When to use it How to calculate Formula Again, it usually doesn’t make sense to use it alone but rather coupled with other metrics like recall. When raising false alerts is costly and you really want all the positive predictions to be worth looking at you should optimize for precision. Explanation Make sure that people that go to prison are guilty. True positive rate | Recall | Sensitivity from sklearn.metrics import recall_score recall_score(y_true, y_pred_class) When to use it How to calculate Formula Usually, you will not use it alone but rather coupled with other metrics like precision. That being said recall is a go-to metric, when you really care about catching all fraudulent transactions even at a cost of false alerts. Potentially it is cheap for you to process those alerts and very expensive when the transaction goes unseen. Explanation Put all guilty in prison. Accuracy from sklearn.metrics import accuracy_score accuracy_score(y_true, y_pred_class) When to use it How to calculate Formula When your problem is balanced using accuracy is usually a good start. When every class is equally important to you. Explanation How good at classyfying both positive and negative cases your model is.
  • 5. Blogpost: Binary classification metrics neptune.ml We bring collaboration to data science projects. based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them) @NeptuneML /neptune-ml contact@neptune.ml 5/7 Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_true, y_pred_class) tn, fp, fn, tp = cm.ravel() When to use it How to calculate Pretty much always. Get the gist of model imbalance and where the predictions fall. Explanation Table that contains true negative (tn), false positive (fp), false negative (fn), and true positive (tp) predictions. Cumulative gain chart from scikitplot.metrics import plot_cumulative_gain fig, ax = plt.subplots() plot_cumulative_gain(y_true, y_pred, ax=ax) When to use it How to calculate Whenever you want to use your model to choose the best customers/transactions to target by sorting all predictions you should consider using cumulative gain charts. Explanation In simple words, it helps you gauge how much you gain by using your model over a random model for a given fraction of top scored predictions.
  • 6. Blogpost: Binary classification metrics neptune.ml We bring collaboration to data science projects. based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them) @NeptuneML /neptune-ml contact@neptune.ml 6/7 Kolmogorov-Smirnov chart from scikitplot.metrics import plot_ks_statistic fig, ax = plt.subplots() plot_ks_statistic(y_true, y_pred, ax=ax) When to use it How to calculate When your problem is about sorting/prioritizing the most relevant observations and you care equally about positive and negative class. Explanation It helps to assess the separation between prediction distributions for positive and negative class. So it works similarly to Cumulative gain chart but instead of just looking at positive class it looks at the separation between positive and negative class. Lift curve from scikitplot.metrics import plot_lift_curve fig, ax = plt.subplots() plot_lift_curve(y_true, y_pred, ax=ax) When to use it How to calculate Whenever you want to use your model to choose the best customers/transactions to target by sorting all predictions you should consider using a lift curve. Explanation In simple words, it helps you gauge how much you gain by using your model over a random model for a given fraction of top scored predictions. It tells you how much better your model is than a random model for the given percentile of top scored predictions.
  • 7. Blogpost: Binary classification metrics neptune.ml We bring collaboration to data science projects. based on blog post 22 Evaluation Metrics for Binary Classification (And When to Use Them) @NeptuneML /neptune-ml contact@neptune.ml 7/7 Precision-Recall curve from scikitplot.metrics import plot_precision_recall fig, ax = plt.subplots() plot_precision_recall(y_true, y_pred, ax=ax) When to use it How to calculate It is a curve that combines precision (PPV) and Recall (TPR) in a single visualization. For every threshold, you calculate PPV and TPR and plot it. The higher on y-axis your curve is the better your model performance. Explanation When you want to communicate precision/recall decision to other stakeholders and want to choose the threshold that fits the business problem. When your data is heavily imbalanced. When you care more about positive than negative class. ROC curve from scikitplot.metrics import plot_roc fig, ax = plt.subplots() plot_roc(y_true, y_pred, ax=ax) When to use it How to calculate You should use it when you ultimately care about ranking predictions. You should not use it when your data is heavily imbalanced. You should use it when you care equally about positive and negative class. Explanation It is a chart that visualizes the tradeoff between true positive rate (TPR) and false positive rate (FPR). Basically, for every threshold, we calculate TPR and FPR and plot it on one chart.