Model Calibration: Improving Probability Estimates

•

1 like•385 views

This document discusses model calibration and how to improve probability estimates from classifiers. It explains that calibration is important when probabilities matter, such as for high-risk applications or combining models. While calibration and accuracy are related, a model can have high accuracy but poor calibration. The document recommends using reliability plots to evaluate calibration by grouping predictions into bins and comparing predicted probabilities to observed fractions of positives. It also introduces calibration methods like Platt scaling and isotonic regression to post-process models and improve probability estimates.

Technology

Model Calibration
Inbar Naor
Data Scientist @ Taboola

A Tale of Two Classifiers
Model A: 90% accuracy, 0.91 confidence in each prediction
Model B: 90% accuracy, 0.99 confidence in each prediction
Which is better?

Calibration:
Post processing a model to improve probability estimate
Among the samples that got a 0.8 probability of being positive,
we expect 80% to be positive.

Intuition
A model is perfectly calibrated if,
for any p, a prediction of a class
with confidence p is correct 100*p
percent of the time.
Probability
FractionofPositives

Agenda
● Why is that important?
● How do we know if our model is calibrated?
● Calibration of different models
● Calibration methods:
○ Platt Scaling
○ Isotonic Regression

Why is calibration important?
Calibration is important only if probabilities are important.
● High risk applications
● Combining with other probability models, thresholding
● Improving our models:
○ Mistakes with high probabilities
○ True labels with low probabilities

Calibration != accuracy
● Well calibrated model can have low accuracy
○ Example: random coin
● Models can have high accuracy and bad calibration

How Do We know If Our
Model Is Calibrated?

Reliability Plots
1. Group predictions into bins
2. Fraction of positives per bin
3. Confidence per bin:

What If I want a Number?
Expected Calibration Error Maximum Calibration Error
Use approximation Use approximation

Calibration Of Different MOdels
Predicting Good Probabilities with Supervised Learning,
Niculescu-Mizil & Caruana

Boosted Trees
● Sigmoidal shape reliability plots
● Values pushed away from 0-1

Calibration OF DiffereNt ClassifiersBoostedTreesSVM
Neural
Network*
Logistic
Regression
* before 2006

Platt Scaling
● New data set
● Fit logistic regression on the output of the model
from sklearn.linear_model import LogisticRegression as LR
lr = LR()
lr.fit(p_holdout, y_holdout)
calibrated_p = lr.predict_proba(p_test)[:, 1]

Isotonic Regression
Learn a monotonic piecewise constant function f to transform
uncalibrated outputs.
from sklearn.isotonic import IsotonicRegression as IR
ir = IR()
ir.fit(p_holdout, y_holdout)
calibrated_p = ir.transform(p_test)

SuMMary
● Calibration is important if you care about probabilities
● Reliability plots
● Different methods for calibration

Is Logistic Regression Calibrated??
Hosmer-Lemeshow (HL) goodnessof-fit statistic for logistic
regression (LR) models
https://statisticalhorizons.com/hosmer-lemeshow
https://statisticalhorizons.com/alternatives-to-the-hosmer-l
emeshow-test

Probabilistic classification
Model output: probabilities over labels.
Calibrated model: different definitions
among the samples that got 0.8 to be 1, we expect 80% to be
1. (HL guarantees)
If we get a 0.8 probability to be 1 we want 80% of similar
examples to be 1.

What's hot

Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn

Artificial Neural Networks - ANNMohamed Talaat

AR modelNaveen Kumar

Logistic regressionYashwantGahlot1

Machine Learning Explanations: LIME framework Deep Learning Italia

Feature selectionDong Guo

Test for equal variancesJohn Smith

Predicting the e-commerce churnLviv Data Science Summer School

Assessing Model Performance - Beginner's GuideMegan Verbakel

Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Mohammed Bennamoun

Regularization in deep learningKien Le

Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Edureka!

An overview of Bayesian testingChristian Robert

application of correlationsudhanyavinod

Different kind of distance and Statistical DistanceKhulna University

ARIMA Venkata Reddy Konasani

Confusion Matrix ExplainedStockholm University

Seasonal ARIMAJoud Khattab

Matrix factorizationLuis Serrano

Logistic regressionPakistan Gum Industries Pvt. Ltd

What's hot (20)

Linear Regression Analysis | Linear Regression in Python | Machine Learning A...

Artificial Neural Networks - ANN

AR model

Logistic regression

Machine Learning Explanations: LIME framework

Feature selection

Test for equal variances

Predicting the e-commerce churn

Assessing Model Performance - Beginner's Guide

Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...

Regularization in deep learning

Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...

An overview of Bayesian testing

application of correlation

Different kind of distance and Statistical Distance

ARIMA

Confusion Matrix Explained

Seasonal ARIMA

Matrix factorization

Logistic regression

Similar to Model Calibration: Improving Probability Estimates

Learning machine learning with YellowbrickRebecca Bilbro

Py data19 finalMaria Navarro Jiménez

House Sale Price Predictionsriram30691

Statistical Learning on Credit DataFiras Obeid

Model selectionAnimesh Kumar

PPT - Deep and Confident Prediction For Time Series at UberJisang Yoon

Machine Learning Application: Credit Scoringeurosigdoc acm

Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava

ngboost.pptxHadrian7

Uncertainties in Deep LearningSilversparro Technologies

Uber Data Analysis - SAS ProjectKushal417

Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan

Machine Learning (Decisoion Trees)Makerere Unversity School of Public Health, Victoria University

4. Classification.pdfJyoti Yadav

Satisfaction and loyaltyTheDataNation

Predictive model based on Supervised MLUmeshchandraYadav5

Artificial Intelligence Course: Linear models ananth

German credit score shivaram prakashShivaram Prakash

Machine Learning Algorithm - KNNKush Kulshrestha

Statistical Learning and Model Selection (1).pptxrajalakshmi5921

Similar to Model Calibration: Improving Probability Estimates (20)

Learning machine learning with Yellowbrick

Py data19 final

House Sale Price Prediction

Statistical Learning on Credit Data

Model selection

PPT - Deep and Confident Prediction For Time Series at Uber

Machine Learning Application: Credit Scoring

Predict Backorder on a supply chain data for an Organization

ngboost.pptx

Uncertainties in Deep Learning

Uber Data Analysis - SAS Project

Robustness Metrics for ML Models based on Deep Learning Methods

Machine Learning (Decisoion Trees)

4. Classification.pdf

Satisfaction and loyalty

Predictive model based on Supervised ML

Artificial Intelligence Course: Linear models

German credit score shivaram prakash

Machine Learning Algorithm - KNN

Statistical Learning and Model Selection (1).pptx

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

A Year of the Servo Reboot: Where Are We Now?Igalia

Real Time Object Detection Using Open CVKhem

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts

Driving Behavioral Change for Information Management through Data-Driven Gree...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Presentation on how to chat with PDF using ChatGPT code interpreter

A Year of the Servo Reboot: Where Are We Now?

Real Time Object Detection Using Open CV

What Are The Drone Anti-jamming Systems Technology?

A Domino Admins Adventures (Engage 2024)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Data Cloud, More than a CDP by Matt Robison

CNv6 Instructor Chapter 6 Quality of Service

Finology Group – Insurtech Innovation Award 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

🐬 The future of MySQL is Postgres 🐘

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Scaling API-first – The story of a global engineering organization

Breaking the Kubernetes Kill Chain: Host Path Mount

Boost PC performance: How more available memory can improve productivity

Model Calibration: Improving Probability Estimates

1. Model Calibration Inbar Naor Data Scientist @ Taboola

2. A Tale of Two Classifiers Model A: 90% accuracy, 0.91 confidence in each prediction Model B: 90% accuracy, 0.99 confidence in each prediction Which is better?

3. Calibration: Post processing a model to improve probability estimate Among the samples that got a 0.8 probability of being positive, we expect 80% to be positive.

4. Intuition A model is perfectly calibrated if, for any p, a prediction of a class with confidence p is correct 100*p percent of the time. Probability FractionofPositives

5. Agenda ● Why is that important? ● How do we know if our model is calibrated? ● Calibration of different models ● Calibration methods: ○ Platt Scaling ○ Isotonic Regression

6. Why is calibration important? Calibration is important only if probabilities are important. ● High risk applications ● Combining with other probability models, thresholding ● Improving our models: ○ Mistakes with high probabilities ○ True labels with low probabilities

7. Calibration != accuracy ● Well calibrated model can have low accuracy ○ Example: random coin ● Models can have high accuracy and bad calibration

8. How Do We know If Our Model Is Calibrated?

9. Intuition A model is perfectly calibrated if, for any p, a prediction of a class with confidence p is correct 100*p percent of the time. Probability FractionofPositives

10. Reliability Plots 1. Group predictions into bins 2. Fraction of positives per bin 3. Confidence per bin:

11. Reliability Plots 1. Group predictions into bins 2. Fraction of positives per bin 3. Confidence per bin:

12. What If I want a Number? Expected Calibration Error Maximum Calibration Error Use approximation Use approximation

13. Calibration Of Different MOdels Predicting Good Probabilities with Supervised Learning, Niculescu-Mizil & Caruana

14. Boosted Trees ● Sigmoidal shape reliability plots ● Values pushed away from 0-1

15. Calibration OF DiffereNt ClassifiersBoostedTreesSVM Neural Network* Logistic Regression * before 2006

16.

17. Platt Scaling ● New data set ● Fit logistic regression on the output of the model from sklearn.linear_model import LogisticRegression as LR lr = LR() lr.fit(p_holdout, y_holdout) calibrated_p = lr.predict_proba(p_test)[:, 1]

18. Isotonic Regression Learn a monotonic piecewise constant function f to transform uncalibrated outputs. from sklearn.isotonic import IsotonicRegression as IR ir = IR() ir.fit(p_holdout, y_holdout) calibrated_p = ir.transform(p_test)

19. SuMMary ● Calibration is important if you care about probabilities ● Reliability plots ● Different methods for calibration

20. Questions?

21. Is Logistic Regression Calibrated?? Hosmer-Lemeshow (HL) goodnessof-fit statistic for logistic regression (LR) models https://statisticalhorizons.com/hosmer-lemeshow https://statisticalhorizons.com/alternatives-to-the-hosmer-l emeshow-test

22. Probabilistic classification Model output: probabilities over labels. Calibrated model: different definitions among the samples that got 0.8 to be 1, we expect 80% to be 1. (HL guarantees) If we get a 0.8 probability to be 1 we want 80% of similar examples to be 1.

Model Calibration: Improving Probability Estimates

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Model Calibration: Improving Probability Estimates

Similar to Model Calibration: Improving Probability Estimates (20)

Recently uploaded

Recently uploaded (20)

Model Calibration: Improving Probability Estimates