SlideShare a Scribd company logo
1 of 22
Download to read offline
Model Calibration
Inbar Naor
Data Scientist @ Taboola
A Tale of Two Classifiers
Model A: 90% accuracy, 0.91 confidence in each prediction
Model B: 90% accuracy, 0.99 confidence in each prediction
Which is better?
Calibration:
Post processing a model to improve probability estimate
Among the samples that got a 0.8 probability of being positive,
we expect 80% to be positive.
Intuition
A model is perfectly calibrated if,
for any p, a prediction of a class
with confidence p is correct 100*p
percent of the time.
Probability
FractionofPositives
Agenda
● Why is that important?
● How do we know if our model is calibrated?
● Calibration of different models
● Calibration methods:
○ Platt Scaling
○ Isotonic Regression
Why is calibration important?
Calibration is important only if probabilities are important.
● High risk applications
● Combining with other probability models, thresholding
● Improving our models:
○ Mistakes with high probabilities
○ True labels with low probabilities
Calibration != accuracy
● Well calibrated model can have low accuracy
○ Example: random coin
● Models can have high accuracy and bad calibration
How Do We know If Our
Model Is Calibrated?
Intuition
A model is perfectly calibrated if,
for any p, a prediction of a class
with confidence p is correct 100*p
percent of the time.
Probability
FractionofPositives
Reliability Plots
1. Group predictions into bins
2. Fraction of positives per bin
3. Confidence per bin:
Reliability Plots
1. Group predictions into bins
2. Fraction of positives per bin
3. Confidence per bin:
What If I want a Number?
Expected Calibration Error Maximum Calibration Error
Use approximation Use approximation
Calibration Of Different MOdels
Predicting Good Probabilities with Supervised Learning,
Niculescu-Mizil & Caruana
Boosted Trees
● Sigmoidal shape reliability plots
● Values pushed away from 0-1
Calibration OF DiffereNt ClassifiersBoostedTreesSVM
Neural
Network*
Logistic
Regression
* before 2006
Platt Scaling
● New data set
● Fit logistic regression on the output of the model
from sklearn.linear_model import LogisticRegression as LR
lr = LR()
lr.fit(p_holdout, y_holdout)
calibrated_p = lr.predict_proba(p_test)[:, 1]
Isotonic Regression
Learn a monotonic piecewise constant function f to transform
uncalibrated outputs.
from sklearn.isotonic import IsotonicRegression as IR
ir = IR()
ir.fit(p_holdout, y_holdout)
calibrated_p = ir.transform(p_test)
SuMMary
● Calibration is important if you care about probabilities
● Reliability plots
● Different methods for calibration
Questions?
Is Logistic Regression Calibrated??
Hosmer-Lemeshow (HL) goodnessof-fit statistic for logistic
regression (LR) models
https://statisticalhorizons.com/hosmer-lemeshow
https://statisticalhorizons.com/alternatives-to-the-hosmer-l
emeshow-test
Probabilistic classification
Model output: probabilities over labels.
Calibrated model: different definitions
among the samples that got 0.8 to be 1, we expect 80% to be
1. (HL guarantees)
If we get a 0.8 probability to be 1 we want 80% of similar
examples to be 1.

More Related Content

What's hot

Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Deep Learning Italia
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Test for equal variances
Test for equal variancesTest for equal variances
Test for equal variancesJohn Smith
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Mohammed Bennamoun
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Edureka!
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
application of correlation
application of correlationapplication of correlation
application of correlationsudhanyavinod
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceKhulna University
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationLuis Serrano
 

What's hot (20)

Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
AR model
AR modelAR model
AR model
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Test for equal variances
Test for equal variancesTest for equal variances
Test for equal variances
 
Predicting the e-commerce churn
Predicting the e-commerce churnPredicting the e-commerce churn
Predicting the e-commerce churn
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
application of correlation
application of correlationapplication of correlation
application of correlation
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical Distance
 
ARIMA
ARIMA ARIMA
ARIMA
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
 
Seasonal ARIMA
Seasonal ARIMASeasonal ARIMA
Seasonal ARIMA
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 

Similar to Model Calibration: Improving Probability Estimates

Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with YellowbrickRebecca Bilbro
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Statistical Learning on Credit Data
Statistical Learning on Credit DataStatistical Learning on Credit Data
Statistical Learning on Credit DataFiras Obeid
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberJisang Yoon
 
Machine Learning Application: Credit Scoring
Machine Learning Application: Credit ScoringMachine Learning Application: Credit Scoring
Machine Learning Application: Credit Scoringeurosigdoc acm
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
Uber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectUber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectKushal417
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan
 
4. Classification.pdf
4. Classification.pdf4. Classification.pdf
4. Classification.pdfJyoti Yadav
 
Satisfaction and loyalty
Satisfaction and loyaltySatisfaction and loyalty
Satisfaction and loyaltyTheDataNation
 
Predictive model based on Supervised ML
Predictive model based on Supervised MLPredictive model based on Supervised ML
Predictive model based on Supervised MLUmeshchandraYadav5
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNNKush Kulshrestha
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 

Similar to Model Calibration: Improving Probability Estimates (20)

Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
 
Py data19 final
Py data19   finalPy data19   final
Py data19 final
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Statistical Learning on Credit Data
Statistical Learning on Credit DataStatistical Learning on Credit Data
Statistical Learning on Credit Data
 
Model selection
Model selectionModel selection
Model selection
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at Uber
 
Machine Learning Application: Credit Scoring
Machine Learning Application: Credit ScoringMachine Learning Application: Credit Scoring
Machine Learning Application: Credit Scoring
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Uncertainties in Deep Learning
Uncertainties in Deep LearningUncertainties in Deep Learning
Uncertainties in Deep Learning
 
Uber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectUber Data Analysis - SAS Project
Uber Data Analysis - SAS Project
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
Machine Learning (Decisoion Trees)
Machine Learning (Decisoion Trees)Machine Learning (Decisoion Trees)
Machine Learning (Decisoion Trees)
 
4. Classification.pdf
4. Classification.pdf4. Classification.pdf
4. Classification.pdf
 
Satisfaction and loyalty
Satisfaction and loyaltySatisfaction and loyalty
Satisfaction and loyalty
 
Predictive model based on Supervised ML
Predictive model based on Supervised MLPredictive model based on Supervised ML
Predictive model based on Supervised ML
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Model Calibration: Improving Probability Estimates

  • 1. Model Calibration Inbar Naor Data Scientist @ Taboola
  • 2. A Tale of Two Classifiers Model A: 90% accuracy, 0.91 confidence in each prediction Model B: 90% accuracy, 0.99 confidence in each prediction Which is better?
  • 3. Calibration: Post processing a model to improve probability estimate Among the samples that got a 0.8 probability of being positive, we expect 80% to be positive.
  • 4. Intuition A model is perfectly calibrated if, for any p, a prediction of a class with confidence p is correct 100*p percent of the time. Probability FractionofPositives
  • 5. Agenda ● Why is that important? ● How do we know if our model is calibrated? ● Calibration of different models ● Calibration methods: ○ Platt Scaling ○ Isotonic Regression
  • 6. Why is calibration important? Calibration is important only if probabilities are important. ● High risk applications ● Combining with other probability models, thresholding ● Improving our models: ○ Mistakes with high probabilities ○ True labels with low probabilities
  • 7. Calibration != accuracy ● Well calibrated model can have low accuracy ○ Example: random coin ● Models can have high accuracy and bad calibration
  • 8. How Do We know If Our Model Is Calibrated?
  • 9. Intuition A model is perfectly calibrated if, for any p, a prediction of a class with confidence p is correct 100*p percent of the time. Probability FractionofPositives
  • 10. Reliability Plots 1. Group predictions into bins 2. Fraction of positives per bin 3. Confidence per bin:
  • 11. Reliability Plots 1. Group predictions into bins 2. Fraction of positives per bin 3. Confidence per bin:
  • 12. What If I want a Number? Expected Calibration Error Maximum Calibration Error Use approximation Use approximation
  • 13. Calibration Of Different MOdels Predicting Good Probabilities with Supervised Learning, Niculescu-Mizil & Caruana
  • 14. Boosted Trees ● Sigmoidal shape reliability plots ● Values pushed away from 0-1
  • 15. Calibration OF DiffereNt ClassifiersBoostedTreesSVM Neural Network* Logistic Regression * before 2006
  • 16.
  • 17. Platt Scaling ● New data set ● Fit logistic regression on the output of the model from sklearn.linear_model import LogisticRegression as LR lr = LR() lr.fit(p_holdout, y_holdout) calibrated_p = lr.predict_proba(p_test)[:, 1]
  • 18. Isotonic Regression Learn a monotonic piecewise constant function f to transform uncalibrated outputs. from sklearn.isotonic import IsotonicRegression as IR ir = IR() ir.fit(p_holdout, y_holdout) calibrated_p = ir.transform(p_test)
  • 19. SuMMary ● Calibration is important if you care about probabilities ● Reliability plots ● Different methods for calibration
  • 21. Is Logistic Regression Calibrated?? Hosmer-Lemeshow (HL) goodnessof-fit statistic for logistic regression (LR) models https://statisticalhorizons.com/hosmer-lemeshow https://statisticalhorizons.com/alternatives-to-the-hosmer-l emeshow-test
  • 22. Probabilistic classification Model output: probabilities over labels. Calibrated model: different definitions among the samples that got 0.8 to be 1, we expect 80% to be 1. (HL guarantees) If we get a 0.8 probability to be 1 we want 80% of similar examples to be 1.