SlideShare a Scribd company logo
1 of 18
12 July 2016
AUC - at what cost(s)?
Evaluating and comparing machine learning models
Alex Korbonits, Data Scientist
2
Introduction
About Remitly and Me
3
Introduction
• Model selection: data and algorithms aren’t the only knobs
• Problems with typical model selection strategies
• Review of model evaluation metrics
• Augmenting these metrics to address practical problems
• Why this matters to Remitly
Agenda
You may think in order to solve all of your machine
learning problems, you only need to have…
... but you need to think carefully about model selection.
8
Why is model selection important?
• Big data is not enough:
• Not everyone has it. Or maybe the big data you have isn’t
useful.
• Fancy algorithms are not enough:
• No Free Lunch Theorem (Wolpert, 1997). There isn’t a ”one-
size-fits-all” model class. Deep learning not a silver bullet.
• Inadequate coverage in the literature:
• This is a practical problem, it’s hard, and it matters.
• Problems such as class imbalance and inclusion of economic
constraints.
Model Selection
9
ML + Economics
• Loss matrices inadequate:
• Penalty of misclassification may vary per instance.
• E.g., size of transaction. Not all misclassifications result in
same penalty even if misclassified from same class.
• Indifference curves good for post-training selection:
• We can compare tradeoffs of selecting different
classification thresholds.
• EXTREMELY IMPORTANT when costs of false positives
and false negatives are very, very different.
Economics: including costs/revenue into model selection
10
Classic machine learning
• Test positive and test negative (prediction outcomes)
• Condition positive and condition negative (actual values)
• True positive: condition positive and test positive
• True negative: condition negative and test negative
• False positive (Type I error): condition negative and test
positive
• False negative (Type II error): condition positive and test
negative
Confusion matrix
11
Radar in WWII
• Classic approach measuring area under the receiver
operating characteristic (ROC)
• Pros:
• Standard in the literature
• Descriptive of predictive power across thresholds
• Cons:
• Ignores class imbalances
• Ignores constraints such as costs of FP vs. FN
My curve is better than your curve
12
Metrics affected by class imbalance
• X axis is recall == tpr == TP / (TP + FN)
• I.e., of the total positive instances, what proportion did
our model classify as positive?
• Y axis is precision == TP / (TP + FP).
• I.e., of the positive classifications, what proportion were
positive instances?
• Class imbalance affects this: WLOG, class imbalance
shifts
curves down (for smaller positive classes).
• There exists a one-to-one mapping from ROC space to PR
space. But optimizing ROC AUC != optimizing PR AUC.
Precision and Recall curves
13
Inclusion of costs in ROC Space
• Indifference Curve:
• Level set that defines, e.g., where your classifier implies
business profitability vs. loss.
• Defined via constraint optimization (e.g., costs of
quadrants in your confusion matrix).
• Points above this curve satisfy the constraint and are
good. Points below == bad.
• Why we care:
• Orange model doesn’t have a threshold that crosses
your indifference curve, even if its AUC is larger. No
threshold for orange model can satisfy your constraint.
Cost curves in ROC Space
14
How do I pick the right threshold?
• Threshold choices:
• Find point with maximum distance from indifference
curve.
• Of your threshold choices, this point maximizes your
utility.
• Technically you’re at a higher indifference curve 
• Other things to consider:
• Changes in your constraints – costs changes, therefore
your indifference curve can change.
• Update models and thresholds subject to such changes.
Picking the right classifier threshold
15
Citing our sources
Bibliography
Davis, Jesse, and Mark Goadrich. "The relationship between Precision-Recall and ROC curves." In Proceedings of the 23rd international conference on Machine
learning, pp. 233-240. ACM, 2006.
Raghavan, V., Bollmann, P., & Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst.,
7, 205–229
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. Proceeding of the 15th International
Conference on Machine Learning (pp. 445–453). Morgan Kaufmann, San Francisco, CA
Drummond, C., & Holte, R. (2000). Explicitly representing expected cost: an alternative to ROC representation. Proceeding of Knowledge Discovery and Datamining
(pp. 198–207).
Drummond, C., & Holte, R. C. (2004). What ROC curves can’t do (and cost curves can). ROCAI (pp. 19–26)
Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159
Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters27, no. 8 (2006): 861-874
Metz, Charles E. "Basic principles of ROC analysis." In Seminars in nuclear medicine, vol. 8, no. 4, pp. 283-298. WB Saunders, 1978
Saito, Takaya, and Marc Rehmsmeier. "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced
datasets." PloS one 10, no. 3 (2015): e0118432
"Information Theoretic Metrics for Multi-class Predictor Evaluation", Sam Steingold, 2016, accessed 23 June 2016, http://www.slideshare.net/SessionsEvents/sam-
steingold-lead-data-scientist-magnetic-media-online-at-mlconf-sea-5201
“Machine Learning Meets Economics”, Datacratic 2016, accessed 23 June 2016, http://blog.mldb.ai/blog/posts/2016/01/ml-meets-economics/
16
What we talked about
• Model selection: data and algorithms aren’t the only knobs
• Problems with typical model selection strategies
• Review of model evaluation metrics
• Augmenting these metrics to address practical problems
• Why this matters to Remitly
Summary
17
Remitly’s Data Science team uses ML for a variety of purposes.
ML applications are core to our business – therefore our business must be core to our ML applications.
Machine learning at Remitly
www.remitly.com/careers
We’re hiring!
alex@remitly.com

More Related Content

What's hot

Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Sunny Mervyne Baa
 
System modeling and simulation full notes by sushma shetty (www.vtulife.com)
System modeling and simulation full notes by sushma shetty (www.vtulife.com)System modeling and simulation full notes by sushma shetty (www.vtulife.com)
System modeling and simulation full notes by sushma shetty (www.vtulife.com)Vivek Maurya
 
02 20110314-simulation
02 20110314-simulation02 20110314-simulation
02 20110314-simulationSaad Gabr
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
Operation research ppt chapter one
Operation research ppt   chapter oneOperation research ppt   chapter one
Operation research ppt chapter onemitku assefa
 
Computer simulation technique the definitive introduction - harry perros
Computer simulation technique   the definitive introduction - harry perrosComputer simulation technique   the definitive introduction - harry perros
Computer simulation technique the definitive introduction - harry perrosJesmin Rahaman
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Hayim Makabee
 
Introduction to simulation modeling
Introduction to simulation modelingIntroduction to simulation modeling
Introduction to simulation modelingbhupendra kumar
 
Machine learning
Machine learningMachine learning
Machine learningdeepakbagam
 
operation research notes
operation research notesoperation research notes
operation research notesRenu Thakur
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence ApproachesJincy Nelson
 
Discrete And Continuous Simulation
Discrete And Continuous SimulationDiscrete And Continuous Simulation
Discrete And Continuous SimulationNguyen Chien
 
Modeling behavioral deontic constraints using UML and OCL
Modeling behavioral deontic constraints using UML and OCLModeling behavioral deontic constraints using UML and OCL
Modeling behavioral deontic constraints using UML and OCLAntonio Vallecillo
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
Assignment oprations research luv
Assignment oprations research luvAssignment oprations research luv
Assignment oprations research luvAshok Sharma
 
The Five Tribes of Machine Learning Explainers
The Five Tribes of Machine Learning ExplainersThe Five Tribes of Machine Learning Explainers
The Five Tribes of Machine Learning ExplainersMichał Łopuszyński
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introductionThe IOT Academy
 

What's hot (20)

Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
 
System modeling and simulation full notes by sushma shetty (www.vtulife.com)
System modeling and simulation full notes by sushma shetty (www.vtulife.com)System modeling and simulation full notes by sushma shetty (www.vtulife.com)
System modeling and simulation full notes by sushma shetty (www.vtulife.com)
 
02 20110314-simulation
02 20110314-simulation02 20110314-simulation
02 20110314-simulation
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
Operation research ppt chapter one
Operation research ppt   chapter oneOperation research ppt   chapter one
Operation research ppt chapter one
 
SIMULATION
SIMULATIONSIMULATION
SIMULATION
 
Computer simulation technique the definitive introduction - harry perros
Computer simulation technique   the definitive introduction - harry perrosComputer simulation technique   the definitive introduction - harry perros
Computer simulation technique the definitive introduction - harry perros
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
 
Introduction to simulation modeling
Introduction to simulation modelingIntroduction to simulation modeling
Introduction to simulation modeling
 
Machine learning
Machine learningMachine learning
Machine learning
 
operation research notes
operation research notesoperation research notes
operation research notes
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 
Discrete And Continuous Simulation
Discrete And Continuous SimulationDiscrete And Continuous Simulation
Discrete And Continuous Simulation
 
Modeling behavioral deontic constraints using UML and OCL
Modeling behavioral deontic constraints using UML and OCLModeling behavioral deontic constraints using UML and OCL
Modeling behavioral deontic constraints using UML and OCL
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Assignment oprations research luv
Assignment oprations research luvAssignment oprations research luv
Assignment oprations research luv
 
The Five Tribes of Machine Learning Explainers
The Five Tribes of Machine Learning ExplainersThe Five Tribes of Machine Learning Explainers
The Five Tribes of Machine Learning Explainers
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introduction
 

Viewers also liked

How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveSamir Haffar
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesKristen Hunter
 
TransactionBasedAnalytics2010
TransactionBasedAnalytics2010TransactionBasedAnalytics2010
TransactionBasedAnalytics2010Vijay Desai
 
Risk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris DelvesRisk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris DelvesKerryanne Wilde
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldJeomoan Kurian
 
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...SSA KPI
 
General Introduction to ROC Curves
General Introduction to ROC CurvesGeneral Introduction to ROC Curves
General Introduction to ROC CurvesAustin Powell
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
Credit Scoring
Credit ScoringCredit Scoring
Credit ScoringMABSIV
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Akanksha Jain
 

Viewers also liked (16)

Roc
RocRoc
Roc
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curve
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC Curves
 
TransactionBasedAnalytics2010
TransactionBasedAnalytics2010TransactionBasedAnalytics2010
TransactionBasedAnalytics2010
 
Risk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris DelvesRisk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris Delves
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data World
 
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
 
General Introduction to ROC Curves
General Introduction to ROC CurvesGeneral Introduction to ROC Curves
General Introduction to ROC Curves
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
Credit Scoring
Credit ScoringCredit Scoring
Credit Scoring
 
Credit scoring
Credit scoringCredit scoring
Credit scoring
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Evaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROCEvaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROC
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1
 

Similar to AUC: at what cost(s)?

Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptxShimaaIbrahim33
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxbelay41
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis VisualizationFEG
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learningSylvain Ferrandiz
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialBilkent University
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
Psychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet PrinterPsychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet PrinterDavid Lee
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agricultureAboul Ella Hassanien
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummiesMichael Winer
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 

Similar to AUC: at what cost(s)? (20)

Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptx
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Psychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet PrinterPsychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet Printer
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agriculture
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Recently uploaded

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

AUC: at what cost(s)?

  • 1. 12 July 2016 AUC - at what cost(s)? Evaluating and comparing machine learning models Alex Korbonits, Data Scientist
  • 3. 3 Introduction • Model selection: data and algorithms aren’t the only knobs • Problems with typical model selection strategies • Review of model evaluation metrics • Augmenting these metrics to address practical problems • Why this matters to Remitly Agenda
  • 4. You may think in order to solve all of your machine learning problems, you only need to have…
  • 5.
  • 6.
  • 7. ... but you need to think carefully about model selection.
  • 8. 8 Why is model selection important? • Big data is not enough: • Not everyone has it. Or maybe the big data you have isn’t useful. • Fancy algorithms are not enough: • No Free Lunch Theorem (Wolpert, 1997). There isn’t a ”one- size-fits-all” model class. Deep learning not a silver bullet. • Inadequate coverage in the literature: • This is a practical problem, it’s hard, and it matters. • Problems such as class imbalance and inclusion of economic constraints. Model Selection
  • 9. 9 ML + Economics • Loss matrices inadequate: • Penalty of misclassification may vary per instance. • E.g., size of transaction. Not all misclassifications result in same penalty even if misclassified from same class. • Indifference curves good for post-training selection: • We can compare tradeoffs of selecting different classification thresholds. • EXTREMELY IMPORTANT when costs of false positives and false negatives are very, very different. Economics: including costs/revenue into model selection
  • 10. 10 Classic machine learning • Test positive and test negative (prediction outcomes) • Condition positive and condition negative (actual values) • True positive: condition positive and test positive • True negative: condition negative and test negative • False positive (Type I error): condition negative and test positive • False negative (Type II error): condition positive and test negative Confusion matrix
  • 11. 11 Radar in WWII • Classic approach measuring area under the receiver operating characteristic (ROC) • Pros: • Standard in the literature • Descriptive of predictive power across thresholds • Cons: • Ignores class imbalances • Ignores constraints such as costs of FP vs. FN My curve is better than your curve
  • 12. 12 Metrics affected by class imbalance • X axis is recall == tpr == TP / (TP + FN) • I.e., of the total positive instances, what proportion did our model classify as positive? • Y axis is precision == TP / (TP + FP). • I.e., of the positive classifications, what proportion were positive instances? • Class imbalance affects this: WLOG, class imbalance shifts curves down (for smaller positive classes). • There exists a one-to-one mapping from ROC space to PR space. But optimizing ROC AUC != optimizing PR AUC. Precision and Recall curves
  • 13. 13 Inclusion of costs in ROC Space • Indifference Curve: • Level set that defines, e.g., where your classifier implies business profitability vs. loss. • Defined via constraint optimization (e.g., costs of quadrants in your confusion matrix). • Points above this curve satisfy the constraint and are good. Points below == bad. • Why we care: • Orange model doesn’t have a threshold that crosses your indifference curve, even if its AUC is larger. No threshold for orange model can satisfy your constraint. Cost curves in ROC Space
  • 14. 14 How do I pick the right threshold? • Threshold choices: • Find point with maximum distance from indifference curve. • Of your threshold choices, this point maximizes your utility. • Technically you’re at a higher indifference curve  • Other things to consider: • Changes in your constraints – costs changes, therefore your indifference curve can change. • Update models and thresholds subject to such changes. Picking the right classifier threshold
  • 15. 15 Citing our sources Bibliography Davis, Jesse, and Mark Goadrich. "The relationship between Precision-Recall and ROC curves." In Proceedings of the 23rd international conference on Machine learning, pp. 233-240. ACM, 2006. Raghavan, V., Bollmann, P., & Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7, 205–229 Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. Proceeding of the 15th International Conference on Machine Learning (pp. 445–453). Morgan Kaufmann, San Francisco, CA Drummond, C., & Holte, R. (2000). Explicitly representing expected cost: an alternative to ROC representation. Proceeding of Knowledge Discovery and Datamining (pp. 198–207). Drummond, C., & Holte, R. C. (2004). What ROC curves can’t do (and cost curves can). ROCAI (pp. 19–26) Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159 Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters27, no. 8 (2006): 861-874 Metz, Charles E. "Basic principles of ROC analysis." In Seminars in nuclear medicine, vol. 8, no. 4, pp. 283-298. WB Saunders, 1978 Saito, Takaya, and Marc Rehmsmeier. "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets." PloS one 10, no. 3 (2015): e0118432 "Information Theoretic Metrics for Multi-class Predictor Evaluation", Sam Steingold, 2016, accessed 23 June 2016, http://www.slideshare.net/SessionsEvents/sam- steingold-lead-data-scientist-magnetic-media-online-at-mlconf-sea-5201 “Machine Learning Meets Economics”, Datacratic 2016, accessed 23 June 2016, http://blog.mldb.ai/blog/posts/2016/01/ml-meets-economics/
  • 16. 16 What we talked about • Model selection: data and algorithms aren’t the only knobs • Problems with typical model selection strategies • Review of model evaluation metrics • Augmenting these metrics to address practical problems • Why this matters to Remitly Summary
  • 17. 17 Remitly’s Data Science team uses ML for a variety of purposes. ML applications are core to our business – therefore our business must be core to our ML applications. Machine learning at Remitly

Editor's Notes

  1. Hi everyone. My name is Alex Korbonits, and I am a data scientist at Remitly. This talk is broadly about evaluating and comparing machine learning models.
  2. Before we dive in, here’s a little bit about Remitly and me. Remitly was founded in 2011 out to forever change the way people send money to their loved ones. Worldwide, remittances represent over 600 billion dollars annually, roughly 4x the amount of foreign aid. We’re now the largest independent digital remittance company in the U.S. We’re sending nearly 2 billion dollars annually and growing quickly Our CEO, Matt Oppenheimer, was just named one of Ernst and Young’s 2016 Entrepreneurs of the Year I'm Remitly's first data scientist, and our team is growing. Right now my principal focus is FRAUD CLASSIFICATION Previously, I was a data scientist at startup called Nuiku, focusing on NLP.
  3. Model selection is crucial for delivering successful data science projects in industry. The inclusion of economic constraints and class imbalance issues into this process is often overlooked, for example, if you’re simply maximizing area under the ROC curve. Industrial settings require thinking beyond status quo model evaluation metrics: today we’ll consider tying model selection to business costs and impact. That makes sense, and dollars and cents. For us, w.r.t. fraud classification, there is a real penalty of being incorrect. We need to address the economic impact of model selection head-on.
  4. So, you may think that in order to solve all of your machine learning problems, you only need to have…
  5. BIG DATA Or maybe you think all of your problems will be solved with…
  6. DEEP LEARNING AND NEURAL NETWORKS Even the TV show Silicon Valley mentioned neural networks and machine learning in several episodes this season. Please stop fanning the flames of AI hype before another AI winter sets in. THANKS.
  7. It is not the case that BIG DATA or FANCY ALGORITHMS can solve all of your machine learning problems! How do you evaluate YOUR model or compare models?
  8. Today we’re just going to focus on model evaluation in a supervised classification setting. The No Free Lunch Theorem tells us there isn’t a one-size-fits-all model class. So how do we do model selection? What do we need to incorporate that other approaches don't? It’s not just about cross-validation, hyperparameter tuning, etc. We want to tie models into our business objectives.
  9. You may have a problem where the penalty of misclassification varies PER INSTANCE. Loss matrices – weighting training misclassifications by class – won’t work for us here since the penalty of misclassifying one transaction worth $1,000 IS VERY DIFFERENT than misclassifying one transaction worth $100. In this talk we do not explore weighting individual points differently during training – which only works for some models -- nor do we explore resampling methods. In economics, we have budget constraints and utility functions, i.e., constraint optimization and Lagrangians. Oh so many Lagrangians. Rational individuals maximize their utility subject to their budget constraints. Level sets of their utility functions represent curves of equivalently achievable utility. At Remitly, we have transactions, revenue associated with completing them, costs of reviewing them, and costs of losing money due to fraud and chargebacks. Like a rational individual, as a price-taking firm in a competitive industry, we want to maximize our own utility subject to our constraints.
  10. Models that have to predict a propensity score – such as logistic regression -- have tradeoffs. It’s not really one classifier per se. It’s a continuum of classifiers as you vary your classification threshold from 0 to 1. Each threshold represents one confusion matrix. Selecting the right model will give us our optimal confusion matrix. Fraud is extremely expensive if it occurs, and also painful for customers to be put into review too easily, so you do care a lot, and differently, about Type I and Type II errors.
  11. The receiver operating characteristic, or ROC curve, was first developed during World War II for detecting enemy objects in battlefields. This curve is useful because it offers a description of the predictive power of a model or set of classifiers across different thresholds, so it gives an indication of the kinds of tradeoffs you can expect to make by choosing a particular threshold.
  12. The precision and recall curve is another popular and important metric. Precision and recall are affected by class imbalance, unlike ROC! Story for another time. PR is just as useful for comparing models as ROC but there are some important differences, which we won't go into here...
  13. Now let’s talk about costs. Here’s an example in ROC space. What’s an indifference curve? It’s called “indifference” because all points along this line are equivalent – it’s a level set of tradeoffs in ROC space. Here we have two curves for two models, one whose area under the curve is greater than the other. The green classifier with WORSE area under the curve satisfies our constraint. This model, for thresholds right of intersecting our indifference curve, is economically viable for us. We can make a profit. Success! Each quadrant of your confusion matrix may have different costs. Your costs will define the slope and intercept of this curve in ROC and PR space. Note that this need not be linear. You need to have a model with a set of thresholds that crosses this curve for your model to make business sense to put into production. Incorporating business sense into our model selection helps us choose between these two models. In isolation, the model with higher AUC seems more attractive, but when considering additional constraints, we see that in fact the model with lower AUC is more attractive.
  14. Back to PR space. Now that we have a model that is economically viable, how do we choose a threshold? One way you would want to pick a threshold: find the maximum point in the direction perpendicular to the indifference curve that your classifier can achieve. This is actually at a higher indifference curve, specifically, your maximally achievable level set. Here we have two models that satisfy our constraints. It looks like, given our indifference curve in this example, your curve actually wins out in the bottom right-hand corner. Even though the area under the PR curve is significantly greater for my curve, your curve has a set of points farther from our indifference curve and thus picking a threshold on your curve to use for classification will be better for our task. I.e., the economic constraints weight the cost of false negatives so heavily that extremely high recall is required for viability. Remember that your constraints may be non-linear and may change with time. Be sure to re-evaluate your choices in thresholds when business logic changes. You may miss out on some big wins.
  15. In summary, model selection is IMPORTANT. Maximizing area under the ROC curve in an industrial setting may be inadequate. For us, there is a real penalty of being incorrect. We HAVE to incorporate costs into model selection. We are just getting started. We are not done with this analysis. We're doing post-training analysis with costs and different model metrics such as ROC and PR. We’re looking into incorporating business objectives into the loss functions of our learners during training.
  16. What does machine learning at Remitly look like? Understanding: Fraud classification Anomaly detection Customer behavior Market forces
  17. We're hiring! Email me at alex@remitly.com. That’s all, folks! THANKS