AUC: at what cost(s)?

12 July 2016
AUC - at what cost(s)?
Evaluating and comparing machine learning models
Alex Korbonits, Data Scientist

2
Introduction
About Remitly and Me

3
Introduction
• Model selection: data and algorithms aren’t the only knobs
• Problems with typical model selection strategies
• Review of model evaluation metrics
• Augmenting these metrics to address practical problems
• Why this matters to Remitly
Agenda

You may think in order to solve all of your machine
learning problems, you only need to have…

... but you need to think carefully about model selection.

8
Why is model selection important?
• Big data is not enough:
• Not everyone has it. Or maybe the big data you have isn’t
useful.
• Fancy algorithms are not enough:
• No Free Lunch Theorem (Wolpert, 1997). There isn’t a ”one-
size-fits-all” model class. Deep learning not a silver bullet.
• Inadequate coverage in the literature:
• This is a practical problem, it’s hard, and it matters.
• Problems such as class imbalance and inclusion of economic
constraints.
Model Selection

9
ML + Economics
• Loss matrices inadequate:
• Penalty of misclassification may vary per instance.
• E.g., size of transaction. Not all misclassifications result in
same penalty even if misclassified from same class.
• Indifference curves good for post-training selection:
• We can compare tradeoffs of selecting different
classification thresholds.
• EXTREMELY IMPORTANT when costs of false positives
and false negatives are very, very different.
Economics: including costs/revenue into model selection

10
Classic machine learning
• Test positive and test negative (prediction outcomes)
• Condition positive and condition negative (actual values)
• True positive: condition positive and test positive
• True negative: condition negative and test negative
• False positive (Type I error): condition negative and test
positive
• False negative (Type II error): condition positive and test
negative
Confusion matrix

11
Radar in WWII
• Classic approach measuring area under the receiver
operating characteristic (ROC)
• Pros:
• Standard in the literature
• Descriptive of predictive power across thresholds
• Cons:
• Ignores class imbalances
• Ignores constraints such as costs of FP vs. FN
My curve is better than your curve

12
Metrics affected by class imbalance
• X axis is recall == tpr == TP / (TP + FN)
• I.e., of the total positive instances, what proportion did
our model classify as positive?
• Y axis is precision == TP / (TP + FP).
• I.e., of the positive classifications, what proportion were
positive instances?
• Class imbalance affects this: WLOG, class imbalance
shifts
curves down (for smaller positive classes).
• There exists a one-to-one mapping from ROC space to PR
space. But optimizing ROC AUC != optimizing PR AUC.
Precision and Recall curves

13
Inclusion of costs in ROC Space
• Indifference Curve:
• Level set that defines, e.g., where your classifier implies
business profitability vs. loss.
• Defined via constraint optimization (e.g., costs of
quadrants in your confusion matrix).
• Points above this curve satisfy the constraint and are
good. Points below == bad.
• Why we care:
• Orange model doesn’t have a threshold that crosses
your indifference curve, even if its AUC is larger. No
threshold for orange model can satisfy your constraint.
Cost curves in ROC Space

14
How do I pick the right threshold?
• Threshold choices:
• Find point with maximum distance from indifference
curve.
• Of your threshold choices, this point maximizes your
utility.
• Technically you’re at a higher indifference curve 
• Other things to consider:
• Changes in your constraints – costs changes, therefore
your indifference curve can change.
• Update models and thresholds subject to such changes.
Picking the right classifier threshold

15
Citing our sources
Bibliography
Davis, Jesse, and Mark Goadrich. "The relationship between Precision-Recall and ROC curves." In Proceedings of the 23rd international conference on Machine
learning, pp. 233-240. ACM, 2006.
Raghavan, V., Bollmann, P., & Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst.,
7, 205–229
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. Proceeding of the 15th International
Conference on Machine Learning (pp. 445–453). Morgan Kaufmann, San Francisco, CA
Drummond, C., & Holte, R. (2000). Explicitly representing expected cost: an alternative to ROC representation. Proceeding of Knowledge Discovery and Datamining
(pp. 198–207).
Drummond, C., & Holte, R. C. (2004). What ROC curves can’t do (and cost curves can). ROCAI (pp. 19–26)
Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159
Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters27, no. 8 (2006): 861-874
Metz, Charles E. "Basic principles of ROC analysis." In Seminars in nuclear medicine, vol. 8, no. 4, pp. 283-298. WB Saunders, 1978
Saito, Takaya, and Marc Rehmsmeier. "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced
datasets." PloS one 10, no. 3 (2015): e0118432
"Information Theoretic Metrics for Multi-class Predictor Evaluation", Sam Steingold, 2016, accessed 23 June 2016, http://www.slideshare.net/SessionsEvents/sam-
steingold-lead-data-scientist-magnetic-media-online-at-mlconf-sea-5201
“Machine Learning Meets Economics”, Datacratic 2016, accessed 23 June 2016, http://blog.mldb.ai/blog/posts/2016/01/ml-meets-economics/

16
What we talked about
• Model selection: data and algorithms aren’t the only knobs
• Problems with typical model selection strategies
• Review of model evaluation metrics
• Augmenting these metrics to address practical problems
• Why this matters to Remitly
Summary

17
Remitly’s Data Science team uses ML for a variety of purposes.
ML applications are core to our business – therefore our business must be core to our ML applications.
Machine learning at Remitly

www.remitly.com/careers
We’re hiring!
alex@remitly.com

AUC: at what cost(s)?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to AUC: at what cost(s)?

Similar to AUC: at what cost(s)? (20)

Recently uploaded

Recently uploaded (20)

AUC: at what cost(s)?

Editor's Notes