SlideShare a Scribd company logo
Valencian Summer School in Machine Learning
4th edition
September 13-14, 2018
BigML, Inc 2
Evaluations
Proving a Model Works
Poul Petersen
CIO, BigML, Inc
BigML, Inc 3Evaluations
Why Evaluations
• FACT: No model is perfect - they all make mistakes
• Your data has mistakes
• Models are “approximations”
• Today you have seen models that predict:
• Churn: How many people will churn that we didn’t predict?
• Diabetes: How many patients might have diabetes that we
said were fine?
• Home Prices: How accurate are the predicted prices?
• You have also seen several different kinds of models
• Decision Trees / Ensembles / Logistic Regression /
Deepnets
• Which one works the best for your data
BigML, Inc 4Evaluations
Easy Right?
INTL
MIN
INTL
CALLS
INTL
CHARGE
CUST
SERV
CALLS
CHURN
8,7 4 2,35 1 False
11,2 5 3,02 0 False
12,7 6 3,43 4 True
9,1 5 2,46 0 False
11,2 2 3,02 1 False
12,3 5 3,32 3 False
13,1 6 3,54 4 False
5,4 9 1,46 4 True
13,8 4 3,73 1 False
Model Prediction
PREDICT
CHURN
False
True
True
False
False
False
False
False
False
Look for Mistakes!
BigML, Inc 5Evaluations
Evaluations Demo #1
BigML, Inc 6Evaluations
What Just Happened?
• We started with the churn Datasource
• Created a Dataset
• Built a Model to predict churn
• We used the Model to predict churn for each customer in the
Dataset using a Batch Prediction
• Downloaded the Batch Prediction as a CSV and looked for
errors. That is, when the Prediction did not match the known
true value for churn
• The comparison was tedious!
• Examining one line at a time
• Hard to understand - need some metrics!!!
BigML, Inc 7Evaluations
Evaluation Metrics
• Imagine we have a model that can predict a person’s dominant
hand, that is for any individual it predicts left / right
• Define the positive class
• This selection is arbitrary
• It is the class you are interested in!
• The negative class is the “other” class (or others)
• For this example, we choose : left
BigML, Inc 8Evaluations
Evaluation Metrics
• We choose the positive class: left
• True Positive (TP)
• We predicted left and the correct answer was left
• True Negative (TN)
• We predicted right and the correct answer was right
• False Positive (FP)
• Predicted left but the correct answer was right
• False Negative (FN)
• Predict right but the correct answer was left
BigML, Inc 9Evaluations
Evaluation Metrics
True Positive: Correctly predicted the positive class
True Negative: Correctly predicted the negative class
False Positive: Incorrectly predicted the positive class
False Negative: Incorrectly predicted the negative class
Remember…
BigML, Inc 10Evaluations
Accuracy
TP + TN
Total
• “Percentage correct” - like an exam
• If Accuracy = 1 then no mistakes
• If Accuracy = 0 then all mistakes
• Intuitive but not always useful
• Watch out for unbalanced classes!
• Ex: 90% of people are right-handed and 10% are left
• A silly model which always predicts right handed is
90% accurate
BigML, Inc 11Evaluations
Accuracy
Classified as
Left Handed
Classified as
Right Handed
TP = 0
FP = 0
TN = 7
FN = 3
= Left
= RightPositive

Class
Negative

Class TP + TN
Total
= 70%
BigML, Inc 12Evaluations
Precision
TP
TP + FP
• “accuracy” or “purity” of positive class
• How well you did separating the positive class from the
negative class
• If Precision = 1 then no FP.
• You may have missed some left handers, but of the
ones you identified, all are left handed. No mistakes.
• If Precision = 0 then no TP
• None of the left handers you identified are actually left
handed. All mistakes.
BigML, Inc 13Evaluations
Precision
Classified as
Left Handed
Classified as
Right Handed
TP = 2
FP = 2
TN = 5
FN = 1
Positive

Class
Negative

Class
= Left
= Right
TP
TP + FP
= 50%
BigML, Inc 14Evaluations
Recall
TP
TP + FN
• percentage of positive class correctly identified
• A measure of how well you identified all of the positive
class examples
• If Recall = 1 then no FN → All left handers identified
• There may be FP, so precision could be <1
• If Recall = 0 then no TP → No left handers identified
BigML, Inc 15Evaluations
Recall
Classified as
Left Handed
Classified as
Right Handed
TP = 2
FP = 2
TN = 5
FN = 1
Positive

Class
Negative

Class
= Left
= Right
TP
TP + FN
= 66%
BigML, Inc 16Evaluations
f-Measure
2 * Recall * Precision
Recall + Precision
• harmonic mean of Recall & Precision
• If f-measure = 1 then Recall == Precision == 1
• If Precision OR Recall is small then the f-measure is small
BigML, Inc 17Evaluations
f-Measure
Classified as
Fraud
Classified as
Not Fraud
R = 66%
P = 50%
f = 57%
Positive

Class
Negative

Class
= Left
= Right
BigML, Inc 18Evaluations
Phi Coefficient
__________TP*TN_-_FP*FN__________
SQRT[(TP+FP)(TP+FN)(TN+FP)(TN+FN)]
• Returns a value between -1 and 1
• If -1 then predictions are opposite reality
• =0 no correlation between predictions and reality
• =1 then predictions are always correct
BigML, Inc 19Evaluations
Phi Coefficient
Classified as
Fraud
Classified as
Not Fraud
TP = 2
FP = 2
TN = 5
FN = 1
Phi = 0.356
Positive

Class
Negative

Class
= Left
= Right
BigML, Inc 20Evaluations
Evaluations Demo #2
BigML, Inc 21Evaluations
What Just Happened?
• Starting with the Diabetes Source, we created a Dataset and
then a Model.
• Using both the Model and the original Dataset, we created an
Evaluation.
• We reviewed the metrics provided by the Evaluation:
• Confusion Matrix
• Accuracy, Precision, Recall, f-measure and
phi
• This Model seemed to perform really, really well…
Question: Can we trust this model?
BigML, Inc 22Evaluations
Evaluation Danger!
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
BigML, Inc 23Evaluations
“Memorizing” Training Data
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 TRUE
85 26,6 0,351 31 FALSE
183 23,3 0,672 32 TRUE
89 28,1 0,167 21 FALSE
137 43,1 2,288 33 TRUE
116 25,6 0,201 30 FALSE
78 31 0,248 26 TRUE
115 35,3 0,134 29 FALSE
197 30,5 0,158 53 TRUE
Training Evaluating
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 ?
85 26,6 0,351 31 ?
• Exactly the same values!
• Who needs a model?
• What we want to know is how the
model performs with values never
seen at training:
124 22 0,107 46 ?
BigML, Inc 24Evaluations
Evaluation Danger!
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
• If you only have one Dataset, use a train/test split
BigML, Inc 25Evaluations
Train / Test Split
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 TRUE
183 23,3 0,672 32 TRUE
89 28,1 0,167 21 FALSE
78 31 0,248 26 TRUE
115 35,3 0,134 29 FALSE
197 30,5 0,158 53 TRUE
Train Test
plasma
glucose
bmi
diabetes
pedigree
age diabetes
85 26,6 0,351 31 FALSE
137 43,1 2,288 33 TRUE
116 25,6 0,201 30 FALSE
• These instances were never seen
at training time.
• Better evaluation of how the
model will perform with “new” data
BigML, Inc 26Evaluations
Train / Test Split
DATASET
TRAIN SET
TEST SET
PREDICTIONS
METRICS
BigML, Inc 27Evaluations
Evaluation Danger!
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
• If you only have one Dataset, use a train/test split
• Even a train/test split may not be enough!
• Might get a “lucky” split
• Solution is to repeat several times (formally to cross validate)
BigML, Inc 28Evaluations
Evaluations Demo #3
BigML, Inc 29Evaluations
What Just Happened?
• Starting with the Diabetes Dataset we created a train/test split
• We built a Model using the train set and evaluated it with the
test set
• The scores were much worse than before, showing the danger
of evaluating with training data.
• Then we launched several other types of models and used the
evaluation comparison tool to see which model algorithm
performed the best.
Question:
Couldn’t we search for the best Model?
STAY
TUNED
BigML, Inc 30Evaluations
Evaluation
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
• If you only have one Dataset, use a train/test split
• Even a train/test split may not be enough!
• Might get a “lucky” split
• Solution is to repeat several times (formally to cross validate)
• Don’t forget that accuracy can be mis-leading!
• Mostly useless with unbalanced classes (left/right?)
• Use weighting, operating points, other tricks…
BigML, Inc 31Evaluations
Weighting
Instance Rate Payment Outcome Predict Confidence
1 23 % 134 Paid Paid 20 %
2 23 % 134 Paid Paid 25 %
3 23 % 134 Paid Paid 30 %
... ... ... ... ...
1000 23 % 134 Paid Paid 99,5 %
1001 23 % 134 Default Paid 99,4 %
Problem: Default is “more important”,but occurs less often
than Paid
Solution: Weights tell the model to treat instances of a
specific class (in this case Default) with more importance
BigML, Inc 32Evaluations
Operating Points
• The default probability threshold is 50%
• Changing the threshold can change the outcome for a
specific class
Rate Payment …
Actual
Outcome
Probability
PAID
Threshold
@ 50%
Threshold
@ 60%
Threshold
@ 90%
8,4 % $456 … PAID 95 % PAID PAID PAID
9,6 % $134 … PAID 87 % PAID PAID DEFAULT
18 % $937 … DEFAULT 36 % DEFAULT DEFAULT DEFAULT
21 % $35 … PAID 88 % PAID PAID DEFAULT
17,5 % $1.044 … DEFAULT 55 % PAID DEFAULT DEFAULT
BigML, Inc 33Evaluations
Lending Club Dataset
• Peer to Peer lending service
• As an investor, we want a way to
identify loans that are a lower risk
• Fortunately, the data for the outcome
(paid or default) for past loans is
available from Lending Club.
• Using this data, we can build a
model to predict which loans are
good or bad
Instance Rate Payment Outcome
1 8,4 % 456 Paid
2 9,6 % 134 Paid
3 18 % 937 Default
MODEL
NEW LOANS
GOOD / BAD
BigML, Inc 34Evaluations
Evaluations Demo #4
BigML, Inc 35Evaluations
What just happened?
• We split the Lending Club data into training and test Datasets
• We created a Model and Evaluation
• Looking at the Accuracy, we saw that the Model was
performing well but because of unbalanced classes
• The resulting Model did well at predicting good loans
• But bad loans are "more important"
• We tried different weights to increase the Recall of bad loans:
• objective balancing: equal consideration
• class weights: bad = 1000, good = 1
• Finally, we explored the impact of changing the probability
threshold
Wait - What about regressions?
BigML, Inc 36Evaluations
Regression - Fitting a Line
Data Points
Model
BigML, Inc 37Evaluations
Mean Absolute Error
e1
e2
e7
e6
e5
e4
e3
MAE = |e1|+|e2|+ … +|en|
n
BigML, Inc 38Evaluations
Mean Squared Error
e1
e2
e7
e6
e5
e4
e3
MSE = (e1)2
+(e2)2
+ … +(en)2
n
BigML, Inc 39Evaluations
MSE versus MAE
• For both MAE & MSE: Smaller is better, but values are
unbounded
• MSE is always larger than or equal to MAE
BigML, Inc 40Evaluations
R-Squared Error
Data Points
Model
Mean
BigML, Inc 41Evaluations
R-Squared Error
Mean
v1
v2
v3 v4 v5
v7
v6
BigML, Inc 42Evaluations
R-Squared Error
e1
e2
e7
e6
e5
e4
e3
Mean
v1
v2
v3 v4 v5
v7
v6
MSEmodel
MSEmean
RSE = 1 -
BigML, Inc 43Evaluations
R-Squared Error
• RSE: measure of how much better the model is than
always predicting the mean
• < 0 model is worse then mean
• MSEmodel > MSEmean
• = 0 model is no better than the mean
• MSEmodel = MSEmean
• ➞ 1 model fits the data “perfectly”
• MSEmodel = 0 (or MSEmean >> MSEmodel)
MSEmodel
MSEmean
RSE = 1 -
BigML, Inc 44Evaluations
Evaluations Demo #5
BigML, Inc 45Evaluations
What just happened?
• We split the RedFin data into training and test Datasets
• We created a Model and Evaluation
• We examined the Evaluation metrics
Wait - What about Time Series?
BigML, Inc 46Data Transformations
Independent Data
Color Mass PPAP
red 11 pen
green 45 apple
red 53 apple
yellow 0 pen
blue 2 pen
green 422 pineapple
yellow 555 pineapple
blue 7 pen
Discovering patterns:
• Color = “red” Mass < 100
• PPAP = “pineapple” Color
≠ “blue”
• Color = “blue” PPAP =
“pen”
BigML, Inc 47Data Transformations
Independent Data
Color Mass PPAP
green 45 apple
blue 2 pen
green 422 pineapple
blue 7 pen
yellow 0 pen
yellow 9 pineapple
red 555 apple
red 11 pen
Patterns still hold when rows
re-arranged:
• Color = “red” Mass < 100
• PPAP = “pineapple” Color
≠ “blue”
• Color = “blue” PPAP =
“pen”
BigML, Inc 48Data Transformations
Dependent Data
Year Pineapple
Harvest1986 50,74
1987 22,03
1988 50,69
1989 40,38
1990 29,80
1991 9,90
1992 73,93
1993 22,95
1994 139,09
1995 115,17
1996 193,88
1997 175,31
1998 223,41
1999 295,03
2000 450,53
Pineapple Harvest
Tons
0
125
250
375
500
Year
1986 1988 1990 1992 1994 1996 1998 2000
Trend
Error
BigML, Inc 49Data Transformations
Dependent Data
Pineapple Harvest
Tons
0
125
250
375
500
Year
1986 1988 1990 1992 1994 1996 1998 2000
Year Pineapple
Harvest1986 139,09
1987 175,31
1988 9,91
1989 22,95
1990 450,53
1991 73,93
1992 40,38
1993 22,03
1994 295,03
1995 50,74
1996 29,8
1997 223,41
1998 115,17
1999 193,88
2000 50,69
Rearranging Disrupts Patterns
BigML, Inc 50Evaluations
Random Train / Test Split
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 TRUE
183 23,3 0,672 32 TRUE
89 28,1 0,167 21 FALSE
78 31 0,248 26 TRUE
115 35,3 0,134 29 FALSE
197 30,5 0,158 53 TRUE
Train Test
plasma
glucose
bmi
diabetes
pedigree
age diabetes
85 26,6 0,351 31 FALSE
137 43,1 2,288 33 TRUE
116 25,6 0,201 30 FALSE
BigML, Inc 51Evaluations
Linear Train / Test Split
Train Test
Year Pineapple
Harvest1986 50,74
1987 22,03
1988 50,69
1989 40,38
1990 29,80
1991 9,90
1992 73,93
1993 22,95
1994 139,09
1995 115,17
1996 193,88
Year Pineapple
Harvest
1997 175,31
1998 223,41
1999 295,03
2000 450,53
Forecast
COMPARE
BigML, Inc 52Evaluations
Evaluation Demo #6
VSSML18. Evaluations

More Related Content

Similar to VSSML18. Evaluations

MLSD18 Evaluations
MLSD18 EvaluationsMLSD18 Evaluations
MLSD18 Evaluations
BigML, Inc
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
Spotle.ai
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Maninda Edirisooriya
 
Model validation
Model validationModel validation
Model validation
Utkarsh Sharma
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
indeedeng
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
BigML, Inc
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
BigML, Inc
 
Performance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptxPerformance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptx
toneve4907
 
Tale of Two Tests
Tale of Two TestsTale of Two Tests
Tale of Two Tests
Optimizely
 
BigML Education - Evaluations
BigML Education - EvaluationsBigML Education - Evaluations
BigML Education - Evaluations
BigML, Inc
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
Marc Berman
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
Ways to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performanceWays to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performance
Mala Deep Upadhaya
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
Jen Stirrup
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
Alia Hamwi
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
Bioinformatics and Computational Biosciences Branch
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
GauravSonawane51
 
Performance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper ParameterPerformance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper Parameter
IndraFransiskusAlam1
 

Similar to VSSML18. Evaluations (20)

MLSD18 Evaluations
MLSD18 EvaluationsMLSD18 Evaluations
MLSD18 Evaluations
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Model validation
Model validationModel validation
Model validation
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
 
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
 
Performance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptxPerformance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptx
 
Tale of Two Tests
Tale of Two TestsTale of Two Tests
Tale of Two Tests
 
BigML Education - Evaluations
BigML Education - EvaluationsBigML Education - Evaluations
BigML Education - Evaluations
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Ways to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performanceWays to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performance
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Performance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper ParameterPerformance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper Parameter
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
BigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
BigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
BigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
BigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
BigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
BigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
BigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
BigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
BigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
BigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
BigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
BigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
BigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
BigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
BigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
BigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Recently uploaded

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 

Recently uploaded (20)

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 

VSSML18. Evaluations

  • 1. Valencian Summer School in Machine Learning 4th edition September 13-14, 2018
  • 2. BigML, Inc 2 Evaluations Proving a Model Works Poul Petersen CIO, BigML, Inc
  • 3. BigML, Inc 3Evaluations Why Evaluations • FACT: No model is perfect - they all make mistakes • Your data has mistakes • Models are “approximations” • Today you have seen models that predict: • Churn: How many people will churn that we didn’t predict? • Diabetes: How many patients might have diabetes that we said were fine? • Home Prices: How accurate are the predicted prices? • You have also seen several different kinds of models • Decision Trees / Ensembles / Logistic Regression / Deepnets • Which one works the best for your data
  • 4. BigML, Inc 4Evaluations Easy Right? INTL MIN INTL CALLS INTL CHARGE CUST SERV CALLS CHURN 8,7 4 2,35 1 False 11,2 5 3,02 0 False 12,7 6 3,43 4 True 9,1 5 2,46 0 False 11,2 2 3,02 1 False 12,3 5 3,32 3 False 13,1 6 3,54 4 False 5,4 9 1,46 4 True 13,8 4 3,73 1 False Model Prediction PREDICT CHURN False True True False False False False False False Look for Mistakes!
  • 6. BigML, Inc 6Evaluations What Just Happened? • We started with the churn Datasource • Created a Dataset • Built a Model to predict churn • We used the Model to predict churn for each customer in the Dataset using a Batch Prediction • Downloaded the Batch Prediction as a CSV and looked for errors. That is, when the Prediction did not match the known true value for churn • The comparison was tedious! • Examining one line at a time • Hard to understand - need some metrics!!!
  • 7. BigML, Inc 7Evaluations Evaluation Metrics • Imagine we have a model that can predict a person’s dominant hand, that is for any individual it predicts left / right • Define the positive class • This selection is arbitrary • It is the class you are interested in! • The negative class is the “other” class (or others) • For this example, we choose : left
  • 8. BigML, Inc 8Evaluations Evaluation Metrics • We choose the positive class: left • True Positive (TP) • We predicted left and the correct answer was left • True Negative (TN) • We predicted right and the correct answer was right • False Positive (FP) • Predicted left but the correct answer was right • False Negative (FN) • Predict right but the correct answer was left
  • 9. BigML, Inc 9Evaluations Evaluation Metrics True Positive: Correctly predicted the positive class True Negative: Correctly predicted the negative class False Positive: Incorrectly predicted the positive class False Negative: Incorrectly predicted the negative class Remember…
  • 10. BigML, Inc 10Evaluations Accuracy TP + TN Total • “Percentage correct” - like an exam • If Accuracy = 1 then no mistakes • If Accuracy = 0 then all mistakes • Intuitive but not always useful • Watch out for unbalanced classes! • Ex: 90% of people are right-handed and 10% are left • A silly model which always predicts right handed is 90% accurate
  • 11. BigML, Inc 11Evaluations Accuracy Classified as Left Handed Classified as Right Handed TP = 0 FP = 0 TN = 7 FN = 3 = Left = RightPositive Class Negative Class TP + TN Total = 70%
  • 12. BigML, Inc 12Evaluations Precision TP TP + FP • “accuracy” or “purity” of positive class • How well you did separating the positive class from the negative class • If Precision = 1 then no FP. • You may have missed some left handers, but of the ones you identified, all are left handed. No mistakes. • If Precision = 0 then no TP • None of the left handers you identified are actually left handed. All mistakes.
  • 13. BigML, Inc 13Evaluations Precision Classified as Left Handed Classified as Right Handed TP = 2 FP = 2 TN = 5 FN = 1 Positive Class Negative Class = Left = Right TP TP + FP = 50%
  • 14. BigML, Inc 14Evaluations Recall TP TP + FN • percentage of positive class correctly identified • A measure of how well you identified all of the positive class examples • If Recall = 1 then no FN → All left handers identified • There may be FP, so precision could be <1 • If Recall = 0 then no TP → No left handers identified
  • 15. BigML, Inc 15Evaluations Recall Classified as Left Handed Classified as Right Handed TP = 2 FP = 2 TN = 5 FN = 1 Positive Class Negative Class = Left = Right TP TP + FN = 66%
  • 16. BigML, Inc 16Evaluations f-Measure 2 * Recall * Precision Recall + Precision • harmonic mean of Recall & Precision • If f-measure = 1 then Recall == Precision == 1 • If Precision OR Recall is small then the f-measure is small
  • 17. BigML, Inc 17Evaluations f-Measure Classified as Fraud Classified as Not Fraud R = 66% P = 50% f = 57% Positive Class Negative Class = Left = Right
  • 18. BigML, Inc 18Evaluations Phi Coefficient __________TP*TN_-_FP*FN__________ SQRT[(TP+FP)(TP+FN)(TN+FP)(TN+FN)] • Returns a value between -1 and 1 • If -1 then predictions are opposite reality • =0 no correlation between predictions and reality • =1 then predictions are always correct
  • 19. BigML, Inc 19Evaluations Phi Coefficient Classified as Fraud Classified as Not Fraud TP = 2 FP = 2 TN = 5 FN = 1 Phi = 0.356 Positive Class Negative Class = Left = Right
  • 21. BigML, Inc 21Evaluations What Just Happened? • Starting with the Diabetes Source, we created a Dataset and then a Model. • Using both the Model and the original Dataset, we created an Evaluation. • We reviewed the metrics provided by the Evaluation: • Confusion Matrix • Accuracy, Precision, Recall, f-measure and phi • This Model seemed to perform really, really well… Question: Can we trust this model?
  • 22. BigML, Inc 22Evaluations Evaluation Danger! • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations!
  • 23. BigML, Inc 23Evaluations “Memorizing” Training Data plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 TRUE 85 26,6 0,351 31 FALSE 183 23,3 0,672 32 TRUE 89 28,1 0,167 21 FALSE 137 43,1 2,288 33 TRUE 116 25,6 0,201 30 FALSE 78 31 0,248 26 TRUE 115 35,3 0,134 29 FALSE 197 30,5 0,158 53 TRUE Training Evaluating plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 ? 85 26,6 0,351 31 ? • Exactly the same values! • Who needs a model? • What we want to know is how the model performs with values never seen at training: 124 22 0,107 46 ?
  • 24. BigML, Inc 24Evaluations Evaluation Danger! • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations! • If you only have one Dataset, use a train/test split
  • 25. BigML, Inc 25Evaluations Train / Test Split plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 TRUE 183 23,3 0,672 32 TRUE 89 28,1 0,167 21 FALSE 78 31 0,248 26 TRUE 115 35,3 0,134 29 FALSE 197 30,5 0,158 53 TRUE Train Test plasma glucose bmi diabetes pedigree age diabetes 85 26,6 0,351 31 FALSE 137 43,1 2,288 33 TRUE 116 25,6 0,201 30 FALSE • These instances were never seen at training time. • Better evaluation of how the model will perform with “new” data
  • 26. BigML, Inc 26Evaluations Train / Test Split DATASET TRAIN SET TEST SET PREDICTIONS METRICS
  • 27. BigML, Inc 27Evaluations Evaluation Danger! • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations! • If you only have one Dataset, use a train/test split • Even a train/test split may not be enough! • Might get a “lucky” split • Solution is to repeat several times (formally to cross validate)
  • 29. BigML, Inc 29Evaluations What Just Happened? • Starting with the Diabetes Dataset we created a train/test split • We built a Model using the train set and evaluated it with the test set • The scores were much worse than before, showing the danger of evaluating with training data. • Then we launched several other types of models and used the evaluation comparison tool to see which model algorithm performed the best. Question: Couldn’t we search for the best Model? STAY TUNED
  • 30. BigML, Inc 30Evaluations Evaluation • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations! • If you only have one Dataset, use a train/test split • Even a train/test split may not be enough! • Might get a “lucky” split • Solution is to repeat several times (formally to cross validate) • Don’t forget that accuracy can be mis-leading! • Mostly useless with unbalanced classes (left/right?) • Use weighting, operating points, other tricks…
  • 31. BigML, Inc 31Evaluations Weighting Instance Rate Payment Outcome Predict Confidence 1 23 % 134 Paid Paid 20 % 2 23 % 134 Paid Paid 25 % 3 23 % 134 Paid Paid 30 % ... ... ... ... ... 1000 23 % 134 Paid Paid 99,5 % 1001 23 % 134 Default Paid 99,4 % Problem: Default is “more important”,but occurs less often than Paid Solution: Weights tell the model to treat instances of a specific class (in this case Default) with more importance
  • 32. BigML, Inc 32Evaluations Operating Points • The default probability threshold is 50% • Changing the threshold can change the outcome for a specific class Rate Payment … Actual Outcome Probability PAID Threshold @ 50% Threshold @ 60% Threshold @ 90% 8,4 % $456 … PAID 95 % PAID PAID PAID 9,6 % $134 … PAID 87 % PAID PAID DEFAULT 18 % $937 … DEFAULT 36 % DEFAULT DEFAULT DEFAULT 21 % $35 … PAID 88 % PAID PAID DEFAULT 17,5 % $1.044 … DEFAULT 55 % PAID DEFAULT DEFAULT
  • 33. BigML, Inc 33Evaluations Lending Club Dataset • Peer to Peer lending service • As an investor, we want a way to identify loans that are a lower risk • Fortunately, the data for the outcome (paid or default) for past loans is available from Lending Club. • Using this data, we can build a model to predict which loans are good or bad Instance Rate Payment Outcome 1 8,4 % 456 Paid 2 9,6 % 134 Paid 3 18 % 937 Default MODEL NEW LOANS GOOD / BAD
  • 35. BigML, Inc 35Evaluations What just happened? • We split the Lending Club data into training and test Datasets • We created a Model and Evaluation • Looking at the Accuracy, we saw that the Model was performing well but because of unbalanced classes • The resulting Model did well at predicting good loans • But bad loans are "more important" • We tried different weights to increase the Recall of bad loans: • objective balancing: equal consideration • class weights: bad = 1000, good = 1 • Finally, we explored the impact of changing the probability threshold Wait - What about regressions?
  • 36. BigML, Inc 36Evaluations Regression - Fitting a Line Data Points Model
  • 37. BigML, Inc 37Evaluations Mean Absolute Error e1 e2 e7 e6 e5 e4 e3 MAE = |e1|+|e2|+ … +|en| n
  • 38. BigML, Inc 38Evaluations Mean Squared Error e1 e2 e7 e6 e5 e4 e3 MSE = (e1)2 +(e2)2 + … +(en)2 n
  • 39. BigML, Inc 39Evaluations MSE versus MAE • For both MAE & MSE: Smaller is better, but values are unbounded • MSE is always larger than or equal to MAE
  • 40. BigML, Inc 40Evaluations R-Squared Error Data Points Model Mean
  • 41. BigML, Inc 41Evaluations R-Squared Error Mean v1 v2 v3 v4 v5 v7 v6
  • 42. BigML, Inc 42Evaluations R-Squared Error e1 e2 e7 e6 e5 e4 e3 Mean v1 v2 v3 v4 v5 v7 v6 MSEmodel MSEmean RSE = 1 -
  • 43. BigML, Inc 43Evaluations R-Squared Error • RSE: measure of how much better the model is than always predicting the mean • < 0 model is worse then mean • MSEmodel > MSEmean • = 0 model is no better than the mean • MSEmodel = MSEmean • ➞ 1 model fits the data “perfectly” • MSEmodel = 0 (or MSEmean >> MSEmodel) MSEmodel MSEmean RSE = 1 -
  • 45. BigML, Inc 45Evaluations What just happened? • We split the RedFin data into training and test Datasets • We created a Model and Evaluation • We examined the Evaluation metrics Wait - What about Time Series?
  • 46. BigML, Inc 46Data Transformations Independent Data Color Mass PPAP red 11 pen green 45 apple red 53 apple yellow 0 pen blue 2 pen green 422 pineapple yellow 555 pineapple blue 7 pen Discovering patterns: • Color = “red” Mass < 100 • PPAP = “pineapple” Color ≠ “blue” • Color = “blue” PPAP = “pen”
  • 47. BigML, Inc 47Data Transformations Independent Data Color Mass PPAP green 45 apple blue 2 pen green 422 pineapple blue 7 pen yellow 0 pen yellow 9 pineapple red 555 apple red 11 pen Patterns still hold when rows re-arranged: • Color = “red” Mass < 100 • PPAP = “pineapple” Color ≠ “blue” • Color = “blue” PPAP = “pen”
  • 48. BigML, Inc 48Data Transformations Dependent Data Year Pineapple Harvest1986 50,74 1987 22,03 1988 50,69 1989 40,38 1990 29,80 1991 9,90 1992 73,93 1993 22,95 1994 139,09 1995 115,17 1996 193,88 1997 175,31 1998 223,41 1999 295,03 2000 450,53 Pineapple Harvest Tons 0 125 250 375 500 Year 1986 1988 1990 1992 1994 1996 1998 2000 Trend Error
  • 49. BigML, Inc 49Data Transformations Dependent Data Pineapple Harvest Tons 0 125 250 375 500 Year 1986 1988 1990 1992 1994 1996 1998 2000 Year Pineapple Harvest1986 139,09 1987 175,31 1988 9,91 1989 22,95 1990 450,53 1991 73,93 1992 40,38 1993 22,03 1994 295,03 1995 50,74 1996 29,8 1997 223,41 1998 115,17 1999 193,88 2000 50,69 Rearranging Disrupts Patterns
  • 50. BigML, Inc 50Evaluations Random Train / Test Split plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 TRUE 183 23,3 0,672 32 TRUE 89 28,1 0,167 21 FALSE 78 31 0,248 26 TRUE 115 35,3 0,134 29 FALSE 197 30,5 0,158 53 TRUE Train Test plasma glucose bmi diabetes pedigree age diabetes 85 26,6 0,351 31 FALSE 137 43,1 2,288 33 TRUE 116 25,6 0,201 30 FALSE
  • 51. BigML, Inc 51Evaluations Linear Train / Test Split Train Test Year Pineapple Harvest1986 50,74 1987 22,03 1988 50,69 1989 40,38 1990 29,80 1991 9,90 1992 73,93 1993 22,95 1994 139,09 1995 115,17 1996 193,88 Year Pineapple Harvest 1997 175,31 1998 223,41 1999 295,03 2000 450,53 Forecast COMPARE