SlideShare a Scribd company logo
1 of 31
Download to read offline
Evaluating Machine Learning Models I
Cèsar Ferri Ramírez
Universitat Politècnica de València
2!
q  Machine Learning Tasks
q  Classification
v  Imbalanced problems, probabilistic classifiers, rankers
q  Regression
q  Unsupervised Learning
q  Lessons learned
Outline
3!
q  Supervised learning: The problem is presented with
example inputs and their desired outputs and the goal is
to learn a general rule that maps inputs to outputs.
o  Classification: Output is categorical
o  Regression: Output is numerical
q  Unsupervised learning: No labels are given to the
learning algorithm, leaving it on its own to find structure
in its input. 
o  Clustering, association rules…
q  Reinforcement learning: A computer program interacts
with a dynamic environment in which it must perform a
certain goal: Driving a vehicle or videogames.
Machine Learning Tasks
4!
q  Classification: problem of identifying to which of a
set of categories a new observation belongs, on the
basis of a training set of data containing
observations whose category membership is
known. 
o  Spam filters
o  Face identification
o  Diagnosis of patients
Evaluation of Classifiers
5!
q  Given a set S of n instances, we define
classification error: 
Evaluation of Classifiers
∑∈
∂=
Sx
S xhxf
n
herror ))(),((
1
)(
Where δ(a,b)=0 if a=b and 1otherwise.
Predicted
class(h(x))
Actual class (f(x)) Error
Buy Buy No
No Buy Buy Yes
Buy No Buy Yes
Buy Buy No
No Buy No Buy No
No Buy Buy Yes
No Buy No Buy No
Buy Buy No
Buy Buy No
No Buy No Buy No
Error = 3/10 = 0.3
Mistakes/ Total
6!
q  Common solution:
o  Split between training and test data
Split the data
training
test
Models
Evaluation
Best model
∑∈
−=
Sx
S xhxf
n
herror 2
))()((
1
)(
data
Algorithms
What if there is not much data available?
GOLDEN RULE: Never use the same example
for training the model and evaluating it!!
7!
q  Too much training data: poor evaluation
q  Too much test data: poor training
q  Can we have more training data and more test data
without breaking the golden rule?
o  Repeat the experiment!
ü  Bootstrap: we perform n samples (with repetition) and test
with the rest.
ü  Cross validation: Data is split in n folds of equal size. 
Taking the most of the data
8!
q  What dataset do we use to estimate all previous
metrics?
o  If we use all data to train the models and evaluate them, we
get overoptimistic models:
ü  Over-fitting:
o  If we try to compensate by generalising the model (e.g.,
pruning a tree), we may get:
ü  Under-fitting:
o  How can we find a trade-off?
Overfitting?
9!
q  Confusion (contingency) matrix:
o  We can observe how errors are distributed.
Confusion Matrix
c Buy No Buy
Buy 4 1
No Buy 2 3
Actual
Pred.
10!
q  For two classes:
Confusion Matrix
+ -
+
-
TP
FN
FP
TN
actual
predicted
TP+FN FP+TN
true positive false positive
false negative true negative
Accuracy=​ 𝑇 𝑃+ 𝑇𝑁/𝑁 ! Error=1-Accuracy=​ 𝐹 𝑃+ 𝐹𝑁/𝑁 !
11!
q  For two classes:
Confusion Matrix
+ -
+
-
TP
FN
FP
TN
actual
predicted
TP+FN FP+TN
true positive false positive
false negative true negative
TPRate, Sensitivity=​ 𝑇 𝑃/𝑇𝑃+ 𝐹𝑁 !
TNRate, Specificity =​ 𝑇 𝑁/𝐹𝑃+ 𝑇𝑁 !
12!
q  For two classes:
Confusion Matrix
+ -
+
-
TP
FN
FP
TN
actual
predicted
TP+FN FP+TN
true positive false positive
false negative true negative
TPRate, Sensitivity, Recall=​ 𝑇 𝑃/𝑇𝑃+ 𝐹𝑁 !
Positive predictive value (PPV), Precision =​ 𝑇 𝑃/𝑇𝑃+ 𝐹𝑃 !
13!
q  Common measures in IR: Precision and Recall
q  Are defined in terms of a set of retrieved documents
and a set of relevant documents.
o  Precision: Fraction of retrieved documents that are
relevant to the query
o  Recall: Percent of all relevant documents that is
returned by the search
q  Both measures are usually combined in one
(harmonic mean):
Information Retrieval
asure=2​ 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛× 𝑅𝑒𝑐𝑎𝑙𝑙/𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛× 𝑅𝑒𝑐𝑎𝑙𝑙 =​2 𝑇𝑃/2 𝑇𝑃+ 𝐹𝑃+ 𝐹𝑁
14!
q  Confusion (contingency) tables can be multiclass
q  Measures based on 2-class matrices are computed
o  1 vs all (average N partial measures)
o  1 vs 1 (average N*(N-1)/2 partial measures)
ü  Weighted average?
Multiclass problems
ERROR actual
low medium high
low 20 0 13
medium 5 15 4predicted
high 4 7 60
15!
q  In some cases we can find important differences
among proportion of classes
o  A naïve classifier that always predictive majority class
(ignoring minority classes) obtains good performance
ü  In a binary problem (+/-) with 1% of negative instances,
model “always positive” gets an accuracy of 99%.
q  Macro-accuracy: Average of accuracy per class
o  The naïve classifier gets a macro-accuracy=0.5
Imbalanced Datasets
m
total
hits
total
hits
total
Hits
hmacroacc mclass
mclass
2class
2class
1class
1class
...
)(
+++
=
q  Crisp and Soft Classifiers:
o  A “hard” or “crisp” classifier predicts a class between a
set of possible classes.
o  A “soft” or “scoring” classifier (probabilistic) predicts a
class, but accompanies each prediction with an
estimation of the reliability (confidence) of each
prediction.
ü  Most learning methods can be adapted to generate this
confidence value.
Soft classifiers
16!
q  A special kind of soft classifier is a class probability
estimator.
o  Instead of predicting “a”, “b” or “c”, it gives a
probability estimation for “a”, “b” or “c”, i.e., “pa”, “pb”
and “pc”. 
ü  Example:
v  Classifier 1: pa = 0.2, pb = 0.5 and pc = 0.3.
v  Classifier 2: pa = 0.3, pb = 0.4 and pc = 0.3.
o  Both predict “b”, but classifier 1 is more confident.
Soft classifiers
17!
q  Probabilistic classifiers: Classifiers that are able to
predict a probability distribution over a set of
classes
o  Provide classification with a degree of certainty:
ü  Combining classifiers
ü  Cost sensitive contexts
q  Mean Squared Error (Brier Score)
q  Log Loss 
Evaluating probabilistic classifiers
18!
∑∑∈ ∈
−=
Si Cj
jipjif
n
MSE ),(),(
1
f(i,j)=1 if instance i is of class j, 0 otherwise.
p(i,j) returns de prob. instance i in class j
∑∑∈ ∈
∗−=
Si Cj
jipjif
n
Logloss )),(log),((
1
2
q  MSE or Brier Score can be decomposed into two
factors:
o  BS=CAL+REF
ü  Calibration: Measures the quality of classifier scores wrt
class membership probabilities 
ü  Refinement: it is an aggregation of resolution and uncertainty,
and is related to the area under the ROC Curve.
q  Calibration Methods: 
o  Try to transform classifier scores into class membership
probabilities
ü  Platt scaling, Isotonic Regression, PAVcal..
Evaluating probabilistic classifiers
19!
q  Brier Curves for analysing classifier performance 
Evaluating probabilistic classifiers
20!
Non calibrated Brier curve ! PAV-calibrated Brier curve !
q  “Rankers”:
o  Whenever we have a probability estimator for a two-
class problem:
ü  pa = x, then pb = 1 ‒ x.
o  Let’s call one class 0 (neg) and the other class 1 (pos).
o  A ranker is a soft classifier that gives a value (score)
monotonically related to the probability of class 1.
ü  Examples:
v  Probability of a customer buying a product.
v  Probability of a message being spam....
Soft classifiers
21!
22!
q  We can rank instances according to estimated
probability
o  CRM: You are interested in the top % of potential
costumers
q  Measures for ranking
o  AUC: Area Ander the ROC Curve
o  Distances between perfect ranking and estimated
ranking
Evaluation of Rankers
23!
q  Regression: In this case the variable to be predicted
is a continuous value.
o  Predicting daily value of stocks in NASDAQ
o  Forecasting number of docks available in a Valenbisi
station in the next hour
o  Predicting the amount of beers sold the next month by
a retail company
Regression
24!
q  Given a set S of n instances,
o  Mean Absolute Error: 

o  Mean Squared Error: 
o  Root Mean Suared Error: 

ü  MSE is more sensitive to extreme values
ü  RMSE and MAE are in the same magnitude of the actual values
Evaluation of Regressors
∑∈
−=
Sx
S xhxf
n
hMAE )()(
1
)(
∑∈
−=
Sx
S xhxf
n
hMSE 2
))()((
1
)(
∑∈
−=
Sx
S xhxf
n
hRMSE 2
))()((
1
)(
25!
q  Example:
Evaluation of Regressors
Predicted Value(h(x)) Acual Value (f(x)) Error Error2
100 mill. € 102 mill. € 2 4
102 mill. € 110 mill. € 8 64
105 mill. € 95 mill. € 10 100
95 mill. € 75 mill. € 20 400
101 mill. € 103 mill. € 2 4
105 mill. € 110 mill. € 5 25
105 mill. € 98 mill. € 7 49
40 mill. € 32 mill. € 8 64
220 mill. € 215 mill. € 5 25
100 mill. € 103 mill. € 3 9
MSE= 744/10 = 74,4!
MAE= 60/10 =6!
RMSE= sqrt(744/10) = 8.63!
26!
q  Sometimes relative error values are more
appropiate:
o  10% for an error of 50 when predicting 500
q  How much does the scheme improve on simply
predicting the average:
o  Relative Mean Squared Error
o  Relative Mean Absolute Error

Evaluation of Regressors
( )
( )∑
∑
∈
∈
−
−
=
Sx
Sx
S
xff
xhxf
hRSE 2
2
)(
)()(
)(
∑
∑
∈
∈
−
−
=
Sx
Sx
S
xhf
xhxf
hRAE
)(
)()(
)(
27!
q  A related measure is R2 (coefficient of
determination) 
o  Number that indicates how well data fit a statistical
model 
ü  sum of squares of residuals

ü  total sum of squares 
ü  coefficient of determination 

Evaluation of Regressors
( )∑∈
−=
Sx
s xffhSStot
2
)()(
)(
)(
1)(2
hSStot
hSSres
hR
s
s
S −=
( )∑∈
−=
Sx
s xhfhSSres
2
)()(
28!
q  Association Rules: task of discovering interesting
relations between variables in databases.
q  Common metrics:
o  Support: Estimates the popularity of a rule
o  Confidence: Estimates the reliability of a rule
q  Rules are ordered according to a measures that
combine both values
q  No partition train/test. 
Unsupervised Learning
29!
q  Clustering: task of grouping a set of objects in such
a way that objects in the same group (cluster) are
more similar to each other than to those in other
clusters. 
o  Task difficult to evaluate
q  Some evaluation measures based on distance:
o  Distance among borders of clusters
o  Distance among centers (centroids) of clusters
o  Radius and density of clusters
Unsupervised Learning
q  Model evaluation is a fundamental phase in the
knowledge discovery process.
q  In classification, depending on the feature and
context we want to analyse, we need to use the
proper metric.
q  Several (and sometimes equivalent) measures for
regression models.
q  Supervised models are easier to evaluate since we
have an estimate of the ground truth.
Lessons learned
30!
q  Witten, Ian H., and Eibe Frank. Data Mining: Practical machine learning tools
and techniques. Morgan Kaufmann, 2005.
q  Flach, P.A. (2012)“Machine Learning: The Art and Science of Algorithms that
Make Sense of Cambridge University Press.
q  Hand, D.J. (1997) “Construction and Assessment of Classification Rules”,
Wiley.
q  César Ferri, José Hernández-Orallo, R. Modroiu (2009): An experimental
comparison of performance measures for classification. Pattern Recognition
Letters 30(1): 27-38
q  Nathalie Japkowicz, ‎Mohak Shah (2011) “Evaluating Learning Algorithms: A
Classification Perspective”, Cambridge University Press 2011.
q  Antonio Bella, Cèsar Ferri Ramirez, José Hernández-Orallo, M. José Ramírez-
Quintana: On the effect of calibration in classifier combination. Appl. Intell.
38(4): 566-585 (2013)
q  José Hernández-Orallo, Peter A. Flach, Cèsar Ferri Ramirez:Brier Curves: a
New Cost-Based Visualisation of Classifier Performance. ICML 2011: 585-592
To know more
31!

More Related Content

What's hot

What's hot (20)

Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Random forest
Random forestRandom forest
Random forest
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 

Viewers also liked

Viewers also liked (20)

Artificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom DietterichArtificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom Dietterich
 
From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4L
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
Comparative Study of Granger Causality Algorithm for Gene Regulatory Network
Comparative Study of Granger Causality Algorithm for Gene Regulatory NetworkComparative Study of Granger Causality Algorithm for Gene Regulatory Network
Comparative Study of Granger Causality Algorithm for Gene Regulatory Network
 
Learning to trust artificial intelligence systems accountability, compliance ...
Learning to trust artificial intelligence systems accountability, compliance ...Learning to trust artificial intelligence systems accountability, compliance ...
Learning to trust artificial intelligence systems accountability, compliance ...
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
Ethical Considerations in the Design of Artificial Intelligence
Ethical Considerations in the Design of Artificial IntelligenceEthical Considerations in the Design of Artificial Intelligence
Ethical Considerations in the Design of Artificial Intelligence
 
AI Techniques for Smart Grids
AI Techniques for Smart GridsAI Techniques for Smart Grids
AI Techniques for Smart Grids
 
Dbm630 lecture08
Dbm630 lecture08Dbm630 lecture08
Dbm630 lecture08
 
Ethics of AI
Ethics of AIEthics of AI
Ethics of AI
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
The Ethics of Artificial Intelligence
The Ethics of Artificial IntelligenceThe Ethics of Artificial Intelligence
The Ethics of Artificial Intelligence
 
The Ethics of Machine Learning/AI - Brent M. Eastwood
The Ethics of Machine Learning/AI - Brent M. EastwoodThe Ethics of Machine Learning/AI - Brent M. Eastwood
The Ethics of Machine Learning/AI - Brent M. Eastwood
 

Similar to L2. Evaluating Machine Learning Algorithms I

Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
Stéphane Canu
 

Similar to L2. Evaluating Machine Learning Algorithms I (20)

Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, MetricsOrdinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
 
Estimating Causal Effects from Observations
Estimating Causal Effects from ObservationsEstimating Causal Effects from Observations
Estimating Causal Effects from Observations
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big Data
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Hypothesis and Test
Hypothesis and TestHypothesis and Test
Hypothesis and Test
 

More from Machine Learning Valencia

More from Machine Learning Valencia (10)

L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
L14. Anomaly Detection
L14. Anomaly DetectionL14. Anomaly Detection
L14. Anomaly Detection
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
L9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking PredictionsL9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking Predictions
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIsL7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
 
L6. Unbalanced Datasets
L6. Unbalanced DatasetsL6. Unbalanced Datasets
L6. Unbalanced Datasets
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
L1. State of the Art in Machine Learning
L1. State of the Art in Machine LearningL1. State of the Art in Machine Learning
L1. State of the Art in Machine Learning
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

L2. Evaluating Machine Learning Algorithms I

  • 1. Evaluating Machine Learning Models I Cèsar Ferri Ramírez Universitat Politècnica de València
  • 2. 2! q  Machine Learning Tasks q  Classification v  Imbalanced problems, probabilistic classifiers, rankers q  Regression q  Unsupervised Learning q  Lessons learned Outline
  • 3. 3! q  Supervised learning: The problem is presented with example inputs and their desired outputs and the goal is to learn a general rule that maps inputs to outputs. o  Classification: Output is categorical o  Regression: Output is numerical q  Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. o  Clustering, association rules… q  Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal: Driving a vehicle or videogames. Machine Learning Tasks
  • 4. 4! q  Classification: problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is known. o  Spam filters o  Face identification o  Diagnosis of patients Evaluation of Classifiers
  • 5. 5! q  Given a set S of n instances, we define classification error: Evaluation of Classifiers ∑∈ ∂= Sx S xhxf n herror ))(),(( 1 )( Where δ(a,b)=0 if a=b and 1otherwise. Predicted class(h(x)) Actual class (f(x)) Error Buy Buy No No Buy Buy Yes Buy No Buy Yes Buy Buy No No Buy No Buy No No Buy Buy Yes No Buy No Buy No Buy Buy No Buy Buy No No Buy No Buy No Error = 3/10 = 0.3 Mistakes/ Total
  • 6. 6! q  Common solution: o  Split between training and test data Split the data training test Models Evaluation Best model ∑∈ −= Sx S xhxf n herror 2 ))()(( 1 )( data Algorithms What if there is not much data available? GOLDEN RULE: Never use the same example for training the model and evaluating it!!
  • 7. 7! q  Too much training data: poor evaluation q  Too much test data: poor training q  Can we have more training data and more test data without breaking the golden rule? o  Repeat the experiment! ü  Bootstrap: we perform n samples (with repetition) and test with the rest. ü  Cross validation: Data is split in n folds of equal size. Taking the most of the data
  • 8. 8! q  What dataset do we use to estimate all previous metrics? o  If we use all data to train the models and evaluate them, we get overoptimistic models: ü  Over-fitting: o  If we try to compensate by generalising the model (e.g., pruning a tree), we may get: ü  Under-fitting: o  How can we find a trade-off? Overfitting?
  • 9. 9! q  Confusion (contingency) matrix: o  We can observe how errors are distributed. Confusion Matrix c Buy No Buy Buy 4 1 No Buy 2 3 Actual Pred.
  • 10. 10! q  For two classes: Confusion Matrix + - + - TP FN FP TN actual predicted TP+FN FP+TN true positive false positive false negative true negative Accuracy=​ 𝑇 𝑃+ 𝑇𝑁/𝑁 ! Error=1-Accuracy=​ 𝐹 𝑃+ 𝐹𝑁/𝑁 !
  • 11. 11! q  For two classes: Confusion Matrix + - + - TP FN FP TN actual predicted TP+FN FP+TN true positive false positive false negative true negative TPRate, Sensitivity=​ 𝑇 𝑃/𝑇𝑃+ 𝐹𝑁 ! TNRate, Specificity =​ 𝑇 𝑁/𝐹𝑃+ 𝑇𝑁 !
  • 12. 12! q  For two classes: Confusion Matrix + - + - TP FN FP TN actual predicted TP+FN FP+TN true positive false positive false negative true negative TPRate, Sensitivity, Recall=​ 𝑇 𝑃/𝑇𝑃+ 𝐹𝑁 ! Positive predictive value (PPV), Precision =​ 𝑇 𝑃/𝑇𝑃+ 𝐹𝑃 !
  • 13. 13! q  Common measures in IR: Precision and Recall q  Are defined in terms of a set of retrieved documents and a set of relevant documents. o  Precision: Fraction of retrieved documents that are relevant to the query o  Recall: Percent of all relevant documents that is returned by the search q  Both measures are usually combined in one (harmonic mean): Information Retrieval asure=2​ 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛× 𝑅𝑒𝑐𝑎𝑙𝑙/𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛× 𝑅𝑒𝑐𝑎𝑙𝑙 =​2 𝑇𝑃/2 𝑇𝑃+ 𝐹𝑃+ 𝐹𝑁
  • 14. 14! q  Confusion (contingency) tables can be multiclass q  Measures based on 2-class matrices are computed o  1 vs all (average N partial measures) o  1 vs 1 (average N*(N-1)/2 partial measures) ü  Weighted average? Multiclass problems ERROR actual low medium high low 20 0 13 medium 5 15 4predicted high 4 7 60
  • 15. 15! q  In some cases we can find important differences among proportion of classes o  A naïve classifier that always predictive majority class (ignoring minority classes) obtains good performance ü  In a binary problem (+/-) with 1% of negative instances, model “always positive” gets an accuracy of 99%. q  Macro-accuracy: Average of accuracy per class o  The naïve classifier gets a macro-accuracy=0.5 Imbalanced Datasets m total hits total hits total Hits hmacroacc mclass mclass 2class 2class 1class 1class ... )( +++ =
  • 16. q  Crisp and Soft Classifiers: o  A “hard” or “crisp” classifier predicts a class between a set of possible classes. o  A “soft” or “scoring” classifier (probabilistic) predicts a class, but accompanies each prediction with an estimation of the reliability (confidence) of each prediction. ü  Most learning methods can be adapted to generate this confidence value. Soft classifiers 16!
  • 17. q  A special kind of soft classifier is a class probability estimator. o  Instead of predicting “a”, “b” or “c”, it gives a probability estimation for “a”, “b” or “c”, i.e., “pa”, “pb” and “pc”. ü  Example: v  Classifier 1: pa = 0.2, pb = 0.5 and pc = 0.3. v  Classifier 2: pa = 0.3, pb = 0.4 and pc = 0.3. o  Both predict “b”, but classifier 1 is more confident. Soft classifiers 17!
  • 18. q  Probabilistic classifiers: Classifiers that are able to predict a probability distribution over a set of classes o  Provide classification with a degree of certainty: ü  Combining classifiers ü  Cost sensitive contexts q  Mean Squared Error (Brier Score) q  Log Loss Evaluating probabilistic classifiers 18! ∑∑∈ ∈ −= Si Cj jipjif n MSE ),(),( 1 f(i,j)=1 if instance i is of class j, 0 otherwise. p(i,j) returns de prob. instance i in class j ∑∑∈ ∈ ∗−= Si Cj jipjif n Logloss )),(log),(( 1 2
  • 19. q  MSE or Brier Score can be decomposed into two factors: o  BS=CAL+REF ü  Calibration: Measures the quality of classifier scores wrt class membership probabilities ü  Refinement: it is an aggregation of resolution and uncertainty, and is related to the area under the ROC Curve. q  Calibration Methods: o  Try to transform classifier scores into class membership probabilities ü  Platt scaling, Isotonic Regression, PAVcal.. Evaluating probabilistic classifiers 19!
  • 20. q  Brier Curves for analysing classifier performance Evaluating probabilistic classifiers 20! Non calibrated Brier curve ! PAV-calibrated Brier curve !
  • 21. q  “Rankers”: o  Whenever we have a probability estimator for a two- class problem: ü  pa = x, then pb = 1 ‒ x. o  Let’s call one class 0 (neg) and the other class 1 (pos). o  A ranker is a soft classifier that gives a value (score) monotonically related to the probability of class 1. ü  Examples: v  Probability of a customer buying a product. v  Probability of a message being spam.... Soft classifiers 21!
  • 22. 22! q  We can rank instances according to estimated probability o  CRM: You are interested in the top % of potential costumers q  Measures for ranking o  AUC: Area Ander the ROC Curve o  Distances between perfect ranking and estimated ranking Evaluation of Rankers
  • 23. 23! q  Regression: In this case the variable to be predicted is a continuous value. o  Predicting daily value of stocks in NASDAQ o  Forecasting number of docks available in a Valenbisi station in the next hour o  Predicting the amount of beers sold the next month by a retail company Regression
  • 24. 24! q  Given a set S of n instances, o  Mean Absolute Error: o  Mean Squared Error: o  Root Mean Suared Error: ü  MSE is more sensitive to extreme values ü  RMSE and MAE are in the same magnitude of the actual values Evaluation of Regressors ∑∈ −= Sx S xhxf n hMAE )()( 1 )( ∑∈ −= Sx S xhxf n hMSE 2 ))()(( 1 )( ∑∈ −= Sx S xhxf n hRMSE 2 ))()(( 1 )(
  • 25. 25! q  Example: Evaluation of Regressors Predicted Value(h(x)) Acual Value (f(x)) Error Error2 100 mill. € 102 mill. € 2 4 102 mill. € 110 mill. € 8 64 105 mill. € 95 mill. € 10 100 95 mill. € 75 mill. € 20 400 101 mill. € 103 mill. € 2 4 105 mill. € 110 mill. € 5 25 105 mill. € 98 mill. € 7 49 40 mill. € 32 mill. € 8 64 220 mill. € 215 mill. € 5 25 100 mill. € 103 mill. € 3 9 MSE= 744/10 = 74,4! MAE= 60/10 =6! RMSE= sqrt(744/10) = 8.63!
  • 26. 26! q  Sometimes relative error values are more appropiate: o  10% for an error of 50 when predicting 500 q  How much does the scheme improve on simply predicting the average: o  Relative Mean Squared Error o  Relative Mean Absolute Error Evaluation of Regressors ( ) ( )∑ ∑ ∈ ∈ − − = Sx Sx S xff xhxf hRSE 2 2 )( )()( )( ∑ ∑ ∈ ∈ − − = Sx Sx S xhf xhxf hRAE )( )()( )(
  • 27. 27! q  A related measure is R2 (coefficient of determination) o  Number that indicates how well data fit a statistical model ü  sum of squares of residuals ü  total sum of squares ü  coefficient of determination Evaluation of Regressors ( )∑∈ −= Sx s xffhSStot 2 )()( )( )( 1)(2 hSStot hSSres hR s s S −= ( )∑∈ −= Sx s xhfhSSres 2 )()(
  • 28. 28! q  Association Rules: task of discovering interesting relations between variables in databases. q  Common metrics: o  Support: Estimates the popularity of a rule o  Confidence: Estimates the reliability of a rule q  Rules are ordered according to a measures that combine both values q  No partition train/test. Unsupervised Learning
  • 29. 29! q  Clustering: task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other clusters. o  Task difficult to evaluate q  Some evaluation measures based on distance: o  Distance among borders of clusters o  Distance among centers (centroids) of clusters o  Radius and density of clusters Unsupervised Learning
  • 30. q  Model evaluation is a fundamental phase in the knowledge discovery process. q  In classification, depending on the feature and context we want to analyse, we need to use the proper metric. q  Several (and sometimes equivalent) measures for regression models. q  Supervised models are easier to evaluate since we have an estimate of the ground truth. Lessons learned 30!
  • 31. q  Witten, Ian H., and Eibe Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005. q  Flach, P.A. (2012)“Machine Learning: The Art and Science of Algorithms that Make Sense of Cambridge University Press. q  Hand, D.J. (1997) “Construction and Assessment of Classification Rules”, Wiley. q  César Ferri, José Hernández-Orallo, R. Modroiu (2009): An experimental comparison of performance measures for classification. Pattern Recognition Letters 30(1): 27-38 q  Nathalie Japkowicz, ‎Mohak Shah (2011) “Evaluating Learning Algorithms: A Classification Perspective”, Cambridge University Press 2011. q  Antonio Bella, Cèsar Ferri Ramirez, José Hernández-Orallo, M. José Ramírez- Quintana: On the effect of calibration in classifier combination. Appl. Intell. 38(4): 566-585 (2013) q  José Hernández-Orallo, Peter A. Flach, Cèsar Ferri Ramirez:Brier Curves: a New Cost-Based Visualisation of Classifier Performance. ICML 2011: 585-592 To know more 31!