SlideShare a Scribd company logo
1 of 25
Program Studi Teknik Informatika
Fakultas Teknik – Universitas Surabaya
Model Evaluation
Week 11
1604C055 - Machine Learning
Model Evaluation
• Model evaluation is a process to measure the performance a
classification model.
• Aims to estimate the generalization performance of a model on
future (unseen/new/out-of-sample) data.
• Model evaluation is required to build a robust model.
How to evaluate the Classifier’s
Generalization Performance?
• Assume that we test a binary classifier on some test set, and we derive at
the end the following confusion matrix:
Predicted class
Actual
class
Pos Neg
Pos TP FN
Neg FP TN
P
N
Metrics for Classifier’s Evaluation
• Accuracy =
𝑇𝑃+𝑇𝑁
𝑃+𝑁
• Error =
𝐹𝑃+𝐹𝑁
𝑃+𝑁
= 1 − Accuracy
• Precision =
𝑇𝑃
𝑇𝑃+𝐹𝑃
• Recall/ Sensitivity/TP rate =
𝑇𝑃
𝑃
• FP Rate =
𝐹𝑃
𝑁
• Specificity =
𝑇𝑁
𝑁
= 1– FP Rate
• F1 score = 2 ×
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
Metrics for Classifier’s Evaluation
• For multiclass classification
– Accuracy =
#correctly classified objects
#objects
– Precision and recall for a certain class can be calculated by assuming
that the class is positive class and the remaining classes are negative
class
How to Estimate the Metrics?
• We can use:
– Training data;
– Independent test data;
– Hold-out method;
– k-fold cross-validation method;
– Leave-one-out method;
– Bootstrap method;
– And many more…
Estimation with Training Data
• The accuracy/error estimates on the training data are not good indicators of
performance on future data.
– Q: Why?
– A: Because new data will probably not be exactly the same as the training data!
• The accuracy/error estimates on the training data measure the degree of
classifier’s overfitting.
Training set
Classifier
Training set
Estimation with Independent Test Data
• Estimation with independent test data is used when we have plenty of
data and there is a natural way to forming training and test data.
• For example: Quinlan in 1987 reported experiments in a medical domain
for which the classifiers were trained on data from 1985 and tested on
data from 1986.
Training set
Classifier
Test set
Hold-out Method
• The hold-out method splits the data into training data and test data. (e.g., 60:40,
2/3:1/3, 70:30)
• Then we build a classifier using the train data and test it using the test data.
• The hold-out method is usually used when we have thousands of instances,
including several hundred instances from each class.
Training set
Classifier
Test set
Data
Train, Validation, Test Split
Data
Predictions
Y N
Results Known
Training set
Validation set
+
+
-
-
+
Classifier Builder
Evaluate
+
-
+
-
Classifier
Final Test Set
+
-
+
-
Final Evaluation
Model
Builder
The test data can’t be used for parameter tuning!
Making the Most of the Data
• Once evaluation is complete, all the data can be used to build the
final classifier.
• Generally, the larger the training data the better the classifier (but
returns diminish).
• The larger the test data the more accurate the error estimate.
Stratification
• The holdout method reserves a certain amount for testing and uses
the remainder for training.
– Example: one third for testing, the rest for training.
• For “unbalanced” datasets, samples might not be representative.
– Few or none instances of some classes.
• Stratified sample: advanced version of balancing the data.
– Make sure that each class is represented with approximately equal
proportions in both subsets.
Repeated Holdout Method
• Holdout estimate can be made more reliable by repeating the
process with different subsamples.
– In each iteration, a certain proportion is randomly selected for training
(possibly with stratification).
– The error rates on the different iterations are averaged to yield an overall
error rate.
• This is called the repeated holdout method.
Repeated Holdout Method
• Still not optimum: the different test sets overlap, but we would like all
our instances from the data to be tested at least ones.
• Can we prevent overlapping?
witten & eibe
k-Fold Cross-Validation
• k-fold cross-validation avoids overlapping test sets:
– First step: data is split into k subsets of equal size;
– Second step: each subset in turn is used for testing and the remainder
for training.
• The subsets are stratified
• before the cross-validation.
• The estimates are averaged to
• yield an overall estimate.
Classifier
Data
train train test
train test train
test train train
More on Cross-Validation
• Standard method for evaluation: stratified 10-fold cross-validation.
• Why 10? Extensive experiments have shown that this is the best
choice to get an accurate estimate.
• Stratification reduces the estimate’s variance.
• Even better: repeated stratified cross-validation:
– E.g. ten-fold cross-validation is repeated ten times and results are
averaged (reduces the variance).
Leave-One-Out Cross-Validation
• Leave-One-Out is a particular form of cross-validation:
– Set number of folds to number of training instances;
– I.e., for n training instances, build classifier n times.
• Makes best use of the data.
• Involves no random sub-sampling.
• Very computationally expensive.
• A disadvantage of Leave-One-Out-CV is that stratification is not
possible:
– It guarantees a non-stratified sample because there is only one instance
in the test set!
Model Evaluation in Python:
Train Test Split
Model Evaluation in Python:
Confusion Matrix
Model Evaluation in Python:
Graphical Confusion Matrix
Model Evaluation in Python:
Evaluation Metrics
Model Evaluation in Python: Train,
Validation, Test Split
Model Evaluation in Python:
Stratified Repeated Hold Out
Model Evaluation in Python: k-Fold
Cross-Validation
Model Evaluation in Python:
Leave-One-Out Cross-Validation

More Related Content

Similar to Week 11 Model Evalaution Model Evaluation

DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit ivmalathieswaran29
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
shubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxshubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxABINASHPADHY6
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxshamsul2010
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validationAbraham Kong
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxTemp762476
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-bestABDUmomo
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdfAlexanderLerch4
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & predictionhktripathy
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptxRithikRaj25
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection TechniquesSwati .
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
Week 13 Feature Selection Computer Vision Bagian 2
Week 13 Feature Selection Computer Vision Bagian 2Week 13 Feature Selection Computer Vision Bagian 2
Week 13 Feature Selection Computer Vision Bagian 2khairulhuda242
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selectionMarco Meoni
 

Similar to Week 11 Model Evalaution Model Evaluation (20)

DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
shubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxshubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptx
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validation
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
 
[Vu Van Nguyen] Test Estimation in Practice
[Vu Van Nguyen]  Test Estimation in Practice[Vu Van Nguyen]  Test Estimation in Practice
[Vu Van Nguyen] Test Estimation in Practice
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Week 13 Feature Selection Computer Vision Bagian 2
Week 13 Feature Selection Computer Vision Bagian 2Week 13 Feature Selection Computer Vision Bagian 2
Week 13 Feature Selection Computer Vision Bagian 2
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 

Recently uploaded

Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 

Recently uploaded (20)

Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 

Week 11 Model Evalaution Model Evaluation

  • 1. Program Studi Teknik Informatika Fakultas Teknik – Universitas Surabaya Model Evaluation Week 11 1604C055 - Machine Learning
  • 2. Model Evaluation • Model evaluation is a process to measure the performance a classification model. • Aims to estimate the generalization performance of a model on future (unseen/new/out-of-sample) data. • Model evaluation is required to build a robust model.
  • 3. How to evaluate the Classifier’s Generalization Performance? • Assume that we test a binary classifier on some test set, and we derive at the end the following confusion matrix: Predicted class Actual class Pos Neg Pos TP FN Neg FP TN P N
  • 4. Metrics for Classifier’s Evaluation • Accuracy = 𝑇𝑃+𝑇𝑁 𝑃+𝑁 • Error = 𝐹𝑃+𝐹𝑁 𝑃+𝑁 = 1 − Accuracy • Precision = 𝑇𝑃 𝑇𝑃+𝐹𝑃 • Recall/ Sensitivity/TP rate = 𝑇𝑃 𝑃 • FP Rate = 𝐹𝑃 𝑁 • Specificity = 𝑇𝑁 𝑁 = 1– FP Rate • F1 score = 2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
  • 5. Metrics for Classifier’s Evaluation • For multiclass classification – Accuracy = #correctly classified objects #objects – Precision and recall for a certain class can be calculated by assuming that the class is positive class and the remaining classes are negative class
  • 6. How to Estimate the Metrics? • We can use: – Training data; – Independent test data; – Hold-out method; – k-fold cross-validation method; – Leave-one-out method; – Bootstrap method; – And many more…
  • 7. Estimation with Training Data • The accuracy/error estimates on the training data are not good indicators of performance on future data. – Q: Why? – A: Because new data will probably not be exactly the same as the training data! • The accuracy/error estimates on the training data measure the degree of classifier’s overfitting. Training set Classifier Training set
  • 8. Estimation with Independent Test Data • Estimation with independent test data is used when we have plenty of data and there is a natural way to forming training and test data. • For example: Quinlan in 1987 reported experiments in a medical domain for which the classifiers were trained on data from 1985 and tested on data from 1986. Training set Classifier Test set
  • 9. Hold-out Method • The hold-out method splits the data into training data and test data. (e.g., 60:40, 2/3:1/3, 70:30) • Then we build a classifier using the train data and test it using the test data. • The hold-out method is usually used when we have thousands of instances, including several hundred instances from each class. Training set Classifier Test set Data
  • 10. Train, Validation, Test Split Data Predictions Y N Results Known Training set Validation set + + - - + Classifier Builder Evaluate + - + - Classifier Final Test Set + - + - Final Evaluation Model Builder The test data can’t be used for parameter tuning!
  • 11. Making the Most of the Data • Once evaluation is complete, all the data can be used to build the final classifier. • Generally, the larger the training data the better the classifier (but returns diminish). • The larger the test data the more accurate the error estimate.
  • 12. Stratification • The holdout method reserves a certain amount for testing and uses the remainder for training. – Example: one third for testing, the rest for training. • For “unbalanced” datasets, samples might not be representative. – Few or none instances of some classes. • Stratified sample: advanced version of balancing the data. – Make sure that each class is represented with approximately equal proportions in both subsets.
  • 13. Repeated Holdout Method • Holdout estimate can be made more reliable by repeating the process with different subsamples. – In each iteration, a certain proportion is randomly selected for training (possibly with stratification). – The error rates on the different iterations are averaged to yield an overall error rate. • This is called the repeated holdout method.
  • 14. Repeated Holdout Method • Still not optimum: the different test sets overlap, but we would like all our instances from the data to be tested at least ones. • Can we prevent overlapping? witten & eibe
  • 15. k-Fold Cross-Validation • k-fold cross-validation avoids overlapping test sets: – First step: data is split into k subsets of equal size; – Second step: each subset in turn is used for testing and the remainder for training. • The subsets are stratified • before the cross-validation. • The estimates are averaged to • yield an overall estimate. Classifier Data train train test train test train test train train
  • 16. More on Cross-Validation • Standard method for evaluation: stratified 10-fold cross-validation. • Why 10? Extensive experiments have shown that this is the best choice to get an accurate estimate. • Stratification reduces the estimate’s variance. • Even better: repeated stratified cross-validation: – E.g. ten-fold cross-validation is repeated ten times and results are averaged (reduces the variance).
  • 17. Leave-One-Out Cross-Validation • Leave-One-Out is a particular form of cross-validation: – Set number of folds to number of training instances; – I.e., for n training instances, build classifier n times. • Makes best use of the data. • Involves no random sub-sampling. • Very computationally expensive. • A disadvantage of Leave-One-Out-CV is that stratification is not possible: – It guarantees a non-stratified sample because there is only one instance in the test set!
  • 18. Model Evaluation in Python: Train Test Split
  • 19. Model Evaluation in Python: Confusion Matrix
  • 20. Model Evaluation in Python: Graphical Confusion Matrix
  • 21. Model Evaluation in Python: Evaluation Metrics
  • 22. Model Evaluation in Python: Train, Validation, Test Split
  • 23. Model Evaluation in Python: Stratified Repeated Hold Out
  • 24. Model Evaluation in Python: k-Fold Cross-Validation
  • 25. Model Evaluation in Python: Leave-One-Out Cross-Validation