SlideShare a Scribd company logo
1 of 18
1
ML-Chapter Four:
Model Performance Evaluation
Belay E., Asst. Prof.
e-mail: belayenyew@gmail.com
Mobile: 0946235206
University of Gondar
College of Informatics
Department of Information Technology
2
Evaluating Model Performance
โ€ข In experimental Machine Learning, we evaluate the accuracy
of a model empirically.
โ€ข Use to comparing Classifiers/Learning Algorithms
โ€ข What is an evaluationโ€™s Metric?
o A way to quantify a performance of a machine learning model.
o It uses for the evaluation of the performance of the machine
learning model and why to use one in place of the other.
โ€ข For classification:
o Confusion matrix, Accuracy, Precision, Recall, Specificity, F1
score Precision-Recall or PR curve and ROC (Receiver Operating
Characteristics) curve
โ€ข Prediction(Regression): MAE, MSE, RMSE and R2
2
3
Evaluating classification Model Performance
โ€ข Suppose a binary classification problem like a patient is
having cancer (positive) or is found healthy (negative)
โ€ข Some common terminology:
o True positives (TP): Predicted positive and are
actually positive.
o False positives (FP): Predicted positive and are
actually negative
o True negatives (TN): Predicted negative and are
actually negative.
o False negatives (FN): Predicted negative and are
actually positive.
3
4
Evaluating Model Performance
โ€ข Confusion matrix: The above terms are summarized
using confusion matrix.
โ€ข TP and TN tell us when the classi๏ฌer is getting
things right, while FP and FN tell us when the
classi๏ฌer is getting things wrong
4
5
Evaluating Model Performance
โ€ข Accuracy (Recognition Rate): calculated as the number
of all correct predictions divided by the total number of
the testing dataset.
o The most commonly used metric to judge a model
o It is not a clear indicator of the performance when classes are
imbalanced.
๐‘จ๐’„๐’„๐’–๐’“๐’‚๐’„๐’š =
๐‘ป๐‘ท + ๐‘ป๐‘ต
๐‘ป๐‘ท + ๐‘ญ๐‘ท + ๐‘ป๐‘ต + ๐‘ญ๐‘ต
โ€ข Error rate or misclassi๏ฌcation rate:
error rate=1 โ€“ accuracy, or
๐’†๐’“๐’“๐’๐’“ ๐’“๐’‚๐’•๐’† =
๐‘ญ๐‘ท + ๐‘ญ๐‘ต
๐‘ป๐‘ท + ๐‘ญ๐‘ท + ๐‘ป๐‘ต + ๐‘ญ๐‘ต
5
6
Example of Confusion Matrix:
โ€ข An example of a confusion matrix for the two classes
buys-computer=yes (positive) and buys-computer=no
(negative)
โ€ข Accuracy = (TP + TN)/total=(6954+2588)/10000=0.95
โ€ข error rate or misclassi๏ฌcation rate=1 โ€“ accuracy, or
Error rate = (FP + FN)/total=(412+46)/10000=0.05
7
Precision and Recall
โ€ข Precision: Percentage of positive instances out of the total
predicted instances i.e., what percentage of tuples labeled as
positive are actually positive.
๐’‘๐’“๐’†๐’„๐’Š๐’”๐’Š๐’๐’ =
๐‘ป๐‘ท
๐‘ป๐‘ท + ๐‘ญ๐‘ท
โ€ข Recall/Sensitivity/True Positive Rate : Percentage of positive
instances out of the total actual positive instances i.e. what
percentage of positive tuples are labeled as positive.
๐’“๐’†๐’„๐’‚๐’๐’ ๐’๐’“ ๐‘ป๐‘ท๐‘น ๐’๐’“ ๐’”๐’†๐’๐’”๐’Š๐’•๐’Š๐’—๐’Š๐’•๐’š =
๐‘ป๐‘ท
๐‘ป๐‘ท + ๐‘ญ๐‘ต
o Perfect score is 1.0
โ€ข Specificity: Percentage of negative instances out of the total actual
negative instances ๐‘ ๐‘๐‘’๐‘๐‘–๐‘“๐‘–๐‘๐‘–๐‘ก๐‘ฆ =
๐‘‡๐‘
๐‘‡๐‘+๐น๐‘ƒ
7
8
Precision and Recall-Example
๏ƒ˜ Based on the previous confusion matrix:
๏ƒผ Precision = 6954/(6954+412) = 0.944 =94.4%
๏ƒผ Recall = 6954/(6954+46)= 0.9934=99.34%
๏ƒ˜ A perfect precision score of 1.0 for a class C means that
every tuple that the classi๏ฌer labeled as belonging to class C
does indeed belong to class C. However, it does not tell us
anything about the number of class C tuples that the
classi๏ฌer mislabeled.
๏ƒ˜ A perfect recall score of 1.0 for C means that every item
from class C was labeled as such, but it does not tell us how
many other tuples were incorrectly labeled as belonging to
class C
๏ƒ˜ Specificity=2588/(2588+412)=0.8626=86.26% 8
9
Precision and Recall-Example
F measure (F1 or F-score): harmonic mean of precision and recall.
๏ƒ˜ It an alternative way to use precision and recall by combining
them into a single measure .
๐‘ญ๐Ÿ =
๐Ÿ โˆ— ๐’“๐’†๐’„๐’‚๐’๐’ โˆ— ๐’‘๐’“๐’†๐’„๐’Š๐’”๐’Š๐’๐’
๐’‘๐’“๐’†๐’„๐’Š๐’”๐’Š๐’๐’ + ๐’“๐’†๐’„๐’‚๐’๐’
F1=(2*0.944*0.9934)/(0.9934+0.944)=1.8755392/1.9374=0.9680
โ€ข The higher the F1 score, the better.
Suppose
Model 1: Recall =70% and precision= 60%
Model 2: Recall =80% and precision=50%
Which model is better? Use F measure
10
class imbalance problem
๏ƒ˜ where the main class of interest is rare.
๏ƒ˜ The data set distribution re๏ฌ‚ects a signi๏ฌcant majority of the
negative class and a minority of positive class.
โ€ข For example, in fraud detection applications, the class of
interest (or positive class) is โ€œfraud,โ€ which occurs much
less frequently than the negative โ€œnonfraudulantโ€ class.
โ€ข class imbalance problem identified using:
o Sensitivity is also referred to as the true positive (recognition)
rate (i.e., the proportion of positive tuples that are correctly
identi๏ฌed), while speci๏ฌcity is the true negative rate (i.e., the
proportion of negative tuples that are correctly identi๏ฌed)
10
11
class imbalance problem
Example, a confusion matrix for medical data where the class values are yes and no
for a class label attribute, cancer.
โ€ข Confusion matrix for the classes cancer=yes and cancer =no.
๏ƒผ The sensitivity of the classifier is 90/300=30%
๏ƒผ The Specificity of classifier is 9560/9700=98.56%
๏ƒผ Overall accuracy is 9650/10000=96.50%
โ€ข Thus, we note that although the classi๏ฌer has a high accuracy, itโ€™s
ability to correctly label the positive (rare) class is poor given its
low sensitivity.
โ€ข It has high speci๏ฌcity, meaning that it can accurately recognize negative tuples.
11
12
ROC curves
โ€ข A Receiver Operating Characteristic (ROC) curve plots
the TP-rate vs. the FP-rate as a threshold on the
confidence of an instance being positive is varied
12
13
ROC curves
โ€ข ROC for visual comparison of models
โ€ข Originated from signal detection theory
โ€ข Shows the trade-off between the true positive rate and the false
positive rate
โ€ข The area under the ROC curve is a measure of the accuracy of the
model
โ€ข Rank the test tuples in decreasing order: the one that is most
likely to belong to the positive class appears at the top of the list
โ€ข The closer to the diagonal line (i.e., the closer the area is to 0.5),
the less accurate is the model
o Vertical axis represents the true positive rate
o Horizontal axis rep. the false positive rate
o The plot also shows a diagonal line
o A model with perfect accuracy will have an area of 1.0
13
14
Predictor Error Measures
โ€ข Measure predictor accuracy: measure how far off the
predicted value is from the actual known value.
โ€ข Loss function: measures the error between actual
values(๐‘ฆ๐‘–) and the predicted values(๐‘ฆ๐‘–)
o Absolute error: ๐’š๐’Š โˆ’ ๐’š๐’Š
o Squared error: ๐’š๐’Š โˆ’ ๐’š๐’Š
๐Ÿ
โ€ข Test error: the average loss over the test set
o Mean absolute error(MAE): ๐’Š=๐Ÿ
๐‘ต
๐’š๐’Šโˆ’๐’š๐’Š
๐‘ต
o Mean Squared error(MSE): ๐’Š=๐Ÿ
๐‘ต
๐’š๐’Šโˆ’๐’š๐’Š
๐Ÿ
๐‘ต
o Root Mean Squared Error(RMSE): ๐’Š=๐Ÿ
๐‘ต ๐’š๐’Šโˆ’๐’š๐’Š
๐Ÿ
๐‘ต
where ๐‘ฆ is average values and N is the number of observations.
14
15
Predictor Error Measures
o Relative absolute error: ๐’Š=๐Ÿ
๐‘ต
๐’š๐’Šโˆ’๐’š๐’Š
๐’Š=๐Ÿ
๐‘ต ๐’š๐’Šโˆ’๐’š
o Relative square error: ๐‘–=1
๐‘
๐‘ฆ๐‘–โˆ’๐‘ฆ๐‘–
2
๐‘–=1
๐‘ ๐‘ฆ๐‘–โˆ’๐‘ฆ 2
โ€ข The mean squared error exaggerates the presence of outliers
โ€ข Popularly use( square) root mean-square error, similarly , root
relative squared error.
โ€ข R Squared(R2) :
o Indicates how well the model prediction approximates the true
vales.
o 1 indicates prefect fit and 0 show low performance
๐‘…2 = 1 โˆ’
๐‘–=1
๐‘
๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–
2
๐‘–=1
๐‘
๐‘ฆ๐‘– โˆ’ ๐‘ฆ 2
15
16
Issues Affecting Model Selection
โ€ข Accuracy
o classifier accuracy: predicting class label
โ€ข Speed
o time to construct the model (training time)
o time to use the model (classification/prediction time)
โ€ข Robustness: handling noise and missing values
โ€ข Scalability: efficiency in disk-resident databases
โ€ข Interpretability
o understanding and insight provided by the model
โ€ข Other measures, e.g., goodness of rules, such as decision tree size
or compactness of classification rules.
16
17
Improving Classi๏ฌcation Accuracy of Class-
Imbalanced Data
โ€ข General approaches for improving the classi๏ฌcation
accuracy of class-imbalanced data: oversampling and
Undersampling
โ€ข Oversampling and undersampling change the
distribution of tuples in the training set.
โ€ข Both oversampling and undersampling change the
training data distribution so that the rare (positive)
class is well represented
17
18
Cont..
โ€ข Oversampling works by resampling the positive tuples
so that the resulting training set contains an equal
number of positive and negative tuples.
โ€ข Undersampling works by decreasing the number of
negative tuples. It randomly eliminates tuples from the
majority (negative) class until there are an equal number
of positive and negative tuples
18

More Related Content

Similar to ML Model Performance Evaluation

Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
ย 
Intro to data science
Intro to data scienceIntro to data science
Intro to data scienceANURAG SINGH
ย 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxTemp762476
ย 
Ways to evaluate a machine learning modelโ€™s performance
Ways to evaluate a machine learning modelโ€™s performanceWays to evaluate a machine learning modelโ€™s performance
Ways to evaluate a machine learning modelโ€™s performanceMala Deep Upadhaya
ย 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
ย 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdfShiwani Gupta
ย 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptxshuchismitjha2
ย 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptxRithikRaj25
ย 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
ย 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
ย 
Module 3_ Classification.pptx
Module 3_ Classification.pptxModule 3_ Classification.pptx
Module 3_ Classification.pptxnikshaikh786
ย 
Download the presentation
Download the presentationDownload the presentation
Download the presentationbutest
ย 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
ย 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
ย 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
ย 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)Abhimanyu Dwivedi
ย 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
ย 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
ย 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxajondaree
ย 

Similar to ML Model Performance Evaluation (20)

Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ย 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ย 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
ย 
Ways to evaluate a machine learning modelโ€™s performance
Ways to evaluate a machine learning modelโ€™s performanceWays to evaluate a machine learning modelโ€™s performance
Ways to evaluate a machine learning modelโ€™s performance
ย 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
ย 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
ย 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
ย 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
ย 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
ย 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
ย 
Module 3_ Classification.pptx
Module 3_ Classification.pptxModule 3_ Classification.pptx
Module 3_ Classification.pptx
ย 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
ย 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
ย 
Numerical Method
Numerical Method Numerical Method
Numerical Method
ย 
Supervised learning
Supervised learningSupervised learning
Supervised learning
ย 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
ย 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
ย 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
ย 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
ย 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
ย 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
ย 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
ย 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
ย 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
ย 
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Callshivangimorya083
ย 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
ย 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics
ย 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
ย 
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”นไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”นatducpo
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
ย 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
ย 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
ย 
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ffjhghh
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
ย 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
ย 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
ย 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
ย 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
ย 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
ย 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
ย 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
ย 
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
ย 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
ย 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
ย 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
ย 
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”นไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
ย 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
ย 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
ย 
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
ย 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
ย 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
ย 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
ย 

ML Model Performance Evaluation

  • 1. 1 ML-Chapter Four: Model Performance Evaluation Belay E., Asst. Prof. e-mail: belayenyew@gmail.com Mobile: 0946235206 University of Gondar College of Informatics Department of Information Technology
  • 2. 2 Evaluating Model Performance โ€ข In experimental Machine Learning, we evaluate the accuracy of a model empirically. โ€ข Use to comparing Classifiers/Learning Algorithms โ€ข What is an evaluationโ€™s Metric? o A way to quantify a performance of a machine learning model. o It uses for the evaluation of the performance of the machine learning model and why to use one in place of the other. โ€ข For classification: o Confusion matrix, Accuracy, Precision, Recall, Specificity, F1 score Precision-Recall or PR curve and ROC (Receiver Operating Characteristics) curve โ€ข Prediction(Regression): MAE, MSE, RMSE and R2 2
  • 3. 3 Evaluating classification Model Performance โ€ข Suppose a binary classification problem like a patient is having cancer (positive) or is found healthy (negative) โ€ข Some common terminology: o True positives (TP): Predicted positive and are actually positive. o False positives (FP): Predicted positive and are actually negative o True negatives (TN): Predicted negative and are actually negative. o False negatives (FN): Predicted negative and are actually positive. 3
  • 4. 4 Evaluating Model Performance โ€ข Confusion matrix: The above terms are summarized using confusion matrix. โ€ข TP and TN tell us when the classi๏ฌer is getting things right, while FP and FN tell us when the classi๏ฌer is getting things wrong 4
  • 5. 5 Evaluating Model Performance โ€ข Accuracy (Recognition Rate): calculated as the number of all correct predictions divided by the total number of the testing dataset. o The most commonly used metric to judge a model o It is not a clear indicator of the performance when classes are imbalanced. ๐‘จ๐’„๐’„๐’–๐’“๐’‚๐’„๐’š = ๐‘ป๐‘ท + ๐‘ป๐‘ต ๐‘ป๐‘ท + ๐‘ญ๐‘ท + ๐‘ป๐‘ต + ๐‘ญ๐‘ต โ€ข Error rate or misclassi๏ฌcation rate: error rate=1 โ€“ accuracy, or ๐’†๐’“๐’“๐’๐’“ ๐’“๐’‚๐’•๐’† = ๐‘ญ๐‘ท + ๐‘ญ๐‘ต ๐‘ป๐‘ท + ๐‘ญ๐‘ท + ๐‘ป๐‘ต + ๐‘ญ๐‘ต 5
  • 6. 6 Example of Confusion Matrix: โ€ข An example of a confusion matrix for the two classes buys-computer=yes (positive) and buys-computer=no (negative) โ€ข Accuracy = (TP + TN)/total=(6954+2588)/10000=0.95 โ€ข error rate or misclassi๏ฌcation rate=1 โ€“ accuracy, or Error rate = (FP + FN)/total=(412+46)/10000=0.05
  • 7. 7 Precision and Recall โ€ข Precision: Percentage of positive instances out of the total predicted instances i.e., what percentage of tuples labeled as positive are actually positive. ๐’‘๐’“๐’†๐’„๐’Š๐’”๐’Š๐’๐’ = ๐‘ป๐‘ท ๐‘ป๐‘ท + ๐‘ญ๐‘ท โ€ข Recall/Sensitivity/True Positive Rate : Percentage of positive instances out of the total actual positive instances i.e. what percentage of positive tuples are labeled as positive. ๐’“๐’†๐’„๐’‚๐’๐’ ๐’๐’“ ๐‘ป๐‘ท๐‘น ๐’๐’“ ๐’”๐’†๐’๐’”๐’Š๐’•๐’Š๐’—๐’Š๐’•๐’š = ๐‘ป๐‘ท ๐‘ป๐‘ท + ๐‘ญ๐‘ต o Perfect score is 1.0 โ€ข Specificity: Percentage of negative instances out of the total actual negative instances ๐‘ ๐‘๐‘’๐‘๐‘–๐‘“๐‘–๐‘๐‘–๐‘ก๐‘ฆ = ๐‘‡๐‘ ๐‘‡๐‘+๐น๐‘ƒ 7
  • 8. 8 Precision and Recall-Example ๏ƒ˜ Based on the previous confusion matrix: ๏ƒผ Precision = 6954/(6954+412) = 0.944 =94.4% ๏ƒผ Recall = 6954/(6954+46)= 0.9934=99.34% ๏ƒ˜ A perfect precision score of 1.0 for a class C means that every tuple that the classi๏ฌer labeled as belonging to class C does indeed belong to class C. However, it does not tell us anything about the number of class C tuples that the classi๏ฌer mislabeled. ๏ƒ˜ A perfect recall score of 1.0 for C means that every item from class C was labeled as such, but it does not tell us how many other tuples were incorrectly labeled as belonging to class C ๏ƒ˜ Specificity=2588/(2588+412)=0.8626=86.26% 8
  • 9. 9 Precision and Recall-Example F measure (F1 or F-score): harmonic mean of precision and recall. ๏ƒ˜ It an alternative way to use precision and recall by combining them into a single measure . ๐‘ญ๐Ÿ = ๐Ÿ โˆ— ๐’“๐’†๐’„๐’‚๐’๐’ โˆ— ๐’‘๐’“๐’†๐’„๐’Š๐’”๐’Š๐’๐’ ๐’‘๐’“๐’†๐’„๐’Š๐’”๐’Š๐’๐’ + ๐’“๐’†๐’„๐’‚๐’๐’ F1=(2*0.944*0.9934)/(0.9934+0.944)=1.8755392/1.9374=0.9680 โ€ข The higher the F1 score, the better. Suppose Model 1: Recall =70% and precision= 60% Model 2: Recall =80% and precision=50% Which model is better? Use F measure
  • 10. 10 class imbalance problem ๏ƒ˜ where the main class of interest is rare. ๏ƒ˜ The data set distribution re๏ฌ‚ects a signi๏ฌcant majority of the negative class and a minority of positive class. โ€ข For example, in fraud detection applications, the class of interest (or positive class) is โ€œfraud,โ€ which occurs much less frequently than the negative โ€œnonfraudulantโ€ class. โ€ข class imbalance problem identified using: o Sensitivity is also referred to as the true positive (recognition) rate (i.e., the proportion of positive tuples that are correctly identi๏ฌed), while speci๏ฌcity is the true negative rate (i.e., the proportion of negative tuples that are correctly identi๏ฌed) 10
  • 11. 11 class imbalance problem Example, a confusion matrix for medical data where the class values are yes and no for a class label attribute, cancer. โ€ข Confusion matrix for the classes cancer=yes and cancer =no. ๏ƒผ The sensitivity of the classifier is 90/300=30% ๏ƒผ The Specificity of classifier is 9560/9700=98.56% ๏ƒผ Overall accuracy is 9650/10000=96.50% โ€ข Thus, we note that although the classi๏ฌer has a high accuracy, itโ€™s ability to correctly label the positive (rare) class is poor given its low sensitivity. โ€ข It has high speci๏ฌcity, meaning that it can accurately recognize negative tuples. 11
  • 12. 12 ROC curves โ€ข A Receiver Operating Characteristic (ROC) curve plots the TP-rate vs. the FP-rate as a threshold on the confidence of an instance being positive is varied 12
  • 13. 13 ROC curves โ€ข ROC for visual comparison of models โ€ข Originated from signal detection theory โ€ข Shows the trade-off between the true positive rate and the false positive rate โ€ข The area under the ROC curve is a measure of the accuracy of the model โ€ข Rank the test tuples in decreasing order: the one that is most likely to belong to the positive class appears at the top of the list โ€ข The closer to the diagonal line (i.e., the closer the area is to 0.5), the less accurate is the model o Vertical axis represents the true positive rate o Horizontal axis rep. the false positive rate o The plot also shows a diagonal line o A model with perfect accuracy will have an area of 1.0 13
  • 14. 14 Predictor Error Measures โ€ข Measure predictor accuracy: measure how far off the predicted value is from the actual known value. โ€ข Loss function: measures the error between actual values(๐‘ฆ๐‘–) and the predicted values(๐‘ฆ๐‘–) o Absolute error: ๐’š๐’Š โˆ’ ๐’š๐’Š o Squared error: ๐’š๐’Š โˆ’ ๐’š๐’Š ๐Ÿ โ€ข Test error: the average loss over the test set o Mean absolute error(MAE): ๐’Š=๐Ÿ ๐‘ต ๐’š๐’Šโˆ’๐’š๐’Š ๐‘ต o Mean Squared error(MSE): ๐’Š=๐Ÿ ๐‘ต ๐’š๐’Šโˆ’๐’š๐’Š ๐Ÿ ๐‘ต o Root Mean Squared Error(RMSE): ๐’Š=๐Ÿ ๐‘ต ๐’š๐’Šโˆ’๐’š๐’Š ๐Ÿ ๐‘ต where ๐‘ฆ is average values and N is the number of observations. 14
  • 15. 15 Predictor Error Measures o Relative absolute error: ๐’Š=๐Ÿ ๐‘ต ๐’š๐’Šโˆ’๐’š๐’Š ๐’Š=๐Ÿ ๐‘ต ๐’š๐’Šโˆ’๐’š o Relative square error: ๐‘–=1 ๐‘ ๐‘ฆ๐‘–โˆ’๐‘ฆ๐‘– 2 ๐‘–=1 ๐‘ ๐‘ฆ๐‘–โˆ’๐‘ฆ 2 โ€ข The mean squared error exaggerates the presence of outliers โ€ข Popularly use( square) root mean-square error, similarly , root relative squared error. โ€ข R Squared(R2) : o Indicates how well the model prediction approximates the true vales. o 1 indicates prefect fit and 0 show low performance ๐‘…2 = 1 โˆ’ ๐‘–=1 ๐‘ ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘– 2 ๐‘–=1 ๐‘ ๐‘ฆ๐‘– โˆ’ ๐‘ฆ 2 15
  • 16. 16 Issues Affecting Model Selection โ€ข Accuracy o classifier accuracy: predicting class label โ€ข Speed o time to construct the model (training time) o time to use the model (classification/prediction time) โ€ข Robustness: handling noise and missing values โ€ข Scalability: efficiency in disk-resident databases โ€ข Interpretability o understanding and insight provided by the model โ€ข Other measures, e.g., goodness of rules, such as decision tree size or compactness of classification rules. 16
  • 17. 17 Improving Classi๏ฌcation Accuracy of Class- Imbalanced Data โ€ข General approaches for improving the classi๏ฌcation accuracy of class-imbalanced data: oversampling and Undersampling โ€ข Oversampling and undersampling change the distribution of tuples in the training set. โ€ข Both oversampling and undersampling change the training data distribution so that the rare (positive) class is well represented 17
  • 18. 18 Cont.. โ€ข Oversampling works by resampling the positive tuples so that the resulting training set contains an equal number of positive and negative tuples. โ€ข Undersampling works by decreasing the number of negative tuples. It randomly eliminates tuples from the majority (negative) class until there are an equal number of positive and negative tuples 18