SlideShare a Scribd company logo
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?
• Methods for Performance Evaluation
• How to obtain reliable estimates?
• Methods for Model Comparison
• How to compare the relative performance among
competing models?
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?
• Methods for Performance Evaluation
• How to obtain reliable estimates?
• Methods for Model Comparison
• How to compare the relative performance among
competing models?
Metrics for Performance Evaluation
• Focus on the predictive capability of a model
• Rather than how fast it takes to classify or build models,
scalability, etc.
• Confusion Matrix:
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c d
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
Metrics for Performance Evaluation…
• Most widely-used metric:
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a
(TP)
b
(FN)
Class=No c
(FP)
d
(TN)
FN
FP
TN
TP
TN
TP
d
c
b
a
d
a










Accuracy
Limitation of Accuracy
• Consider a 2-class problem
• Number of Class 0 examples = 9990
• Number of Class 1 examples = 10
• If model predicts everything to be class 0,
accuracy is 9990/10000 = 99.9 %
• Accuracy is misleading because model does not detect
any class 1 example
Cost Matrix
PREDICTED CLASS
ACTUAL
CLASS
C(i|j) Class=Yes Class=No
Class=Yes C(Yes|Yes) C(No|Yes)
Class=No C(Yes|No) C(No|No)
C(i|j): Cost of classifying class j example as class i
d
w
c
w
b
w
a
w
d
w
a
w
4
3
2
1
4
1
Accuracy
Weighted





Computing Cost of Classification
Cost
Matrix
PREDICTED CLASS
ACTUAL
CLASS
C(i|j) + -
+ -1 100
- 1 0
Model
M1
PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 150 40
- 60 250
Model
M2
PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 250 45
- 5 200
Accuracy = 80%
Cost = 3910
Accuracy = 90%
Cost = 4255
Cost vs Accuracy
Count PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c d
Cost PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes p q
Class=No q p
N = a + b + c + d
Accuracy = (a + d)/N
Cost = p (a + d) + q (b + c)
= p (a + d) + q (N – a – d)
= q N – (q – p)(a + d)
= N [q – (q-p)  Accuracy]
Accuracy is proportional to cost if
1. C(Yes|No)=C(No|Yes) = q
2. C(Yes|Yes)=C(No|No) = p
Precision-Recall
FN
FP
TP
TP
c
b
a
a
p
r
rp
p
r
FN
TP
TP
b
a
a
FP
TP
TP
c
a
a













 









2
2
2
2
2
2
/
1
/
1
1
(F)
measure
-
F
(r)
Recall
(p)
Precision
 Precision is biased towards C(Yes|Yes) & C(Yes|No)
 Recall is biased towards C(Yes|Yes) & C(No|Yes)
 F-measure is biased towards all except C(No|No)
Count PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c d
Precision-Recall plot
• Usually for parameterized models, it controls the
precision/recall tradeoff
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?
• Methods for Performance Evaluation
• How to obtain reliable estimates?
• Methods for Model Comparison
• How to compare the relative performance among
competing models?
Methods for Performance Evaluation
• How to obtain a reliable estimate of
performance?
• Performance of a model may depend on other
factors besides the learning algorithm:
• Class distribution
• Cost of misclassification
• Size of training and test sets
Methods of Estimation
• Holdout
• Reserve 2/3 for training and 1/3 for testing
• Random subsampling
• One sample may be biased -- Repeated holdout
• Cross validation
• Partition data into k disjoint subsets
• k-fold: train on k-1 partitions, test on the remaining one
• Leave-one-out: k=n
• Guarantees that each record is used the same number of
times for training and testing
• Bootstrap
• Sampling with replacement
• ~63% of records used for training, ~27% for testing
Dealing with class Imbalance
• If the class we are interested in is very rare, then
the classifier will ignore it.
• The class imbalance problem
• Solution
• We can modify the optimization criterion by using a cost
sensitive metric
• We can balance the class distribution
• Sample from the larger class so that the size of the two classes
is the same
• Replicate the data of the class of interest so that the classes are
balanced
• Over-fitting issues
Learning Curve
 Learning curve shows how
accuracy changes with
varying sample size
 Requires a sampling
schedule for creating learning
curve
Effect of small sample size:
- Bias in the estimate
- Poor model
- Underfitting error
- Variance of estimate
- Poor training data
- Overfitting error
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?
• Methods for Performance Evaluation
• How to obtain reliable estimates?
• Methods for Model Comparison
• How to compare the relative performance among
competing models?
ROC (Receiver Operating Characteristic)
• Developed in 1950s for signal detection theory to analyze
noisy signals
• Characterize the trade-off between positive hits and false alarms
• ROC curve plots TPR (on the y-axis) against FPR (on the
x-axis)
FN
TP
TP
TPR


TN
FP
FP
FPR


PREDICTED CLASS
Actual
Yes No
Yes a
(TP)
b
(FN)
No c
(FP)
d
(TN)
Fraction of positive instances
predicted as positive
Fraction of negative instances
predicted as positive
ROC (Receiver Operating Characteristic)
• Performance of a classifier represented as a point
on the ROC curve
• Changing some parameter of the algorithm,
sample distribution, or cost matrix changes the
location of the point
ROC Curve
At threshold t:
TP=0.5, FN=0.5, FP=0.12, FN=0.88
- 1-dimensional data set containing 2 classes (positive and negative)
- any points located at x > t is classified as positive
ROC Curve
(TP,FP):
• (0,0): declare everything
to be negative class
• (1,1): declare everything
to be positive class
• (1,0): ideal
• Diagonal line:
• Random guessing
• Below diagonal line:
• prediction is opposite of
the true class
PREDICTED CLASS
Actual
Yes No
Yes a
(TP)
b
(FN)
No c
(FP)
d
(TN)
Using ROC for Model Comparison
 No model consistently
outperform the other
 M1 is better for
small FPR
 M2 is better for
large FPR
 Area Under the ROC
curve (AUC)
 Ideal: Area = 1
 Random guess:
 Area = 0.5
ROC curve vs Precision-Recall curve
Area Under the Curve (AUC) as a single number for evaluation

More Related Content

Similar to datamining-lect11.pptx

NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
cmpt cmpt
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
PriyadharshiniG41
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Thomas Ploetz
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
Pier Luca Lanzi
 
Cheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricksCheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricks
Steve Nouri
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
Pier Luca Lanzi
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
Crossing Minds
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
Collin Bennett
 
vdocuments.mx_supplier-selection-fuzzy-ahp.ppt
vdocuments.mx_supplier-selection-fuzzy-ahp.pptvdocuments.mx_supplier-selection-fuzzy-ahp.ppt
vdocuments.mx_supplier-selection-fuzzy-ahp.ppt
hamedtab66
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
PerumalPitchandi
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
Riadh Al-Haidari
 
Classification assessment methods
Classification assessment methodsClassification assessment methods
Classification assessment methods
Alaa Tharwat
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
Barbara Fusinska
 
Prediction of House Sales Price
Prediction of House Sales PricePrediction of House Sales Price
Prediction of House Sales Price
Anirvan Ghosh
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
BAINIDA
 
Dealing with imbalanced data in RTB
Dealing with imbalanced data in RTBDealing with imbalanced data in RTB
Dealing with imbalanced data in RTB
Yuya Kanemoto
 
Mattar_PhD_Thesis
Mattar_PhD_ThesisMattar_PhD_Thesis
Mattar_PhD_Thesis
Marwan Mattar
 
Lecture3-eval.pptx
Lecture3-eval.pptxLecture3-eval.pptx
Lecture3-eval.pptx
SahilShahPhD2020
 
Lecture 3 for the AI course in A university
Lecture 3 for the AI course in A universityLecture 3 for the AI course in A university
Lecture 3 for the AI course in A university
Cao Minh Tu
 

Similar to datamining-lect11.pptx (20)

NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
Cheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricksCheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricks
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
vdocuments.mx_supplier-selection-fuzzy-ahp.ppt
vdocuments.mx_supplier-selection-fuzzy-ahp.pptvdocuments.mx_supplier-selection-fuzzy-ahp.ppt
vdocuments.mx_supplier-selection-fuzzy-ahp.ppt
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
 
Classification assessment methods
Classification assessment methodsClassification assessment methods
Classification assessment methods
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Prediction of House Sales Price
Prediction of House Sales PricePrediction of House Sales Price
Prediction of House Sales Price
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 
Dealing with imbalanced data in RTB
Dealing with imbalanced data in RTBDealing with imbalanced data in RTB
Dealing with imbalanced data in RTB
 
Mattar_PhD_Thesis
Mattar_PhD_ThesisMattar_PhD_Thesis
Mattar_PhD_Thesis
 
Lecture3-eval.pptx
Lecture3-eval.pptxLecture3-eval.pptx
Lecture3-eval.pptx
 
Lecture 3 for the AI course in A university
Lecture 3 for the AI course in A universityLecture 3 for the AI course in A university
Lecture 3 for the AI course in A university
 

More from RithikRaj25

html1.ppt
html1.ppthtml1.ppt
html1.ppt
RithikRaj25
 
Data
DataData
Data
DataData
Introduction To Database.ppt
Introduction To Database.pptIntroduction To Database.ppt
Introduction To Database.ppt
RithikRaj25
 
Data.ppt
Data.pptData.ppt
Data.ppt
RithikRaj25
 
DataTypes.ppt
DataTypes.pptDataTypes.ppt
DataTypes.ppt
RithikRaj25
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
RithikRaj25
 
NoSQL
NoSQLNoSQL
text classification_NB.ppt
text classification_NB.ppttext classification_NB.ppt
text classification_NB.ppt
RithikRaj25
 
html1.ppt
html1.ppthtml1.ppt
html1.ppt
RithikRaj25
 
slide-keras-tf.pptx
slide-keras-tf.pptxslide-keras-tf.pptx
slide-keras-tf.pptx
RithikRaj25
 
Intro_OpenCV.ppt
Intro_OpenCV.pptIntro_OpenCV.ppt
Intro_OpenCV.ppt
RithikRaj25
 
lec1b.ppt
lec1b.pptlec1b.ppt
lec1b.ppt
RithikRaj25
 
PR7.ppt
PR7.pptPR7.ppt
PR7.ppt
RithikRaj25
 
objectdetect_tutorial.ppt
objectdetect_tutorial.pptobjectdetect_tutorial.ppt
objectdetect_tutorial.ppt
RithikRaj25
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
RithikRaj25
 
week6a.ppt
week6a.pptweek6a.ppt
week6a.ppt
RithikRaj25
 

More from RithikRaj25 (17)

html1.ppt
html1.ppthtml1.ppt
html1.ppt
 
Data
DataData
Data
 
Data
DataData
Data
 
Introduction To Database.ppt
Introduction To Database.pptIntroduction To Database.ppt
Introduction To Database.ppt
 
Data.ppt
Data.pptData.ppt
Data.ppt
 
DataTypes.ppt
DataTypes.pptDataTypes.ppt
DataTypes.ppt
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
NoSQL
NoSQLNoSQL
NoSQL
 
text classification_NB.ppt
text classification_NB.ppttext classification_NB.ppt
text classification_NB.ppt
 
html1.ppt
html1.ppthtml1.ppt
html1.ppt
 
slide-keras-tf.pptx
slide-keras-tf.pptxslide-keras-tf.pptx
slide-keras-tf.pptx
 
Intro_OpenCV.ppt
Intro_OpenCV.pptIntro_OpenCV.ppt
Intro_OpenCV.ppt
 
lec1b.ppt
lec1b.pptlec1b.ppt
lec1b.ppt
 
PR7.ppt
PR7.pptPR7.ppt
PR7.ppt
 
objectdetect_tutorial.ppt
objectdetect_tutorial.pptobjectdetect_tutorial.ppt
objectdetect_tutorial.ppt
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
week6a.ppt
week6a.pptweek6a.ppt
week6a.ppt
 

Recently uploaded

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

datamining-lect11.pptx

  • 1. Model Evaluation • Metrics for Performance Evaluation • How to evaluate the performance of a model? • Methods for Performance Evaluation • How to obtain reliable estimates? • Methods for Model Comparison • How to compare the relative performance among competing models?
  • 2. Model Evaluation • Metrics for Performance Evaluation • How to evaluate the performance of a model? • Methods for Performance Evaluation • How to obtain reliable estimates? • Methods for Model Comparison • How to compare the relative performance among competing models?
  • 3. Metrics for Performance Evaluation • Focus on the predictive capability of a model • Rather than how fast it takes to classify or build models, scalability, etc. • Confusion Matrix: PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes a b Class=No c d a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative)
  • 4. Metrics for Performance Evaluation… • Most widely-used metric: PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes a (TP) b (FN) Class=No c (FP) d (TN) FN FP TN TP TN TP d c b a d a           Accuracy
  • 5. Limitation of Accuracy • Consider a 2-class problem • Number of Class 0 examples = 9990 • Number of Class 1 examples = 10 • If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9 % • Accuracy is misleading because model does not detect any class 1 example
  • 6. Cost Matrix PREDICTED CLASS ACTUAL CLASS C(i|j) Class=Yes Class=No Class=Yes C(Yes|Yes) C(No|Yes) Class=No C(Yes|No) C(No|No) C(i|j): Cost of classifying class j example as class i d w c w b w a w d w a w 4 3 2 1 4 1 Accuracy Weighted     
  • 7. Computing Cost of Classification Cost Matrix PREDICTED CLASS ACTUAL CLASS C(i|j) + - + -1 100 - 1 0 Model M1 PREDICTED CLASS ACTUAL CLASS + - + 150 40 - 60 250 Model M2 PREDICTED CLASS ACTUAL CLASS + - + 250 45 - 5 200 Accuracy = 80% Cost = 3910 Accuracy = 90% Cost = 4255
  • 8. Cost vs Accuracy Count PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes a b Class=No c d Cost PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes p q Class=No q p N = a + b + c + d Accuracy = (a + d)/N Cost = p (a + d) + q (b + c) = p (a + d) + q (N – a – d) = q N – (q – p)(a + d) = N [q – (q-p)  Accuracy] Accuracy is proportional to cost if 1. C(Yes|No)=C(No|Yes) = q 2. C(Yes|Yes)=C(No|No) = p
  • 9. Precision-Recall FN FP TP TP c b a a p r rp p r FN TP TP b a a FP TP TP c a a                         2 2 2 2 2 2 / 1 / 1 1 (F) measure - F (r) Recall (p) Precision  Precision is biased towards C(Yes|Yes) & C(Yes|No)  Recall is biased towards C(Yes|Yes) & C(No|Yes)  F-measure is biased towards all except C(No|No) Count PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes a b Class=No c d
  • 10. Precision-Recall plot • Usually for parameterized models, it controls the precision/recall tradeoff
  • 11. Model Evaluation • Metrics for Performance Evaluation • How to evaluate the performance of a model? • Methods for Performance Evaluation • How to obtain reliable estimates? • Methods for Model Comparison • How to compare the relative performance among competing models?
  • 12. Methods for Performance Evaluation • How to obtain a reliable estimate of performance? • Performance of a model may depend on other factors besides the learning algorithm: • Class distribution • Cost of misclassification • Size of training and test sets
  • 13. Methods of Estimation • Holdout • Reserve 2/3 for training and 1/3 for testing • Random subsampling • One sample may be biased -- Repeated holdout • Cross validation • Partition data into k disjoint subsets • k-fold: train on k-1 partitions, test on the remaining one • Leave-one-out: k=n • Guarantees that each record is used the same number of times for training and testing • Bootstrap • Sampling with replacement • ~63% of records used for training, ~27% for testing
  • 14. Dealing with class Imbalance • If the class we are interested in is very rare, then the classifier will ignore it. • The class imbalance problem • Solution • We can modify the optimization criterion by using a cost sensitive metric • We can balance the class distribution • Sample from the larger class so that the size of the two classes is the same • Replicate the data of the class of interest so that the classes are balanced • Over-fitting issues
  • 15. Learning Curve  Learning curve shows how accuracy changes with varying sample size  Requires a sampling schedule for creating learning curve Effect of small sample size: - Bias in the estimate - Poor model - Underfitting error - Variance of estimate - Poor training data - Overfitting error
  • 16. Model Evaluation • Metrics for Performance Evaluation • How to evaluate the performance of a model? • Methods for Performance Evaluation • How to obtain reliable estimates? • Methods for Model Comparison • How to compare the relative performance among competing models?
  • 17. ROC (Receiver Operating Characteristic) • Developed in 1950s for signal detection theory to analyze noisy signals • Characterize the trade-off between positive hits and false alarms • ROC curve plots TPR (on the y-axis) against FPR (on the x-axis) FN TP TP TPR   TN FP FP FPR   PREDICTED CLASS Actual Yes No Yes a (TP) b (FN) No c (FP) d (TN) Fraction of positive instances predicted as positive Fraction of negative instances predicted as positive
  • 18. ROC (Receiver Operating Characteristic) • Performance of a classifier represented as a point on the ROC curve • Changing some parameter of the algorithm, sample distribution, or cost matrix changes the location of the point
  • 19. ROC Curve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=0.88 - 1-dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive
  • 20. ROC Curve (TP,FP): • (0,0): declare everything to be negative class • (1,1): declare everything to be positive class • (1,0): ideal • Diagonal line: • Random guessing • Below diagonal line: • prediction is opposite of the true class PREDICTED CLASS Actual Yes No Yes a (TP) b (FN) No c (FP) d (TN)
  • 21. Using ROC for Model Comparison  No model consistently outperform the other  M1 is better for small FPR  M2 is better for large FPR  Area Under the ROC curve (AUC)  Ideal: Area = 1  Random guess:  Area = 0.5
  • 22. ROC curve vs Precision-Recall curve Area Under the Curve (AUC) as a single number for evaluation