SlideShare a Scribd company logo
Data Mining &
Knowledge Discovery
IME 672
Dr. Faiz Hamid
Department of Industrial & Management Engineering
Indian Institute of Technology Kanpur
Email: fhamid@iitk.ac.in
Classifier Evaluation and
Improvement Techniques
Classifier Evaluation
• Estimate how accurately the classifier can predict on future
data on which the classifier has not been trained
• Compare the performance of classifiers if there are more than
one
• How to estimate accuracy?
• Are some measures of a classifier’s accuracy more
appropriate than others?
Classifier Evaluation Metrics
Confusion Matrix
C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)
Actual class
Predicted class
• Positive tuples - tuples of the main class of interest
• Negative tuples - all other tuples
• Confusion matrix – a tool for analysing how well a classifier can recognize tuples of
different classes
• True positives (TP) - positive tuples correctly labeled by the classifier
• True negatives (TN) - negative tuples correctly labeled by the classifier
• False positives (FP) - negative tuples incorrectly labeled as positive
• False negatives (FN) - positive tuples mislabeled as negative
• Confusion matrices can be easily drawn for multiple classes
Confusion between the positive and
negative class
Classifier Evaluation Metrics
• Classifier Accuracy, or recognition rate:
percentage of test set tuples that are
correctly classified
Accuracy = (TP + TN)/All
• Error rate: 1 – accuracy, or
Error rate = (FP + FN)/All
• Sensitivity (Recall): True Positive
recognition rate
• Sensitivity = TP/P
• Specificity: True Negative recognition
rate
• Specificity = TN/N
AP C ¬C
C TP FN P
¬C FP TN N
P’ N’ All
Classifier Evaluation Metrics
• Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive
• Recall: completeness – what % of positive tuples did the
classifier label as positive?
Precision =
# positive tuples retrieved
# tuples retrieved
=
TP
TP + FP
AP C ¬C
C TP FN P
¬C FP TN N
P’ N’ All
Recall =
# positive tuples retrieved
# positive tuples
=
TP
P
Precision = 1 Recall = 1
Classifier Evaluation Metrics
• F measure (F1 or F-score): harmonic mean of precision and recall
• Fß: weighted measure of precision and recall
– assigns ß times as much weight to recall as to precision
Classifier Evaluation Metrics
Example of Confusion Matrix:
buy_computer
= yes
buy_computer
= no
Total
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000
Actual class
Predicted class
Classifier Evaluation Metrics
• Classify medical data tuples
• Positive tuples (cancer = yes)
• Negative tuples (cancer = no)
• The classifier seems quite accurate; 96.5% accuracy
• Sensitivity = TP/P = 90/300×100 = 30% (accuracy on the cancer tuples)
• Specificity = TN/N = 9560/9700×100 = 98.56% (accuracy on noncancer
tuples)
• Classifier is correctly labeling only the noncancer tuples and
misclassifying most of the cancer tuples!!!
• Accuracy rate of 98.56% is not acceptable
• Only 3% of the training set are cancer tuples
Actual
class
Predicted class
Overfitting and Underfitting
• Overall goal in machine learning is to obtain a model/
hypothesis that generalizes well to new, unseen data
– Goal is not to memorize the training data (far more efficient ways to store data
than inside a random forest)
• A good model has a “high generalization accuracy” or “low
generalization error”
• Assumptions we generally make are:
– i.i.d. assumption: inputs are independent, and training and test examples are
identically distributed (drawn from the same probability distribution)
– For some random model that has not been fit to the training set, we expect
both the training and test error to be equal
– Training error or accuracy provides an (optimistically) biased estimate of the
generalization performance
Overfitting and Underfitting
• In statistics, a fit refers to how well a target function is
approximated
• Overfitting refers to a model that models the training data too
well
– Model learns the detail and noise/ random fluctuations in the training data as
concepts
– These concepts do not apply to new data; negatively impacts the performance
– More likely with nonparametric and nonlinear models that have more
flexibility when learning a target function
– Example: decision trees
– Techniques to reduce overfitting:
• Reduce model complexity
• Regularization, Early stopping during the training phase
• Cross-validation
Overfitting and Underfitting
• Underfitting refers to a model that can neither model the
training data nor generalize to new data
• Model cannot capture the underlying trend of the data
• Usually happens when:
– we have less data to build an accurate model
– we try to build a linear model with a non-linear data
• Techniques to reduce underfitting :
– Increase training data
– Increase model complexity
– Increase number of features, performing feature engineering
– Increase number of epochs/duration of training
Overfitting and Underfitting
Overfitting Underfitting Appropriate-fitting
Forcefitting – too good
to be true
Too simple to explain
the variance
Overfitting and Underfitting
Bias and Variance
• Bias
– Assumptions made by a model to make a function easier to learn
– This is the error when the approximated function is trivial for a very complex
problem, thereby ignoring the structural relationship between the predictors
and the target
– High bias results in underfitting and a higher training error
– Can be reduced by augmenting features which better describe the association
with target variable
• Variance
– Extent to which the approximated function learned by a model differs a lot
between different training sets
– High variance results in overfitting
– Regularization methods are commonly used to control the variance
Bias and Variance
• Suppose there is an unknown target function or “true function” to which we do
want to approximate
• Suppose we have different training sets drawn from an unknown distribution
defined as “true function + noise”
f(x) f(x)
• Plot shows different linear regression models, each fit to a
different training set
• None of these models approximate the true function well,
except at two points (around x=-10 and x=6)
• Bias is large because the difference between the true value and
the predicted value, on average is large
• Plot shows different unpruned decision tree models, each
fit to a different training set
• These models fit the training data very closely
• However, the expectation over training sets, the average
hypothesis would fit the true function perfectly (given
that the noise is unbiased and has an expected value of 0)
• However, the variance is very high, since on average, a
prediction differs a lot from the expected value of the
prediction
Bias and Variance
Source: https://sebastianraschka.com/pdf/lecture-notes/stat479fs18/08_eval-intro_notes.pdf
Overfitting
Underfitting
Bias-Variance Tradeoff
• Find a balance between bias and variance that minimizes the
total error
• Ensemble and cross validation are frequently used methods to
minimize the total error
• Scenario #1: High Bias, Low Variance - underfitting
• Scenario #2: Low Bias, High Variance - overfitting
• Scenario #3: Low Bias, Low Variance - optimal state
• Scenario #4: High Bias, High Variance - something wrong
with data (training and validation distribution mismatch,
noisy data etc.)

More Related Content

Similar to IME 672 - Classifier Evaluation I.pptx

ai4.ppt
ai4.pptai4.ppt
ai4.ppt
atul404633
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
belay41
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
RithikRaj25
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
L7 method validation and modeling
L7 method validation and modelingL7 method validation and modeling
L7 method validation and modeling
Seppo Karrila
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
Pier Luca Lanzi
 
1. Demystifying ML.pdf
1. Demystifying ML.pdf1. Demystifying ML.pdf
1. Demystifying ML.pdf
Jyoti Yadav
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Thomas Ploetz
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higginsrgveroniki
 
How to determine sample size
How to determine sample size How to determine sample size
How to determine sample size
saifur rahman
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
MODEL EVALUATION.pptx
MODEL  EVALUATION.pptxMODEL  EVALUATION.pptx
MODEL EVALUATION.pptx
KARPAGAMT3
 
Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKA
Yogesh Shinde
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptx
ShimaaIbrahim33
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
Sri Ambati
 

Similar to IME 672 - Classifier Evaluation I.pptx (20)

evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
L7 method validation and modeling
L7 method validation and modelingL7 method validation and modeling
L7 method validation and modeling
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
1. Demystifying ML.pdf
1. Demystifying ML.pdf1. Demystifying ML.pdf
1. Demystifying ML.pdf
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins
 
How to determine sample size
How to determine sample size How to determine sample size
How to determine sample size
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
MODEL EVALUATION.pptx
MODEL  EVALUATION.pptxMODEL  EVALUATION.pptx
MODEL EVALUATION.pptx
 
Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKA
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptx
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 

Recently uploaded

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 

Recently uploaded (20)

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 

IME 672 - Classifier Evaluation I.pptx

  • 1. Data Mining & Knowledge Discovery IME 672 Dr. Faiz Hamid Department of Industrial & Management Engineering Indian Institute of Technology Kanpur Email: fhamid@iitk.ac.in
  • 3. Classifier Evaluation • Estimate how accurately the classifier can predict on future data on which the classifier has not been trained • Compare the performance of classifiers if there are more than one • How to estimate accuracy? • Are some measures of a classifier’s accuracy more appropriate than others?
  • 4. Classifier Evaluation Metrics Confusion Matrix C1 ¬ C1 C1 True Positives (TP) False Negatives (FN) ¬ C1 False Positives (FP) True Negatives (TN) Actual class Predicted class • Positive tuples - tuples of the main class of interest • Negative tuples - all other tuples • Confusion matrix – a tool for analysing how well a classifier can recognize tuples of different classes • True positives (TP) - positive tuples correctly labeled by the classifier • True negatives (TN) - negative tuples correctly labeled by the classifier • False positives (FP) - negative tuples incorrectly labeled as positive • False negatives (FN) - positive tuples mislabeled as negative • Confusion matrices can be easily drawn for multiple classes Confusion between the positive and negative class
  • 5. Classifier Evaluation Metrics • Classifier Accuracy, or recognition rate: percentage of test set tuples that are correctly classified Accuracy = (TP + TN)/All • Error rate: 1 – accuracy, or Error rate = (FP + FN)/All • Sensitivity (Recall): True Positive recognition rate • Sensitivity = TP/P • Specificity: True Negative recognition rate • Specificity = TN/N AP C ¬C C TP FN P ¬C FP TN N P’ N’ All
  • 6. Classifier Evaluation Metrics • Precision: exactness – what % of tuples that the classifier labeled as positive are actually positive • Recall: completeness – what % of positive tuples did the classifier label as positive? Precision = # positive tuples retrieved # tuples retrieved = TP TP + FP AP C ¬C C TP FN P ¬C FP TN N P’ N’ All Recall = # positive tuples retrieved # positive tuples = TP P Precision = 1 Recall = 1
  • 7. Classifier Evaluation Metrics • F measure (F1 or F-score): harmonic mean of precision and recall • Fß: weighted measure of precision and recall – assigns ß times as much weight to recall as to precision
  • 8. Classifier Evaluation Metrics Example of Confusion Matrix: buy_computer = yes buy_computer = no Total buy_computer = yes 6954 46 7000 buy_computer = no 412 2588 3000 Total 7366 2634 10000 Actual class Predicted class
  • 9. Classifier Evaluation Metrics • Classify medical data tuples • Positive tuples (cancer = yes) • Negative tuples (cancer = no) • The classifier seems quite accurate; 96.5% accuracy • Sensitivity = TP/P = 90/300×100 = 30% (accuracy on the cancer tuples) • Specificity = TN/N = 9560/9700×100 = 98.56% (accuracy on noncancer tuples) • Classifier is correctly labeling only the noncancer tuples and misclassifying most of the cancer tuples!!! • Accuracy rate of 98.56% is not acceptable • Only 3% of the training set are cancer tuples Actual class Predicted class
  • 10. Overfitting and Underfitting • Overall goal in machine learning is to obtain a model/ hypothesis that generalizes well to new, unseen data – Goal is not to memorize the training data (far more efficient ways to store data than inside a random forest) • A good model has a “high generalization accuracy” or “low generalization error” • Assumptions we generally make are: – i.i.d. assumption: inputs are independent, and training and test examples are identically distributed (drawn from the same probability distribution) – For some random model that has not been fit to the training set, we expect both the training and test error to be equal – Training error or accuracy provides an (optimistically) biased estimate of the generalization performance
  • 11. Overfitting and Underfitting • In statistics, a fit refers to how well a target function is approximated • Overfitting refers to a model that models the training data too well – Model learns the detail and noise/ random fluctuations in the training data as concepts – These concepts do not apply to new data; negatively impacts the performance – More likely with nonparametric and nonlinear models that have more flexibility when learning a target function – Example: decision trees – Techniques to reduce overfitting: • Reduce model complexity • Regularization, Early stopping during the training phase • Cross-validation
  • 12. Overfitting and Underfitting • Underfitting refers to a model that can neither model the training data nor generalize to new data • Model cannot capture the underlying trend of the data • Usually happens when: – we have less data to build an accurate model – we try to build a linear model with a non-linear data • Techniques to reduce underfitting : – Increase training data – Increase model complexity – Increase number of features, performing feature engineering – Increase number of epochs/duration of training
  • 13. Overfitting and Underfitting Overfitting Underfitting Appropriate-fitting Forcefitting – too good to be true Too simple to explain the variance
  • 15. Bias and Variance • Bias – Assumptions made by a model to make a function easier to learn – This is the error when the approximated function is trivial for a very complex problem, thereby ignoring the structural relationship between the predictors and the target – High bias results in underfitting and a higher training error – Can be reduced by augmenting features which better describe the association with target variable • Variance – Extent to which the approximated function learned by a model differs a lot between different training sets – High variance results in overfitting – Regularization methods are commonly used to control the variance
  • 16. Bias and Variance • Suppose there is an unknown target function or “true function” to which we do want to approximate • Suppose we have different training sets drawn from an unknown distribution defined as “true function + noise” f(x) f(x) • Plot shows different linear regression models, each fit to a different training set • None of these models approximate the true function well, except at two points (around x=-10 and x=6) • Bias is large because the difference between the true value and the predicted value, on average is large • Plot shows different unpruned decision tree models, each fit to a different training set • These models fit the training data very closely • However, the expectation over training sets, the average hypothesis would fit the true function perfectly (given that the noise is unbiased and has an expected value of 0) • However, the variance is very high, since on average, a prediction differs a lot from the expected value of the prediction
  • 17. Bias and Variance Source: https://sebastianraschka.com/pdf/lecture-notes/stat479fs18/08_eval-intro_notes.pdf Overfitting Underfitting
  • 18. Bias-Variance Tradeoff • Find a balance between bias and variance that minimizes the total error • Ensemble and cross validation are frequently used methods to minimize the total error • Scenario #1: High Bias, Low Variance - underfitting • Scenario #2: Low Bias, High Variance - overfitting • Scenario #3: Low Bias, Low Variance - optimal state • Scenario #4: High Bias, High Variance - something wrong with data (training and validation distribution mismatch, noisy data etc.)