SlideShare a Scribd company logo
1 of 20
Download to read offline
Introduction to Audio Content Analysis
module 6.0: evaluation and metrics
alexander lerch
overview intro classification regression summary
introduction
overview
corresponding textbook section
chapter 6
lecture content
• evaluation methodology
• good practices
• metrics
learning objectives
• design proper evaluation setups for machine learning algorithms
• list relevant metrics for different machine learning models
module 6.0: evaluation and metrics 1 / 9
overview intro classification regression summary
introduction
overview
corresponding textbook section
chapter 6
lecture content
• evaluation methodology
• good practices
• metrics
learning objectives
• design proper evaluation setups for machine learning algorithms
• list relevant metrics for different machine learning models
module 6.0: evaluation and metrics 1 / 9
overview intro classification regression summary
evaluation
introduction
without proper evaluation, there is no way to say whether a system works
typical mistakes in evaluation
1 non-representative test set
1 small, too homogeneous, ...
2 tuning system parameters with the test set (explicitly or implicitly)
3 using misleading evaluation procedures and metrics
module 6.0: evaluation and metrics 2 / 9
overview intro classification regression summary
evaluation
good practices 1/2
evaluation method unrelated to the specific implementation
• has to be task driven, not algorithm driven
• metrics should be unrelated to loss function
expectations clearly defined
• worst case performance (random)
• best case performance (oracle)
• realistic performance ⇒ baseline system
▶ Zero-R classifier
▶ standard approach
module 6.0: evaluation and metrics 3 / 9
overview intro classification regression summary
evaluation
good practices 1/2
evaluation method unrelated to the specific implementation
• has to be task driven, not algorithm driven
• metrics should be unrelated to loss function
expectations clearly defined
• worst case performance (random)
• best case performance (oracle)
• realistic performance ⇒ baseline system
▶ Zero-R classifier
▶ standard approach
module 6.0: evaluation and metrics 3 / 9
overview intro classification regression summary
evaluation
good practices 2/2
comparability to state-of-the-art
• use of established datasets and identical data splits
• running existing systems on your data
increase reproducibility
• automate evaluation
• publish source code
test for statistical significance
module 6.0: evaluation and metrics 4 / 9
overview intro classification regression summary
evaluation
good practices 2/2
comparability to state-of-the-art
• use of established datasets and identical data splits
• running existing systems on your data
increase reproducibility
• automate evaluation
• publish source code
test for statistical significance
module 6.0: evaluation and metrics 4 / 9
overview intro classification regression summary
evaluation
good practices 2/2
comparability to state-of-the-art
• use of established datasets and identical data splits
• running existing systems on your data
increase reproducibility
• automate evaluation
• publish source code
test for statistical significance
module 6.0: evaluation and metrics 4 / 9
overview intro classification regression summary
classification metrics
introduction
possible outcomes of two class problem (positive and negative):
• TP: Positives correctly identified as Positives,
• TN: Negatives correctly identified Negatives,
• FP: Negatives incorrectly identified Positives, and
• FN: Positives incorrectly identified Negatives.
visualization: confusion matrix
Predicted
Positive Negative Σ
GT
Positive
TP
True Positives
FN
False Negatives
TP+FN
# of GT Positives
Negative
FP
False Positives
TN
True Negatives
FP+TN
# of GT Negatives
Σ
TP+FP
# of Predicted Positives
TN+FN
# of Predicted Negatives
TP+TN
# of True Predictions
module 6.0: evaluation and metrics 5 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
P =
TP
TP + FP
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
P =
TP
TP + FP
R =
TP
TP + FN
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
P =
TP
TP + FP
R =
TP
TP + FN
F = 2 ·
P · R
P + R
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
area under curve
module 6.0: evaluation and metrics 7 / 9
matlab
source:
plotROC.m
overview intro classification regression summary
regression metrics
mae, mse, R2
goal: measure deviation
mean absolute error
mean squared error
coefficient of determination
MAE =
1
R
X
∀r
|y(r) − ŷ(r)|
module 6.0: evaluation and metrics 8 / 9
overview intro classification regression summary
regression metrics
mae, mse, R2
goal: measure deviation
mean absolute error
mean squared error
coefficient of determination
MAE =
1
R
X
∀r
|y(r) − ŷ(r)|
MSE =
1
R
X
∀r
y(r) − ŷ(r)
2
module 6.0: evaluation and metrics 8 / 9
overview intro classification regression summary
regression metrics
mae, mse, R2
goal: measure deviation
mean absolute error
mean squared error
coefficient of determination
MAE =
1
R
X
∀r
|y(r) − ŷ(r)|
MSE =
1
R
X
∀r
y(r) − ŷ(r)
2
R2
= 1 −
MSE y − ŷ

MSE y − µy

module 6.0: evaluation and metrics 8 / 9
overview intro classification regression summary
summary
lecture content
evaluation
• system development without evaluation is meaningless
• data and method need to be carefully selected
• metrics need to reflect the sucess of the system
classification metrics
• accuracy and macro accuracy
• precision, recall, and f-measure
• AUC
regression metrics
• MAE and MSE
• coefficient of determination
module 6.0: evaluation and metrics 9 / 9

More Related Content

Similar to 06-00-ACA-Evaluation.pdf

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptxshuchismitjha2
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Predictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter OptimizationPredictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter OptimizationMarlon Dumas
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Ho Cao Viet
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmIRJET Journal
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
 
Presentation
PresentationPresentation
Presentationbutest
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfasdfasdf214078
 
STATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARSTATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARRaniBhagat1
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 InternshipTaylor Martell
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Matthew Clark
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...RAHUL WAGAJ
 

Similar to 06-00-ACA-Evaluation.pdf (20)

OTTO-Report
OTTO-ReportOTTO-Report
OTTO-Report
 
Msa presentation
Msa presentationMsa presentation
Msa presentation
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Predictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter OptimizationPredictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter Optimization
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
Presentation
PresentationPresentation
Presentation
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
 
STATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARSTATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSAR
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
 

More from AlexanderLerch4

0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdfAlexanderLerch4
 
09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdfAlexanderLerch4
 
09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdfAlexanderLerch4
 
09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdfAlexanderLerch4
 
07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdfAlexanderLerch4
 
15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdfAlexanderLerch4
 
12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdfAlexanderLerch4
 
09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdf09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdfAlexanderLerch4
 
09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdfAlexanderLerch4
 
11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdfAlexanderLerch4
 
03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdf03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdfAlexanderLerch4
 
07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdfAlexanderLerch4
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdfAlexanderLerch4
 
07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdfAlexanderLerch4
 
03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdfAlexanderLerch4
 
03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdfAlexanderLerch4
 

More from AlexanderLerch4 (20)

0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf
 
12-02-ACA-Genre.pdf
12-02-ACA-Genre.pdf12-02-ACA-Genre.pdf
12-02-ACA-Genre.pdf
 
10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf
 
09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf
 
13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf
 
09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf
 
09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf
 
07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf
 
15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf
 
12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf
 
09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdf09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdf
 
09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf
 
11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf
 
08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf
 
03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdf03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdf
 
07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf
 
07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf
 
03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf
 
03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

06-00-ACA-Evaluation.pdf

  • 1. Introduction to Audio Content Analysis module 6.0: evaluation and metrics alexander lerch
  • 2. overview intro classification regression summary introduction overview corresponding textbook section chapter 6 lecture content • evaluation methodology • good practices • metrics learning objectives • design proper evaluation setups for machine learning algorithms • list relevant metrics for different machine learning models module 6.0: evaluation and metrics 1 / 9
  • 3. overview intro classification regression summary introduction overview corresponding textbook section chapter 6 lecture content • evaluation methodology • good practices • metrics learning objectives • design proper evaluation setups for machine learning algorithms • list relevant metrics for different machine learning models module 6.0: evaluation and metrics 1 / 9
  • 4. overview intro classification regression summary evaluation introduction without proper evaluation, there is no way to say whether a system works typical mistakes in evaluation 1 non-representative test set 1 small, too homogeneous, ... 2 tuning system parameters with the test set (explicitly or implicitly) 3 using misleading evaluation procedures and metrics module 6.0: evaluation and metrics 2 / 9
  • 5. overview intro classification regression summary evaluation good practices 1/2 evaluation method unrelated to the specific implementation • has to be task driven, not algorithm driven • metrics should be unrelated to loss function expectations clearly defined • worst case performance (random) • best case performance (oracle) • realistic performance ⇒ baseline system ▶ Zero-R classifier ▶ standard approach module 6.0: evaluation and metrics 3 / 9
  • 6. overview intro classification regression summary evaluation good practices 1/2 evaluation method unrelated to the specific implementation • has to be task driven, not algorithm driven • metrics should be unrelated to loss function expectations clearly defined • worst case performance (random) • best case performance (oracle) • realistic performance ⇒ baseline system ▶ Zero-R classifier ▶ standard approach module 6.0: evaluation and metrics 3 / 9
  • 7. overview intro classification regression summary evaluation good practices 2/2 comparability to state-of-the-art • use of established datasets and identical data splits • running existing systems on your data increase reproducibility • automate evaluation • publish source code test for statistical significance module 6.0: evaluation and metrics 4 / 9
  • 8. overview intro classification regression summary evaluation good practices 2/2 comparability to state-of-the-art • use of established datasets and identical data splits • running existing systems on your data increase reproducibility • automate evaluation • publish source code test for statistical significance module 6.0: evaluation and metrics 4 / 9
  • 9. overview intro classification regression summary evaluation good practices 2/2 comparability to state-of-the-art • use of established datasets and identical data splits • running existing systems on your data increase reproducibility • automate evaluation • publish source code test for statistical significance module 6.0: evaluation and metrics 4 / 9
  • 10. overview intro classification regression summary classification metrics introduction possible outcomes of two class problem (positive and negative): • TP: Positives correctly identified as Positives, • TN: Negatives correctly identified Negatives, • FP: Negatives incorrectly identified Positives, and • FN: Positives incorrectly identified Negatives. visualization: confusion matrix Predicted Positive Negative Σ GT Positive TP True Positives FN False Negatives TP+FN # of GT Positives Negative FP False Positives TN True Negatives FP+TN # of GT Negatives Σ TP+FP # of Predicted Positives TN+FN # of Predicted Negatives TP+TN # of True Predictions module 6.0: evaluation and metrics 5 / 9
  • 11. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN module 6.0: evaluation and metrics 6 / 9
  • 12. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 module 6.0: evaluation and metrics 6 / 9
  • 13. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 P = TP TP + FP module 6.0: evaluation and metrics 6 / 9
  • 14. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 P = TP TP + FP R = TP TP + FN module 6.0: evaluation and metrics 6 / 9
  • 15. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 P = TP TP + FP R = TP TP + FN F = 2 · P · R P + R module 6.0: evaluation and metrics 6 / 9
  • 16. overview intro classification regression summary classification metrics area under curve module 6.0: evaluation and metrics 7 / 9 matlab source: plotROC.m
  • 17. overview intro classification regression summary regression metrics mae, mse, R2 goal: measure deviation mean absolute error mean squared error coefficient of determination MAE = 1 R X ∀r |y(r) − ŷ(r)| module 6.0: evaluation and metrics 8 / 9
  • 18. overview intro classification regression summary regression metrics mae, mse, R2 goal: measure deviation mean absolute error mean squared error coefficient of determination MAE = 1 R X ∀r |y(r) − ŷ(r)| MSE = 1 R X ∀r y(r) − ŷ(r) 2 module 6.0: evaluation and metrics 8 / 9
  • 19. overview intro classification regression summary regression metrics mae, mse, R2 goal: measure deviation mean absolute error mean squared error coefficient of determination MAE = 1 R X ∀r |y(r) − ŷ(r)| MSE = 1 R X ∀r y(r) − ŷ(r) 2 R2 = 1 − MSE y − ŷ MSE y − µy module 6.0: evaluation and metrics 8 / 9
  • 20. overview intro classification regression summary summary lecture content evaluation • system development without evaluation is meaningless • data and method need to be carefully selected • metrics need to reflect the sucess of the system classification metrics • accuracy and macro accuracy • precision, recall, and f-measure • AUC regression metrics • MAE and MSE • coefficient of determination module 6.0: evaluation and metrics 9 / 9