SlideShare a Scribd company logo
Introduction to Audio Content Analysis
module 6.0: evaluation and metrics
alexander lerch
overview intro classification regression summary
introduction
overview
corresponding textbook section
chapter 6
lecture content
• evaluation methodology
• good practices
• metrics
learning objectives
• design proper evaluation setups for machine learning algorithms
• list relevant metrics for different machine learning models
module 6.0: evaluation and metrics 1 / 9
overview intro classification regression summary
introduction
overview
corresponding textbook section
chapter 6
lecture content
• evaluation methodology
• good practices
• metrics
learning objectives
• design proper evaluation setups for machine learning algorithms
• list relevant metrics for different machine learning models
module 6.0: evaluation and metrics 1 / 9
overview intro classification regression summary
evaluation
introduction
without proper evaluation, there is no way to say whether a system works
typical mistakes in evaluation
1 non-representative test set
1 small, too homogeneous, ...
2 tuning system parameters with the test set (explicitly or implicitly)
3 using misleading evaluation procedures and metrics
module 6.0: evaluation and metrics 2 / 9
overview intro classification regression summary
evaluation
good practices 1/2
evaluation method unrelated to the specific implementation
• has to be task driven, not algorithm driven
• metrics should be unrelated to loss function
expectations clearly defined
• worst case performance (random)
• best case performance (oracle)
• realistic performance ⇒ baseline system
▶ Zero-R classifier
▶ standard approach
module 6.0: evaluation and metrics 3 / 9
overview intro classification regression summary
evaluation
good practices 1/2
evaluation method unrelated to the specific implementation
• has to be task driven, not algorithm driven
• metrics should be unrelated to loss function
expectations clearly defined
• worst case performance (random)
• best case performance (oracle)
• realistic performance ⇒ baseline system
▶ Zero-R classifier
▶ standard approach
module 6.0: evaluation and metrics 3 / 9
overview intro classification regression summary
evaluation
good practices 2/2
comparability to state-of-the-art
• use of established datasets and identical data splits
• running existing systems on your data
increase reproducibility
• automate evaluation
• publish source code
test for statistical significance
module 6.0: evaluation and metrics 4 / 9
overview intro classification regression summary
evaluation
good practices 2/2
comparability to state-of-the-art
• use of established datasets and identical data splits
• running existing systems on your data
increase reproducibility
• automate evaluation
• publish source code
test for statistical significance
module 6.0: evaluation and metrics 4 / 9
overview intro classification regression summary
evaluation
good practices 2/2
comparability to state-of-the-art
• use of established datasets and identical data splits
• running existing systems on your data
increase reproducibility
• automate evaluation
• publish source code
test for statistical significance
module 6.0: evaluation and metrics 4 / 9
overview intro classification regression summary
classification metrics
introduction
possible outcomes of two class problem (positive and negative):
• TP: Positives correctly identified as Positives,
• TN: Negatives correctly identified Negatives,
• FP: Negatives incorrectly identified Positives, and
• FN: Positives incorrectly identified Negatives.
visualization: confusion matrix
Predicted
Positive Negative Σ
GT
Positive
TP
True Positives
FN
False Negatives
TP+FN
# of GT Positives
Negative
FP
False Positives
TN
True Negatives
FP+TN
# of GT Negatives
Σ
TP+FP
# of Predicted Positives
TN+FN
# of Predicted Negatives
TP+TN
# of True Predictions
module 6.0: evaluation and metrics 5 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
P =
TP
TP + FP
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
P =
TP
TP + FP
R =
TP
TP + FN
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
accuracy and f-measure
accuracy: how many predictions are
accurate
macro accuracy: averaged over
classes (not observations)
precision: how many predicted
positives are correct
recall: how many ground truth
positives correctly predicted
f-measure: combines precision and
recall
Acc =
TP + TN
TP + TN + FP + FN
AccMacro =
TP
TP+FN
+ TN
TN+FP
2
=
TPR + TNR
2
P =
TP
TP + FP
R =
TP
TP + FN
F = 2 ·
P · R
P + R
module 6.0: evaluation and metrics 6 / 9
overview intro classification regression summary
classification metrics
area under curve
module 6.0: evaluation and metrics 7 / 9
matlab
source:
plotROC.m
overview intro classification regression summary
regression metrics
mae, mse, R2
goal: measure deviation
mean absolute error
mean squared error
coefficient of determination
MAE =
1
R
X
∀r
|y(r) − ŷ(r)|
module 6.0: evaluation and metrics 8 / 9
overview intro classification regression summary
regression metrics
mae, mse, R2
goal: measure deviation
mean absolute error
mean squared error
coefficient of determination
MAE =
1
R
X
∀r
|y(r) − ŷ(r)|
MSE =
1
R
X
∀r
y(r) − ŷ(r)
2
module 6.0: evaluation and metrics 8 / 9
overview intro classification regression summary
regression metrics
mae, mse, R2
goal: measure deviation
mean absolute error
mean squared error
coefficient of determination
MAE =
1
R
X
∀r
|y(r) − ŷ(r)|
MSE =
1
R
X
∀r
y(r) − ŷ(r)
2
R2
= 1 −
MSE y − ŷ

MSE y − µy

module 6.0: evaluation and metrics 8 / 9
overview intro classification regression summary
summary
lecture content
evaluation
• system development without evaluation is meaningless
• data and method need to be carefully selected
• metrics need to reflect the sucess of the system
classification metrics
• accuracy and macro accuracy
• precision, recall, and f-measure
• AUC
regression metrics
• MAE and MSE
• coefficient of determination
module 6.0: evaluation and metrics 9 / 9

More Related Content

Similar to 06-00-ACA-Evaluation.pdf

OTTO-Report
OTTO-ReportOTTO-Report
Msa presentation
Msa presentationMsa presentation
Msa presentation
Dr. Bikram Jit Singh
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
shuchismitjha2
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
KonkoboUlrichArthur
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
Predictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter OptimizationPredictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter Optimization
Marlon Dumas
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
Ho Cao Viet
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
IRJET Journal
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Sagar Deogirkar
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
Ann-Marie Roche
 
Presentation
PresentationPresentation
Presentation
butest
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
asdfasdf214078
 
STATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARSTATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSAR
RaniBhagat1
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
Taylor Martell
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016
Matthew Clark
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
Manuel Martín
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
RAHUL WAGAJ
 

Similar to 06-00-ACA-Evaluation.pdf (20)

OTTO-Report
OTTO-ReportOTTO-Report
OTTO-Report
 
Msa presentation
Msa presentationMsa presentation
Msa presentation
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Predictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter OptimizationPredictive Process Monitoring with Hyperparameter Optimization
Predictive Process Monitoring with Hyperparameter Optimization
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
Presentation
PresentationPresentation
Presentation
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
 
STATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARSTATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSAR
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
 

More from AlexanderLerch4

0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf
AlexanderLerch4
 
12-02-ACA-Genre.pdf
12-02-ACA-Genre.pdf12-02-ACA-Genre.pdf
12-02-ACA-Genre.pdf
AlexanderLerch4
 
10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf
AlexanderLerch4
 
09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf
AlexanderLerch4
 
13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf
AlexanderLerch4
 
09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf
AlexanderLerch4
 
09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf
AlexanderLerch4
 
07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf
AlexanderLerch4
 
15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf
AlexanderLerch4
 
12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf
AlexanderLerch4
 
09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdf09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdf
AlexanderLerch4
 
09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf
AlexanderLerch4
 
11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf
AlexanderLerch4
 
08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf
AlexanderLerch4
 
03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdf03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdf
AlexanderLerch4
 
07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf
AlexanderLerch4
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf
AlexanderLerch4
 
07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf
AlexanderLerch4
 
03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf
AlexanderLerch4
 
03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf
AlexanderLerch4
 

More from AlexanderLerch4 (20)

0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf
 
12-02-ACA-Genre.pdf
12-02-ACA-Genre.pdf12-02-ACA-Genre.pdf
12-02-ACA-Genre.pdf
 
10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf
 
09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf
 
13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf
 
09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf
 
09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf
 
07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf
 
15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf
 
12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf
 
09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdf09-04-ACA-Temporal-BeatHisto.pdf
09-04-ACA-Temporal-BeatHisto.pdf
 
09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf
 
11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf
 
08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf
 
03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdf03-05-ACA-Input-Features.pdf
03-05-ACA-Input-Features.pdf
 
07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf
 
07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf
 
03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf
 
03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 

06-00-ACA-Evaluation.pdf

  • 1. Introduction to Audio Content Analysis module 6.0: evaluation and metrics alexander lerch
  • 2. overview intro classification regression summary introduction overview corresponding textbook section chapter 6 lecture content • evaluation methodology • good practices • metrics learning objectives • design proper evaluation setups for machine learning algorithms • list relevant metrics for different machine learning models module 6.0: evaluation and metrics 1 / 9
  • 3. overview intro classification regression summary introduction overview corresponding textbook section chapter 6 lecture content • evaluation methodology • good practices • metrics learning objectives • design proper evaluation setups for machine learning algorithms • list relevant metrics for different machine learning models module 6.0: evaluation and metrics 1 / 9
  • 4. overview intro classification regression summary evaluation introduction without proper evaluation, there is no way to say whether a system works typical mistakes in evaluation 1 non-representative test set 1 small, too homogeneous, ... 2 tuning system parameters with the test set (explicitly or implicitly) 3 using misleading evaluation procedures and metrics module 6.0: evaluation and metrics 2 / 9
  • 5. overview intro classification regression summary evaluation good practices 1/2 evaluation method unrelated to the specific implementation • has to be task driven, not algorithm driven • metrics should be unrelated to loss function expectations clearly defined • worst case performance (random) • best case performance (oracle) • realistic performance ⇒ baseline system ▶ Zero-R classifier ▶ standard approach module 6.0: evaluation and metrics 3 / 9
  • 6. overview intro classification regression summary evaluation good practices 1/2 evaluation method unrelated to the specific implementation • has to be task driven, not algorithm driven • metrics should be unrelated to loss function expectations clearly defined • worst case performance (random) • best case performance (oracle) • realistic performance ⇒ baseline system ▶ Zero-R classifier ▶ standard approach module 6.0: evaluation and metrics 3 / 9
  • 7. overview intro classification regression summary evaluation good practices 2/2 comparability to state-of-the-art • use of established datasets and identical data splits • running existing systems on your data increase reproducibility • automate evaluation • publish source code test for statistical significance module 6.0: evaluation and metrics 4 / 9
  • 8. overview intro classification regression summary evaluation good practices 2/2 comparability to state-of-the-art • use of established datasets and identical data splits • running existing systems on your data increase reproducibility • automate evaluation • publish source code test for statistical significance module 6.0: evaluation and metrics 4 / 9
  • 9. overview intro classification regression summary evaluation good practices 2/2 comparability to state-of-the-art • use of established datasets and identical data splits • running existing systems on your data increase reproducibility • automate evaluation • publish source code test for statistical significance module 6.0: evaluation and metrics 4 / 9
  • 10. overview intro classification regression summary classification metrics introduction possible outcomes of two class problem (positive and negative): • TP: Positives correctly identified as Positives, • TN: Negatives correctly identified Negatives, • FP: Negatives incorrectly identified Positives, and • FN: Positives incorrectly identified Negatives. visualization: confusion matrix Predicted Positive Negative Σ GT Positive TP True Positives FN False Negatives TP+FN # of GT Positives Negative FP False Positives TN True Negatives FP+TN # of GT Negatives Σ TP+FP # of Predicted Positives TN+FN # of Predicted Negatives TP+TN # of True Predictions module 6.0: evaluation and metrics 5 / 9
  • 11. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN module 6.0: evaluation and metrics 6 / 9
  • 12. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 module 6.0: evaluation and metrics 6 / 9
  • 13. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 P = TP TP + FP module 6.0: evaluation and metrics 6 / 9
  • 14. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 P = TP TP + FP R = TP TP + FN module 6.0: evaluation and metrics 6 / 9
  • 15. overview intro classification regression summary classification metrics accuracy and f-measure accuracy: how many predictions are accurate macro accuracy: averaged over classes (not observations) precision: how many predicted positives are correct recall: how many ground truth positives correctly predicted f-measure: combines precision and recall Acc = TP + TN TP + TN + FP + FN AccMacro = TP TP+FN + TN TN+FP 2 = TPR + TNR 2 P = TP TP + FP R = TP TP + FN F = 2 · P · R P + R module 6.0: evaluation and metrics 6 / 9
  • 16. overview intro classification regression summary classification metrics area under curve module 6.0: evaluation and metrics 7 / 9 matlab source: plotROC.m
  • 17. overview intro classification regression summary regression metrics mae, mse, R2 goal: measure deviation mean absolute error mean squared error coefficient of determination MAE = 1 R X ∀r |y(r) − ŷ(r)| module 6.0: evaluation and metrics 8 / 9
  • 18. overview intro classification regression summary regression metrics mae, mse, R2 goal: measure deviation mean absolute error mean squared error coefficient of determination MAE = 1 R X ∀r |y(r) − ŷ(r)| MSE = 1 R X ∀r y(r) − ŷ(r) 2 module 6.0: evaluation and metrics 8 / 9
  • 19. overview intro classification regression summary regression metrics mae, mse, R2 goal: measure deviation mean absolute error mean squared error coefficient of determination MAE = 1 R X ∀r |y(r) − ŷ(r)| MSE = 1 R X ∀r y(r) − ŷ(r) 2 R2 = 1 − MSE y − ŷ MSE y − µy module 6.0: evaluation and metrics 8 / 9
  • 20. overview intro classification regression summary summary lecture content evaluation • system development without evaluation is meaningless • data and method need to be carefully selected • metrics need to reflect the sucess of the system classification metrics • accuracy and macro accuracy • precision, recall, and f-measure • AUC regression metrics • MAE and MSE • coefficient of determination module 6.0: evaluation and metrics 9 / 9