SlideShare a Scribd company logo
5. How robust are Chroma features? 
4. How robust are MFCCs features? 
Supported by the A4U postdoctoral grants programme and projects SIGMUS (TIN2012-36650), Compmusic (ERC 267583), PHENICX (ICT-2011.8.2) and GiantSteps (ICT-2013-10) 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Features? 
Julián Urbano, Dmitry Bogdanov, Perfecto Herrera, Emilia Gómez and Xavier Serra Department of Information and Communication Technologies Contact: julian.urbano@upf.edu http://mtg.upf.edu 
Variance components (% of total) in distributions of robustness of MFCCs 
22050 Hz 
44100 Hz 
δ 
ε 
r 
ρ 
θ 
δ 
ε 
r 
ρ 
θ 
Lib 1 
σ FSize2 
1 % 
3 % 
2 % 
0 % 
2 % 
0 % 
0 % 
0 % 
0 % 
0 % 
σ Codec2 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
σ Brate:Codec2 
31 % 
42 % 
22 % 
8 % 
21 % 
47 % 
42 % 
23 % 
24 % 
22 % 
σ FSize×Codec2 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
σ FSize×(Brate:Codec) 2 
5 % 
12 % 
12 % 
1 % 
13 % 
7 % 
18 % 
18 % 
11 % 
18 % 
σ Genre2 
1 % 
5 % 
4 % 
0 % 
4 % 
1 % 
1 % 
1 % 
0 % 
1 % 
σ Track2 
20 % 
6 % 
6 % 
13 % 
6 % 
10 % 
4 % 
3 % 
5 % 
3 % 
σ residual2 
42 % 
33 % 
54 % 
79 % 
54 % 
34 % 
35 % 
56 % 
60 % 
57 % 
Grand mean 
0.0591 
1.6958 
0.9999 
0.9977 
0.9999 
0.0682 
1.8820 
0.9998 
0.9939 
0.9998 
Total variance 
0.0032 
3.4641 
1.8e-7 
3.2e-5 
1.5e-7 
0.0081 
11.44 
1.6e-6 
0.0005 
1.4e-6 
Standard deviation 
0.0567 
1.8612 
0.0004 
0.0056 
0.0004 
0.0897 
3.3835 
0.0013 
0.0214 
0.0012 
Lib 2 
σ FSize2 
1 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
σ Codec2 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
σ Brate:Codec2 
5 % 
6 % 
2 % 
1 % 
3 % 
23 % 
24 % 
14 % 
13 % 
15 % 
σ FSize×Codec2 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
0 % 
σ FSize×(Brate:Codec) 2 
1 % 
0 % 
0 % 
0 % 
0 % 
7 % 
8 % 
10 % 
6 % 
11 % 
σ Genre2 
4 % 
15 % 
3 % 
1 % 
4 % 
0 % 
5 % 
1 % 
0 % 
0 % 
σ Track2 
52 % 
61 % 
32 % 
66 % 
41 % 
27 % 
14 % 
7 % 
13 % 
6 % 
σ residual2 
36 % 
18 % 
63 % 
32 % 
51 % 
41 % 
48 % 
68 % 
67 % 
68 % 
Grand mean 
0.0622 
0.0278 
0.9999 
0.9955 
0.9999 
0.0656 
0.0342 
0.9998 
0.9947 
0.9999 
Total variance 
0.0040 
0.0015 
8.9e-8 
0.0002 
3.5e-8 
0.0055 
0.0034 
6.4e-7 
0.0002 
4.8e-7 
Standard deviation 
0.0631 
0.0391 
0.0003 
0.0131 
0.0002 
0.0740 
0.0587 
0.0008 
0.0150 
0.0007 
Much research in MIR is based on descriptors computed from audio signals. Some music corpora use different audio encodings, some do not contain audio but descriptors already computed in some particular way, and sometimes we have to gather audio files ourselves. We thus assume that descriptors are robust to these changes and algorithms are not affected. We investigated this issue for MFCCs and Chroma: how do encoding quality, analysis parameters and musical characteristics affect their robustness? 
Variance components (% of total) in distributions of robustness of Chroma 
22050 Hz 
44100 Hz 
δ 
ε 
r 
ρ 
θ 
δ 
ε 
r 
ρ 
θ 
Lib 1 
σ FSize2 
2 % 
3 % 
0 % 
0 % 
0 % 
2 % 
2 % 
0 % 
0 % 
1 % 
σ Genre2 
3 % 
3 % 
1 % 
1 % 
1 % 
3 % 
3 % 
1 % 
1 % 
1 % 
σ Track2 
21 % 
19 % 
18 % 
19 % 
17 % 
22 % 
21 % 
19 % 
20 % 
19 % 
σ residual2 
75 % 
75 % 
81 % 
80 % 
82 % 
72 % 
74 % 
80 % 
78 % 
80 % 
Grand mean 
0.0610 
0.0545 
0.9554 
0.9366 
0.992 
0.0588 
0.0521 
0.9549 
0.9375 
0.9922 
Total variance 
0.0046 
0.0085 
0.0276 
0.0293 
0.0014 
0.0048 
0.0082 
0.0286 
0.0298 
0.0013 
Standard deviation 
0.0682 
0.0924 
0.1663 
0.1713 
0.0373 
0.0695 
0.0904 
0.1691 
0.1725 
0.0355 
Lib 2 
σ Codec2 
64 % 
35 % 
0 % 
0 % 
0 % 
32 % 
22 % 
0 % 
0 % 
0 % 
σ Brate:Codec2 
1 % 
0 % 
0 % 
0 % 
0 % 
62 % 
40 % 
0 % 
0 % 
0 % 
σ Genre2 
0 % 
16 % 
3 % 
4 % 
8 % 
1 % 
10 % 
3 % 
1 % 
4 % 
σ Track2 
19 % 
33 % 
97 % 
93 % 
92 % 
3 % 
14 % 
94 % 
93 % 
77 % 
σ residual2 
16 % 
17 % 
0 % 
3 % 
0 % 
2 % 
15 % 
2 % 
6 % 
19 % 
Grand mean 
0.0346 
0.0031 
0.9915 
0.9766 
0.9998 
0.0260 
0.0022 
0.9989 
0.9928 
1.0000 
Total variance 
0.0004 
5e-6 
0.0002 
0.0007 
6.1e-8 
0.0005 
4.8e-6 
3.7e-6 
0.0001 
1.8e-9 
Standard deviation 
0.0195 
0.0022 
0.0135 
0.0270 
0.0002 
0.0213 
0.0022 
0.0019 
0.0122 
0.0000 
1. What factors did we study? Encoding Quality 
•Sampling Rate: 22050 and 44100 Hz 
•Codec: WAV, MP3 CBR and MP3 VBR 
•Bitrate: 64 to 320 Kbps Analysis Parameters 
•Analysis Tool: 
•Lib1 (Essentia 2.0.1) 
•MFCCs: 40 mel bands, bins equally spaced, 0-11000 Hz 
•Chroma: 40-5000 Hz, estimates tuning frequency 
•Lib2 (QM Vamp Plugins 1.7) 
•MFCCs: 40 mel bands, 66-6364 Hz 
•Chroma: 65-2093 Hz, constant Q transform, assumes tuning at 440 Hz, ignores harmonics and fixes frame size to 16384 
•Frame Size: 1024, 2048, …, 32768 samples Musical Characteristics 
•Genre: blues, classical, rock, jazz, disco/funk/soul, country, electronic, rap/hip-hop, reggae, rock’n’roll 
2. What did we do? Corpus 
•Compile ad-hoc corpus of 400 music tracks, 395 different artists, uniformly covering all 10 genres 
•Clipped to 30 seconds for efficiency File versions 
•Original: lossless, encoded in FLAC 
•Derived: lossy, encoded in MP3, for all Sampling Rate, Codec and Bitrate Method 
•Compute MFCCs and Chroma vectors from all files (remove first MFCC) 
•Summarize frame-wise feature vectors with the mean of each coefficient 
•Compute indicators of robustness between derived and original descriptors 
•Block by Tool, Sampling Rate and Frame Size (not mixed in practice) 
3. How did we measure robustness? 5 robustness indicators (error) between lossless and lossy 
•Mean relative error δ across coefficients (%) 
•Euclidean distance ε 
•Pearson correlation r 
•Spearman correlation ρ 
•Cosine similarity θ 
•Get 144400 datapoints per indicator and analyze their distribution 
•High robustness is good if we have heterogeneous encodings 
•Low stability is bad with heterogeneous encodings, but we can analyze what factor is responsible for the variability and control it Fitted random-effects models to study each effect separately 
•Controllable factors (Frame Size, Codec and Bitrate): fitted the main effects and all the interactions among them 
•Uncontrollable factors (Genre and Track): fitted only main effects 
•ANOVA to estimate variance components (individual effect contributions) 
σ2 
robustness 
stability 
0 
error 
•Shape of MFCC vectors is preserved but individual coefficients differ 
•Most variability due to Track+residual effect, which cannot be controlled anyway 
•Independent of Genre 
•Frame Size is irrelevant (except for 64 Kbps in Lib1: low Frame Sizes are robust) 
•Lib2 more stable than Lib1 
•But there is a large Codec:Bitrate effect, so we can achieve high robustness if we establish a minimum Bitrate 
•No effect of Codec in Lib1 (omitted). Frame Size omitted in Lib2 (it is fixed to16384) 
•Lib2 more robust than Lib1. Shape of Chroma vector kept too, 
•Lib2 more stable than Lib1 
•Almost all variability due to uncontrollable Track+residual 
•But much variability in δ and ε is due to Codec and Bitrate with Lib2; avoided by normalizing Chroma vector to unit max 
Establishing a minimum Bitrate for MFCCs 
•Lib1 converges to δ≈3% at 256 Kbps, and stability decreases with bitrate (low within-group variance) 
•Lib2 converges to δ≈5% at 160-192 Kbps, but stability does not change after 96 Kbps (same within-group variance) 
•Lib1 is twice as robust with homogeneous encoding 
•Lib2 is more stable with heterogeneous encodings 
6. What are the practical implications? Consider Genre Classification as an example 
•SVM per Sampling Rate, Codec, Bitrate, Tool & feature Accuracy 
•No significant differences found 
•With same encoding in training & testing 
•With different encoding in training & testing 
•Always best if training and testing with same encoding 
•No correlation between bitrate and accuracy; few differences attributable to Type I errors 
•Should study other low-level tasks more likely to be affected by lossy compression

More Related Content

Similar to What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Features?

Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Berna Bulut
 
lesson 2 digital data acquisition and data processing
lesson 2 digital data acquisition and data processinglesson 2 digital data acquisition and data processing
lesson 2 digital data acquisition and data processing
Mathew John
 
Micro e systems_mercury1500_datasheet
Micro e systems_mercury1500_datasheetMicro e systems_mercury1500_datasheet
Micro e systems_mercury1500_datasheet
Electromate
 

Similar to What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Features? (20)

Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
 
Bt0064 logic design assignment-feb-11
Bt0064 logic design assignment-feb-11Bt0064 logic design assignment-feb-11
Bt0064 logic design assignment-feb-11
 
mems poster
mems postermems poster
mems poster
 
Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1
 
Rf technology 5-8-2011-final-revised
Rf technology 5-8-2011-final-revisedRf technology 5-8-2011-final-revised
Rf technology 5-8-2011-final-revised
 
Pon OtdRs Sun Ot7000
Pon OtdRs Sun Ot7000Pon OtdRs Sun Ot7000
Pon OtdRs Sun Ot7000
 
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
 
LTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic PrincipleLTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic Principle
 
Senior Project Student's Presentation on Design of EMG Signal Recording System
Senior Project Student's Presentation on Design of EMG Signal Recording SystemSenior Project Student's Presentation on Design of EMG Signal Recording System
Senior Project Student's Presentation on Design of EMG Signal Recording System
 
ROBOTICS - Introduction to Robotics Microcontroller
ROBOTICS -  Introduction to Robotics MicrocontrollerROBOTICS -  Introduction to Robotics Microcontroller
ROBOTICS - Introduction to Robotics Microcontroller
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
 
biometrics
biometricsbiometrics
biometrics
 
lesson 2 digital data acquisition and data processing
lesson 2 digital data acquisition and data processinglesson 2 digital data acquisition and data processing
lesson 2 digital data acquisition and data processing
 
Next generation image compression standards: JPEG XR and AIC
Next generation image compression standards: JPEG XR and AICNext generation image compression standards: JPEG XR and AIC
Next generation image compression standards: JPEG XR and AIC
 
Model Compression
Model CompressionModel Compression
Model Compression
 
CODING.ppt
CODING.pptCODING.ppt
CODING.ppt
 
Micro e systems_mercury1500_datasheet
Micro e systems_mercury1500_datasheetMicro e systems_mercury1500_datasheet
Micro e systems_mercury1500_datasheet
 
Introduction to Digital Electronics & What we will study.ppt
Introduction to Digital Electronics & What we will study.pptIntroduction to Digital Electronics & What we will study.ppt
Introduction to Digital Electronics & What we will study.ppt
 
R2D2 Project (EP/L006251/1) - Research Objectives & Outcomes
R2D2 Project (EP/L006251/1) - Research Objectives & OutcomesR2D2 Project (EP/L006251/1) - Research Objectives & Outcomes
R2D2 Project (EP/L006251/1) - Research Objectives & Outcomes
 
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str..."Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
 

More from Julián Urbano

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Julián Urbano
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Julián Urbano
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
Julián Urbano
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
Julián Urbano
 

More from Julián Urbano (20)

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
Your PhD and You
Your PhD and YouYour PhD and You
Your PhD and You
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP Correlation
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured Documents
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
 
A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music Similarity
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
 
Audio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and StabilityAudio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and Stability
 

Recently uploaded

ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
sreddyrahul
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 

Recently uploaded (20)

electrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptxelectrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptx
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
biotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptxbiotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptx
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 

What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Features?

  • 1. 5. How robust are Chroma features? 4. How robust are MFCCs features? Supported by the A4U postdoctoral grants programme and projects SIGMUS (TIN2012-36650), Compmusic (ERC 267583), PHENICX (ICT-2011.8.2) and GiantSteps (ICT-2013-10) What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Features? Julián Urbano, Dmitry Bogdanov, Perfecto Herrera, Emilia Gómez and Xavier Serra Department of Information and Communication Technologies Contact: julian.urbano@upf.edu http://mtg.upf.edu Variance components (% of total) in distributions of robustness of MFCCs 22050 Hz 44100 Hz δ ε r ρ θ δ ε r ρ θ Lib 1 σ FSize2 1 % 3 % 2 % 0 % 2 % 0 % 0 % 0 % 0 % 0 % σ Codec2 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % σ Brate:Codec2 31 % 42 % 22 % 8 % 21 % 47 % 42 % 23 % 24 % 22 % σ FSize×Codec2 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % σ FSize×(Brate:Codec) 2 5 % 12 % 12 % 1 % 13 % 7 % 18 % 18 % 11 % 18 % σ Genre2 1 % 5 % 4 % 0 % 4 % 1 % 1 % 1 % 0 % 1 % σ Track2 20 % 6 % 6 % 13 % 6 % 10 % 4 % 3 % 5 % 3 % σ residual2 42 % 33 % 54 % 79 % 54 % 34 % 35 % 56 % 60 % 57 % Grand mean 0.0591 1.6958 0.9999 0.9977 0.9999 0.0682 1.8820 0.9998 0.9939 0.9998 Total variance 0.0032 3.4641 1.8e-7 3.2e-5 1.5e-7 0.0081 11.44 1.6e-6 0.0005 1.4e-6 Standard deviation 0.0567 1.8612 0.0004 0.0056 0.0004 0.0897 3.3835 0.0013 0.0214 0.0012 Lib 2 σ FSize2 1 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % σ Codec2 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % σ Brate:Codec2 5 % 6 % 2 % 1 % 3 % 23 % 24 % 14 % 13 % 15 % σ FSize×Codec2 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % σ FSize×(Brate:Codec) 2 1 % 0 % 0 % 0 % 0 % 7 % 8 % 10 % 6 % 11 % σ Genre2 4 % 15 % 3 % 1 % 4 % 0 % 5 % 1 % 0 % 0 % σ Track2 52 % 61 % 32 % 66 % 41 % 27 % 14 % 7 % 13 % 6 % σ residual2 36 % 18 % 63 % 32 % 51 % 41 % 48 % 68 % 67 % 68 % Grand mean 0.0622 0.0278 0.9999 0.9955 0.9999 0.0656 0.0342 0.9998 0.9947 0.9999 Total variance 0.0040 0.0015 8.9e-8 0.0002 3.5e-8 0.0055 0.0034 6.4e-7 0.0002 4.8e-7 Standard deviation 0.0631 0.0391 0.0003 0.0131 0.0002 0.0740 0.0587 0.0008 0.0150 0.0007 Much research in MIR is based on descriptors computed from audio signals. Some music corpora use different audio encodings, some do not contain audio but descriptors already computed in some particular way, and sometimes we have to gather audio files ourselves. We thus assume that descriptors are robust to these changes and algorithms are not affected. We investigated this issue for MFCCs and Chroma: how do encoding quality, analysis parameters and musical characteristics affect their robustness? Variance components (% of total) in distributions of robustness of Chroma 22050 Hz 44100 Hz δ ε r ρ θ δ ε r ρ θ Lib 1 σ FSize2 2 % 3 % 0 % 0 % 0 % 2 % 2 % 0 % 0 % 1 % σ Genre2 3 % 3 % 1 % 1 % 1 % 3 % 3 % 1 % 1 % 1 % σ Track2 21 % 19 % 18 % 19 % 17 % 22 % 21 % 19 % 20 % 19 % σ residual2 75 % 75 % 81 % 80 % 82 % 72 % 74 % 80 % 78 % 80 % Grand mean 0.0610 0.0545 0.9554 0.9366 0.992 0.0588 0.0521 0.9549 0.9375 0.9922 Total variance 0.0046 0.0085 0.0276 0.0293 0.0014 0.0048 0.0082 0.0286 0.0298 0.0013 Standard deviation 0.0682 0.0924 0.1663 0.1713 0.0373 0.0695 0.0904 0.1691 0.1725 0.0355 Lib 2 σ Codec2 64 % 35 % 0 % 0 % 0 % 32 % 22 % 0 % 0 % 0 % σ Brate:Codec2 1 % 0 % 0 % 0 % 0 % 62 % 40 % 0 % 0 % 0 % σ Genre2 0 % 16 % 3 % 4 % 8 % 1 % 10 % 3 % 1 % 4 % σ Track2 19 % 33 % 97 % 93 % 92 % 3 % 14 % 94 % 93 % 77 % σ residual2 16 % 17 % 0 % 3 % 0 % 2 % 15 % 2 % 6 % 19 % Grand mean 0.0346 0.0031 0.9915 0.9766 0.9998 0.0260 0.0022 0.9989 0.9928 1.0000 Total variance 0.0004 5e-6 0.0002 0.0007 6.1e-8 0.0005 4.8e-6 3.7e-6 0.0001 1.8e-9 Standard deviation 0.0195 0.0022 0.0135 0.0270 0.0002 0.0213 0.0022 0.0019 0.0122 0.0000 1. What factors did we study? Encoding Quality •Sampling Rate: 22050 and 44100 Hz •Codec: WAV, MP3 CBR and MP3 VBR •Bitrate: 64 to 320 Kbps Analysis Parameters •Analysis Tool: •Lib1 (Essentia 2.0.1) •MFCCs: 40 mel bands, bins equally spaced, 0-11000 Hz •Chroma: 40-5000 Hz, estimates tuning frequency •Lib2 (QM Vamp Plugins 1.7) •MFCCs: 40 mel bands, 66-6364 Hz •Chroma: 65-2093 Hz, constant Q transform, assumes tuning at 440 Hz, ignores harmonics and fixes frame size to 16384 •Frame Size: 1024, 2048, …, 32768 samples Musical Characteristics •Genre: blues, classical, rock, jazz, disco/funk/soul, country, electronic, rap/hip-hop, reggae, rock’n’roll 2. What did we do? Corpus •Compile ad-hoc corpus of 400 music tracks, 395 different artists, uniformly covering all 10 genres •Clipped to 30 seconds for efficiency File versions •Original: lossless, encoded in FLAC •Derived: lossy, encoded in MP3, for all Sampling Rate, Codec and Bitrate Method •Compute MFCCs and Chroma vectors from all files (remove first MFCC) •Summarize frame-wise feature vectors with the mean of each coefficient •Compute indicators of robustness between derived and original descriptors •Block by Tool, Sampling Rate and Frame Size (not mixed in practice) 3. How did we measure robustness? 5 robustness indicators (error) between lossless and lossy •Mean relative error δ across coefficients (%) •Euclidean distance ε •Pearson correlation r •Spearman correlation ρ •Cosine similarity θ •Get 144400 datapoints per indicator and analyze their distribution •High robustness is good if we have heterogeneous encodings •Low stability is bad with heterogeneous encodings, but we can analyze what factor is responsible for the variability and control it Fitted random-effects models to study each effect separately •Controllable factors (Frame Size, Codec and Bitrate): fitted the main effects and all the interactions among them •Uncontrollable factors (Genre and Track): fitted only main effects •ANOVA to estimate variance components (individual effect contributions) σ2 robustness stability 0 error •Shape of MFCC vectors is preserved but individual coefficients differ •Most variability due to Track+residual effect, which cannot be controlled anyway •Independent of Genre •Frame Size is irrelevant (except for 64 Kbps in Lib1: low Frame Sizes are robust) •Lib2 more stable than Lib1 •But there is a large Codec:Bitrate effect, so we can achieve high robustness if we establish a minimum Bitrate •No effect of Codec in Lib1 (omitted). Frame Size omitted in Lib2 (it is fixed to16384) •Lib2 more robust than Lib1. Shape of Chroma vector kept too, •Lib2 more stable than Lib1 •Almost all variability due to uncontrollable Track+residual •But much variability in δ and ε is due to Codec and Bitrate with Lib2; avoided by normalizing Chroma vector to unit max Establishing a minimum Bitrate for MFCCs •Lib1 converges to δ≈3% at 256 Kbps, and stability decreases with bitrate (low within-group variance) •Lib2 converges to δ≈5% at 160-192 Kbps, but stability does not change after 96 Kbps (same within-group variance) •Lib1 is twice as robust with homogeneous encoding •Lib2 is more stable with heterogeneous encodings 6. What are the practical implications? Consider Genre Classification as an example •SVM per Sampling Rate, Codec, Bitrate, Tool & feature Accuracy •No significant differences found •With same encoding in training & testing •With different encoding in training & testing •Always best if training and testing with same encoding •No correlation between bitrate and accuracy; few differences attributable to Type I errors •Should study other low-level tasks more likely to be affected by lossy compression