Music Emotion Tracking with 
Continuous Conditional Neural Fields 
and Relative Representation 
Vaiva Imbrasaitė, Peter Robinson 
Vaiva.Imbrasaite@cl.cam.ac.uk
Basic approach 
Song 
0.5s 
Feature vector 
… …
Basic approach 
… … 
… … 
… … 
… … 
… … 
… … 
… … 
… … 
Arousal 
values 
Valence 
values 
Model for 
arousal 
Model for 
valence
Combined 
Relative features CCNF 
Baseline
Relative representation 
• Based on the idea of expectation 
• Feature vector: 
– Average of a particular feature for the song 
– Difference between the average and the absolute 
value for that feature vector
α1 α2 α1 α2 
β2 
β1 
CCNF 
...
Method 
● Features extracted with OpenSmile 
─ 150 features for basic and 300 for relative 
● Features: MFCC, various spectral 
descriptors, RMS energy, pitch descriptors 
─ Statistical measures on those: range, variance, 
SD, skewness 
● Two-fold cross validation to select hyper-parameters 
─ 4 random seeds for parameter selection and 
20 for training (highest likelihood) for CCNF
Results 
Model 
Corr for 
valence 
Corr for 
arousal 
RMSE for 
valence 
RMSE for 
arousal 
SVR-basic 0.073 0.129 0.100 0.146 
SVR-relative 0.074 0.148 0.099 0.147 
CCNF-basic 0.063 0.116 0.102 0.139 
CCNF-relative 0.066 0.181 0.098 0.118 
Range for correlation for CCNF – twice that of SVR, RMSE – the same for both
Insights about CCNF 
• Potentially very powerful and flexible 
– Can be made multi-modal 
• Can be quite slow and heavy-weight 
– More hyperparameters than SVR, but fast to check 
• Very sensitive to the number of features used 
– Can be problematic with smaller datasets
Where to find us? 
www.cl.cam.ac.uk/research/rainbow/projects/ccnf/ 
– Appendix – full derivation 
– MATLAB code 
– Other publications
Music Emotion Tracking with 
Continuous Conditional Neural Fields 
and Relative Representation 
Vaiva Imbrasaitė, Peter Robinson 
Vaiva.Imbrasaite@cl.cam.ac.uk

Music Emotion Tracking with Continuous Conditional Neural Fields and Relative Representation

  • 1.
    Music Emotion Trackingwith Continuous Conditional Neural Fields and Relative Representation Vaiva Imbrasaitė, Peter Robinson Vaiva.Imbrasaite@cl.cam.ac.uk
  • 2.
    Basic approach Song 0.5s Feature vector … …
  • 3.
    Basic approach …… … … … … … … … … … … … … … … Arousal values Valence values Model for arousal Model for valence
  • 4.
  • 5.
    Relative representation •Based on the idea of expectation • Feature vector: – Average of a particular feature for the song – Difference between the average and the absolute value for that feature vector
  • 6.
    α1 α2 α1α2 β2 β1 CCNF ...
  • 7.
    Method ● Featuresextracted with OpenSmile ─ 150 features for basic and 300 for relative ● Features: MFCC, various spectral descriptors, RMS energy, pitch descriptors ─ Statistical measures on those: range, variance, SD, skewness ● Two-fold cross validation to select hyper-parameters ─ 4 random seeds for parameter selection and 20 for training (highest likelihood) for CCNF
  • 8.
    Results Model Corrfor valence Corr for arousal RMSE for valence RMSE for arousal SVR-basic 0.073 0.129 0.100 0.146 SVR-relative 0.074 0.148 0.099 0.147 CCNF-basic 0.063 0.116 0.102 0.139 CCNF-relative 0.066 0.181 0.098 0.118 Range for correlation for CCNF – twice that of SVR, RMSE – the same for both
  • 9.
    Insights about CCNF • Potentially very powerful and flexible – Can be made multi-modal • Can be quite slow and heavy-weight – More hyperparameters than SVR, but fast to check • Very sensitive to the number of features used – Can be problematic with smaller datasets
  • 10.
    Where to findus? www.cl.cam.ac.uk/research/rainbow/projects/ccnf/ – Appendix – full derivation – MATLAB code – Other publications
  • 11.
    Music Emotion Trackingwith Continuous Conditional Neural Fields and Relative Representation Vaiva Imbrasaitė, Peter Robinson Vaiva.Imbrasaite@cl.cam.ac.uk