This document describes research on dynamic music emotion recognition using state-space models. It focuses on two subtasks: feature development for static affect prediction and new modeling approaches for dynamic affect prediction. The researchers extract audio features from music clips and use Kalman filters and Gaussian process state-space models to model the affect trajectories over time. They train and evaluate the models on development and test datasets, finding that clustering the training data improves results and that Kalman filters and Gaussian process models perform similarly, with Gaussian processes slightly better for valence estimation. Baseline audio features also perform relatively well compared to other features.
Diamond Application Development Crafting Solutions with Precision
Dynamic Music Emotion Recognition Using State-Space Models
1. Dynamic Music Emotion Recognition
using State-Space Models
Team UoA
Konstantin Markov*, Tomoko Matsui**
*The University of Aizu, Japan
**Institute of Statistical Mathematics, Japan
2. Subtask focus
Subtask 1: Feature development
Static affect prediction
New features
Signal Processing challenge
Subtask 2: Dynamic estimation
Dynamic affect prediction
New modeling approaches
Machine learning challenge
3. Subtask focus
Subtask 1: Feature development
Static affect prediction
New features
Signal Processing challenge
Subtask 2: Dynamic estimation
Dynamic affect prediction
New modeling approaches
Machine learning challenge
4. Affect trajectory
Dynamic emotion recognition is estimation of the
affect trajectory in time!
Apply time series analysis tools:
Trajectory estimation is a time series
filtering/smoothing task.
Well suited are the State-Space Models (SSM).
6. Gaussian Filtering/Smoothing
Given a State-Space Model:
The task of filtering is to approximate
푝 푥푡 푦1:푡 , 푡 = 1, … 푇
The task of smoothing is to approximate
푝 푥푡 푦1:푇 , 푡 = 1, … 푇
When they are approximated by
Gaussian distributions, the task is called
Gaussian filtering/smoothing.
9. Gaussian Process SSM (GP-SSM)
Gaussian Processes based state-space model:
푥푡 = 푓 푥푡−1 + υ푡 , 푓(푥) ∼ 풢풫(0, 퐾푓 )
푦푡 = 푔 푥푡 + ν푡, 푔(푥) ∼ 풢풫(0, 퐾푔)
Advantages:
Non-linear, Non-parametric
Flexible.
Disadvantages:
No standard algorithms for training and inference.
Analytic moment matching approximation (Deisenroth,2012)
Computationally expensive.
10. Experiments
Feature extraction.
Marsyas tool.
mfcc – MelFreq Cepstral Coefficients
spfe – zero-cross, spectral flux, centroid, rolloff.
scf – spectral crest factor.
baseline – Features used in the official baseline
system.
Independent state and observation model learning.
Multivariate linear regression for the KF.
GP regression learning for the GP-SSM.
11. Experiments
Development data.
Training set - 600 clips.
Validation set – 144 clips.
Training set clustering.
Four clusters based on clips static A-V vectors.
Separate SSM trained for each
cluster.
Maximum likelihood based model
selection.
12. Results on Development Data
Feature Kalman filter RTS smoother
R RMSE R RMSE
AROUSAL – SINGLE MODEL
mfcc 0.2062 0.2894 0.1070 0.3008
mfcc+spfe 0.2326 0.2378 0.0894 0.2291
mfcc+scf 0.1171 0.2288 0.1611 0.2188
baseline 0.2791 0.3631 0.1898 0.4027
AROUSAL – MULTIPLE MODELS
mfcc 0.1698 0.1384 0.0991 0.1284
mfcc+spfe 0.2022 0.1290 0.1246 0.1277
mfcc+scf 0.0059 0.1613 0.0253 0.1615
baseline 0.0212 0.2276 0.0236 0.2259
13. Results on Development Data
Feature Kalman filter RTS smoother
R RMSE R RMSE
VALENCE – SINGLE MODEL
mfcc 0.0411 0.3131 0.0598 0.3542
mfcc+spfe 0.0304 0.3100 0.0725 0.3495
mfcc+scf 0.1545 0.3346 0.1401 0.3616
baseline 0.0753 0.1341 0.0779 0.1499
VALENCE – MULTIPLE MODELS
mfcc -0.082 0.1847 -0.042 0.1915
mfcc+spfe -0.054 0.1866 -0.068 0.1914
mfcc+scf 0.0149 0.1688 -0.008 0.1703
baseline -0.080 0.2425 -0.058 0.2497
16. Conclusions
Training data clustering.
Did not improve the Kalman Filter performance.
The only way GP-SSM could be trained.
KF vs. GP-SST
Similar performance under similar conditions.
GP-SST a bit better for Valence estimation.
Feature types.
Baseline features seem to perform well.
No definite winner.