Emotion in Music Task at MediaEval 2014

Emotion in Music: Task Overview
Anna Aljanaki1 Mohammad Soleymani2
Yi-Hsuan Yang3
1Utrecht University, Netherlands
2University of Geneva, Switzerland
3Academia Sinica, Taiwan
16-17 October, MediaEval 2014

Task definition
Description
I A benchmark for music emotion recognition systems
(similar but different from MIREX)
I Focusing on audio analysis (optionally, metadata)
Two subtasks
I Dynamic task (required): predict arousal and valence
values for a song every 0.5s.
I Feature design task: design new or rework existing audio
features to estimate emotion for the whole 45s musical
excerpt or dynamically.

Ground truth
Development set
I Collected for Emotion in Music brave new task in 2013.
I 744 files.
I 10 annotators per file.
Test set
I Additional data collected in 2014.
I 1000 files.
I 10 annotators per file.

Ground truth. Music
I 1744 musical excerpts of 45 seconds (randomly sampled)
from Free Music Archive (freemusicarchive.org).
I Curated music licensed under Creative Commons.
I Manually checked for quality.
I 10 genres: Rock, Pop, Electronic, Hip-Hop, Classical, Soul
and RnB, Country, Folk, International, Jazz

Ground truth. Annotations.
Collecting annotations.
I Amazon Mechanical Turk (mturk.com).
I 10 Mechanical Turk workers annotated each song.
I We averaged 10 annotations and provided to participants:
I Continuous annotations of valence and arousal (1 label
every 1=2 second).
I Static annotations of valence and arousal for each file
(independent from continuous).

Worker Instructions on Valence and Arousal Space
The workers were given the following instructions to introduce
valence-arousal space to them.
I Valence refers to the degree of positive or negative
emotions one experiences from a given piece of music.
I Positive valence: happiness, joy, excitement.
I Negative valence: sadness, fear, anxiety, anger.
I Arousal refers to the intensity of the music clip.
I High arousal: loud, energetic, emotionally engaging.
I Low arousal: quiet, peaceful, repetitive.

Annotation Interface

Some statistics
I 250 out of 424 workers (59%) passed the qualification test.
I It took annotators 10.5 minutes on average to complete the
task (3 songs), and we payed 0.40$ per task.
I 99% of time the song was unfamiliar to the annotator.
I In general, the music was enjoyed by annotators (on a
scale from 1 to 5, mean liking=3:32 1:22, median=4)

Static annotations.
A measure of inter-annotator agreement - Krippendorf’s alpha:
I Valence - 0.22
I Arousal - 0.37

Dynamic annotations.
A measure of inter-annotator agreement - Kendall’s W after
discarding first 15 seconds:
I Valence - 0:16 0:11
I Arousal - 0:2 0:13

Evaluation
Dynamic subtask evaluation
We use Pearson’s correlation coefficient and RMSE as metrics in the
following steps:
1. Calculate Pearson’s rho between predictions and ground truth
for each song separately.
2. Average across songs separately for valence and for arousal.
3. Rank all submissions for each dimension based on the averaged
rho.
4. In case the difference based on the one sided Wilcoxon test is
not significant (p0.05), we use RMSE to break the tie.
5. If the ranking changed, we do significance test between
neighbouring pairs again (bubble sort).
Feature design subtask evaluation
Same procedure, but Pearson’s rho is calculated for all the songs in
test set at once.

Baseline
The organizers decided not to submit and only provide a simple
baseline that participants should beat.
I Five features: Spectral Flux, HCDF (harmonic change
detection function), loudness, roughness and zero crossing
rate.
I Linear Regression

Results - Arousal
7 teams crossed the finish line, 6 teams beat the baseline (at
least for arousal).
Dynamic task
Rank Team Arousal
RMSE
1 TUMMISP 0:35 0:45 0:1 0:05
2 SAIL 0:28 0:50 0:13 0:07
3 UoA 0:21 0:57 0:08 0:05
4 Beatsens 0:23 0:56 0:12 0:05
5 Rainbow 0:18 0:60 0:12 0:07
6 THUHCSIL 0:17 0:41 0:12 0:05
7 Baseline 0:18 0:36 0:14 0:06
8 Average baseline 0 0:39 0:03

Results - Valence
Dynamic task
The teams highlighted in bold beat the baseline, other teams
are in the same rank with it.
Rank Team Valence
RMSE
1 TUMMISP 0:20 0:49 0:08 0:05
2 Beatsens 0:12 0:55 0:09 0:05
3 SAIL 0:15 0:5 0:10 0:06
4 UoA 0:17 0:5 0:14 0:07
5 THUHCSIL 0:10 0:37 0:09 0:05
5 Rainbow 0:07 0:29 0:10 0:06
5 Baseline 0:11 0:34 0:10 0:06
6 Average baseline 0 0:34 0:03

Results
Only one team designed new features.
Feature design - static evaluation.
Arousal Valence
2 RMSE 2 RMSE
SAIL 0:53 0:32 0:28 0:27
Feature design - dynamic evaluation.
Arousal Valence
RMSE RMSE
SAIL 0:22 0:12 0:11 0:09

Results
Dynamic runs - Arousal.

Results
Dynamic runs - Valence.

Approaches
Beatsens
I 54 features from MIRToolbox.
I Annotations are modeled as a continuous conditional
random field (CCRF) process.
I SVR is used as base classifier.
I Best performance is achieved by a combination of spectral,
dynamic and rhythmic features, of which the most
important were MFCCs.

Approaches
SAIL
Have designed 3 types of new features
1. Compressibility features
2. Median Spectral Band Energy
3. Spectral Centre of Mass
Use Partial Least Squares Regression in combination with
Haar coefficients to predict the dynamic ratings based on
features from the whole song.

Emotion in Music Task at MediaEval 2014

Recommended

Recommended

More Related Content

More from multimediaeval

More from multimediaeval (20)

Recently uploaded

Recently uploaded (20)

Emotion in Music Task at MediaEval 2014