Gen AI in Business - Global Trends Report 2024.pdf
Multimodal Music Tagging Task Overview
1. Multimodal Music Tagging Task
Nicola Orio – University of Padova
Cynthia C. S. Liem – Delft University of Technology
Geoffroy Peeters – UMR STMS IRCAM-CNRS, Paris
Markus Schedl – Johannes Kepler University, Linz
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 1
2. Multimodal music tagging
• Definition
• Songs of a commercial music library need to be categorized
according to their usage in TV and radio broadcasts (e.g.
soundtracks, jingles)
• Practical motivation
• The search for suitable music for video productions is a
major activity for professionals and lay users alike
• Collaborative filtering systems are taking their role
• Notwithstanding their known limitations: long-tail, cold start…
• Annotating professional music libraries is another important
professional activity
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 2
3. Human assessment
Different sources of information are routinely exploited
by professionals to overcome limitations of individual media
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 3
4. Goals of MusiClef
• To focus evaluation on professional application scenarios
• Textual description of music items
• To grant replication of experiments and results
• Feature extraction phase is crucial – released features
computed with public, open-source library (MIRToolbox)
• To promote the exploitation of multimodal sources of
information
• Content (audio) + Context (tags & webpages)
• To disseminate music related initiatives
• Outside the music information retrieval community
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 4
5. Evaluation initiatives – 1
• MIREX (since 2004)
• Community-based selection of tasks
• Many tasks address audio feature extraction algorithms
• Participants submit algorithms that are run by organizers
• Music files are not shared with participants
• Million Song Dataset (since 2011)
• Task on music recommendation proposed by organizers
• Audio features are computed using proprietary algorithms
• Only features are shared with participants
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 5
6. Evaluation initiatives – 2
• Quaero-Eval (since 2012)
• Tasks agreed with participants
• Strategies to grant public access to evaluation results
• Participants run training experiments on a shared repository
• Runs on test set made by the organizers
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 6
7. Test collection – 1
• Individual songs of pop and rock music
• 1355 songs (from 218 artists)
• train (975) and test (380) split
• Social tags
• Gathered from Last.fm API
• Multilingual sets of Web pages related to artists+albums
• Mined querying Google
• Acoustic features: MFCC (using MIRToolbox) with a
window length of 200ms and 50% overlap
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 7
8. Test collection – 2
• Test collection created starting from the “500 Greatest
Songs of All Time” (Rolling Stone)
• Expected high number of social tags and web pages
• Ground truth created by experts in the domain
• 355 tags selected (167 genre, 288 usage)
• Tags associated to less than 20 songs were discarded
• Reference implementation in Matlab
• Participants has an example to run a complete experiment
• Code for the evaluation made already available
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 8
9. Evaluation measures
• Standard IR measures
• Accuracy
• Precision
• Recall
• Specificity
• F-measure
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 9
10. Examining tags more closely
• Some tags are more equal than others…
hard rock ballroom
melancholic
travel
countryside
bright
• Thus, we propose to also analyze results employing a
higher-level tag categorization
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 10
11. Tag categorization – 1
• Affective, mood-related aspects:
• activity: the amount of perceived music
activity, without implying strong positive or
negative affective qualities (e.g.
'fast', 'mellow', 'lazy')
• affective state: affective qualities that can only be
connected and attributed to living beings (e.g.
'aggressive', 'hopeful')
• atmosphere: affective qualities that can be
connected to environments (e.g.
'chaotic', 'intimate').
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 11
12. Tag categorization – 2
• Situation, time and space aspects of the music:
• Physical situation: concrete physical environments
(e.g. 'city', 'night').
• Occasion: implications of time and space, typically
connected to social events (e.g. 'holiday', 'glamour').
• Sociocultural genre (e.g. 'new wave', 'r&b', 'punk')
• Sound qualities:
• timbral aspects (e.g. 'acoustic', 'bright')
• temporal aspects (e.g. 'beat', 'groove').
• Other (e.g. 'catchy', 'evocative').
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 12
13. Reference implementation
• Made in MATLAB and released publicly
• Simple and straightforward approaches:
• Individual GMMs for audio, user tags, web pages
• Tagging process: 1-NN qualification using symmetrized KL
• Scenarios tested:
• Audio, user tags, web pages individually
• Majority vote
• Union
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 13
14. Baseline results – 1
• Evaluation of the submitted runs and of the reference
implementation
• Results with different modalities over the full dataset
strategy accuracy recall precision specificity f-measure
audio 0.894 0.148 0.127 0.939 0.126
tags 0.898 0.061 0.039 0.942 0.037
web pages 0.897 0.050 0.007 0.954 0.011
majority 0.880 0.123 0.086 0.922 0.086
union 0.824 0.240 0.115 0.845 0.134
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 14
15. Baseline results – 2
1. activity, energy
2. affective state
3. atmosphere
4. other
5. situation: occasion
6. situation: physical
7. sociocultural genre
8. sound: temporal
9: sound: timbral
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 15
16. Participation
• Initially a lot of interest - about 8 explicitly interested
parties
• But ultimately just one participant (LUTIN UserLab)
• Aggregation of estimators
• Currently investigating what happened to the 7 others
• So far, it appears ISMIR 2012 was inconveniently close
• The 3 other MusiClef co-organizers will discuss this there
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 16
17. Conclusions
• We established a multimodal music tagging benchmark task
• Special effort in facilitating deeper tag analysis
• We would like a 2013 multimodal music benchmark task
• Depending on survey input
• Depending on your input
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 17
18. Thank you for your attention!
For contact and more information: musiclef@dei.unipd.it
MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 18