Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Brave New Task: Musiclef Multimodal Music Tagging

1,287 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Brave New Task: Musiclef Multimodal Music Tagging

  1. 1. Multimodal Music Tagging TaskNicola Orio – University of PadovaCynthia C. S. Liem – Delft University of TechnologyGeoffroy Peeters – UMR STMS IRCAM-CNRS, ParisMarkus Schedl – Johannes Kepler University, LinzMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 1
  2. 2. Multimodal music tagging• Definition • Songs of a commercial music library need to be categorized according to their usage in TV and radio broadcasts (e.g. soundtracks, jingles)• Practical motivation • The search for suitable music for video productions is a major activity for professionals and lay users alike • Collaborative filtering systems are taking their role • Notwithstanding their known limitations: long-tail, cold start… • Annotating professional music libraries is another important professional activityMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 2
  3. 3. Human assessment Different sources of information are routinely exploited by professionals to overcome limitations of individual mediaMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 3
  4. 4. Goals of MusiClef• To focus evaluation on professional application scenarios • Textual description of music items• To grant replication of experiments and results • Feature extraction phase is crucial – released features computed with public, open-source library (MIRToolbox)• To promote the exploitation of multimodal sources of information • Content (audio) + Context (tags & webpages)• To disseminate music related initiatives • Outside the music information retrieval communityMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 4
  5. 5. Evaluation initiatives – 1• MIREX (since 2004) • Community-based selection of tasks • Many tasks address audio feature extraction algorithms • Participants submit algorithms that are run by organizers • Music files are not shared with participants• Million Song Dataset (since 2011) • Task on music recommendation proposed by organizers • Audio features are computed using proprietary algorithms • Only features are shared with participantsMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 5
  6. 6. Evaluation initiatives – 2• Quaero-Eval (since 2012) • Tasks agreed with participants • Strategies to grant public access to evaluation results • Participants run training experiments on a shared repository • Runs on test set made by the organizersMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 6
  7. 7. Test collection – 1• Individual songs of pop and rock music • 1355 songs (from 218 artists) • train (975) and test (380) split• Social tags • Gathered from Last.fm API• Multilingual sets of Web pages related to artists+albums • Mined querying Google• Acoustic features: MFCC (using MIRToolbox) with a window length of 200ms and 50% overlapMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 7
  8. 8. Test collection – 2• Test collection created starting from the “500 Greatest Songs of All Time” (Rolling Stone) • Expected high number of social tags and web pages• Ground truth created by experts in the domain • 355 tags selected (167 genre, 288 usage) • Tags associated to less than 20 songs were discarded• Reference implementation in Matlab • Participants has an example to run a complete experiment • Code for the evaluation made already availableMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 8
  9. 9. Evaluation measures• Standard IR measures• Accuracy• Precision• Recall• Specificity• F-measureMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 9
  10. 10. Examining tags more closely• Some tags are more equal than others… hard rock ballroom melancholic travel countryside bright• Thus, we propose to also analyze results employing a higher-level tag categorizationMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 10
  11. 11. Tag categorization – 1• Affective, mood-related aspects: • activity: the amount of perceived music activity, without implying strong positive or negative affective qualities (e.g. fast, mellow, lazy) • affective state: affective qualities that can only be connected and attributed to living beings (e.g. aggressive, hopeful) • atmosphere: affective qualities that can be connected to environments (e.g. chaotic, intimate).MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 11
  12. 12. Tag categorization – 2• Situation, time and space aspects of the music: • Physical situation: concrete physical environments (e.g. city, night). • Occasion: implications of time and space, typically connected to social events (e.g. holiday, glamour).• Sociocultural genre (e.g. new wave, r&b, punk)• Sound qualities: • timbral aspects (e.g. acoustic, bright) • temporal aspects (e.g. beat, groove).• Other (e.g. catchy, evocative).MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 12
  13. 13. Reference implementation• Made in MATLAB and released publicly• Simple and straightforward approaches: • Individual GMMs for audio, user tags, web pages • Tagging process: 1-NN qualification using symmetrized KL• Scenarios tested: • Audio, user tags, web pages individually • Majority vote • UnionMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 13
  14. 14. Baseline results – 1• Evaluation of the submitted runs and of the reference implementation • Results with different modalities over the full datasetstrategy accuracy recall precision specificity f-measureaudio 0.894 0.148 0.127 0.939 0.126tags 0.898 0.061 0.039 0.942 0.037web pages 0.897 0.050 0.007 0.954 0.011majority 0.880 0.123 0.086 0.922 0.086union 0.824 0.240 0.115 0.845 0.134MediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 14
  15. 15. Baseline results – 2 1. activity, energy 2. affective state 3. atmosphere 4. other 5. situation: occasion 6. situation: physical 7. sociocultural genre 8. sound: temporal 9: sound: timbralMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 15
  16. 16. Participation• Initially a lot of interest - about 8 explicitly interested parties• But ultimately just one participant (LUTIN UserLab) • Aggregation of estimators• Currently investigating what happened to the 7 others • So far, it appears ISMIR 2012 was inconveniently close • The 3 other MusiClef co-organizers will discuss this thereMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 16
  17. 17. Conclusions• We established a multimodal music tagging benchmark task• Special effort in facilitating deeper tag analysis• We would like a 2013 multimodal music benchmark task • Depending on survey input • Depending on your inputMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 17
  18. 18. Thank you for your attention!For contact and more information: musiclef@dei.unipd.itMediaEval, Pisa 05/10/2012 MusiClef: Multimodal Music Tagging Task 18

×