Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multi-modal Music MoodClassification Using Audio, Lyrics         and Social Tags                  Xiao Hu      National In...
Outline• Multimodal Music Mood Classification  – Research questions  – Methodology  – Findings and contributions• Future R...
Music Mood ClassificationExercise: What do you feel about …                      Here comes the sun,How people   categoriz...
Why Mood           4
State-of-the-Art• Mood categories directly adopted from music  psychological models   – Lack for social context of music l...
Research Questions• Q1: Can social tags help develop mood taxonomy?• Q2: Which lyric features are the most useful for musi...
Q1: Mood Categories• New topic in information science• Influential models in music psychology  – Categorical : Hevner (193...
Russell’s 2D Model
Can Social Tags Help?• Last.fm  – One largest tagging site for Western popular    music
Social Tags•   Pros:     –   Users’ perspectives     –   Large quantity•   Cons:     –   Noisy: “I aaaaam lov’in it”   Lin...
11
Distances between Categories• Calculated by song co-occurrences  – Categories associating with the same songs are    simil...
Identified Categories   13
Research Questions• Q1: Can social tags help identify mood categories that  are more realistic?• Q2: Which lyric features ...
What do they feel about…
Multi-modal                   Social Tags                                       Mood Categories                           ...
Classification Experiments• Evaluation task   – Binary    Classification• Evaluation measures and tests   – Accuracy   – F...
Ground Truth Dataset•   Built from social tags•   Has audio, lyrics and social tags•   5,296 unique songs•   18 mood categ...
Baseline System                  (audio-based)• The AMC tasks in MIREX  – MIREX: Music Information Retrieval Evaluation eX...
Lyric-based System• Very little existing work   – Only used basic text features:      bag_of_words, part_of_speech   – Wor...
Best Lyric Features• Basic features:   – Content words, part-of-speech, function words• Psycholinguistic features:   – Psy...
Lyric Feature Example                        22
No significant       difference     between top     combinations23
24
25
26
Research Questions• Q1: Can social tags help identify mood categories that  are more realistic?• Q2: Which lyric features ...
Combine Lyrics and Audio• Two hybrid methods:   – Late fusion        Lyric Classifier                                     ...
System PerformancesAudio + Lyrics   Lyrics                          Audio                                  30
Effectiveness                31
Research Questions• Q1: Can social tags help identify mood categories that  are more realistic?• Q2: Which lyric features ...
Automatic Classification                  (supervised learning)                         Classifier for “Happy”    “Here co...
Learning Curves
ConclusionsQ1: Can social tags help identify mood categories  that are more realistic?Q2: The most useful lyric Combinatio...
What does computer feel about…
ContributionsMethodology• Mood categories identified from social tags complement psychological  models• Established an exa...
Application              39
Feature Analysis                   40
Audio vs. Lyrics                   41
Top Lyric Features                     42
Top Lyric Features in “Calm”                               43
Top Affective   Words      vs.                44
Future Research Directions                             45
Affect Analysis for Information Studies• Affect is an important factor in information behavior and  information access• NL...
Future Research Directions• Multimedia, multimodal: audio-visual-textual                                                 47
Summary• Multimodal Music Mood Classification  – Combining lyrics and audio helps improve     effectiveness     efficiency...
49
References•   Hu, X. and Downie, J. S. (2010) When Lyrics Outperform Audio for Music Mood    Classification: A Feature Ana...
References (2)•   Hu, X., Downie, J. S., Laurier, C., Bay, M. and Ehmann, A. (2008a). The 2007    MIREX Audio Music Classi...
Multi modal music mood classification
Multi modal music mood classification
Upcoming SlideShare
Loading in …5
×

Multi modal music mood classification

2,371 views

Published on

Research Seminar at National Institute of Informatics, Japan

Multi modal music mood classification

  1. 1. Multi-modal Music MoodClassification Using Audio, Lyrics and Social Tags Xiao Hu National Institute of Informatics July 5, 2011
  2. 2. Outline• Multimodal Music Mood Classification – Research questions – Methodology – Findings and contributions• Future Research 2
  3. 3. Music Mood ClassificationExercise: What do you feel about … Here comes the sun,How people categorize music mood? here comes the sun, and I say its all right Little darling, its been a How well can computer do it? long cold lonely winter Little darling, it feels like years since its been here Here comes the sun, here comes the sun, ……. 3
  4. 4. Why Mood 4
  5. 5. State-of-the-Art• Mood categories directly adopted from music psychological models – Lack for social context of music listening (Juslin & Laukka, 2004) – Can social tags help?• Evaluation datasets are small – Low consistency cross assessors (Skowronek et al., 2006 Hu et al., 2008)• Suboptimal performances of automatic music mood classification systems – Mostly audio-based – Can lyrics help? 5
  6. 6. Research Questions• Q1: Can social tags help develop mood taxonomy?• Q2: Which lyric features are the most useful for music mood classification?• Q3: Are lyrics better than audio in music mood classification?• Q4: Can combining lyrics and audio improve the effectiveness of mood classification?• Q5: Can combining lyrics and audio improve the efficiency of mood classification? – Number of training examples – Length of audio dataQ2-5: Improving classification performance by combining lyrics and audio 6
  7. 7. Q1: Mood Categories• New topic in information science• Influential models in music psychology – Categorical : Hevner (1936) – Dimensional : Russell (1980) often used in previous research on music mood classification 7
  8. 8. Russell’s 2D Model
  9. 9. Can Social Tags Help?• Last.fm – One largest tagging site for Western popular music
  10. 10. Social Tags• Pros: – Users’ perspectives – Large quantity• Cons: – Noisy: “I aaaaam lov’in it” Linguistic Resources: WordNet-Affect – Ambiguous: “love” Human Expertise: – Synonyms: “calm”, “serene” 2 music retrieval experts native English speakers – “Long tail” 10
  11. 11. 11
  12. 12. Distances between Categories• Calculated by song co-occurrences – Categories associating with the same songs are similar• Plotted in 2-D space using Multidimensional Scaling 12
  13. 13. Identified Categories 13
  14. 14. Research Questions• Q1: Can social tags help identify mood categories that are more realistic?• Q2: Which lyric features are the most useful for music mood classification?• Q3: Are lyrics better than audio in music mood classification?• Q4 Can combining lyrics and audio improve the effectiveness of mood classification?• Q5: Can combining lyrics and audio improve the efficiency of mood classification? – Number of training examples – Length of audio data 14
  15. 15. What do they feel about…
  16. 16. Multi-modal Social Tags Mood Categories Ground Truth MUSIC Audio Lyrics Automatic ClassificationQ2-5: Improving classification performance by combining lyrics and audio 16
  17. 17. Classification Experiments• Evaluation task – Binary Classification• Evaluation measures and tests – Accuracy – Friedman’s ANOVA• Classification algorithm – SVM (LIBSVM implementation) 17
  18. 18. Ground Truth Dataset• Built from social tags• Has audio, lyrics and social tags• 5,296 unique songs• 18 mood categories• Equal positive and negative examples• 12,980 examples numbers of positive examples in categories 18
  19. 19. Baseline System (audio-based)• The AMC tasks in MIREX – MIREX: Music Information Retrieval Evaluation eXchange – AMC: Audio Mood Classification• A leading system in AMC 2007 and 2008: Marsyas – Music Analysis, Retrieval and Synthesis for Audio Signals; led by Prof. Tzanetakis@UVic.ca – Uses audio spectral features 19
  20. 20. Lyric-based System• Very little existing work – Only used basic text features: bag_of_words, part_of_speech – Worse than audio-based approaches• This research extracted and compared a range of novel lyric features 20
  21. 21. Best Lyric Features• Basic features: – Content words, part-of-speech, function words• Psycholinguistic features: – Psychological categories in GI (General Inquirer) – Scores in ANEW (Affective Norm of English Words)• Stylistic features: – Punctuation marks; interjection words – Statistics: e.g., how many words per minute• Combinations: 255 of them! Most comprehensive study on lyric classification so far. 21
  22. 22. Lyric Feature Example 22
  23. 23. No significant difference between top combinations23
  24. 24. 24
  25. 25. 25
  26. 26. 26
  27. 27. Research Questions• Q1: Can social tags help identify mood categories that are more realistic?• Q2: Which lyric features are the most useful for music mood classification?• Q3: Are lyrics better than audio in music mood classification?• Q4 Can combining lyrics and audio improve the effectiveness of mood classification?• Q5: Can combining lyrics and audio improve the efficiency of mood classification? – Number of training examples – Length of audio data 27
  28. 28. Combine Lyrics and Audio• Two hybrid methods: – Late fusion Lyric Classifier Prediction Final Prediction Prediction Audio Classifier – Feature concatenation Classifier Prediction 28
  29. 29. System PerformancesAudio + Lyrics Lyrics Audio 30
  30. 30. Effectiveness 31
  31. 31. Research Questions• Q1: Can social tags help identify mood categories that are more realistic?• Q2: Which lyric features are the most useful for music mood classification?• Q3: Are lyrics better than audio in music mood classification?• Q4 Can combining lyrics and audio improve the effectiveness of mood classification?• Q5: Can combining lyrics and audio improve the efficiency of mood classification? – Number of training examples – Length of audio data 33
  32. 32. Automatic Classification (supervised learning) Classifier for “Happy” “Here comes the sun”  Y Y “ I will be back”  N “Down with the N sickness”  N Song A  Y Song B N N ………Training examplesfor “Happy” New examples 34
  33. 33. Learning Curves
  34. 34. ConclusionsQ1: Can social tags help identify mood categories that are more realistic?Q2: The most useful lyric Combination of words, linguistic features are: features and text stylistic featuresQ3: Are lyrics better than audio in music mood classification ?Q4: Can combining lyrics and audio improve the effectiveness of mood classification?Q5: Can combining lyrics and audio improve the efficiency of mood classification? 36
  35. 35. What does computer feel about…
  36. 36. ContributionsMethodology• Mood categories identified from social tags complement psychological models• Established an example of using empirical data to refine/adapt theoretical models• Improved lyric affect analysis and multi-modal mood classificationEvaluation• Proposed efficient method in building ground truth datasets• Largest dataset with ternary information sources to date made available to MIR community via MIREX 2009 http://www.music-ir.org/mirex/2009/index.php/Audio_Tag_ClassificationApplication• Provided practical reference for MIR systems• Moodydb.com 38
  37. 37. Application 39
  38. 38. Feature Analysis 40
  39. 39. Audio vs. Lyrics 41
  40. 40. Top Lyric Features 42
  41. 41. Top Lyric Features in “Calm” 43
  42. 42. Top Affective Words vs. 44
  43. 43. Future Research Directions 45
  44. 44. Affect Analysis for Information Studies• Affect is an important factor in information behavior and information access• NLP techniques have been applied to attitude, sentiment and opinion analysis• I am interested in its applications on human cognition and learning• English and Chinese; Text and Music• Paper accepted to ISMIR “Exploring the Relationship Between Mood and Creativityin Rock Lyrics” 46
  45. 45. Future Research Directions• Multimedia, multimodal: audio-visual-textual 47
  46. 46. Summary• Multimodal Music Mood Classification – Combining lyrics and audio helps improve effectiveness efficiency – Contributions – Feature analysis• Future Research – Affect factor in informatics – Multimodal, multimedia (Photo mining seminar on Thursday! – Prof. Winston Hsu from Taiwan) 48
  47. 47. 49
  48. 48. References• Hu, X. and Downie, J. S. (2010) When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis, In Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR), Aug. 2010, Utrecht, Netherland.• Hu, X. and Downie, J. S. (2010) Improving Mood Classification in Music Digital Libraries by Combining Lyrics and Audio, In Proceedings of the Joint Conference on Digital Libraries’2010, (JCDL), June 2010, Surfers Paradise, Australia. (Best Student Paper Award).• Hu, X. (2010) Music and Mood: Where Theory and Reality Meet, In the Proceedings of the 5th iConference, University of Illinois at Urbana-Champaign, Feb. 2010, Champaign, IL (Best Student Paper Award).• Hu, X. Downie, J. S. and Ehmann, A.(2009) Lyric Text Mining in Music Mood Classification, ISMIR’ 09.• Hu, X. (2009) Combining Text and Audio for Music Mood Classification in Music Digital Libraries, IEEE Bulletin of Technical Committee on Digital Libraries (TCDL), 5(3)• Hu, X. (2010) Multi-modal Music Mood Classification, presented in the Jean Tague- Sutcliffe Doctoral Research Poster session at the ALISE Annual Conference, Jan. 2010, Boston, MA. (3rd Place Award).• Hu, X. (2009) Categorizing Music Mood in Social Context, In Proceedings of the Annual Meeting of ASIS&T (CD-ROM), Nov. 2009, Vancouver, Canada. 50
  49. 49. References (2)• Hu, X., Downie, J. S., Laurier, C., Bay, M. and Ehmann, A. (2008a). The 2007 MIREX Audio Music Classification task: lessons learned, In Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR’08). Sept. 2008, Philadelphia, USA.• Juslin, P. N. and Laukka, P. (2004). Expression, perception, and induction of musical emotions: a review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3): 217-238.• Juslin, P. N. and Sloboda, J. A. (2001). Music and emotion: introduction. In P. N. Juslin and J. A. Sloboda (Eds.), Music and Emotion: Theory and Research. New York: Oxford University Press.• Skowronek, J., McKinney, M. F. and van de Par, S. (2006). Ground truth for automatic music mood classification. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR’06), Oct. 2006, Victoria, Canada. 51

×