Dimensional Music Emotion Recognition


Published on

Invited talk at Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain, 2011/12/15.

This talk describes a computational framework for automatically predicting the emotion we perceive in music listening. Based on this computational framework, novel emotion-based music organization, browsing, and retrieval methods can be created to provide users an an intuitive, easy-to-use, and effective way for accessing music information.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Dimensional Music Emotion Recognition

  1. 1. Dec. 2011 @ MTG, UPF Dimensional Music Emotion RecognitionYi-Hsuan YangAssistant Research FellowMusic & Audio Computing (MAC) LabResearch Center for IT InnovationAcademia Sinica 1
  2. 2. Music & EmotionMusic conveys emotion and modulates our moodMusic emotion recognition (MER) Understand how human perceives/feels emotion when listening to music Develop systems for emotion-based music retrieval 2
  3. 3. Why Do We Listen to Music? Motive Ratio “to express, release, and influence emotions” 47% “to relax and settle down” 33% “for enjoyment, fun, and pleasure” 22% “as company and background sound” 16% “because it makes me feel good” 13% “because it’s a basic need, I can’t live without it” 12% “because I like/love music” 11% “to get energized” 9% “to evoke memories” 4%“Expression, Perception, and Induction of Musical Emotions: A Review and aQuestionnaire Study of Everyday Listening,” Patrik N. Juslin and Petri Laukka,Journal of New Music Research, 2004 3
  4. 4. Categories of EmotionExpressed (intended) emotion What a performer tries to expressPerceived emotion What a listener perceives as being expressed in music Usually the same as the expressed emotionFelt (induced) emotion What a listener actually feels Strongly influenced by the context of music listening (environment, mood) 4
  5. 5. Emotion Description w/ Mood Labels 5Courtesy of Ching-Wei Chen @ Gracenote
  6. 6. Description w/ Latent Dimensions 6
  7. 7. Categorical Approach Audio spectrum Hevner’ model (1936) 7
  8. 8. Dimensional Approach Audio spectrum Emotion plane (Russell 1980, Thayer 1989) 8
  9. 9. Categorical vs. Dimensional Pros ConsCategorical • Intuitive • Lack a unifying model • Natural language • Ambiguous • Atomic description • Subjective • Difficult to offer fine-grained differentiationDimensional • Focus on a few • Less intuitive dimensions • Semantic loss in projection • Good user interface • Difficult to obtain ground truth 9
  10. 10. Q: No Consensus on Mood TaxonomyWork # Emotion descriptionKatayose et al [icpr98] 4 Gloomy, urbane, pathetic, seriousFeng et al [sigir03] 4 Happy, angry, fear, sadLi et al [ismir03], Happy, light, graceful, dreamy, longing, dark, sacred,Wieczorkowska 13 dramatic, agitated, frustrated, mysterious, passionate,et al [imtci04] bluesyWang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomyTolos et al [ccnc05] 3 Happy, aggressive, melancholic+calmLu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, contentYang et al [mm06] 4 Happy, angry, sad, relaxedSkowronek et al Arousing, angry, calming, carefree, cheerful, emo- 12[ismir07] tional, loving, peaceful, powerful, sad, restless, tender Happy, light, easy, touching, sad, sublime,Wu et al [mmm08] 8 grand, excitingHu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressiveTrohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry 10
  11. 11. Fuzzy Boundary b/w Mood Classes Subjective usage of affective terms Cheerful, happy, joyous, party/celebratory Melancholy, gloomy, sad, sorrowful Semantic overlap (#2 and #4) and acoustic overlap (#1 and #5) [mirex07.cyril&perfe]MIREX AMC TaxonomyCluster 1 Passionate, rowdy, rousing, confident, boisterousCluster 2 Amiable/good-natured, sweet, fun, rollicking, cheerfulCluster 3 Literate, wistful, bittersweet, autumnal, brooding, poignantCluster 4 Witty, humorous, whimsical, wry, campy, quirky, sillyCluster 5 Aggressive, volatile, fiery, visceral, tense/anxious, intense 11
  12. 12. Granularity of Emotion Description Small set of emotion classes Insufficient comparing to the richness of our perception Large set of emotion classes Difficult to obtain reliable ground truth data □ Happy □ Sad Acerbic, Aggressive, Ambitious, □ Angry Amiable, Angry, Bittersweet, Bright, □ Relaxed Brittle, Calm/, Carefree, Cathartic, Cerebral, Cheerful, Circular, Clinical, Cold, Confident, Delicate, Dramatic, Dreamy, Druggy, Earnest, Eccentric, Elegant, Energetic, Enigmatic, Epic, Exciting, Exuberant, Fierce, Fiery, Fun, Gentle, Gloomy, Greasy, Happy, … 12
  13. 13. Sol: Describing Emotions in Emotion Space ○ Activation, activity Arousal ○ Energy and stimulation level Valence ○ Pleasantness ○ Positive and negative affective states [psp80] 13
  14. 14. The Dimensional ApproachStrength No need to consider which and how many emotions Generalize MER from categorical domain to real-valued domain Easy to compare different computational models Arousal Valence 14
  15. 15. The Dimensional ApproachWeakness Semantic loss due to projection Blurs important psychological distinctions 3rd dimension: potency [psy07] Angry ↔ afraid Proud ↔ shameful Interested ↔ disappointed 4th dimension: unpredictability Surprised Tense ↔ afraid Contempt ↔ disgust 15
  16. 16. Music Retrieval in VA Space arousal Provide a simple means for 2D user interface Pick a pointvalence Draw a trajectory Useful for mobile devices with small display space Demo 16
  17. 17. Q: How to Predict Emotion Values?Transformation-based approach [mm06] Consider the four quadrants Perform 4-class mood classification Apply the following transformation Arousal = u1 + u2 – u3 – u4 Valence = u1 + u4 – u2 – u3 (u denotes likelihood) Not rigorous 17
  18. 18. Sol: Perform RegressionGiven features, ypredict a numerical valueGiven N inputs (xi, yi), 1≤ i ≤N,where xi is feature and yi is thenumerical value to be predicted,train a regression model R(.)such that the following meansquared error (MSE) is minimized 1 N min ∑ (yi − f (xi ))2 x f N i =1 18
  19. 19. Computational Framework [taslp08]Predict the VA values 1 N Trains a regression model min ∑ (yi − f (xi ))2 f N f (·) that minimizes the mean i =1 squared error (MSE) yi : numerical emotion value One for valence; xi : feature (input) one for arousal f(xi) : prediction result (output) e.g. linear regression f(xi) = wTxi +b Emotion Manual value = sumj {wjxij} +b annotation Training Regressor data training Feature extraction Feature Regressor Test Feature Feature Automatic Emotion data extraction Prediction value 19
  20. 20. Obtain Music Emotion RatingManual annotation Rates the VA values of each song Ordinal rating scale Scroll bar Emotion Manual value Training annotation Regressor data training Feature extraction Feature Regressor Test Feature Feature Automatic Emotion data extraction Prediction value 20
  21. 21. Evaluation of Emotion RatingUser study 1240 Chinese pop songs; each 30-sec 666 subjects; each rates 8 random songsSubjective evaluation 0 100 Easiness of annotating emotion Within-subject reliability: compare to one month later Between-subject reliability: compare to other subjects Within-subject Between-subject Method Easiness reliability reliability Emotion rating 2.82 2.92 2.81 From 1 to 5 (strongly disagree to strongly agree) 21
  22. 22. AnnoEmo: GUI for Emotion Rating [hcm07] Encourages differentiation Demo Drag & drop Click to to modify listen again annotation 22
  23. 23. Cognitive Load is Still HighDetermining VA values is not that easyDifficult to ensure consistently Does dist(0.5,0.8) = dist(–0.2,0.1) in terms of our emotion perception? Does 0.7 the same for two subjects? 1 0.5 0.8 -1 1 0.1 -0.2 -1 23
  24. 24. Sol: Ranking Instead of Rating [taslp11a] Determines the position of a song By the relative ranking with respect to other songs Rather than by the exact emotion values Oh Happy Day valence positive valence I Want to Hold Your Hand by Beatles =1 I Feel Good by James Brown What a Wonderful World by Louis Armstrong relative Into the Woods by My Morning Jacket exact ranking The Christmas Song rating Cest La Vie Labita by Lisa One Just the Way You Are by Billy Joel negative Perfect Day by Lou Reed valence valence When a Man Loves a Woman by Michael Bolton = –1 Smells Like Teen Spirit by Nirvana 24
  25. 25. Ranking-Based Emotion Annotation Emotion tournament Requires only n–1 pairwise comparisons The global ordering can later be approximated by a greedy algorithm [jair99] a b c d e f g h a 0 b 3 c 1 d 0 e 0 f 7 g 0 a b c d e f g h h 1Which songs is more positive? f>b>c=h>a=d=e=g 25
  26. 26. Online Interface 26
  27. 27. Simplify Emotion AnnotationSubjective evaluation Both rate and rank The ordering of rate and rank does not matterResult Strong Weak 27
  28. 28. Q: Which Features are Relevant? [psy07] Sound intensity Tempo Rhythm major Pitch range Mode Consonance 28
  29. 29. Feature ExtractionMelody/harmony [MIR toolbox] Pitch estimate, key clarity, harmonic change, musical modeSpectral [Marsyas] Spectral flatness measures, spectral crest factors, MFCCsTemporal [Sound description toolbox] Zero-crossing rate, temporal centroid, log-attack timeRhythmic [Rhythm pattern extractor] Beat histogram and average tempoPsyco-acoustic motivated features [PsySound] Loudness, sharpness, timbral width, volume, spectral dissonance, tonal dissonance, pure tonal, complex tonal, multiplicity, tonality, chord 29
  30. 30. Data Collection 0 30
  31. 31. Q: Subjective IssueEach circlerepresents theemotion annotationfor a music pieceby a subject 31
  32. 32. Sol: Probabilistic MER [taslp11b]Predicts the probabilistic distribution P(e|d) of theperceived emotions of a music piece 32
  33. 33. Sol: Personalized MER [sigir09]From P(e|d) to P(e|d,u) General regressor personal regressor Utilize user feedback Manual Emotion value annotation Regressor Training data Feature training extraction Feature Regressor Test Feature Feature Automatic Personalization data extraction Prediction Emotion value Emotion-based User retrieval feedback 33
  34. 34. Evaluation SetupTraining data 195 Western/Japanese/Chinese pop songs 25-sec segment that is representative of the song Too long the emotion may not be homogeneous Too short the listener may not hear enoughManual annotation 253 subjects; each rates 12 songs Rate the VA values in 11 ordinal levels ○ 0 ○ 1 ○ 2 ○ 3 ○ 4 ○ 5 ○ 6 ○ 7 ○ 8 ○ 9 ○ 10 Each song is annotated by 10+ subjects Ground truth obtained by averaging 34
  35. 35. Quantitative ResultMethod R2 of valence R2 of arousalMultiple linear regression 0.109 0.568Adaboost.RT [ijcnn04] 0.117 0.553SVR (support vector regression) [sc04] 0.222 0.570SVR + RReliefF (feature selection) [ml03] 0.254 0.609 Result R2: squared correlation between y and f(x) Valence prediction is challenging Valence: 0.25 ~ 0.35 Arousal: 0.60 ~ 0.85 35
  36. 36. Qualitative Result No No No Part 2 - Beyonce Out Ta Get Me - Guns N RosesYoure Crazy - Guns N All Of Me - 50 Cent RosesBodies - Sex Pistols New York Giants - Big PunIve Got To See You Mammas Dont LetAgain - Norah Jones Your Babies Grow Up To Be Cowboys -If Only In The Heavens Willie Nelson Eyes - NSYNC Live For The One I Love - The Last Resort - The Eagles Celine Dion Why Do I Have To Choose - Willie Nelson 36
  37. 37. Missing 1: Temporal Context of Music “Sweet anticipation” by David Huron Music’s most expressive qualities probably relate to structural changes across time Music emotion can also vary within an excerpt [tsmc06] 37
  38. 38. Missing 2: Context of Music Listening Listening mood/context Familiarity/associated memory Preference of the singer/performer/song Social relationship 38
  39. 39. ConclusionA computational framework for predicting numericalemotion values Generalizes MER from categorical to dimensional Resolves some issues of emotion description Rank instead of rate 2D user interface for music retrievalValence & subjectivityContent & contextAcknowledgement Prof. Homer Chen, National Taiwan University 39
  40. 40. ReferenceMusic Emotion Recognition, CRC Press, 2011“A regression approach to music emotion recognition,”IEEE TASLP, 2008. (cited by 76)“Ranking-based emotion recognition for musicorganization and retrieval,” IEEE TASLP, 2011“Prediction of the distribution of perceivedmusic emotions using discrete samples,”IEEE TASLP, 2011“Exploiting online tags for music emotionclassification,” ACM TOMCCAP, 2011“Machine recognition of music emotion:A review,” ACM TIST, 2012 40 CRC Press