Exploring the Relationship Between Multi-Modal Emotion Semantics of Music


Published on

Computational modeling of music emotion has been addressed primarily by two approaches: the categorical approach that categorizes emotions into mood classes and the dimensional approach that regards emotions as numerical values over a few dimensions such as valence and activation. Being two extreme scenarios (discrete/continuous), the two approaches actually share a unified goal of understanding the emotion semantics of music. This paper presents the first computational model that unifies the two semantic modalities under a probabilistic framework, which makes it possible to explore the relationship between them in a computational way. With the proposed framework, mood labels can be mapped into the emotion space in an unsupervised and content-based manner, without any training ground truth annotations for the semantic mapping. Such a function can be applied to automatically generate a semantically structured tag cloud in the emotion space. To demonstrate the effectiveness of the proposed framework, we qualitatively evaluate the mood tag clouds generated from two emotion-annotated corpora, and quantitatively evaluate the accuracy of the categorical-dimensional mapping by comparing the results with those created by psychologists, including the one proposed by Whissell & Plutchik and the one defined in the Affective Norms for English Words (ANEW).

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

  1. 1. 1 Exploring the Relationship Between Multi-Modal Emotion Semantics of Music Ju-Chiang Wang, Yi-Hsuan Yang, Kaichun Chang, Hsin-Min Wang, and Skyh-Kang Jeng Academia Sinica, National Taiwan University, Taipei, Taiwan
  2. 2. 2 Outline • Introduction and Potentiality • Methodology – The ATB and AEG models – Framework to combine the two models • Evaluation and Result • Conclusion • In this presentation, mood and emotion are exchangeable
  3. 3. 3 Introduction – Tag and Valence-Arousal (VA) • Music emotion modeling, two approaches: • Share a unified goal of understanding the emotion semantics of music • (Arbitrary) mood tags can be mapped into the VA space in an unsupervised and content-based manner, without any training ground truth for the semantic mapping • Automatically generate a semantically structured tag cloud in the VA space Categorical Dimensional Arousal 2 1 3 4 (high ) (low ) Valence (positive )(negative )
  4. 4. 4 Visualization of Music Mood (Laurier et al.) Generated by SOM
  5. 5. 5 Potentiality (Clarifying the Debate) • A novice user may be unfamiliar with VA model, it would be helpful to display mood tags in the VA space • Facilitate applications such as tag-based music search and browsing interface • Dimension reduction for tag visualization may result dimensions not conforming to valence and arousal • The VA values of some affective terms can be found, but not elicited from music • Affective terms are not cross lingual and not always have exact translations in different languages • Cultural-dependent, corpus-dependent
  6. 6. 6 Taxonomy of Music Mood (Xiao Hu, et al.) Aggressive 侵略的;好鬥 Amiable 和藹可親的;厚道的 Autumnal 秋的;像秋天的 Bittersweet 苦樂參半的 Boisterous 喧鬧的;狂暴的 Brooding 徘徊不去的;沈思的 Calm 冷靜;鎮定 Campy 裝模作樣; Cheerful 興高采烈的;情緒好的 Confident 有信心的,自負的 Dreamy 夢幻般的;愛作白日夢的; Fiery (感情)激烈的,熱烈的 Fun 有趣的 Humorous 幽默的;滑稽的 Intense 強烈的;熱情的 Literate 有文化修養的 Nostalgic 鄉愁的 Passionate 熱情的;熱烈的;易怒的 Poignant 深刻的;辛酸的 Quirky 詭詐的;多變的;古怪的 Relaxed 鬆懈的;放鬆的 Rollicking 嬉耍的;愉快的 Rousing 使覺醒的;使奮起的 Rowdy 粗暴的;喧鬧的 Silly 愚蠢的;糊塗的;無聊的 Soothing 慰藉的;使人寬心的 Sweet 甜的;悅耳的 Tense 緊張的;引起緊張的 Visceral 出自內心深處的 Volatile 易發作的;輕浮的;飛逝的 Whimsical 想入非非的,怪誕的,古怪的 Wistful 渴望的;想往的;留戀的 Witty 機智的;說話風趣的 Wry 歪斜的;曲解的;堅持錯誤的 GAP GAP
  7. 7. 7 Potentiality (Clarifying the Debate) Machine Learning is necessary for such a task
  8. 8. 8 Methodology of the Framework • A probabilistic framework with two component models, Acoustic Tag Bernoullis (ATB) and Acoustic Emotion Gaussians (AEG) – Computationally model the generative processes from acoustic features to a mood tag and a VA value, respectively • Based on the same acoustic feature space, the ATB and AEG models can share and transit the semantic information to each other • Bridged by the acoustic feature space, we can align one emotion modality to the other • The first attempt to establish a joint model for exploring between discrete mood categories and continuous emotion space
  9. 9. 9 Construct Feature Reference Model A1 A2 AK-1 AK A3A4 Global GMM for acoustic feature encoding EM Training A Universal Music Database Acoustic GMM Music Tracks & Audio Signal Frame-based Features … … … … Global Set  of frame vectors randomly selected from each track … Music Tracks & Audio Signal A Universal Music Database Music Tracks & Audio Signal
  10. 10. 10 Represent a Song into Probabilistic Space 1 2 K-1 K… Posterior Probabilities over the Acoustic GMM … A1 A2 AK-1 Acoustic GMM AK … Feature Vectors Histogram: Acoustic GMM Posterior prob Each dim corresponds to a specific acoustic pattern 1 2 K-1 K…
  11. 11. 11 Acoustic Tag Bernoullis (ATB) • Given an mood-tagged music dataset with the binary label for a mood tag • Learn ATB that describes the generative process of each song in the dataset from acoustic features to mood tag • Won (AUC Clip) in Mood Tag Classification (MIREX2009, 2010)
  12. 12. 12 Acoustic Emotion Gaussians (AEG) • Given a VA-annotated music dataset • Learn AEG that describes the generative process of each song in the dataset from acoustic features to the VA space • Presented in OS2, superior to its rivals, SVR and MLR
  13. 13. 13 The Learning of VA GMM on MER60
  14. 14. 14 Multi-Modal Emotion Semantic Mapping • Three models are aligned, ATB, Acoustic GMM, and AEG • Transit the weights from a mood tag to the VA GMM • The semantic mapping processes are transparent and easy to be observed and interpreted Mapping a tag into a VA Gaussian distribution
  15. 15. 15 Evaluation – Corpora and Settings • Two corpora used: MER60 and AMG1644 • MER60: jointly annotated corpus (MER60-alone setting) – 60 music clips, each is 30-second – 99 subjects in total, each clip annotated by 40 subjects – The VA values are entered by clicking on the emotion space on a computer display – Query Last.fm and leave 50 top mood tags for the 60 songs • AMG1644: used for the separately annotated corpora (AMG1644-MER60 setting) – Crawl the audio of the “top songs” for 33 mood tags (AMG), most of the tags are used in MIREX mood classification task – Leading to 1,644 clips, each is about 30-second
  16. 16. 16 Acoustic Features • Adopt the bag-of-frames representation • Extracting frame-based musical features from audio using the MIRToolbox 1.3 • All the frames of a clip are aggregated into the acoustic GMM posterior and perform the analysis of emotion at the clip-level, instead of frame-level • Frame-based features – Dynamic, spectral, timbre, and tonal – 70-dim concatenated feature vector for a frame
  17. 17. 17 Result for the MER60-Alone Setting • Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping
  18. 18. 18 • Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping Result for the AMG-MER Setting
  19. 19. 19 Comparison with Psychologist • Quantitative comparison – Refer to the VA values of 30 affective terms proposed by Whissell and Plutchik (WP) and by the Affective Norms for English Words (ANEW) – For a tag, measure the Euclidean distance between the generated VA value and the psychologists’ one • Baseline – Set the generated VA values of each tag to the origin – Represent a non-effective tag-VA mapping
  20. 20. 20 Discussion • The result is not sensitive to K • Such a learning-based framework is scalable and can do better if more annotated data is available • Automatic discovering – For instance, construct a balance audio music corpus and let Chinese to label the Chinese mood tags – Generate a Chinese mood tag cloud • Inverse correlation between the VA intensity and the covariance of a tag – Tags lying on the outer circle would have larger font sizes
  21. 21. 21 Result for the MER60-Alone Setting
  22. 22. 22 Conclusion • A novel framework that unifies the categorical and dimensional emotion semantics of music • Demonstrated how to map a mood tag to a 2-D VA Gaussian and generate the corresponding tag cloud, and this can be further extended to arbitrary tags • Verify whether an arbitrary tag is mood-related or not • We will conduct user studies for the result • More investigations in acoustic feature representations for better generalization of the emotion modeling
  23. 23. 23 Arbitrary Tag - MajorMiner Not Mood-related
  24. 24. 24 Arbitrary Tag - MajorMiner Mood-related