Multi-modal Music Mood Classification Using Audio, Lyrics and Social Tags

Multi-modal Music Mood
Classification Using Audio, Lyrics
and Social Tags

Xiao Hu
National Institute of Informatics
July 5, 2011

Outline
• Multimodal Music Mood Classification
– Research questions
– Methodology
– Findings and contributions
• Future Research

2

Music Mood Classification
Exercise: What do you feel about …
Here comes the sun,
How people categorize music mood?
here comes the sun,
and I say it's all right

Little darling, it's been a
How well can computer do it?
long cold lonely winter
Little darling, it feels like
years since it's been here
Here comes the sun, here
comes the sun,
…….

3

State-of-the-Art
• Mood categories directly adopted from music
psychological models
– Lack for social context of music listening (Juslin & Laukka,
2004)
– Can social tags help?
• Evaluation datasets are small
– Low consistency cross assessors (Skowronek et al., 2006 Hu et
al., 2008)
• Suboptimal performances of automatic music mood
classification systems
– Mostly audio-based
– Can lyrics help?

5

Research Questions
• Q1: Can social tags help develop mood taxonomy?
• Q2: Which lyric features are the most useful for music
mood classification?
• Q3: Are lyrics better than audio in music mood
classification?
• Q4: Can combining lyrics and audio improve the
effectiveness of mood classification?
• Q5: Can combining lyrics and audio improve the efficiency
of mood classification?
– Number of training examples
– Length of audio data

Q2-5: Improving classification performance
by combining lyrics and audio 6

Q1: Mood Categories
• New topic in information science
• Influential models in music psychology
– Categorical : Hevner (1936)
– Dimensional : Russell (1980)
often used in previous research on music mood
classification

7

Can Social Tags Help?
• Last.fm
– One largest tagging site for Western popular
music

Social Tags
• Pros:
– Users’ perspectives
– Large quantity
• Cons:
– Noisy: “I aaaaam lov’in it” Linguistic Resources:
WordNet-Affect
– Ambiguous: “love” Human Expertise:
– Synonyms: “calm”, “serene” 2 music retrieval experts
native English speakers
– “Long tail”

10

Distances between Categories
• Calculated by song co-occurrences
– Categories associating with the same songs are
similar
• Plotted in 2-D space using Multidimensional
Scaling

12

Identified Categories

13

Research Questions
• Q1: Can social tags help identify mood categories that
are more realistic?
classification?
• Q4 Can combining lyrics and audio improve the
efficiency of mood classification?

14

Multi-modal
Social Tags
Mood Categories
Ground Truth

MUSIC

Audio Lyrics

Automatic
Classification

Q2-5: Improving classification performance by
combining lyrics and audio 16

Classification Experiments
• Evaluation task
– Binary
Classification

• Evaluation measures and tests
– Accuracy
– Friedman’s ANOVA
• Classification algorithm
– SVM (LIBSVM implementation)

17

Ground Truth Dataset
• Built from social tags
• Has audio, lyrics and social tags
• 5,296 unique songs
• 18 mood categories
• Equal positive and negative examples
• 12,980 examples

numbers of positive examples in categories

18

Baseline System
(audio-based)
• The AMC tasks in MIREX
– MIREX: Music Information Retrieval Evaluation eXchange
– AMC: Audio Mood Classification

• A leading system in AMC 2007 and 2008: Marsyas
– Music Analysis, Retrieval and Synthesis for Audio Signals; led by
Prof. Tzanetakis@UVic.ca
– Uses audio spectral features

19

Lyric-based System
• Very little existing work
– Only used basic text features:
bag_of_words, part_of_speech
– Worse than audio-based approaches
• This research extracted and compared a range of novel
lyric features

20

Best Lyric Features
• Basic features:
– Content words, part-of-speech, function words
• Psycholinguistic features:
– Psychological categories in GI (General Inquirer)
– Scores in ANEW (Affective Norm of English Words)
• Stylistic features:
– Punctuation marks; interjection words
– Statistics: e.g., how many words per minute
• Combinations: 255 of them!

Most comprehensive study on lyric
classification so far.
21

Lyric Feature Example

22

No significant
difference
between top
combinations
23

Research Questions
are more realistic?
classification?

27

Combine Lyrics and Audio
• Two hybrid methods:
– Late fusion Lyric Classifier

Prediction
Final
Prediction
Prediction

Audio Classifier
– Feature concatenation
Classifier

Prediction

28

System Performances

Audio + Lyrics Lyrics

Audio

30

Research Questions
are more realistic?
classification?

33

Automatic Classification
(supervised learning)
Classifier for “Happy”
“Here comes the
sun”  Y Y
“ I will be back”  N
“Down with the
N
sickness”  N
Song A  Y
Song B N N
………

Training examples
for “Happy”
New examples

34

Conclusions
Q1: Can social tags help identify mood categories
that are more realistic?
Q2: The most useful lyric Combination of words, linguistic
features are: features and text stylistic features
Q3: Are lyrics better than audio in music
mood classification ?
Q4: Can combining lyrics and audio improve
the effectiveness of mood classification?
Q5: Can combining lyrics and audio improve
the efficiency of mood classification?

36

What does computer feel about…

Contributions
Methodology
• Mood categories identified from social tags complement psychological
models
• Established an example of using empirical data to refine/adapt
theoretical models
• Improved lyric affect analysis and multi-modal mood classification
Evaluation
• Proposed efficient method in building ground truth datasets
• Largest dataset with ternary information sources to date made
available to MIR community via MIREX 2009
http://www.music-ir.org/mirex/2009/index.php/Audio_Tag_Classification

Application
• Provided practical reference for MIR systems
• Moodydb.com

38

Feature Analysis

40

Audio vs. Lyrics

41

Top Lyric Features

42

Top Lyric Features in “Calm”

43

Top Affective
Words

vs.

44

Future Research Directions

45

Affect Analysis for Information Studies

• Affect is an important factor in information behavior and
information access
• NLP techniques have been applied to attitude, sentiment
and opinion analysis
• I am interested in its applications on human cognition and
learning
• English and Chinese; Text and Music

• Paper accepted to ISMIR
“Exploring the Relationship Between Mood and Creativity
in Rock Lyrics”
46

Future Research Directions
• Multimedia, multimodal: audio-visual-textual

47

Summary
• Multimodal Music Mood Classification
– Combining lyrics and audio helps improve
effectiveness
efficiency
– Contributions
– Feature analysis
• Future Research
– Affect factor in informatics
– Multimodal, multimedia (Photo mining seminar on
Thursday! – Prof. Winston Hsu from Taiwan)

48

References
• Hu, X. and Downie, J. S. (2010) When Lyrics Outperform Audio for Music Mood
Classification: A Feature Analysis, In Proceedings of the 10th International Conference on
Music Information Retrieval (ISMIR), Aug. 2010, Utrecht, Netherland.
• Hu, X. and Downie, J. S. (2010) Improving Mood Classification in Music Digital Libraries
by Combining Lyrics and Audio, In Proceedings of the Joint Conference on Digital
Libraries’2010, (JCDL), June 2010, Surfers Paradise, Australia. (Best Student Paper
Award).
• Hu, X. (2010) Music and Mood: Where Theory and Reality Meet, In the Proceedings of the
5th iConference, University of Illinois at Urbana-Champaign, Feb. 2010, Champaign, IL
(Best Student Paper Award).
• Hu, X. Downie, J. S. and Ehmann, A.(2009) Lyric Text Mining in Music Mood
Classification, ISMIR’ 09.
• Hu, X. (2009) Combining Text and Audio for Music Mood Classification in Music Digital
Libraries, IEEE Bulletin of Technical Committee on Digital Libraries (TCDL), 5(3)
• Hu, X. (2010) Multi-modal Music Mood Classification, presented in the Jean Tague-
Sutcliffe Doctoral Research Poster session at the ALISE Annual Conference, Jan. 2010,
Boston, MA. (3rd Place Award).
• Hu, X. (2009) Categorizing Music Mood in Social Context, In Proceedings of the Annual
Meeting of ASIS&T (CD-ROM), Nov. 2009, Vancouver, Canada.
50

References (2)
• Hu, X., Downie, J. S., Laurier, C., Bay, M. and Ehmann, A. (2008a). The 2007
MIREX Audio Music Classification task: lessons learned, In Proceedings of the
9th International Conference on Music Information Retrieval (ISMIR’08). Sept.
2008, Philadelphia, USA.
• Juslin, P. N. and Laukka, P. (2004). Expression, perception, and induction of
musical emotions: a review and a questionnaire study of everyday listening.
Journal of New Music Research, 33(3): 217-238.
• Juslin, P. N. and Sloboda, J. A. (2001). Music and emotion: introduction. In P. N.
Juslin and J. A. Sloboda (Eds.), Music and Emotion: Theory and Research. New
York: Oxford University Press.
• Skowronek, J., McKinney, M. F. and van de Par, S. (2006). Ground truth for
automatic music mood classification. In Proceedings of the 7th International
Conference on Music Information Retrieval (ISMIR’06), Oct. 2006, Victoria,
Canada.

51

Multi-modal Music Mood Classification Using Audio, Lyrics and Social Tags

Recommended

Recommended

More Related Content

Similar to Multi-modal Music Mood Classification Using Audio, Lyrics and Social Tags

Similar to Multi-modal Music Mood Classification Using Audio, Lyrics and Social Tags (20)

Recently uploaded

Recently uploaded (20)

Multi-modal Music Mood Classification Using Audio, Lyrics and Social Tags

Editor's Notes