CSTalks - Music Information Retrieval - 23 Feb

CSTalks

Similarity Measures in Music Information
Retrieval Systems

Speaker: Zhonghua Li
Supervisor: Ye Wang
lizhongh@comp.nus.edu.sg
1

Where do you search for music?
 Music is an important part of our way of life.
 Music stores
 Online music

Mufin

2

Outline
 Music Information Retrieval (MIR)
 Definition
 Applications – Search
 Applications -- Recommendation
 Similarity Measure Methods
 Text-based Method
 Audio Feature-based Method
 Semantic Concept-based Method
 Multimodal Fusion Method
 Conclusion and Future Directions

4

Definition
 Music Information Retrieval
 Is the process of searching for, and finding, music objects, or
part of objects, via a query framed musically and/or in musical
terms.
 Music objects: Recordings (wav, mp3, etc.), scores, parts, etc.
 Musically framed query: Singing, humming, keyboad, notation-based,
MIDI files, sound files, etc.
 Music terms: Genre, style, tempo, bibliography, etc.
 Applications
 Music search, recommendation, identification ,etc.

5

Outline
 Definition
 Music Similarity Measure

6

Applications -- Search
 Text-based Music Search
 Compare a textual query with the metadata
 Is adopted by most existing systems.
 Examples: Last.fm, Musicovery, …

7

 Content-based Music Search
 Compare audio query with audio content
 Query-by-humming/singing/recording: midomi

8

 Content-based Music Search
 Compare rhythm tapped with audio content
 Query-by-tapping: SongTapper

9

 User’s information need Online Offline

(intention) : Query

Explicit Query: text, audio, etc. Intention Gap
Query
Formation

 Similarity measure: Descriptors Documents

Query  Music documents in the Match
Descriptor
Extraction
Database Indexing
Semantic Gap Index Descriptors
 Ranking: relevant documents by
domain specific criterions (no. of Ranking

hits ). Ranked List

Presentation

Results

10

Outline
 Definition

11

Applications – Recommendation
 Collaborative-Filtering-based Recommendation
 Last.fm: what you (and others ) listen to and like,
 Amazon: customers who shopped for … also shopped for …

12

Applications – Recommendation
 Collaborative-Filtering-based Recommendation
 Last.fm: what you (and others ) listen to and like,
 Amazon: customers who shopped for … also shopped for …
 Example:
 Users: A, B, and C
 Music: 1, 2, …, 8

Small similarity Large similarity
C A B

Similarity
4
Similarity 1 Measure
1
Measure
2 2
6 3 3
8 4 5

Recommend to user A
13

Applications -- Recommendation
 Audio Content-based Recommendation
 Recommend songs which have similar audio content to the
songs that you like.
 Pandora:
Music database Music Experts
User
Listen

Instrument:
Instrument: Similarity Instrument:
Vocal:
Vocal: Vocal:
Structure:
Measure
Structure: …Structure:
… …
400 Attributes/song

Recommendations
14

Applications -- Recommendation
 User’s information need Online / Offline Offline

(intention): User Profile

Implicit user profiles: ratings, Intention Gap
Profile
Capture
listening history, etc.
Descriptors Documents
 Similarity measure:
Descriptor
User profiles Music Match Extraction

documents/other user profiles Semantic Gap Index
Indexing
Descriptors

 Ranking: relevant documents Ranking

by some domain specific
Ranked List
criterions (no. of hits).
Presentation

Results

15

Similarity Measure
 one of the most fundamental concepts in MIR
Online / Offline Offline
 Closely related to
User Profile
/query
 What information music Profile/query
Intention Gap
Capture
contains.
 How this information is
Descriptors Documents

represented. Match
Descriptor
Extraction

 How to match between themSemantic Gap
Indexing
Index Descriptors

Ranking

Ranked List

Presentation

Results

16

Music Information Plane
Similarity can be measure
from different aspects.

Song1: New favorite -
Alison Krauss
Song2: She is Beautiful -
Andrew W.K.
Song1 Song2
Female Male
Dissimilar Gentle Aggressive
Slow fast

Guitar
Similar Tempo: ~162 BPM
(Beat Per Minute)

* O. C. Herrada. Music recommendation and discovery in
Music the long tail. PhD thsis. 2008. 17

Outline
 Definition

18

Similarity Measure Methods
 Text-based Method: Okapi BM-25 Ranking
 Given: queryQ, containing keywords q1, …, qn, music documents: bag of words.
 BM25 ranking function can be formulated as:

 f(qi, D) is qi’s term frequency (tf) in document D.
 |D| is the length of document D in words.
 avgdl is the average document length in the collection.
 k1 and b are free parameters, usually set as k1=2.0 and b=0.75.
 IDF(qi) is the inverse document frequency (idf), calculated as:

. The query term appears in this document frequently. f (qi, D)
. And it doesn’t appear in other document. IDF
19

 Text-based Method:
 Pros
 Simple & efficient
 Cons
 Affected by noisy/wrong texts
 Songs with no text cannot be retrieved
 Require high-level domain knowledge to create good metadata
 “Text retrieval on audio metadata” not pure music retrieval

20

Outline
 Definition

21


Audio feature Distribution Model
Music
extraction modeling comparison

22

 Audio Feature extraction -> Distribution modeling -> Model Comparison

frame

Feature
Vector …

23

Existing Works
 Audio Feature extraction  Distribution modeling  Model Comparison

 Use low-level feature directly
 Pitch, loudness, MFCC (Blum et al.[3], 1999)
 Histogram of MFCC (Foote[4], 1997)
 Spectrum, rhythm, chord changesingleVector (Tzanetakis [5], 2002)

 Low-level features  higher-level features.
 Cluster MFCC=>model comparison (Aucouturier[6], 2002)
 MFCC => Gaussian Mixture Models => model comparison
 MFCC =>“anchor space”, compare probability models (Berenzweig et al.
[7], 2003)

24

 Audio Feature extraction  Distribution modeling  Model Comparison

 Euclidean /Cosine distance (uniform-length feature vectors)

 Distance between two probability distributions
 Kullback-Leibler divergence (KL Distance) / relative entropy

 No closed form for Gaussian Distributions
 Centroid distance: Euclidean distance between the overall means;
 Sampling based method: compute the likelihood of one model given points
sampled from another; very computationally expensive;
 Earth-Mover’s distance

Berenzweig, A., Logan, B., Ellis, D. P., and Whitman, B. P., A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity
Measures. Computer Music Journal. 28, 2004, 25

 Pros
 Can deal with new songs with no or few texts.
 Save human labors from annotating each song manually
 Cons
 Time complexity is relatively high.
 Features ≠ audio piece: Two songs with very similar features may sounds
very different.
 The average performance is reaching the glass ceiling of around 65% in
accuracy.

26

Outline
 Definition

27

 Nature of user queries
 Far beyond of bibliographic text and audio search
 Semantically-rich
 Syntactically- undetermined
e.g.: “Find me a classical and happy song”, or “Find me a song to relax”
“Find me some songs for parties/ weddings/ in churches” …
 Collaborative(social) tagging is very popular on Web 2.0.
Users annotate their feelings or opinions to the music. Tags,
comments, etc.

28

 Tags VS user queries (Last.fm)
Tag Type Frequency Multi-tag search queries
Genre 68% 51%
Locale 12% 7%
Mood 5% 4%
Opinion 4% 2%
Instrumentation 4% 5%
Style 3% 26%

. Paul Lamere. Social tagging and music information
retrieval. Journal of New Music Research. 2008.
. Klaas Bosteels, Elias Pampalk, and Etienne E. Kerre. Music
retrieval based on social tags: a case study. ISMIR, 2008. 29

Vocabulary: classical, jazz, … piano, violin, …, female, male, …

…

Model Model … Model

… Probability vector
Song1 Similarity
…
Song2

30

Outline
 Definition

31

 Multimodal Method
 Information keeps growing.
 One of the most important ongoing trends:

Metadata

Audio Semantic
Content Concept

 Users are important.

32

 Multimodal Method
Document Vectors

Customization

Fuzzy Music Semantic
Vector
B. Zhang, J. Shen, Q. Xiang, and Y. Wang. CompositMap: a novel framework for music similarity measure. ACM Multimedia,
2009. 33

Outline
 Definition

34

Conclusion and Future Directions
 What makes MIR (and the similarity measure) so
tricky?
Music information is
 Multimodal: audio, metadata, social , …
 Multicultural: e.g., modern art, Indian ragas, …
 Multirepresentational: audio, MIDI, score, …
 Multifaceted: melody, tempo, beat, …
…
 Similarity can be measured from different aspects.

35

Conclusions and Future Directions
 What do users really want?
Intention Gap
 User interactions with the system.
 Learn a good user preference modeling

 What kind of music features can really capture this need?
 Content –Tags Semantic Gap
 Leverage more social data? Comments, ratings, groups, playlist, other
user created information, …

 How to fuse multiple information effectively?
 Identify the relevant/discriminative information aspects
 Fusion Methods

36

References
 [2] F[1] O. C. Herrada. Music recommendation and discovery in the long
tail. PhD thsis. 2008.
 . Pachet. Knowledge management and musical metadata. Encyclopedia of
Knowledge Management. Idea Group, 2005.
 [3] T. L. Blum, D. F. Keislar, J. A. Wheaton, and E. H. Wold. Method and article
of manufacture for content-based analysis, storage, retrieval, and
segmentation of audio information. U.S. Patent 5, 918, 223.
 [4] J. T. Foote. Content-based retrieval of music and audio. SPIE, 1997.
 [5] G. Tzanetakis. Manipulation, analysis, and retrieval system for audio
signals. PhD thsis, 2002.
 [6] J. J. Aucouturier and F. Pachet. Music similarity measure: What’s the use?
International Symposium on Music information retrieval. 2002.
 [7] A. Berenzweig, D. P. W. Ellis and S. Lawrence. Anchor space for
classification and similarity measurement for music. ICME 2003.

38

References
 [8] B. Zhang, J. Shen, Q. Xiang and Y. Wang. CompositeMap: a
novel framework for music similarity measure. SIGIR, 2009.
 [9] B. Whiteman and S. Lawrence. Inferring descriptions and
similarity for music from community metadata. International
computer music conference. 2002.
 [10] M. Schedl, T. Pohle, P. Knees and G. Widmer. Assigning
and visualizing music genre by web-based co-occurrence
analysis. ISMIR 2006.
 [11] B. Whitman and Paris Smaragdis. Combining musical and
cultural features for intelligent style detection. ISMIR 2002.
 [12] L. Chen, P. Wright, and W. Nejdl. Improving music genre
classification using collaborative tagging data. WSDM, 2009.
 [13] Benedikt Raes. Automatic generation of music metadata.
ISMIR, 2009.

39

CSTalks - Music Information Retrieval - 23 Feb

Recommended

Recommended

More Related Content

Similar to CSTalks - Music Information Retrieval - 23 Feb

Similar to CSTalks - Music Information Retrieval - 23 Feb (20)

More from cstalks

More from cstalks (15)

Recently uploaded

Recently uploaded (20)

CSTalks - Music Information Retrieval - 23 Feb