MIR

Music Information Retrieval
Deema Aloum Noor Orfahly

Overview
Introduction
Music Document
Retrieval
Emotion Detection

What is MIR?
• Music Information Retrieval (MIR): the
interdisciplinary science of retrieving information
from music. MIR is a small but growing field of
research with many real-world applications.
• Objective: make the world’s vast store of music
accessible to all.
• The contributing disciplines: computer science,
information retrieval, audio engineering, digital
sound processing, musicology, library science,
cognitive science, psychology, philosophy and law.

MIR Applications
Music
Document
Retrieval
Recommender
System
Track
Separation
Automatic
Music
Transcription
Rights
Managements
Emotion
Detection

Music Terms - Pitch & Melody
• Pitch is a particular frequency of sound
• E.g., 440 Hz
• Note is a named pitch by us humans.
• E.g., Western music generally refers to the
440 Hz pitch as A, specifically A4
• Melody is A pattern of pitches
• Only a sound produced electronically can have
only one pitch; all other sounds consist of
multiple pitches.
• The mix of frequencies in a sound results in the
Timbre

Music Terms - Timbre
• In music
– The characteristic quality of sound produced by
a particular instrument or voice; tone color.
• In acoustics and phonetics
– The characteristic quality of a sound,
independent of pitch and loudness
– Depends on the relative strengths
of its component frequencies;
– E.g, A4 on a guitar a sound
composed of the following Freq:
440 Hz, 880 Hz, 1320 Hz, 1760 Hz,
etc

Music Document Retrieval
Music Identification Music Similarity

MDR - Music Identification
• Metadata-based Approach:
– Music identification relies on information about
the content rather than the content itself.
– Ex. TOC
• Content-based Approach:
– Ex. Shazam Service

MDR - Music Identification - TOC
• TOC (Table Of Contents): a representation of the
start positions and lengths of the tracks on the disc.
• This feature is highly specific, because it is extremely
rare for different albums to share the same lengths
of tracks in the same order.
• But, slight differences in the generation of CDs, even
from the same source audio material, can produce
different TOCs, which will then fail to match each
other.
• Ex. freedb

MDR - Music Identification - Shazam
• Shazam:
a mobile app that recognizes music and TV around
you. (it lets you record up to 15 seconds of the song
you are hearing and then it will tell you everything
you want to know about that song: the artist, the
name of the song, the album, offer you links to
YouTube or to buy the song on iTunes)

The Initial Spectrogram

• They will store only the intense sounds in the song, the time
when they appear in the song and at which frequency.
The Simplified Spectrogram

• To store this in the database in a way in which is efficient to search for a
match (easy to index), they choose some of the points from within the
simplified spectrogram (called “anchor points”) and zones in the vicinity of
them (called “target zone”)
Pairing the anchor point with points in a target zone

• For each point in the target zone, they will create a hash that
will be the aggregation of the following:
– F1: the frequency at which the anchor point is located
– F2: the frequency at which the point in the target zone is
located
– T2 - T1: the time difference between the time when the
point in the target zone is located in the song (t2) and the
time when the anchor point is located in the song (t1)
• 64-bit struct, 32 bits for the hash and 32 bits for the time
offset and track ID.

How do they find the song based on the recorded sample ?
• Repeat the same fingerprinting to the recorded sample.
• Each hash generated from the sample sound, will be searched
for a match in the database.
• If a match is found you will have:
– The time of the hash from the sample (th1)
– The time of the hash from the song in the database (th2)
• Draw a new graph called scatter graph.
– The horizontal axis (X): th2
– The vertical axis (Y): th1
– The point of intersection of the two occurrence times (th1 and th2)
will be marked with a small circle.

• If the graph will contain a lot of pairs of th1‘s and th2‘s from the same
song, a diagonal line will form.
Scatter graph of a matching run

• Calculate a difference between th2 and th1 (dth) and they will plot it in
a histogram.
• If there is a match in the graph plotted, then there will be a lot
of dths with the same value.
Histogram of a matching run

MDR – Similarity Search
• The concept of similarity is less specific than identity.
• There are many different types of musical similarity.
– Two different performances played from the same
notation
– Same composer
– Same function, for example dances
– Same genre
– Same culture

QBH – Query Comparision
• The elements in the database must
have the same representation as
the query.
• EX: Dynamic Time Warping

QBH – Ranking evaluation measures
A. Mean Reciprocal Rank (MRR):
MRR = (1/3 + 1/2 + 1)/3 = 11/18 or about 0.61

QBH – Ranking evaluation measures
B. Top-X Hit Rate
• The position r of the correct result of the
search is in the first X positions or not.
• Mathematically: r(Qi) ≤ X.

Emotions?
• Music is language of emotion.
• Users often want to listen to music that is in a certain category
of emotions or they want to listen to music that brings them
in a certain mood.
• What affect the mood of the song?
– Harmony
– Timbre
– Interpretation
– lyrics

Challenging Problem  !!
• Ambiguous
– Due to the ambiguities of human emotions.
– Different mood interpretation & perception between
individuals
• Cross disciplinary endeavor
– Signal processing
– Machine learning
– Understanding of auditory perception, psychology, and
music theory.
• Mood may change over its durations

Different Methods
Contextual
text
information
• websites
• tags
• lyrics
Content-
based
approaches
• audios
• images
• videos
combining
multiple
feature
domains
• Audio & Lyrics
• Audio & Tags
• Audio & Images
(album covers, artist photos, etc.)

Contextual text information
• Web-Documents
– Artist biographies, album reviews, and song
reviews are rich sources of information about
music.
– Collect from the Internet by
• querying search engines
• monitoring MP3 blogs
• crawling a music website
– Can be noisy 

Mood Representation
Categorical psychometrics
• A set of emotional descriptors (tags)
Scalar/dimensional psychometrics
• Mood can be scaled and measured by a
continuum of descriptors or simple
multidimensional metrics.
• Most noted: two dimensional Valence-Arousal
(V-A) space

Valence-Arousal (V-A) space
Excited
Clated
Happy
Tense
Stressed
Upset
Arousal
Valiance
Sad
Depressed
Fatigued
Serene
Relaxed
Calm
Activation
Deactivation
PleasantUnpleasant

Valence-Arousal (V-A) space
• Simple, powerful way of thinking about the spectrum
of human emotions.
• Both valence and arousal can be defined as
subjective experiences (Russell, 1989).
– Valiance describes whether the emotion is positive or
negative
– Arousal describes the level of alertness or energy involved
in the emotion.

Emotion Recognition Problem
• Multiclass multi label classification or regression
problem
• A music piece
– an entire song
– a section of a song (e.g., chorus, verse)
– a fixed-length clip (e.g., 30-second song snipet)
– a short-term segment (e.g., 1 second )

Mood representation - vectors
a single multi-dimensional vector
• Each dimension represents
• a single emotion (e.g., angry).
• or a bi-polar pair of emotions
(e.g., positive/negative).
a time-series of vectors over a
semantic space of emotions
• Track changes in emotional content over the
duration of a piece

Mood Representation- Vector Values
• a binary label
– The presence or absence of the emotion
• a real-valued score
– e.g., Likert scale value
– Probability estimate
• A Likert scale is a psychometric scale commonly involved in
research that employs questionnaires. It is the most widely used
approach to scaling responses in survey research

Annotation
• Labeling tasks are time consuming, tedious, and
expensive
• Online games “Games With a Purpose”.

Timbre Features
• Musical instruments usually produce sound waves with
frequencies
• The lowest frequency is
– The fundamental frequency f0
– Close relation with pitch
• The second and higher frequencies are
– Called overtones

MIR

Recommended

Recommended

More Related Content

Similar to MIR

Similar to MIR (20)

Recently uploaded

Recently uploaded (20)

MIR

Editor's Notes