Introduction to Music Information Retrieval

London Information Retrieval Meetup
19 Feb 2019
Introduction to Music Information Retrieval
Thoughts from a former bass player
Andrea Gazzarini, Software Engineer
19th February 2019

Who I am
▪ Software Engineer (1999-)
▪ “Hermit” Software Engineer (2010-)
▪ Java & Information Retrieval Passionate
▪ Apache Qpid (past) Committer
▪ Husband & Father
▪ Bass Player
Andrea Gazzarini, “Gazza”

Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
! Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning

✓Music Information Retrieval (MIR)?
➢ Music Essentials
➢ Audio Processing
➢ Q&A
Agenda

MIR is concerned with the extraction, analysis and usage of information about any kind of music
entity (e.g. a song or a music artist) on any representation level (for example, audio signal, symbolic MIDI
representation of a piece of music, or name of a music artist).”
Schedl, M.: Automatically extracting, analyzing and visualizing information on music artists from the world wide web.
Dissertation, Johannes Kepler University, Wien (2003)
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from
music. MIR is a small but growing ﬁeld of research with many real-world applications. Those involved in
MIR may have a background in in musicology, psychoacoustics, psychology, academic music study,
signal processing, informatics, machine learning, optical music recognition, computational intelligence or
some combination of these.
https://en.wikipedia.org/wiki/Music_information_retrieval
Music Information Retrieval (MIR)

AUDIO IDENTIFICATION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
COVER SONG DETECTION
SYMBOLIC SIMILARITY
MOOD
SOURCE SEPARATION
INSTRUMENT RECOGNITION
TEMPO ESTIMATION
SCORE ALIGNMENT
SONG STRUCTURE
BEAT TRACKING
KEY DETECTION
QUERY BY HUMMINGQUERY BY HUMMING
AUDIO IDENTIFICATION
INSTRUMENT RECOGNITION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
TEMPO ESTIMATION
SONG STRUCTURE
SCORE ALIGNMENT
COVER SONG DETECTION
SYMBOLIC SIMILARITY
KEY DETECTION
BEAT TRACKING
MOOD
SOURCE SEPARATION
Music Information Retrieval (MIR)

Music Content includes all those low-level things we
can extract from the audio signal (e.g. time,
frequencies, loudness)
Computational Factors
Context
State
Music Content
Music Context
Music Context defines additional metadata that
cannot be extracted from the audio signal (e.g. lyrics,
tags, artists, feedback, posts)
Listener state includes the user state in a given
moment (e.g. mood, musical knowledge, preferences)
Listener Context relates to the environment where
the listener is in a given moment (e.g. political,
geographical, social)
Factors in Music Perception

➢ Music Information Retrieval (MIR)
✓Music Essentials
‣ Essentials
‣ Score Music Representation
‣ Symbolic Representations
‣ Audio Representation
➢ Q&A
Agenda

A note is used for denoting a sound, its pitch and duration
A sound is the audio signal produced by a vibrating body
Notes are associated to graphical symbols (for indicating the pitch and the duration)
Two notes with the same fundamental frequency in a ratio of any integer power of two are perceived as similar. As
consequence of that, we say they belong to the same pitch class
A note is also used for denoting a pitch class. The traditional music theory individuates 12 pitch classes
Notes and Pitch classes are associated to mnemonic codes (e.g. C,D,E,F,G,A,B or DO,RE,MI,FA,SOL,LA,SI)
C D E
F G A B
C
B A
G F E D
C
C#
D# F# G#
A#
Bb Ab Gb Eb Db
Music Language Essentials

Text Music
Letter Note
Word
Phrase
Sentence
Chord
Ghost Note
Phrase
Text vs Music

Time Signature
Key Signature
Clef
Tempo
Note
Reference Chord
Chord
Score music representation

Symbolic music representations comprise any
kind of score representation with an explicit
encoding of notes or other musical events.
Piano Roll, initially used for denoting rolls of
paper with holes for controlling a melody
execution on a self-playing device, it is nowadays
used for referring to a digital visualisation which
provides pitches over time.
Musical Instrument Digital Interface (MIDI) is
another representation, widely adopted, for
representing music event (e.g. pitch, velocity,
duration, intensity)
Piano Roll & MIDI
Symbolic music representation

MusicXML [1] is an XML dialect for expressing Music
in XML format.
As you can imagine from the example on the right,
encoding a whole song will result in a huge and
verbose textual representation (that’s XML!).
For that reason MusicXML 2.0 introduced a
compressed format with a .mxml suffix
• Widely supported (scorewriting, OCR, sequencer)
• Easy to understand
• Full support of music features
MusicXML
Part
Time
Clef
Note(s)
[1] https://www.musicxml.com
MusicXML

The Parsons code, formally named the Parsons
code for melodic contours, is a simple notation
used to identify a piece of music through melodic
motion — movements of the pitch up and down.
(https://en.wikipedia.org/wiki/Parsons_code)
The encoding focuses on the pitch relation between
subsequent notes. Main points about this method are:
• Simplicity
• Being a textual encoding it offers interesting
challenges in text search engines
• Limited: It doesn’t consider at all important
features like time and intervals, pauses, ghost
notes
Parsons CodeSymbol Description
* First note of a sequence
u,/
“up”, the note is higher than the
previous one
d,
“down”, the note is lower than
the previous one
r,-
“repeat”, the note is the same
of the previous one
Parsons Code (1/4)

Parsons Code (2/4)

*
*
r
u u rr u r u r d r d r
d r d r
u r u r u r u r
*
u
d d d u u uX
u
d d d u u uXd
Money, Pink Floyd
Parsons Code (3/4)

Tempo (Time)
Intervals
Rests
Ghost Notes
Parsons Code (4/4)

Digital computers can only capture this data at discrete moments in time. The rate at which a
computer captures audio data is called the sampling frequency or sampling rate.
An audio signal is a representation of sound that represents the ﬂuctuation in air pressure
caused by the vibration as a function of time. Unlike sheet music or symbolic representations,
audio representations encode everything that is necessary to reproduce an acoustic realization
of a piece of music.
Audio Representation: Time Domain

The Frequency Domain representation
decomposes the audio signal in a number of
waves oscillating a different frequencies.
The FD plots the frequencies on the
horizontal axis by their corresponding
magnitude (power) on the vertical axis.
This representation, among other things, can
be used for highlighting the dominant
frequencies of a musical tone.
Frequency Domain
Frequency Domain

➢ Music Essentials
✓ Audio Processing
‣ Basic Pipeline
‣ Time Domain Features
‣ Frequency Domain Features
‣ Chroma Features
➢ Q&A
Agenda

Time Domain Features Extraction
Frequency Domain Features Extraction
Sampling / Quantization
Framing
Windowing
FFT
Analog Signal
Basic Audio Processing Pipeline

Amplitude Envelope (AE)
Max amplitude within a frame
Root-Mean-Square Energy (RMS)
Perceived sound intensity
Zero Crossing Rate (ZCR)
Number of times the amplitude changes its sign within a frameFeature
Example
Usage
Loudness Estimation
Timbre Analysis
Speech Recognition
Audio Segmentation
Onset Detection
Time Domain Features

Band Energy Ratio (BER)
Ratio between lower and higher
frequency bands energy
Spectral Centroid
Frequency band where most of
the energy is concentrated
Bandwidth (BW)
Spectral range of interesting
part of a signal
Feature
Example
Usage
Timbre Analysis
Speech Recognition
Onset DetectionSpeech/Music Discrimination
Spectral Flux
Frequency band where most of
the energy is concentrated
Frequency Domain Features

Chroma features are a powerful representation for
music audio in which the entire spectrum is
projected onto 12 bins representing the 12 distinct
semitones (or chroma) of the musical octave.
It’s a kind of analysis which bridges between low-level
and middle-level features, moving the audio signal
representation toward something which is more
readable, from a functional perspective.
Chroma Features
Chroma Features (1/2)

Time
C
D
E
F
G
A
B
C#
D#
F#
G#
A#
A A A A A A C A F F F F F F FG C C C C C C D C B B B B B B C B
N
O
I
S
E
Chroma Features (2/2)

FALCON: FAst Lucene-based Cover sOng identification | chromaprint (part of AcustID)
Interesting Projects

➢ Music Representation
✓ Q&A
Agenda

19 Feb 2019
Thank you!
Introduction to Music Information Retrieval
Thoughts from a former bass player
Andrea Gazzarini, Software Engineer
19th February 2019

Introduction to Music Information Retrieval

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Music Information Retrieval

Similar to Introduction to Music Information Retrieval (20)

Recently uploaded

Recently uploaded (20)

Introduction to Music Information Retrieval