楊奕軒/音樂資料檢索

Music Information
Retrieval
Music & Audio Computing Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw

Prelude
• PI @ Music & Audio Computing Lab, Academia Sinica, since 2011
• 10420CS 573100 “Music Information Retrieval” @ NTHU, 2016
https://twtmir.wordpress.com/
2
https://teachingmir.wikispaces.com/courses

Outline
• Types of music related research
• Fundamentals of music signal processing
• New opportunities in the big data era
3

Types of Music Related Research
1. Music creation
4
https://www.youtube.com/watch?v=3OEmzI52stk

1. Music creation
5
https://www.youtube.com/watch?v=k1DgNfz1g_s

1. Music creation
6
https://www.youtube.com/watch?v=wj1r9YJ6INA

1. Music creation
7
http://www.inside.com.tw/2016/05/04/positive-grid-bias-head

1. Music creation
8
https://youtu.be/rL5YKZ9ecpg?t=50m

2. Music information
“analysis”
9
automatic page turner
automatic
Karaoke scoring
interactive
concert

2. Music information
“analysis”
10
chord recognizer music browsing assistant

3. Music information “retrieval”
• Search
‒ through keywords/labels (genre, instrument, emotion)
11

• Search
‒ through keywords/labels (genre, instrument, emotion)
‒ through audio examples (humming, audio recording)
12

• Match
‒ to match 1) a video clip, 2) a photo slideshow,
3) a song lyrics, or 4) a given context
‒ cross-domain retrieval
13

• Discover
‒ recommendation: diversity, serendipity, explanations
14

• Discover
‒ recommendation: diversity, serendipity, explanations
15

1. Music creation
• Google Magenta, Smule AutoRap, Samsung Hum-On,
Positive Grid, Yamaha Vocaloid
2. Music information analysis
• Education, data visualization
3. Music information retrieval
• Search: through keywords (genre, instrument, emotion) or
audio examples (humming or audio recording)
• Match: cross domain retrieval
• Discover: recommendation
16

Outline
17

Fundamentals of Music Signal Processing
• Pitch: which notes are played?
• Tempo: how fast?
• Timbre: which instrument(s)?
18
Mozart’s Variationen
(1st phrase)

Pitch ♪♪♪ ♪♪♪ ♪♪♪
Tempo ♪ ♪ ♪
Timbre ♪ ♪♪ ♪
19
Karaoke scorer chord recognizerpage turner

Pitch ♪♪♪ ♪
Tempo ♪♪♪
Timbre ♪♪♪ ♪
20
instrument
classifier
content ID Spotify running

Pitch ♪♪♪ ♪♪♪ ♪♪♪
Tempo ♪♪♪ ♪♪♪ ♪♪♪
Timbre ♪♪♪ ♪♪♪ ♪♪♪
21
similarity search
or
recommendation
music
emotion or
genre
recognizer
automatic
music video
generation

22
• Listens to music
tempo, instrumentation,
key, time signature, energy,
harmonic & timbral structures
• Reads about music
lyrics, blog posts, reviews,
playlists and discussion forums
• Learns about trends
online music behavior — who's
talking about which artists this
week, what songs are being
streamed or downloaded
• Not everything is in audio

• Let’s have a look at what we can extract from audio
anyway
• Time-domain waveform
23

• Frequency domain
representation
• Spectrogram (obtained
by Short-Time Fourier
Transform)
24

• Pitch
• Simple for monophonic
signals (almost table
lookup)
• Challenging for polyphonic
signals; known as multi-
pitch estimation (MPE)
‒ overlapping partials
‒ missing fundamentals
25
8ve
8ve
8ve
8ve
8ve

• Tempo: beats
per minute (bpm)
• Onset detection,
downbeat estimation
tempo estimation,
beat tracking,
rhythm pattern
extraction
26
energy-based spectrum-based

• Timbre: difference in time-frequency distribution
27

• Timbre: difference in time-frequency distribution
‒ odd-to-even harmonic ratio, decay rate, vibrato etc
28
piano solo human voice

• Spectrogram, or the reduced-dimension version “Mel-
spectrogram,” is usually considered as a “raw” feature
representation of music
• Can be treated as an image and then processed by
convolutional neural nets (CNN)
29
figure made by
Sander Dieleman
http://benanne.github.io/2014/
08/05/spotify-cnns.html

• Chromagram: a better “timbre-invariant” feature
representation for pitch related tasks (e.g. chord
recognition, cover song identification)
‒ merge all the frequency bins
with the same note name
(C, C#, D, D#, …)
‒ 12-dim vector for each
time frame
30
figure made by
Meinard Meuller

• Source separation can sometimes be helpful
‒ harmonic/percussion separation: given a mixture, separate
the percussive part from the harmonic part
‒ harmonic: pitch related info
‒ percussive: tempo related info
31
(a) original (b) harmonic (c) percussive

• Source separation can sometimes be helpful
‒ singing voice separation: given a mixture, separate the
singing voice from the accompaniment
32

• Pitch, tempo, timbre play different roles in different
tasks
• Spectrogram: a basic feature representation
• Multipitch estimation: for better pitch info
• Source separation: might improve the extraction for
pitch, tempo and also timbre
• Feature design (based on domain knowledge) versus
feature learning (data-driven; deep learning)
33

Outline
34

New Opportunities in the Big Data Era
• Big music audio data? No, only if you work for a big
company
─ not sharable due to copyright issues and business interest
─ however, audio features can be shared
─ or, start with copyright free music
35
free music
archive

• Big music listening data? Yes, some of them can be
crawled from social platform websites
‒ from last.fm API, EchoNest API
‒ from Twitter: #nowplaying dataset
36

• Big music text data? Yes, plenty of data
─ score, lyrics, review, playlist, tags, Wikipedia, etc
─ not everything is in audio
─ some of them are easier to get from non-audio data
37

• Big sensor data? Yes, everywhere
─ sensors attached to “things” or “human beings”
─ emerging new applications
1) music generation
2) context aware music
recommendation
38
figure from pinterest
figure from ask.audio

• The missing “D” in Data Science —
domain knowledge
• Music information retrieval
= musicology
+ signal processing
+ machine learning
+ others
39

Postlude
• Extension reading
‒ International Conference on Music Information Retrieval
(ISMIR)
‒ International Conference on Acoustic, Speech, and Signal
Processing (ICASSP)
‒ MIREX (MIR Evaluation eXchange)
‒ IEEE Transactions on Audio, Speech and Language
Processing (TASLP)
‒ IEEE Transactions on Multimedia (TMM)
40

楊奕軒/音樂資料檢索

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 楊奕軒/音樂資料檢索

Similar to 楊奕軒/音樂資料檢索 (20)

More from 台灣資料科學年會

More from 台灣資料科學年會 (20)

Recently uploaded

Recently uploaded (20)

楊奕軒/音樂資料檢索