Music Information
Retrieval
Music & Audio Computing Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw
Prelude
• PI @ Music & Audio Computing Lab, Academia Sinica, since 2011
• 10420CS 573100 “Music Information Retrieval” @ NTHU, 2016
https://twtmir.wordpress.com/
2
https://teachingmir.wikispaces.com/courses
Outline
• Types of music related research
• Fundamentals of music signal processing
• New opportunities in the big data era
3
Types of Music Related Research
1. Music creation
4
https://www.youtube.com/watch?v=3OEmzI52stk
Types of Music Related Research
1. Music creation
5
https://www.youtube.com/watch?v=k1DgNfz1g_s
Types of Music Related Research
1. Music creation
6
https://www.youtube.com/watch?v=wj1r9YJ6INA
Types of Music Related Research
1. Music creation
7
http://www.inside.com.tw/2016/05/04/positive-grid-bias-head
Types of Music Related Research
1. Music creation
8
https://youtu.be/rL5YKZ9ecpg?t=50m
Types of Music Related Research
2. Music information
“analysis”
9
automatic page turner
automatic
Karaoke scoring
interactive
concert
Types of Music Related Research
2. Music information
“analysis”
10
chord recognizer music browsing assistant
Types of Music Related Research
3. Music information “retrieval”
• Search
‒ through keywords/labels (genre, instrument, emotion)
11
Types of Music Related Research
3. Music information “retrieval”
• Search
‒ through keywords/labels (genre, instrument, emotion)
‒ through audio examples (humming, audio recording)
12
Types of Music Related Research
3. Music information “retrieval”
• Match
‒ to match 1) a video clip, 2) a photo slideshow,
3) a song lyrics, or 4) a given context
‒ cross-domain retrieval
13
Types of Music Related Research
3. Music information “retrieval”
• Discover
‒ recommendation: diversity, serendipity, explanations
14
Types of Music Related Research
3. Music information “retrieval”
• Discover
‒ recommendation: diversity, serendipity, explanations
15
Types of Music Related Research
1. Music creation
• Google Magenta, Smule AutoRap, Samsung Hum-On,
Positive Grid, Yamaha Vocaloid
2. Music information analysis
• Education, data visualization
3. Music information retrieval
• Search: through keywords (genre, instrument, emotion) or
audio examples (humming or audio recording)
• Match: cross domain retrieval
• Discover: recommendation
16
Outline
• Types of music related research
• Fundamentals of music signal processing
• New opportunities in the big data era
17
Fundamentals of Music Signal Processing
• Pitch: which notes are played?
• Tempo: how fast?
• Timbre: which instrument(s)?
18
Mozart’s Variationen
(1st phrase)
Fundamentals of Music Signal Processing
Pitch ♪♪♪ ♪♪♪ ♪♪♪
Tempo ♪ ♪ ♪
Timbre ♪ ♪♪ ♪
19
Karaoke scorer chord recognizerpage turner
Fundamentals of Music Signal Processing
Pitch ♪♪♪ ♪
Tempo ♪♪♪
Timbre ♪♪♪ ♪
20
instrument
classifier
content ID Spotify running
Fundamentals of Music Signal Processing
Pitch ♪♪♪ ♪♪♪ ♪♪♪
Tempo ♪♪♪ ♪♪♪ ♪♪♪
Timbre ♪♪♪ ♪♪♪ ♪♪♪
21
similarity search
or
recommendation
music
emotion or
genre
recognizer
automatic
music video
generation
Fundamentals of Music Signal Processing
22
• Listens to music
tempo, instrumentation,
key, time signature, energy,
harmonic & timbral structures
• Reads about music
lyrics, blog posts, reviews,
playlists and discussion forums
• Learns about trends
online music behavior — who's
talking about which artists this
week, what songs are being
streamed or downloaded
• Not everything is in audio
Fundamentals of Music Signal Processing
• Let’s have a look at what we can extract from audio
anyway
• Time-domain waveform
23
Fundamentals of Music Signal Processing
• Frequency domain
representation
• Spectrogram (obtained
by Short-Time Fourier
Transform)
24
Fundamentals of Music Signal Processing
• Pitch
• Simple for monophonic
signals (almost table
lookup)
• Challenging for polyphonic
signals; known as multi-
pitch estimation (MPE)
‒ overlapping partials
‒ missing fundamentals
25
8ve
8ve
8ve
8ve
8ve
Fundamentals of Music Signal Processing
• Tempo: beats
per minute (bpm)
• Onset detection,
downbeat estimation
tempo estimation,
beat tracking,
rhythm pattern
extraction
26
energy-based spectrum-based
Fundamentals of Music Signal Processing
• Timbre: difference in time-frequency distribution
27
Fundamentals of Music Signal Processing
• Timbre: difference in time-frequency distribution
‒ odd-to-even harmonic ratio, decay rate, vibrato etc
28
piano solo human voice
Fundamentals of Music Signal Processing
• Spectrogram, or the reduced-dimension version “Mel-
spectrogram,” is usually considered as a “raw” feature
representation of music
• Can be treated as an image and then processed by
convolutional neural nets (CNN)
29
figure made by
Sander Dieleman
http://benanne.github.io/2014/
08/05/spotify-cnns.html
Fundamentals of Music Signal Processing
• Chromagram: a better “timbre-invariant” feature
representation for pitch related tasks (e.g. chord
recognition, cover song identification)
‒ merge all the frequency bins
with the same note name
(C, C#, D, D#, …)
‒ 12-dim vector for each
time frame
30
figure made by
Meinard Meuller
• Source separation can sometimes be helpful
‒ harmonic/percussion separation: given a mixture, separate
the percussive part from the harmonic part
‒ harmonic: pitch related info
‒ percussive: tempo related info
Fundamentals of Music Signal Processing
31
(a) original (b) harmonic (c) percussive
• Source separation can sometimes be helpful
‒ singing voice separation: given a mixture, separate the
singing voice from the accompaniment
Fundamentals of Music Signal Processing
32
Fundamentals of Music Signal Processing
• Pitch, tempo, timbre play different roles in different
tasks
• Spectrogram: a basic feature representation
• Multipitch estimation: for better pitch info
• Source separation: might improve the extraction for
pitch, tempo and also timbre
• Feature design (based on domain knowledge) versus
feature learning (data-driven; deep learning)
33
Outline
• Types of music related research
• Fundamentals of music signal processing
• New opportunities in the big data era
34
New Opportunities in the Big Data Era
• Big music audio data? No, only if you work for a big
company
─ not sharable due to copyright issues and business interest
─ however, audio features can be shared
─ or, start with copyright free music
35
free music
archive
New Opportunities in the Big Data Era
• Big music listening data? Yes, some of them can be
crawled from social platform websites
‒ from last.fm API, EchoNest API
‒ from Twitter: #nowplaying dataset
36
New Opportunities in the Big Data Era
• Big music text data? Yes, plenty of data
─ score, lyrics, review, playlist, tags, Wikipedia, etc
─ not everything is in audio
─ some of them are easier to get from non-audio data
37
New Opportunities in the Big Data Era
• Big sensor data? Yes, everywhere
─ sensors attached to “things” or “human beings”
─ emerging new applications
1) music generation
2) context aware music
recommendation
38
figure from pinterest
figure from ask.audio
New Opportunities in the Big Data Era
• The missing “D” in Data Science —
domain knowledge
• Music information retrieval
= musicology
+ signal processing
+ machine learning
+ others
39
Postlude
• Extension reading
‒ International Conference on Music Information Retrieval
(ISMIR)
‒ International Conference on Acoustic, Speech, and Signal
Processing (ICASSP)
‒ MIREX (MIR Evaluation eXchange)
‒ IEEE Transactions on Audio, Speech and Language
Processing (TASLP)
‒ IEEE Transactions on Multimedia (TMM)
40

楊奕軒/音樂資料檢索

  • 1.
    Music Information Retrieval Music &Audio Computing Lab, Research Center for IT Innovation, Academia Sinica Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw
  • 2.
    Prelude • PI @Music & Audio Computing Lab, Academia Sinica, since 2011 • 10420CS 573100 “Music Information Retrieval” @ NTHU, 2016 https://twtmir.wordpress.com/ 2 https://teachingmir.wikispaces.com/courses
  • 3.
    Outline • Types ofmusic related research • Fundamentals of music signal processing • New opportunities in the big data era 3
  • 4.
    Types of MusicRelated Research 1. Music creation 4 https://www.youtube.com/watch?v=3OEmzI52stk
  • 5.
    Types of MusicRelated Research 1. Music creation 5 https://www.youtube.com/watch?v=k1DgNfz1g_s
  • 6.
    Types of MusicRelated Research 1. Music creation 6 https://www.youtube.com/watch?v=wj1r9YJ6INA
  • 7.
    Types of MusicRelated Research 1. Music creation 7 http://www.inside.com.tw/2016/05/04/positive-grid-bias-head
  • 8.
    Types of MusicRelated Research 1. Music creation 8 https://youtu.be/rL5YKZ9ecpg?t=50m
  • 9.
    Types of MusicRelated Research 2. Music information “analysis” 9 automatic page turner automatic Karaoke scoring interactive concert
  • 10.
    Types of MusicRelated Research 2. Music information “analysis” 10 chord recognizer music browsing assistant
  • 11.
    Types of MusicRelated Research 3. Music information “retrieval” • Search ‒ through keywords/labels (genre, instrument, emotion) 11
  • 12.
    Types of MusicRelated Research 3. Music information “retrieval” • Search ‒ through keywords/labels (genre, instrument, emotion) ‒ through audio examples (humming, audio recording) 12
  • 13.
    Types of MusicRelated Research 3. Music information “retrieval” • Match ‒ to match 1) a video clip, 2) a photo slideshow, 3) a song lyrics, or 4) a given context ‒ cross-domain retrieval 13
  • 14.
    Types of MusicRelated Research 3. Music information “retrieval” • Discover ‒ recommendation: diversity, serendipity, explanations 14
  • 15.
    Types of MusicRelated Research 3. Music information “retrieval” • Discover ‒ recommendation: diversity, serendipity, explanations 15
  • 16.
    Types of MusicRelated Research 1. Music creation • Google Magenta, Smule AutoRap, Samsung Hum-On, Positive Grid, Yamaha Vocaloid 2. Music information analysis • Education, data visualization 3. Music information retrieval • Search: through keywords (genre, instrument, emotion) or audio examples (humming or audio recording) • Match: cross domain retrieval • Discover: recommendation 16
  • 17.
    Outline • Types ofmusic related research • Fundamentals of music signal processing • New opportunities in the big data era 17
  • 18.
    Fundamentals of MusicSignal Processing • Pitch: which notes are played? • Tempo: how fast? • Timbre: which instrument(s)? 18 Mozart’s Variationen (1st phrase)
  • 19.
    Fundamentals of MusicSignal Processing Pitch ♪♪♪ ♪♪♪ ♪♪♪ Tempo ♪ ♪ ♪ Timbre ♪ ♪♪ ♪ 19 Karaoke scorer chord recognizerpage turner
  • 20.
    Fundamentals of MusicSignal Processing Pitch ♪♪♪ ♪ Tempo ♪♪♪ Timbre ♪♪♪ ♪ 20 instrument classifier content ID Spotify running
  • 21.
    Fundamentals of MusicSignal Processing Pitch ♪♪♪ ♪♪♪ ♪♪♪ Tempo ♪♪♪ ♪♪♪ ♪♪♪ Timbre ♪♪♪ ♪♪♪ ♪♪♪ 21 similarity search or recommendation music emotion or genre recognizer automatic music video generation
  • 22.
    Fundamentals of MusicSignal Processing 22 • Listens to music tempo, instrumentation, key, time signature, energy, harmonic & timbral structures • Reads about music lyrics, blog posts, reviews, playlists and discussion forums • Learns about trends online music behavior — who's talking about which artists this week, what songs are being streamed or downloaded • Not everything is in audio
  • 23.
    Fundamentals of MusicSignal Processing • Let’s have a look at what we can extract from audio anyway • Time-domain waveform 23
  • 24.
    Fundamentals of MusicSignal Processing • Frequency domain representation • Spectrogram (obtained by Short-Time Fourier Transform) 24
  • 25.
    Fundamentals of MusicSignal Processing • Pitch • Simple for monophonic signals (almost table lookup) • Challenging for polyphonic signals; known as multi- pitch estimation (MPE) ‒ overlapping partials ‒ missing fundamentals 25 8ve 8ve 8ve 8ve 8ve
  • 26.
    Fundamentals of MusicSignal Processing • Tempo: beats per minute (bpm) • Onset detection, downbeat estimation tempo estimation, beat tracking, rhythm pattern extraction 26 energy-based spectrum-based
  • 27.
    Fundamentals of MusicSignal Processing • Timbre: difference in time-frequency distribution 27
  • 28.
    Fundamentals of MusicSignal Processing • Timbre: difference in time-frequency distribution ‒ odd-to-even harmonic ratio, decay rate, vibrato etc 28 piano solo human voice
  • 29.
    Fundamentals of MusicSignal Processing • Spectrogram, or the reduced-dimension version “Mel- spectrogram,” is usually considered as a “raw” feature representation of music • Can be treated as an image and then processed by convolutional neural nets (CNN) 29 figure made by Sander Dieleman http://benanne.github.io/2014/ 08/05/spotify-cnns.html
  • 30.
    Fundamentals of MusicSignal Processing • Chromagram: a better “timbre-invariant” feature representation for pitch related tasks (e.g. chord recognition, cover song identification) ‒ merge all the frequency bins with the same note name (C, C#, D, D#, …) ‒ 12-dim vector for each time frame 30 figure made by Meinard Meuller
  • 31.
    • Source separationcan sometimes be helpful ‒ harmonic/percussion separation: given a mixture, separate the percussive part from the harmonic part ‒ harmonic: pitch related info ‒ percussive: tempo related info Fundamentals of Music Signal Processing 31 (a) original (b) harmonic (c) percussive
  • 32.
    • Source separationcan sometimes be helpful ‒ singing voice separation: given a mixture, separate the singing voice from the accompaniment Fundamentals of Music Signal Processing 32
  • 33.
    Fundamentals of MusicSignal Processing • Pitch, tempo, timbre play different roles in different tasks • Spectrogram: a basic feature representation • Multipitch estimation: for better pitch info • Source separation: might improve the extraction for pitch, tempo and also timbre • Feature design (based on domain knowledge) versus feature learning (data-driven; deep learning) 33
  • 34.
    Outline • Types ofmusic related research • Fundamentals of music signal processing • New opportunities in the big data era 34
  • 35.
    New Opportunities inthe Big Data Era • Big music audio data? No, only if you work for a big company ─ not sharable due to copyright issues and business interest ─ however, audio features can be shared ─ or, start with copyright free music 35 free music archive
  • 36.
    New Opportunities inthe Big Data Era • Big music listening data? Yes, some of them can be crawled from social platform websites ‒ from last.fm API, EchoNest API ‒ from Twitter: #nowplaying dataset 36
  • 37.
    New Opportunities inthe Big Data Era • Big music text data? Yes, plenty of data ─ score, lyrics, review, playlist, tags, Wikipedia, etc ─ not everything is in audio ─ some of them are easier to get from non-audio data 37
  • 38.
    New Opportunities inthe Big Data Era • Big sensor data? Yes, everywhere ─ sensors attached to “things” or “human beings” ─ emerging new applications 1) music generation 2) context aware music recommendation 38 figure from pinterest figure from ask.audio
  • 39.
    New Opportunities inthe Big Data Era • The missing “D” in Data Science — domain knowledge • Music information retrieval = musicology + signal processing + machine learning + others 39
  • 40.
    Postlude • Extension reading ‒International Conference on Music Information Retrieval (ISMIR) ‒ International Conference on Acoustic, Speech, and Signal Processing (ICASSP) ‒ MIREX (MIR Evaluation eXchange) ‒ IEEE Transactions on Audio, Speech and Language Processing (TASLP) ‒ IEEE Transactions on Multimedia (TMM) 40