Yi-Hsuan Yang is an Associate Research Fellow with Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010, and became an Assistant Research Fellow in Academia Sinica in 2011. He is also an Adjunct Associate Professor with the National Tsing Hua University, Taiwan. His research interests include music information retrieval, machine learning and affective computing. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award, the 2012 ACM Multimedia Grand Challenge First Prize, and the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan. He is an author of the book Music Emotion Recognition (CRC Press 2011) and a tutorial speaker on music affect recognition in the International Society for Music Information Retrieval Conference (ISMIR 2012). In 2014, he served as a Technical Program Co-chair of ISMIR, and a Guest Editor of the IEEE Transactions on Affective Computing and the ACM Transactions on Intelligent Systems and Technology.
1. Music Information
Retrieval
Music & Audio Computing Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw
2. Prelude
• PI @ Music & Audio Computing Lab, Academia Sinica, since 2011
• 10420CS 573100 “Music Information Retrieval” @ NTHU, 2016
https://twtmir.wordpress.com/
2
https://teachingmir.wikispaces.com/courses
3. Outline
• Types of music related research
• Fundamentals of music signal processing
• New opportunities in the big data era
3
4. Types of Music Related Research
1. Music creation
4
https://www.youtube.com/watch?v=3OEmzI52stk
5. Types of Music Related Research
1. Music creation
5
https://www.youtube.com/watch?v=k1DgNfz1g_s
6. Types of Music Related Research
1. Music creation
6
https://www.youtube.com/watch?v=wj1r9YJ6INA
7. Types of Music Related Research
1. Music creation
7
http://www.inside.com.tw/2016/05/04/positive-grid-bias-head
8. Types of Music Related Research
1. Music creation
8
https://youtu.be/rL5YKZ9ecpg?t=50m
9. Types of Music Related Research
2. Music information
“analysis”
9
automatic page turner
automatic
Karaoke scoring
interactive
concert
10. Types of Music Related Research
2. Music information
“analysis”
10
chord recognizer music browsing assistant
11. Types of Music Related Research
3. Music information “retrieval”
• Search
‒ through keywords/labels (genre, instrument, emotion)
11
12. Types of Music Related Research
3. Music information “retrieval”
• Search
‒ through keywords/labels (genre, instrument, emotion)
‒ through audio examples (humming, audio recording)
12
13. Types of Music Related Research
3. Music information “retrieval”
• Match
‒ to match 1) a video clip, 2) a photo slideshow,
3) a song lyrics, or 4) a given context
‒ cross-domain retrieval
13
14. Types of Music Related Research
3. Music information “retrieval”
• Discover
‒ recommendation: diversity, serendipity, explanations
14
15. Types of Music Related Research
3. Music information “retrieval”
• Discover
‒ recommendation: diversity, serendipity, explanations
15
16. Types of Music Related Research
1. Music creation
• Google Magenta, Smule AutoRap, Samsung Hum-On,
Positive Grid, Yamaha Vocaloid
2. Music information analysis
• Education, data visualization
3. Music information retrieval
• Search: through keywords (genre, instrument, emotion) or
audio examples (humming or audio recording)
• Match: cross domain retrieval
• Discover: recommendation
16
17. Outline
• Types of music related research
• Fundamentals of music signal processing
• New opportunities in the big data era
17
18. Fundamentals of Music Signal Processing
• Pitch: which notes are played?
• Tempo: how fast?
• Timbre: which instrument(s)?
18
Mozart’s Variationen
(1st phrase)
19. Fundamentals of Music Signal Processing
Pitch ♪♪♪ ♪♪♪ ♪♪♪
Tempo ♪ ♪ ♪
Timbre ♪ ♪♪ ♪
19
Karaoke scorer chord recognizerpage turner
20. Fundamentals of Music Signal Processing
Pitch ♪♪♪ ♪
Tempo ♪♪♪
Timbre ♪♪♪ ♪
20
instrument
classifier
content ID Spotify running
21. Fundamentals of Music Signal Processing
Pitch ♪♪♪ ♪♪♪ ♪♪♪
Tempo ♪♪♪ ♪♪♪ ♪♪♪
Timbre ♪♪♪ ♪♪♪ ♪♪♪
21
similarity search
or
recommendation
music
emotion or
genre
recognizer
automatic
music video
generation
22. Fundamentals of Music Signal Processing
22
• Listens to music
tempo, instrumentation,
key, time signature, energy,
harmonic & timbral structures
• Reads about music
lyrics, blog posts, reviews,
playlists and discussion forums
• Learns about trends
online music behavior — who's
talking about which artists this
week, what songs are being
streamed or downloaded
• Not everything is in audio
23. Fundamentals of Music Signal Processing
• Let’s have a look at what we can extract from audio
anyway
• Time-domain waveform
23
24. Fundamentals of Music Signal Processing
• Frequency domain
representation
• Spectrogram (obtained
by Short-Time Fourier
Transform)
24
25. Fundamentals of Music Signal Processing
• Pitch
• Simple for monophonic
signals (almost table
lookup)
• Challenging for polyphonic
signals; known as multi-
pitch estimation (MPE)
‒ overlapping partials
‒ missing fundamentals
25
8ve
8ve
8ve
8ve
8ve
26. Fundamentals of Music Signal Processing
• Tempo: beats
per minute (bpm)
• Onset detection,
downbeat estimation
tempo estimation,
beat tracking,
rhythm pattern
extraction
26
energy-based spectrum-based
27. Fundamentals of Music Signal Processing
• Timbre: difference in time-frequency distribution
27
28. Fundamentals of Music Signal Processing
• Timbre: difference in time-frequency distribution
‒ odd-to-even harmonic ratio, decay rate, vibrato etc
28
piano solo human voice
29. Fundamentals of Music Signal Processing
• Spectrogram, or the reduced-dimension version “Mel-
spectrogram,” is usually considered as a “raw” feature
representation of music
• Can be treated as an image and then processed by
convolutional neural nets (CNN)
29
figure made by
Sander Dieleman
http://benanne.github.io/2014/
08/05/spotify-cnns.html
30. Fundamentals of Music Signal Processing
• Chromagram: a better “timbre-invariant” feature
representation for pitch related tasks (e.g. chord
recognition, cover song identification)
‒ merge all the frequency bins
with the same note name
(C, C#, D, D#, …)
‒ 12-dim vector for each
time frame
30
figure made by
Meinard Meuller
31. • Source separation can sometimes be helpful
‒ harmonic/percussion separation: given a mixture, separate
the percussive part from the harmonic part
‒ harmonic: pitch related info
‒ percussive: tempo related info
Fundamentals of Music Signal Processing
31
(a) original (b) harmonic (c) percussive
32. • Source separation can sometimes be helpful
‒ singing voice separation: given a mixture, separate the
singing voice from the accompaniment
Fundamentals of Music Signal Processing
32
33. Fundamentals of Music Signal Processing
• Pitch, tempo, timbre play different roles in different
tasks
• Spectrogram: a basic feature representation
• Multipitch estimation: for better pitch info
• Source separation: might improve the extraction for
pitch, tempo and also timbre
• Feature design (based on domain knowledge) versus
feature learning (data-driven; deep learning)
33
34. Outline
• Types of music related research
• Fundamentals of music signal processing
• New opportunities in the big data era
34
35. New Opportunities in the Big Data Era
• Big music audio data? No, only if you work for a big
company
─ not sharable due to copyright issues and business interest
─ however, audio features can be shared
─ or, start with copyright free music
35
free music
archive
36. New Opportunities in the Big Data Era
• Big music listening data? Yes, some of them can be
crawled from social platform websites
‒ from last.fm API, EchoNest API
‒ from Twitter: #nowplaying dataset
36
37. New Opportunities in the Big Data Era
• Big music text data? Yes, plenty of data
─ score, lyrics, review, playlist, tags, Wikipedia, etc
─ not everything is in audio
─ some of them are easier to get from non-audio data
37
38. New Opportunities in the Big Data Era
• Big sensor data? Yes, everywhere
─ sensors attached to “things” or “human beings”
─ emerging new applications
1) music generation
2) context aware music
recommendation
38
figure from pinterest
figure from ask.audio
39. New Opportunities in the Big Data Era
• The missing “D” in Data Science —
domain knowledge
• Music information retrieval
= musicology
+ signal processing
+ machine learning
+ others
39
40. Postlude
• Extension reading
‒ International Conference on Music Information Retrieval
(ISMIR)
‒ International Conference on Acoustic, Speech, and Signal
Processing (ICASSP)
‒ MIREX (MIR Evaluation eXchange)
‒ IEEE Transactions on Audio, Speech and Language
Processing (TASLP)
‒ IEEE Transactions on Multimedia (TMM)
40