Some research projects we did in 2017 at the Music & Audio Computing (MAC) Lab, Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan. It includes three parts: 1) vocal separation, 2) music generation, 3) AI DJ.
1. Research at MAC Lab,
Academia Sincia, i n 2 0 1 7
Music & Audio Computing (MAC) Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw
2. Music Information “Research” (MIR)
2
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix
3. Music Information “Research” (MIR)
3
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix
4. Project #1: Singing Voice Separation
• Create the singing-only version
Input: audio mixture
Output: vocal (singing only) + accompaniment (karaoke)
Or, output all the individual tracks
• Why?
Not easy to get the un-mixed, multi-track version
Even more so for old songs or live performance
• Applications
Singing voice processing, karaoke, soundtracks for movies
Vocal melody extraction
Beat/downbeat detection
4
6. Project #2: Music Generation
• Create music
• Why?
Make musician’s life easier
Create copyright-free music (for films, Ads, games)
And, well, people think this is AI
6
Eminem - When I'm Gone
7. Project #2: Music Generation
7
IBM Waston Beats Sony ->
“Create unique, royalty-free soundtracks
for your videos”
8. Project #2a: Pop→8bit Style Transfer
8bit music, also known as “chip-tunes”
https://lemonatsu.github.io/
8
Adele - someone like you
Pop 8-bit
Christina Perri - Jar of Hearts
Pop 8-bit Pop 8-bit
John Legend - All of Me
► Demo:
ICASSP’17ICASSP’17
9. Project #2b: Lead Sheet Generation
• Lead sheet
melody
chord
• Given chord, generate melody
• Given melody, generate chord (a.k.a., harmonization)
• Or, from scratch
9
10. Melody Generation by RNN
Google
MelodyRNN
C-RNN-
GAN
Song from PI DeepBach
Google
WaveNet
core model RNN RNN RNN RNN CNN
data type symbolic symbolic symbolic symbolic audio
genre specificity ─ ─ ─
Bach
chorale
─
mandatory prior
knowledge
priming
melody
─
music scale &
melody profile
melody of
one part
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
generate from
scratch
V
generate multi-
part music
V V V
open source V V
10
11. Melody Generation by CNN+GAN
Google
MelodyRNN
MidiNet
Google
WaveNet
core model RNN CNN CNN
data type symbolic symbolic audio
genre specificity ─
─
─
mandatory prior
knowledge
priming
melody
─
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
V
generate from
scratch
V
generate multi-
part music
V V
open source V V
11
• By Google
• RNN
• Trained with
thousands of
melodies
• By MAC Lab
• CNN
• 526 tabs
(4,208 bars)
• One GPU
(GTX 1080)
• <30 mins
ISMIR’17ISMIR’17
12. Algorithm 1/5: Desired Output
• Generate the melody of a bar at a time
• Use a matrix to represent the music of a bar
• Condition on the previous bar (the history)
12
96 time steps (current bar)
84notes
(next bar)(previous bar)
13. Algorithm 2/5: Main Idea
13
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
real or
fake?
14. Algorithm 3/5: Main Idea
14
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
• Or, given chord, generate melody
real or
fake?
15. Algorithm 4/5: Temporal Model
15
• Conditioner: provide 2-D conditions
use the same filter shapes as the generator CNN
so that their intermediate outputs are “compatible”
real or
fake?
16. Algorithm 5/5
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Don’t know the “desired output”
(for example, what should be played next)
Only know whether it “sounds like real”
• Do not explicitly use music theories
As opposed to more traditional methods such as rule-based
ones or that based on finite state machine
Therefore, can be used for non-Western classical music
16
real or
fake?
19. Project #2c: Multi-track Generation
• Multi-track
(not only melody
and chord)
19
Begin Again (2013)
https://salu133445.github.io/musegan/► Demo:
AAAI’18AAAI’18
20. Algorithm 1/7: Data
• LPD dataset: 128K MIDIs (piano-rolls) from LMD
20
http://colinraffel.com/projects/lmd/
21. Algorithm 2/7: Intra- & Inter-tracks
• Multi-track
piano, guitar, bass,
strings, drums
• Hybrid model
one “shared” (inter) z
five “private” (intra) zi
five generators
one discriminator
21
22. Algorithm 3/7: Temporal Model
22
(a) generation from scratch
(b) track-conditional generation
26. Algorithm 6/7: G & D
26
• Grow time steps first
1 → 2 → 4 → 8 → 16
→ 32 → 96
• Then notes (freq)
octave (7)
then, pitch (84)
27. Algorithm 7/7: Objective Metrics
• EB: ratio of empty bars (in %)
• UPC: number of used pitch classes per bar (from 0 to 12)
• QN: ratio of “qualified” notes (in %); we consider a note no shorter than
three time steps (i.e. a 32th note) as a qualified note; QN shows if the
music is overly fragmented
• DP, or drum pattern: ratio of notes in 8- or 16-beat patterns, common ones
for Rock songs in 4/4 time (in %)
• TD: or tonal distance; the hamornicity between a pair of tracks; larger TD
implies weaker inter-track harmonic relations
27
UPC QN
30. Music Information “Research” (MIR)
30
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix
33. Project #3a: Music Thumbnailing
1. Extract music highlights
“Automatic DJ mix generation using highlight detection,”
Proc. ISMIR 2017 (from Clova Line WAVE)
“Music thumbnailing via neural attention modeling of music
emotion,” Proc. APSIPA ASC 2017 (from MAC Lab)
33
周杰倫 - 稻香
光良 - 童話
胡夏 - 那些年
Linkin Park - Burn It Down
Adam Lambert - Whataya Want from Me
34. Algorithm: Music Thumbnailing
• CNN for emotion prediction + attention (predicting
weights of different parts of a song)
• Transfer learning: no need of structural (chorus) labels
34
APSIPA ASC’17APSIPA ASC’17
35. Project #3b: Music Sequencing
2. Find an ordering of music pieces
“Automatic playlist sequencing and transitions,”
Proc. ISMIR 2017 (from )
“Generating music medleys via playing music puzzle games by
unsupervised similarity embedding,” (from MAC Lab)
35
https://remyhuang.github.io/
► Demo:
36. Algorithm 1/2: Music Puzzle Games
• Divide a song into several non-overlapping chunks
• Learn to order them by a Siamese CNN network
Positive pair: R1R2, R2R3
Negative pair: R2R1, R3R2, R1R3, R3R1
Unsupervised (self-supervised) learning
36
AAAI’18bAAAI’18b
37. Algorithm 2/2: Similarity Embedding Net
a) Shuffle and learn: Unsupervised learning using temporal order verification (ECCV’16)
b) Unsupervised representation learning by sorting sequences (2017)
37
39. Project #3: DJnet Tasks
1. Music thumbnailing: extract a short snippet that represents a
whole song
2. Music sequencing: find an ordering of music pieces
3. Music mash-up: blend two or more pre-recorded songs
(related to vocal separation)
4. EDM sub-genre classification
(related to music classification)
5. EDM generation
(related to music generation)
39
40. Music Information “Research” (MIR)
40
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix