Research at MAC Lab, Academia Sincia, in 2017

Research at MAC Lab,
Academia Sincia, i n 2 0 1 7
Music & Audio Computing (MAC) Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw

Music Information “Research” (MIR)
2
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix

3
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
(by genre, emotion)
• accompaniment
• generation
• style transfer
• mashup/remix

Project #1: Singing Voice Separation
• Create the singing-only version
 Input: audio mixture
 Output: vocal (singing only) + accompaniment (karaoke)
 Or, output all the individual tracks
• Why?
 Not easy to get the un-mixed, multi-track version
 Even more so for old songs or live performance
• Applications
 Singing voice processing, karaoke, soundtracks for movies
 Vocal melody extraction
 Beat/downbeat detection
4

Project #1: Singing Voice Separation
• Create the singing-only version
 Input: audio mixture
 Output: vocal (singing only) + accompaniment (karaoke)
http://ss.ciaua.com/
(by J Bengio’s group: https://js-mim.github.io/mss_pytorch/)
(Yamaha: http://www.y2lab.com/en/project/source_separation/)
► Demo:

Project #2: Music Generation
• Create music
• Why?
Make musician’s life easier
Create copyright-free music (for films, Ads, games)
And, well, people think this is AI
6
Eminem - When I'm Gone

Project #2: Music Generation
7
IBM Waston Beats Sony ->
“Create unique, royalty-free soundtracks
for your videos”

Project #2a: Pop→8bit Style Transfer
8bit music, also known as “chip-tunes”
https://lemonatsu.github.io/
8
Adele - someone like you
Pop 8-bit
Christina Perri - Jar of Hearts
Pop 8-bit Pop 8-bit
John Legend - All of Me
► Demo:
ICASSP’17ICASSP’17

Project #2b: Lead Sheet Generation
• Lead sheet
 melody
 chord
• Given chord, generate melody
• Given melody, generate chord (a.k.a., harmonization)
• Or, from scratch
9

Melody Generation by RNN
Google
MelodyRNN
C-RNN-
GAN
Song from PI DeepBach
Google
WaveNet
core model RNN RNN RNN RNN CNN
data type symbolic symbolic symbolic symbolic audio
genre specificity ─ ─ ─
Bach
chorale
─
mandatory prior
knowledge
priming
melody
─
music scale &
melody profile
melody of
one part
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
generate from
scratch
V
generate multi-
part music
V V V
open source V V
10

Melody Generation by CNN+GAN
Google
MelodyRNN
MidiNet
Google
WaveNet
core model RNN CNN CNN
data type symbolic symbolic audio
genre specificity ─
─
─
mandatory prior
knowledge
priming
melody
─
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
V
generate from
scratch
V
generate multi-
part music
V V
open source V V
11
• By Google
• RNN
• Trained with
thousands of
melodies
• By MAC Lab
• CNN
• 526 tabs
(4,208 bars)
• One GPU
(GTX 1080)
• <30 mins
ISMIR’17ISMIR’17

Algorithm 1/5: Desired Output
• Generate the melody of a bar at a time
• Use a matrix to represent the music of a bar
• Condition on the previous bar (the history)
12
96 time steps (current bar)
84notes
(next bar)(previous bar)

Algorithm 2/5: Main Idea
13
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
real or
fake?

Algorithm 3/5: Main Idea
14
• Generate from scratch
• Or, given chord, generate melody
real or
fake?

Algorithm 4/5: Temporal Model
15
• Conditioner: provide 2-D conditions
 use the same filter shapes as the generator CNN
 so that their intermediate outputs are “compatible”
real or
fake?

Algorithm 5/5
• Don’t know the “desired output”
(for example, what should be played next)
Only know whether it “sounds like real”
• Do not explicitly use music theories
As opposed to more traditional methods such as rule-based
ones or that based on finite state machine
Therefore, can be used for non-Western classical music
16
real or
fake?

MidiNet: Evaluation
17
Google Magenta’s RNN models MidiNet models
top: pro users
bottom: layman

MidiNet: Examples
• Variants of MidiNet
18
1 2 3
• Google Magenta
vs. MidiNet
• With drums

Project #2c: Multi-track Generation
• Multi-track
(not only melody
and chord)
19
Begin Again (2013)
https://salu133445.github.io/musegan/► Demo:
AAAI’18AAAI’18

Algorithm 1/7: Data
• LPD dataset: 128K MIDIs (piano-rolls) from LMD
20
http://colinraffel.com/projects/lmd/

Algorithm 2/7: Intra- & Inter-tracks
• Multi-track
piano, guitar, bass,
strings, drums
• Hybrid model
 one “shared” (inter) z
 five “private” (intra) zi
 five generators
 one discriminator
21

Algorithm 3/7: Temporal Model
22
(a) generation from scratch
(b) track-conditional generation

Algorithm 4/7: Combined Model
23
generation from scratch

Algorithm 5/7: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
24
NegativeDloss

Algorithm 5/7: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
25
NegativeDloss

Algorithm 6/7: G & D
26
• Grow time steps first
 1 → 2 → 4 → 8 → 16
→ 32 → 96
• Then notes (freq)
 octave (7)
 then, pitch (84)

Algorithm 7/7: Objective Metrics
• EB: ratio of empty bars (in %)
• UPC: number of used pitch classes per bar (from 0 to 12)
• QN: ratio of “qualified” notes (in %); we consider a note no shorter than
three time steps (i.e. a 32th note) as a qualified note; QN shows if the
music is overly fragmented
• DP, or drum pattern: ratio of notes in 8- or 16-beat patterns, common ones
for Rock songs in 4/4 time (in %)
• TD: or tonal distance; the hamornicity between a pair of tracks; larger TD
implies weaker inter-track harmonic relations
27
UPC QN

Challenge Ahead: Make It Longer
• And with more structures
29

30
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
(by genre, emotion)
• accompaniment
• generation
• style transfer
• mashup/remix

Personalized DJ
• Smart speaker + recommendation + DJ
32

Project #3a: Music Thumbnailing
1. Extract music highlights
 “Automatic DJ mix generation using highlight detection,”
Proc. ISMIR 2017 (from Clova Line WAVE)
 “Music thumbnailing via neural attention modeling of music
emotion,” Proc. APSIPA ASC 2017 (from MAC Lab)
33
周杰倫 - 稻香
光良 - 童話
胡夏 - 那些年
Linkin Park - Burn It Down
Adam Lambert - Whataya Want from Me

Algorithm: Music Thumbnailing
• CNN for emotion prediction + attention (predicting
weights of different parts of a song)
• Transfer learning: no need of structural (chorus) labels
34
APSIPA ASC’17APSIPA ASC’17

Project #3b: Music Sequencing
2. Find an ordering of music pieces
 “Automatic playlist sequencing and transitions,”
Proc. ISMIR 2017 (from )
 “Generating music medleys via playing music puzzle games by
unsupervised similarity embedding,” (from MAC Lab)
35
https://remyhuang.github.io/
► Demo:

Algorithm 1/2: Music Puzzle Games
• Divide a song into several non-overlapping chunks
• Learn to order them by a Siamese CNN network
 Positive pair: R1R2, R2R3
 Negative pair: R2R1, R3R2, R1R3, R3R1
 Unsupervised (self-supervised) learning
36
AAAI’18bAAAI’18b

Algorithm 2/2: Similarity Embedding Net
a) Shuffle and learn: Unsupervised learning using temporal order verification (ECCV’16)
b) Unsupervised representation learning by sorting sequences (2017)
37

Project #3: DJnet Tasks
1. Music thumbnailing: extract a short snippet that represents a
whole song
2. Music sequencing: find an ordering of music pieces
3. Music mash-up: blend two or more pre-recorded songs
(related to vocal separation)
4. EDM sub-genre classification
(related to music classification)
5. EDM generation
(related to music generation)
39

40
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
(by genre, emotion)
• accompaniment
• generation
• style transfer
• mashup/remix

Research at MAC Lab, Academia Sincia, in 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Research at MAC Lab, Academia Sincia, in 2017

Similar to Research at MAC Lab, Academia Sincia, in 2017 (20)

Recently uploaded

Recently uploaded (20)

Research at MAC Lab, Academia Sincia, in 2017