SlideShare a Scribd company logo
1 of 40
Download to read offline
Research at MAC Lab,
Academia Sincia, i n 2 0 1 7
Music & Audio Computing (MAC) Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw
Music Information “Research” (MIR)
2
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix
Music Information “Research” (MIR)
3
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix
Project #1: Singing Voice Separation
• Create the singing-only version
 Input: audio mixture
 Output: vocal (singing only) + accompaniment (karaoke)
 Or, output all the individual tracks
• Why?
 Not easy to get the un-mixed, multi-track version
 Even more so for old songs or live performance
• Applications
 Singing voice processing, karaoke, soundtracks for movies
 Vocal melody extraction
 Beat/downbeat detection
4
Project #1: Singing Voice Separation
• Create the singing-only version
 Input: audio mixture
 Output: vocal (singing only) + accompaniment (karaoke)
http://ss.ciaua.com/
(by J Bengio’s group: https://js-mim.github.io/mss_pytorch/)
(Yamaha: http://www.y2lab.com/en/project/source_separation/)
► Demo:
Project #2: Music Generation
• Create music
• Why?
Make musician’s life easier
Create copyright-free music (for films, Ads, games)
And, well, people think this is AI
6
Eminem - When I'm Gone
Project #2: Music Generation
7
IBM Waston Beats Sony ->
“Create unique, royalty-free soundtracks
for your videos”
Project #2a: Pop→8bit Style Transfer
8bit music, also known as “chip-tunes”
https://lemonatsu.github.io/
8
Adele - someone like you
Pop 8-bit
Christina Perri - Jar of Hearts
Pop 8-bit Pop 8-bit
John Legend - All of Me
► Demo:
ICASSP’17ICASSP’17
Project #2b: Lead Sheet Generation
• Lead sheet
 melody
 chord
• Given chord, generate melody
• Given melody, generate chord (a.k.a., harmonization)
• Or, from scratch
9
Melody Generation by RNN
Google
MelodyRNN
C-RNN-
GAN
Song from PI DeepBach
Google
WaveNet
core model RNN RNN RNN RNN CNN
data type symbolic symbolic symbolic symbolic audio
genre specificity ─ ─ ─
Bach
chorale
─
mandatory prior
knowledge
priming
melody
─
music scale &
melody profile
melody of
one part
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
generate from
scratch
V
generate multi-
part music
V V V
open source V V
10
Melody Generation by CNN+GAN
Google
MelodyRNN
MidiNet
Google
WaveNet
core model RNN CNN CNN
data type symbolic symbolic audio
genre specificity ─
─
─
mandatory prior
knowledge
priming
melody
─
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
V
generate from
scratch
V
generate multi-
part music
V V
open source V V
11
• By Google
• RNN
• Trained with
thousands of
melodies
• By MAC Lab
• CNN
• 526 tabs
(4,208 bars)
• One GPU
(GTX 1080)
• <30 mins
ISMIR’17ISMIR’17
Algorithm 1/5: Desired Output
• Generate the melody of a bar at a time
• Use a matrix to represent the music of a bar
• Condition on the previous bar (the history)
12
96 time steps (current bar)
84notes
(next bar)(previous bar)
Algorithm 2/5: Main Idea
13
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
real or
fake?
Algorithm 3/5: Main Idea
14
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
• Or, given chord, generate melody
real or
fake?
Algorithm 4/5: Temporal Model
15
• Conditioner: provide 2-D conditions
 use the same filter shapes as the generator CNN
 so that their intermediate outputs are “compatible”
real or
fake?
Algorithm 5/5
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Don’t know the “desired output”
(for example, what should be played next)
Only know whether it “sounds like real”
• Do not explicitly use music theories
As opposed to more traditional methods such as rule-based
ones or that based on finite state machine
Therefore, can be used for non-Western classical music
16
real or
fake?
MidiNet: Evaluation
17
Google Magenta’s RNN models MidiNet models
top: pro users
bottom: layman
MidiNet: Examples
• Variants of MidiNet
18
1 2 3
• Google Magenta
vs. MidiNet
• With drums
Project #2c: Multi-track Generation
• Multi-track
(not only melody
and chord)
19
Begin Again (2013)
https://salu133445.github.io/musegan/► Demo:
AAAI’18AAAI’18
Algorithm 1/7: Data
• LPD dataset: 128K MIDIs (piano-rolls) from LMD
20
http://colinraffel.com/projects/lmd/
Algorithm 2/7: Intra- & Inter-tracks
• Multi-track
piano, guitar, bass,
strings, drums
• Hybrid model
 one “shared” (inter) z
 five “private” (intra) zi
 five generators
 one discriminator
21
Algorithm 3/7: Temporal Model
22
(a) generation from scratch
(b) track-conditional generation
Algorithm 4/7: Combined Model
23
generation from scratch
Algorithm 5/7: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
24
NegativeDloss
Algorithm 5/7: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
25
NegativeDloss
Algorithm 6/7: G & D
26
• Grow time steps first
 1 → 2 → 4 → 8 → 16
→ 32 → 96
• Then notes (freq)
 octave (7)
 then, pitch (84)
Algorithm 7/7: Objective Metrics
• EB: ratio of empty bars (in %)
• UPC: number of used pitch classes per bar (from 0 to 12)
• QN: ratio of “qualified” notes (in %); we consider a note no shorter than
three time steps (i.e. a 32th note) as a qualified note; QN shows if the
music is overly fragmented
• DP, or drum pattern: ratio of notes in 8- or 16-beat patterns, common ones
for Rock songs in 4/4 time (in %)
• TD: or tonal distance; the hamornicity between a pair of tracks; larger TD
implies weaker inter-track harmonic relations
27
UPC QN
Interpolation Result
28
Challenge Ahead: Make It Longer
• And with more structures
29
Music Information “Research” (MIR)
30
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix
Project #3: DJnet
31
Personalized DJ
• Smart speaker + recommendation + DJ
32
Project #3a: Music Thumbnailing
1. Extract music highlights
 “Automatic DJ mix generation using highlight detection,”
Proc. ISMIR 2017 (from Clova Line WAVE)
 “Music thumbnailing via neural attention modeling of music
emotion,” Proc. APSIPA ASC 2017 (from MAC Lab)
33
周杰倫 - 稻香
光良 - 童話
胡夏 - 那些年
Linkin Park - Burn It Down
Adam Lambert - Whataya Want from Me
Algorithm: Music Thumbnailing
• CNN for emotion prediction + attention (predicting
weights of different parts of a song)
• Transfer learning: no need of structural (chorus) labels
34
APSIPA ASC’17APSIPA ASC’17
Project #3b: Music Sequencing
2. Find an ordering of music pieces
 “Automatic playlist sequencing and transitions,”
Proc. ISMIR 2017 (from )
 “Generating music medleys via playing music puzzle games by
unsupervised similarity embedding,” (from MAC Lab)
35
https://remyhuang.github.io/
► Demo:
Algorithm 1/2: Music Puzzle Games
• Divide a song into several non-overlapping chunks
• Learn to order them by a Siamese CNN network
 Positive pair: R1R2, R2R3
 Negative pair: R2R1, R3R2, R1R3, R3R1
 Unsupervised (self-supervised) learning
36
AAAI’18bAAAI’18b
Algorithm 2/2: Similarity Embedding Net
a) Shuffle and learn: Unsupervised learning using temporal order verification (ECCV’16)
b) Unsupervised representation learning by sorting sequences (2017)
37
Learned Embeddings
38
Project #3: DJnet Tasks
1. Music thumbnailing: extract a short snippet that represents a
whole song
2. Music sequencing: find an ordering of music pieces
3. Music mash-up: blend two or more pre-recorded songs
(related to vocal separation)
4. EDM sub-genre classification
(related to music classification)
5. EDM generation
(related to music generation)
39
Music Information “Research” (MIR)
40
• Intelligent ways to analyze, retrieve, and create music
1. Music informa-
tion analysis
2. Music informa-
tion retrieval
3. Music
generation
music → features query → music X → music
• multi-pitch estimation
• beat/downbeat detection
• chord recognition
• melody extraction
• audio-score alignment
• music transcription
• source separation
• query by humming
• music identification
• music classification
(by genre, emotion)
• similarity search
• music recommendation
• playlist generation
• accompaniment
• generation
• style transfer
• mashup/remix

More Related Content

What's hot

Learning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR TechniquesLearning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR TechniquesYi-Hsuan Yang
 
20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AIYi-Hsuan Yang
 
20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mir20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mirYi-Hsuan Yang
 
"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo ChoiKeunwoo Choi
 
Artificial intelligence and Music
Artificial intelligence and MusicArtificial intelligence and Music
Artificial intelligence and MusicJehoshaphat Abu
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012Paul Lamere
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music PlaylistsKeunwoo Choi
 
Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.Guillaume Saint Etienne
 
Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Paul Lamere
 
Adaptive Music in Video Games (2018)
Adaptive Music in Video Games (2018)Adaptive Music in Video Games (2018)
Adaptive Music in Video Games (2018)Adam Sporka
 
MOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionMOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionRui Pedro Paiva
 
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...Gilberto Bernardes
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5Aloïs Gruson
 
MSUPPLE RESUME2016theone
MSUPPLE RESUME2016theoneMSUPPLE RESUME2016theone
MSUPPLE RESUME2016theoneMike Supple
 
Denktank 2010
Denktank 2010Denktank 2010
Denktank 2010ocor203
 
I've got key to your API, now what?
I've got key to your API, now what?I've got key to your API, now what?
I've got key to your API, now what?Javaun Moradi
 
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...Public Broadcasting Service
 
Computational models of symphonic music
Computational models of symphonic musicComputational models of symphonic music
Computational models of symphonic musicEmilia Gómez
 
social web music
social web musicsocial web music
social web musicclaudio b
 
FindStream investor deck
FindStream investor deckFindStream investor deck
FindStream investor deckFindStream
 

What's hot (20)

Learning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR TechniquesLearning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
 
20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI
 
20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mir20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mir
 
"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi
 
Artificial intelligence and Music
Artificial intelligence and MusicArtificial intelligence and Music
Artificial intelligence and Music
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
 
Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.
 
Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)
 
Adaptive Music in Video Games (2018)
Adaptive Music in Video Games (2018)Adaptive Music in Video Games (2018)
Adaptive Music in Video Games (2018)
 
MOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionMOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion Recognition
 
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
MSUPPLE RESUME2016theone
MSUPPLE RESUME2016theoneMSUPPLE RESUME2016theone
MSUPPLE RESUME2016theone
 
Denktank 2010
Denktank 2010Denktank 2010
Denktank 2010
 
I've got key to your API, now what?
I've got key to your API, now what?I've got key to your API, now what?
I've got key to your API, now what?
 
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...
 
Computational models of symphonic music
Computational models of symphonic musicComputational models of symphonic music
Computational models of symphonic music
 
social web music
social web musicsocial web music
social web music
 
FindStream investor deck
FindStream investor deckFindStream investor deck
FindStream investor deck
 

Similar to Research at MAC Lab, Academia Sincia, in 2017

Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016Wojciech Franke
 
Music recognition
Music recognition Music recognition
Music recognition aaronloklok
 
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)Keunwoo Choi
 
Using mashup technology to improve findability
Using mashup technology to improve findabilityUsing mashup technology to improve findability
Using mashup technology to improve findabilitySten Govaerts
 
Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technologyShital Kat
 
Automatic Music Transcription
Automatic Music TranscriptionAutomatic Music Transcription
Automatic Music TranscriptionKhyati Ganatra
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialBen Fields
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialclaudio b
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...multimediaeval
 
Introduction musictech
Introduction musictechIntroduction musictech
Introduction musictechJia Liu
 
AMIS 2016 2, Music Math Coding
AMIS 2016 2, Music Math CodingAMIS 2016 2, Music Math Coding
AMIS 2016 2, Music Math CodingSamuel Wright
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based RankingAndrea Gazzarini
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...multimediaeval
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorSofya Latkina
 
Music Objects to Social Machines
Music Objects to Social MachinesMusic Objects to Social Machines
Music Objects to Social MachinesDavid De Roure
 
Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...kthrlab
 

Similar to Research at MAC Lab, Academia Sincia, in 2017 (20)

Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
 
Music recognition
Music recognition Music recognition
Music recognition
 
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
 
Using mashup technology to improve findability
Using mashup technology to improve findabilityUsing mashup technology to improve findability
Using mashup technology to improve findability
 
Ism2011
Ism2011Ism2011
Ism2011
 
Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technology
 
Automatic Music Transcription
Automatic Music TranscriptionAutomatic Music Transcription
Automatic Music Transcription
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
 
Introduction musictech
Introduction musictechIntroduction musictech
Introduction musictech
 
AMIS 2016 2, Music Math Coding
AMIS 2016 2, Music Math CodingAMIS 2016 2, Music Math Coding
AMIS 2016 2, Music Math Coding
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based Ranking
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious Generator
 
Setlistör Demo Slides
Setlistör Demo SlidesSetlistör Demo Slides
Setlistör Demo Slides
 
Music Objects to Social Machines
Music Objects to Social MachinesMusic Objects to Social Machines
Music Objects to Social Machines
 
MIR
MIRMIR
MIR
 
Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Research at MAC Lab, Academia Sincia, in 2017

  • 1. Research at MAC Lab, Academia Sincia, i n 2 0 1 7 Music & Audio Computing (MAC) Lab, Research Center for IT Innovation, Academia Sinica Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw
  • 2. Music Information “Research” (MIR) 2 • Intelligent ways to analyze, retrieve, and create music 1. Music informa- tion analysis 2. Music informa- tion retrieval 3. Music generation music → features query → music X → music • multi-pitch estimation • beat/downbeat detection • chord recognition • melody extraction • audio-score alignment • music transcription • source separation • query by humming • music identification • music classification (by genre, emotion) • similarity search • music recommendation • playlist generation • accompaniment • generation • style transfer • mashup/remix
  • 3. Music Information “Research” (MIR) 3 • Intelligent ways to analyze, retrieve, and create music 1. Music informa- tion analysis 2. Music informa- tion retrieval 3. Music generation music → features query → music X → music • multi-pitch estimation • beat/downbeat detection • chord recognition • melody extraction • audio-score alignment • music transcription • source separation • query by humming • music identification • music classification (by genre, emotion) • similarity search • music recommendation • playlist generation • accompaniment • generation • style transfer • mashup/remix
  • 4. Project #1: Singing Voice Separation • Create the singing-only version  Input: audio mixture  Output: vocal (singing only) + accompaniment (karaoke)  Or, output all the individual tracks • Why?  Not easy to get the un-mixed, multi-track version  Even more so for old songs or live performance • Applications  Singing voice processing, karaoke, soundtracks for movies  Vocal melody extraction  Beat/downbeat detection 4
  • 5. Project #1: Singing Voice Separation • Create the singing-only version  Input: audio mixture  Output: vocal (singing only) + accompaniment (karaoke) http://ss.ciaua.com/ (by J Bengio’s group: https://js-mim.github.io/mss_pytorch/) (Yamaha: http://www.y2lab.com/en/project/source_separation/) ► Demo:
  • 6. Project #2: Music Generation • Create music • Why? Make musician’s life easier Create copyright-free music (for films, Ads, games) And, well, people think this is AI 6 Eminem - When I'm Gone
  • 7. Project #2: Music Generation 7 IBM Waston Beats Sony -> “Create unique, royalty-free soundtracks for your videos”
  • 8. Project #2a: Pop→8bit Style Transfer 8bit music, also known as “chip-tunes” https://lemonatsu.github.io/ 8 Adele - someone like you Pop 8-bit Christina Perri - Jar of Hearts Pop 8-bit Pop 8-bit John Legend - All of Me ► Demo: ICASSP’17ICASSP’17
  • 9. Project #2b: Lead Sheet Generation • Lead sheet  melody  chord • Given chord, generate melody • Given melody, generate chord (a.k.a., harmonization) • Or, from scratch 9
  • 10. Melody Generation by RNN Google MelodyRNN C-RNN- GAN Song from PI DeepBach Google WaveNet core model RNN RNN RNN RNN CNN data type symbolic symbolic symbolic symbolic audio genre specificity ─ ─ ─ Bach chorale ─ mandatory prior knowledge priming melody ─ music scale & melody profile melody of one part priming wave follow a priming melody V V V follow a chord sequence generate from scratch V generate multi- part music V V V open source V V 10
  • 11. Melody Generation by CNN+GAN Google MelodyRNN MidiNet Google WaveNet core model RNN CNN CNN data type symbolic symbolic audio genre specificity ─ ─ ─ mandatory prior knowledge priming melody ─ priming wave follow a priming melody V V V follow a chord sequence V generate from scratch V generate multi- part music V V open source V V 11 • By Google • RNN • Trained with thousands of melodies • By MAC Lab • CNN • 526 tabs (4,208 bars) • One GPU (GTX 1080) • <30 mins ISMIR’17ISMIR’17
  • 12. Algorithm 1/5: Desired Output • Generate the melody of a bar at a time • Use a matrix to represent the music of a bar • Condition on the previous bar (the history) 12 96 time steps (current bar) 84notes (next bar)(previous bar)
  • 13. Algorithm 2/5: Main Idea 13 • Generative adversarial nets (GAN) Discriminator: tell real from fake Generator: fool the discriminator • Generate from scratch real or fake?
  • 14. Algorithm 3/5: Main Idea 14 • Generative adversarial nets (GAN) Discriminator: tell real from fake Generator: fool the discriminator • Generate from scratch • Or, given chord, generate melody real or fake?
  • 15. Algorithm 4/5: Temporal Model 15 • Conditioner: provide 2-D conditions  use the same filter shapes as the generator CNN  so that their intermediate outputs are “compatible” real or fake?
  • 16. Algorithm 5/5 • Generative adversarial nets (GAN) Discriminator: tell real from fake Generator: fool the discriminator • Don’t know the “desired output” (for example, what should be played next) Only know whether it “sounds like real” • Do not explicitly use music theories As opposed to more traditional methods such as rule-based ones or that based on finite state machine Therefore, can be used for non-Western classical music 16 real or fake?
  • 17. MidiNet: Evaluation 17 Google Magenta’s RNN models MidiNet models top: pro users bottom: layman
  • 18. MidiNet: Examples • Variants of MidiNet 18 1 2 3 • Google Magenta vs. MidiNet • With drums
  • 19. Project #2c: Multi-track Generation • Multi-track (not only melody and chord) 19 Begin Again (2013) https://salu133445.github.io/musegan/► Demo: AAAI’18AAAI’18
  • 20. Algorithm 1/7: Data • LPD dataset: 128K MIDIs (piano-rolls) from LMD 20 http://colinraffel.com/projects/lmd/
  • 21. Algorithm 2/7: Intra- & Inter-tracks • Multi-track piano, guitar, bass, strings, drums • Hybrid model  one “shared” (inter) z  five “private” (intra) zi  five generators  one discriminator 21
  • 22. Algorithm 3/7: Temporal Model 22 (a) generation from scratch (b) track-conditional generation
  • 23. Algorithm 4/7: Combined Model 23 generation from scratch
  • 24. Algorithm 5/7: WGAN-gp • Beginning – 500:1 D/G updates • Later – 5:1 D/G updates • Training time: <24 hours 24 NegativeDloss
  • 25. Algorithm 5/7: WGAN-gp • Beginning – 500:1 D/G updates • Later – 5:1 D/G updates • Training time: <24 hours 25 NegativeDloss
  • 26. Algorithm 6/7: G & D 26 • Grow time steps first  1 → 2 → 4 → 8 → 16 → 32 → 96 • Then notes (freq)  octave (7)  then, pitch (84)
  • 27. Algorithm 7/7: Objective Metrics • EB: ratio of empty bars (in %) • UPC: number of used pitch classes per bar (from 0 to 12) • QN: ratio of “qualified” notes (in %); we consider a note no shorter than three time steps (i.e. a 32th note) as a qualified note; QN shows if the music is overly fragmented • DP, or drum pattern: ratio of notes in 8- or 16-beat patterns, common ones for Rock songs in 4/4 time (in %) • TD: or tonal distance; the hamornicity between a pair of tracks; larger TD implies weaker inter-track harmonic relations 27 UPC QN
  • 29. Challenge Ahead: Make It Longer • And with more structures 29
  • 30. Music Information “Research” (MIR) 30 • Intelligent ways to analyze, retrieve, and create music 1. Music informa- tion analysis 2. Music informa- tion retrieval 3. Music generation music → features query → music X → music • multi-pitch estimation • beat/downbeat detection • chord recognition • melody extraction • audio-score alignment • music transcription • source separation • query by humming • music identification • music classification (by genre, emotion) • similarity search • music recommendation • playlist generation • accompaniment • generation • style transfer • mashup/remix
  • 32. Personalized DJ • Smart speaker + recommendation + DJ 32
  • 33. Project #3a: Music Thumbnailing 1. Extract music highlights  “Automatic DJ mix generation using highlight detection,” Proc. ISMIR 2017 (from Clova Line WAVE)  “Music thumbnailing via neural attention modeling of music emotion,” Proc. APSIPA ASC 2017 (from MAC Lab) 33 周杰倫 - 稻香 光良 - 童話 胡夏 - 那些年 Linkin Park - Burn It Down Adam Lambert - Whataya Want from Me
  • 34. Algorithm: Music Thumbnailing • CNN for emotion prediction + attention (predicting weights of different parts of a song) • Transfer learning: no need of structural (chorus) labels 34 APSIPA ASC’17APSIPA ASC’17
  • 35. Project #3b: Music Sequencing 2. Find an ordering of music pieces  “Automatic playlist sequencing and transitions,” Proc. ISMIR 2017 (from )  “Generating music medleys via playing music puzzle games by unsupervised similarity embedding,” (from MAC Lab) 35 https://remyhuang.github.io/ ► Demo:
  • 36. Algorithm 1/2: Music Puzzle Games • Divide a song into several non-overlapping chunks • Learn to order them by a Siamese CNN network  Positive pair: R1R2, R2R3  Negative pair: R2R1, R3R2, R1R3, R3R1  Unsupervised (self-supervised) learning 36 AAAI’18bAAAI’18b
  • 37. Algorithm 2/2: Similarity Embedding Net a) Shuffle and learn: Unsupervised learning using temporal order verification (ECCV’16) b) Unsupervised representation learning by sorting sequences (2017) 37
  • 39. Project #3: DJnet Tasks 1. Music thumbnailing: extract a short snippet that represents a whole song 2. Music sequencing: find an ordering of music pieces 3. Music mash-up: blend two or more pre-recorded songs (related to vocal separation) 4. EDM sub-genre classification (related to music classification) 5. EDM generation (related to music generation) 39
  • 40. Music Information “Research” (MIR) 40 • Intelligent ways to analyze, retrieve, and create music 1. Music informa- tion analysis 2. Music informa- tion retrieval 3. Music generation music → features query → music X → music • multi-pitch estimation • beat/downbeat detection • chord recognition • melody extraction • audio-score alignment • music transcription • source separation • query by humming • music identification • music classification (by genre, emotion) • similarity search • music recommendation • playlist generation • accompaniment • generation • style transfer • mashup/remix