SlideShare a Scribd company logo
1 of 27
PopMAG:
Pop Music Accompaniment Generation
Hyeshin Chu
2021. 08. 20
MM 2020
Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
Contents
• Overview of the Paper
• Introduction
• Multi-Track Midi Representation
• Multi-Track Modeling
• Experimental Setup
• Results and Analyses
• Conclusion & Future Work
2
Overview of the Paper
https://music-popmag.github.io/popmag/
• Goal
 To improve harmony of accompaniment (usually with multiple instruments)
• Previous Works
 Generate multiple tracks separately
 Music notes from different tracks: NOT explicitly depend on each other
• MuMIDI (MUlti-track MIDI representation)
 Simultaneous multi-track generation in a single sequence
 Explicitly models the dependency of the notes from different tracks
 Challenge occurs!
• Enlarges the sequence length ⇒ Difficult to model long-term
 How to solve?
• 1) Model multiple note attributes(e.g., pitch, duration, velocity) of a musical in one step (NOT multiple steps)
⇒ Shorten the length of MuMIDI sequence
• 2) Introduce extra long-context as memory to capture long-term dependency in music
3
Introduction
• Music sequence modeling using deep learning techniques
 CNN [30], RNN [26], Transformer [5, 14, 15], VAE [25], GAN [11]
• Pop music generation consists of two parts
 Chord and melody generation
 Accompaniment generation
• Accompaniment generation ⇒ Multi-track generation
 MuseGAN [10-12]
 MIDI-Sandwich2 [18]
 XiaoIce Band [34]
 LakhNES [9]
• MuMIDI
 Encodes multi-track MIDI events into one sequence of tokens
⇒ Better captures dependency among musical notes in different tracks
 Models multiple attributes in one sequence step instead of multiple steps
⇒ Shortens the sequence length
 Solves long-term music modeling challenges
4
Multi-Track Representation
• 2-track musical piece (Piano track, Bass track)
 Piano track: 10 notes
 Bass track: 5 notes
Bar
and
Position
Track Note Chord
Meta
Symbol
5
Multi-Track Representation
• Beginning of bar and different positions in a bar
 <Bar> ∋ <Pos>, <Track>, <Note>, <Chord>
 Each <Bar> has a total of 32 timesteps, called <Pos>
Bar
and
Position
Track Note Chord
Meta
Symbol
Pos 1 Pos 8
Pos 12
Pos 16 Pos 24
6
Multi-Track Representation
• Six <Track> symbols:
 <Track_Melody>, <Track_Drum>, <Track_Piano>, <Track_String>, <Track_Guitar>, <Track_Bass>
Pos 1 Pos 8
Pos 12
Pos 16 Pos 24
Bar
and
Position
Track Note Chord
Meta
Symbol
7
Multi-Track Representation
• <Note> includes four attributes:
 Pitch: 𝑃𝑖𝑡𝑐ℎ1 (C-1) to 𝑃𝑖𝑡𝑐ℎ128 (G9) for all tracks except drum
 Velocity: Quantize into 32 levels (how hard the key was struck)
 Duration: The duration of note from 1 timestep to 32 timestep
Bar
and
Position
Track Note Chord
Meta
Symbol
8
Multi-Track Representation
• Guides the pitch range of notes and emotion
• Totally 84 possible chord symbols
 12 chord roots (C, C#, D, D#, E, F, …) X 7 chord qualities (major, minor, diminished, …)
Bar
and
Position
Track Note Chord
Meta
Symbol
9
Multi-Track Representation
• Encodes the meta data of the whole musical piece
 Including tempo, tonality, style, and emotion
 Usually unchanged throughout the whole musical piece
Bar
and
Position
Track Note Chord
Meta
Symbol
10
Multi-Track Modeling
Modeling One Note
in One Step
• MuMIDI
 Encodes multi-track MIDI events into a single sequence
⇒ Long sequence
⇒ Difficult to model long-term structure
• Two aspects to better model long-term sequence
 (1) Shorten the sequence length:
• Modeling multiple note attributes (e.g., pitch, duration, velocity) of a note
in one sequence step (NOT in multiple steps)
 (2) Adopt extra long context to capture long-term dependencies:
• In the encoder and decoder of our seq-to-seq model
Modeling
Long-Term Structure
Modeling
Implementation
11
Multi-Track Modeling
Modeling One Note
in One Step
• Why?
 To let MuMIDI learn from longer music structure
• How?
 Apply note-level modeling
: Model multiple attributes of one note in one sequence step
 Regard each attribute of a note(pitch, velocity, duration) as an embedding
⇒ Sum of all attribute embeddings represent one note:
⇒ Input to encoder and decoder in our seq-to-seq model in each time step
• Result
 Shorter input and output sequences
 Better captures the long-term dependency
 Faster training and inference
Modeling
Long-Term Structure
Modeling
Implementation
12
Multi-Track Modeling
Modeling One Note
in One Step
• Why?
 To capture and exploit the long-range context
• How?
 Recurrence Transformer Encoder
• Encode each token 𝑥𝑖 in conditional tracks (in one sequence step i)
• Outputs of encoder: fed into decoder as condition context
 Recurrence Transformer Decoder
• Generate token 𝑦𝑗 :
• Conditioned on 1) the previously generated tokens 𝑦𝑡 (𝑡<𝑗)
and 2) context from encoder
• Each token in decoder:
• Only sees the condition context of the same bar
Modeling
Long-Term Structure
Modeling
Implementation
13
Multi-Track Modeling
Modeling One Note
in One Step
• Input Module
 The input embedding in each timestep
: Sum of token, meta, position, bar embedding in the timestep
 Token Embeddings
• Contain <Note>, <Bar>, <Pos>, <Track>, <Chord>, etc.
• <Note>:
• (1) All attributes(pitch, duration, velocity) of one note ⇒ One token
• (2) Sum all embeddings of all attributes as one sequence step
 Bar Embeddings
• Which bar the input token is located in
• 𝐵1, … 𝐵𝑚 (m: max # of bars in a music piece)
 Position Embeddings
• The timestep the current input token is located in
• In a <Bar>, there exist O(empty), 𝑃1, …, 𝑃32
 Meta Embeddings
• Meta symbols: 𝑇𝑒𝑚𝑝𝑜𝑙𝑜𝑤, 𝑇𝑒𝑚𝑝𝑜𝑚𝑖𝑑, 𝑇𝑒𝑚𝑝𝑜ℎ𝑖𝑔ℎ
• Output Module
 Predict a note symbol or non-note symbol
Modeling
Long-Term Structure
Modeling
Implementation
<Figure 3> Input module of MuMIDI
<Figure 4> Output module of MuMIDI
Predict a note symbol Predict non-note symbol
14
Experimental Setup
Datasets
Model
Configurations
Training &
Evaluation Setup
Evaluation
Metrics
• Three music datastes:
 LMD [23]:
• Get meta info ⇒ Filter MIDIs with ‘pop’ style tag
 FreeMidi:
• Crawl all MIDIs in pop genre in the FreeMidi website
 CPMD
• Data processing
 (1) Melody Extraction:
• MIDI Miner [13] to recognize melody track, or use flute as melody
 (2) Track Compression
• Other tracks ⇒ Compress into five tracks: bass, drum, guitar, piano, and string [11]
 (3) Data Filtration
• Filter tracks which contain less than 20 notes
• ⇒ (1) MIDIs which contain at least 3 tracks; (2) Must contain melody track and at leat one another track
 (4) Data Segmentation
• Only consider 4/4 time signature
 (5) Chord Recognition
• Infer two chords for each bar
Val: 100 samples / Test: 100 samples /
Train: The remaining samples
15
Experimental Setup
Datasets
Model
Configurations
Training &
Evaluation Setup
Evaluation
Metrics
• Model Configurations
 Model:
• Recurrent Transformer Encoder + Recurrent Transformer Decoder
 More details:
• Encoder layers (4), decoder layers (8), encoder heads (8), decoder heads (8)
• Hidden size of all layers and dimension of token, bar, position embeddings: 512
• Training and Evaluation Setup
 Default task:
• To generate five tracks (bass, piano, guitar, string and drum) conditioned on melody and chord
 Max # of generated bars:
• Set to 32
 For inference:
• Stochastic sampling method as most music generation systems do [14, 15]
16
Experimental Setup
Datasets
Model
Configurations
Training &
Evaluation Setup
Evaluation
Metrics
• Subjective Evaluation
 What:
• Choose the musical piece you like by overall harmony
 Who:
• Totally 15 participants (5 understanding basic music theory)
 How:
• Each participant: Listens to a total of 100 listening sets (100 test musical pieces)
• Each set: Contains musical pieces from several settings (e.g., generated, ground truth)
17
Experimental Setup
Datasets
Model
Configurations
Training &
Evaluation Setup
Evaluation
Metrics
• Objective Evaluation
 Chord Accuracy (CA)
• To measure harmony (Higher score ⇒ Better harmony)
• Whether the chords of generated tracks match the conditional chord sequence
 Perplexity (PPL)
• How good a model can fit the sequence (Lower perplexity ⇒ The model better fits the sequence)
 Pitch (P), Velocity (V), Duration (D), and Inter-Onset Interval (IOI)
• Measure the difference between generated musical piece and ground-truth musical piece
by computing the average OA(Overalapped Area) of distribution(P, V, D, or IOI). High OA = High Similarity
• Pitch (P):
• Compute the distribution of pitches classes (Higher score ⇒ More similar with GT)
• Velocity (V):
• Quantize the note velocity into 32 classes (Higher score ⇒ More similar with GT)
• Duration (D):
• Quantize the duration into 32 classes (Higher score ⇒ More similar with GT)
• Inter-Onset Interval (IOI): The time between the beginning of one note and that of the next one
• Quantize the intervals into 32 classes ⇒ Compute the distritbution of interval classes
(Higher score  More similar with GT)
18
Results and Analyses
Overall Quality
Comparison with
Previous Works
Method Analyses Extension
• Goal
 To evaluate the overall harmony and high-quality of generated musical pieces (PopMAG)
• How
 GT vs. PopMAG, for three datasets
• Results
 42%, 38%, 40% of PopMAG generated music pieces have reached the quality of GT
19
Results and Analyses
Overall Quality
Comparison with
Previous Works
Method Analyses Extension
• MuseGAN [11] vs. PopGAN
• What & How
 Generate four tracks (guitar, drum, string, and bass)
conditioned on piano track
 4 bars of notes in target tracks / NOT use chord / Velocity 100
• Result
 PopMAG wins all subjective and objective metrics
 PopMAG can generate long musical pieces
<Figure 6> Subjective evaluations of several settings
20
Results and Analyses
Overall Quality
Comparison with
Previous Works
Method Analyses Extension
• Comparison with Other MIDI Representation
• Goal
 To analyze the effectiveness of MuMIDI representation
• How
 PopMAG vs. REMI [15], MIDI-Like [14]
• Result
 PopMAG: Better scores
(more harmonious musical piece)
<Table 5> The result comparison of among different settings of PopMAG and LMD dataset
<Figure 6> Subjective evaluations of several settings
21
Results and Analyses
Overall Quality
Comparison with
Previous Works
Method Analyses Extension
• Analyses on Note-Level Modeling
• Goal
 To verify the effectiveness of the note-level modeling method
(modeling one note in one step)
• How
 PopMAG vs. MIDI-Like [14], REMI [15]
• Result
 PopMAG: Faster
• Shorter target token length
• Shorter training time & Latency
22
Results and Analyses
Overall Quality
Comparison with
Previous Works
Method Analyses Extension
• Analyses on Memory in the Encoder and Decoder
• Goal
 To investigate the effectiveness of the context memory in the encoder and decoder
• How
 PopMAG vs.
• PopMAG – DM – EM (#4):
• Removes memory in the encoder and decoder
• PopMAG – DM (#5):
• Removes memory in the decoder
• PopMAG – EM (#6):
• Removes memory in the encoder
• Results
 PopMAG (#1) outperforms others in all metrics
⇒ Context memory in E & D improves performance
 PopMAG—EM (#6): better than PopMAG—DM (#7)
⇒ Memory in D is more important
<Figure 6> Subjective evaluations of several settings
<Table 5> The result comparison of among different settings of PopMAG and LMD dataset
in melody-to-others task
23
Results and Analyses
Overall Quality
Comparison with
Previous Works
Method Analyses Extension
• Analyses on Bar and Position Embeddings
• Goal
 To prove the effectiveness of bar and position embeddings
• How
 PopMAG (#1) vs. PopMAG – POS – BAR (#7), + Sinusoidal (#8), + Relative Position Encoding (#9)
• Results
 PopMAG outperforms Sinusoidal (#8) and Relative (#9)
⇒ Bar and position embeddings help model better capture the music structure
<Figure 6> Subjective evaluations of several settings
<Table 5> The result comparison of among different settings of PopMAG and LMD dataset
24
Results and Analyses
Overall Quality
Comparison with
Previous Works
Method Analyses Extension
• Possible Future Extensions
 (1) Generate multi-track accompaniments conditioned only on melody and chord
 (2) Generate more tracks conditioned on other tracks (e.g., melody, chord, etc.)
 (3) Recompose a song (remove or generate some tracks)
25
Conclusion & Future Work
• Main Contributions
 Propose a novel Multi-track MIDI representation (MuMIDI)
• Enables simultaneous multi-track generation in a single sequence
• Explicitly models the dependency of the notes from different tracks
 Conduct experiment
• Three datasets
• Compare with previous work, and some ablation studies
 Show outperforming performance
• Opinions
 Provide in-detailed explanation on terms and concepts to understand musical representation
 Remained questions on objective evaluation metrics: validity?
 Effort on subjective evaluation
• Only one question to ask the quality(harmony) of the generated songs
 Suggestions on usage scenario (how MuMIDI can help end-users) to make the motivation more persuasive
PopMAG: Pop Music Accompaniment Generation

More Related Content

What's hot

Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3
Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3
Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3Pablo Martín Freiberg
 
Cifra e melodia-nelson_faria
Cifra e melodia-nelson_fariaCifra e melodia-nelson_faria
Cifra e melodia-nelson_fariaPedro Victor
 
Beyond the i iv-v
Beyond the i iv-vBeyond the i iv-v
Beyond the i iv-vRufus Jones
 
Apostila batidas de violao vol 3
Apostila batidas de violao   vol 3Apostila batidas de violao   vol 3
Apostila batidas de violao vol 3Leafar Azous
 
Cifras para teclado gospel
Cifras para teclado gospelCifras para teclado gospel
Cifras para teclado gospelOseias Guita
 
Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3
Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3
Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3Pablo Martín Freiberg
 
Divisor
DivisorDivisor
Divisoroupc
 
Poco Piano For Young Children Series: A Preview
Poco  Piano For  Young  Children  Series: A PreviewPoco  Piano For  Young  Children  Series: A Preview
Poco Piano For Young Children Series: A PreviewDavid Ngo
 
Livro de exercícios para clarinete e sax tenor Bb
Livro de exercícios para clarinete e sax tenor BbLivro de exercícios para clarinete e sax tenor Bb
Livro de exercícios para clarinete e sax tenor BbPartitura de Banda
 

What's hot (14)

Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3
Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3
Quinteto de vientos (material complementario y bibliografía) P.M.Freiberg 3 de 3
 
Cifra e melodia-nelson_faria
Cifra e melodia-nelson_fariaCifra e melodia-nelson_faria
Cifra e melodia-nelson_faria
 
Beyond the i iv-v
Beyond the i iv-vBeyond the i iv-v
Beyond the i iv-v
 
Apostila batidas de violao vol 3
Apostila batidas de violao   vol 3Apostila batidas de violao   vol 3
Apostila batidas de violao vol 3
 
Cifras para teclado gospel
Cifras para teclado gospelCifras para teclado gospel
Cifras para teclado gospel
 
Plano de estudo para Guitarristas
Plano de estudo para GuitarristasPlano de estudo para Guitarristas
Plano de estudo para Guitarristas
 
Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3
Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3
Quinteto de vientos (generalidades) P.M.Freiberg 1 de 3
 
Divisor
DivisorDivisor
Divisor
 
Poco Piano For Young Children Series: A Preview
Poco  Piano For  Young  Children  Series: A PreviewPoco  Piano For  Young  Children  Series: A Preview
Poco Piano For Young Children Series: A Preview
 
Curso teclado
Curso tecladoCurso teclado
Curso teclado
 
Abc musical
Abc musicalAbc musical
Abc musical
 
Livro de exercícios para clarinete e sax tenor Bb
Livro de exercícios para clarinete e sax tenor BbLivro de exercícios para clarinete e sax tenor Bb
Livro de exercícios para clarinete e sax tenor Bb
 
04 - BAIXO
04 - BAIXO04 - BAIXO
04 - BAIXO
 
Acordes de piano
Acordes de pianoAcordes de piano
Acordes de piano
 

Similar to PopMAG: Pop Music Accompaniment Generation

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ivaderivader
 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameKuan Ting Chen
 
Learning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence TransformationsLearning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence Transformationsivaderivader
 
Audio chord recognition using deep neural networks
Audio chord recognition using deep neural networksAudio chord recognition using deep neural networks
Audio chord recognition using deep neural networksbzamecnik
 
Music Creation by Example
Music Creation by ExampleMusic Creation by Example
Music Creation by Exampleivaderivader
 
Video Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music TransformerVideo Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music Transformerivaderivader
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Ju-Chiang Wang
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5Aloïs Gruson
 
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...Rui Pedro Paiva
 
Algorithmic Music Design Using Max/Msp
Algorithmic Music Design Using Max/MspAlgorithmic Music Design Using Max/Msp
Algorithmic Music Design Using Max/Msptuleyb
 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorSofya Latkina
 
Trends in Answer Set-Programming - Focus Musik - Presentation
Trends in Answer Set-Programming - Focus Musik - PresentationTrends in Answer Set-Programming - Focus Musik - Presentation
Trends in Answer Set-Programming - Focus Musik - PresentationErhard Dinhobl
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...Emanuele Chioso
 
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...multimediaeval
 
Foley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from VideosFoley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from Videosivaderivader
 
Web audio, Trackers and Making Music
Web audio, Trackers and Making MusicWeb audio, Trackers and Making Music
Web audio, Trackers and Making MusicGibDevs
 
Audio Mastering
Audio MasteringAudio Mastering
Audio MasteringJoe Nasr
 
Data science-2013-heekim
Data science-2013-heekimData science-2013-heekim
Data science-2013-heekimHaklae Kim
 
A Unified Music Recommender System Using Listening Habits and Semantics of Tags
A Unified Music Recommender System Using Listening Habits and Semantics of TagsA Unified Music Recommender System Using Listening Habits and Semantics of Tags
A Unified Music Recommender System Using Listening Habits and Semantics of Tagsdatasciencekorea
 

Similar to PopMAG: Pop Music Accompaniment Generation (20)

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
 
Learning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence TransformationsLearning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence Transformations
 
Audio chord recognition using deep neural networks
Audio chord recognition using deep neural networksAudio chord recognition using deep neural networks
Audio chord recognition using deep neural networks
 
Music Creation by Example
Music Creation by ExampleMusic Creation by Example
Music Creation by Example
 
Video Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music TransformerVideo Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music Transformer
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...
 
Algorithmic Music Design Using Max/Msp
Algorithmic Music Design Using Max/MspAlgorithmic Music Design Using Max/Msp
Algorithmic Music Design Using Max/Msp
 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious Generator
 
Trends in Answer Set-Programming - Focus Musik - Presentation
Trends in Answer Set-Programming - Focus Musik - PresentationTrends in Answer Set-Programming - Focus Musik - Presentation
Trends in Answer Set-Programming - Focus Musik - Presentation
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
 
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
 
AMT overview
AMT overviewAMT overview
AMT overview
 
Foley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from VideosFoley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from Videos
 
Web audio, Trackers and Making Music
Web audio, Trackers and Making MusicWeb audio, Trackers and Making Music
Web audio, Trackers and Making Music
 
Audio Mastering
Audio MasteringAudio Mastering
Audio Mastering
 
Data science-2013-heekim
Data science-2013-heekimData science-2013-heekim
Data science-2013-heekim
 
A Unified Music Recommender System Using Listening Habits and Semantics of Tags
A Unified Music Recommender System Using Listening Habits and Semantics of TagsA Unified Music Recommender System Using Listening Habits and Semantics of Tags
A Unified Music Recommender System Using Listening Habits and Semantics of Tags
 

More from ivaderivader

DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...ivaderivader
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...ivaderivader
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualizationivaderivader
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...ivaderivader
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...ivaderivader
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Componentsivaderivader
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...ivaderivader
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking Systemivaderivader
 

More from ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

PopMAG: Pop Music Accompaniment Generation

  • 1. PopMAG: Pop Music Accompaniment Generation Hyeshin Chu 2021. 08. 20 MM 2020 Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
  • 2. Contents • Overview of the Paper • Introduction • Multi-Track Midi Representation • Multi-Track Modeling • Experimental Setup • Results and Analyses • Conclusion & Future Work
  • 3. 2 Overview of the Paper https://music-popmag.github.io/popmag/ • Goal  To improve harmony of accompaniment (usually with multiple instruments) • Previous Works  Generate multiple tracks separately  Music notes from different tracks: NOT explicitly depend on each other • MuMIDI (MUlti-track MIDI representation)  Simultaneous multi-track generation in a single sequence  Explicitly models the dependency of the notes from different tracks  Challenge occurs! • Enlarges the sequence length ⇒ Difficult to model long-term  How to solve? • 1) Model multiple note attributes(e.g., pitch, duration, velocity) of a musical in one step (NOT multiple steps) ⇒ Shorten the length of MuMIDI sequence • 2) Introduce extra long-context as memory to capture long-term dependency in music
  • 4. 3 Introduction • Music sequence modeling using deep learning techniques  CNN [30], RNN [26], Transformer [5, 14, 15], VAE [25], GAN [11] • Pop music generation consists of two parts  Chord and melody generation  Accompaniment generation • Accompaniment generation ⇒ Multi-track generation  MuseGAN [10-12]  MIDI-Sandwich2 [18]  XiaoIce Band [34]  LakhNES [9] • MuMIDI  Encodes multi-track MIDI events into one sequence of tokens ⇒ Better captures dependency among musical notes in different tracks  Models multiple attributes in one sequence step instead of multiple steps ⇒ Shortens the sequence length  Solves long-term music modeling challenges
  • 5. 4 Multi-Track Representation • 2-track musical piece (Piano track, Bass track)  Piano track: 10 notes  Bass track: 5 notes Bar and Position Track Note Chord Meta Symbol
  • 6. 5 Multi-Track Representation • Beginning of bar and different positions in a bar  <Bar> ∋ <Pos>, <Track>, <Note>, <Chord>  Each <Bar> has a total of 32 timesteps, called <Pos> Bar and Position Track Note Chord Meta Symbol Pos 1 Pos 8 Pos 12 Pos 16 Pos 24
  • 7. 6 Multi-Track Representation • Six <Track> symbols:  <Track_Melody>, <Track_Drum>, <Track_Piano>, <Track_String>, <Track_Guitar>, <Track_Bass> Pos 1 Pos 8 Pos 12 Pos 16 Pos 24 Bar and Position Track Note Chord Meta Symbol
  • 8. 7 Multi-Track Representation • <Note> includes four attributes:  Pitch: 𝑃𝑖𝑡𝑐ℎ1 (C-1) to 𝑃𝑖𝑡𝑐ℎ128 (G9) for all tracks except drum  Velocity: Quantize into 32 levels (how hard the key was struck)  Duration: The duration of note from 1 timestep to 32 timestep Bar and Position Track Note Chord Meta Symbol
  • 9. 8 Multi-Track Representation • Guides the pitch range of notes and emotion • Totally 84 possible chord symbols  12 chord roots (C, C#, D, D#, E, F, …) X 7 chord qualities (major, minor, diminished, …) Bar and Position Track Note Chord Meta Symbol
  • 10. 9 Multi-Track Representation • Encodes the meta data of the whole musical piece  Including tempo, tonality, style, and emotion  Usually unchanged throughout the whole musical piece Bar and Position Track Note Chord Meta Symbol
  • 11. 10 Multi-Track Modeling Modeling One Note in One Step • MuMIDI  Encodes multi-track MIDI events into a single sequence ⇒ Long sequence ⇒ Difficult to model long-term structure • Two aspects to better model long-term sequence  (1) Shorten the sequence length: • Modeling multiple note attributes (e.g., pitch, duration, velocity) of a note in one sequence step (NOT in multiple steps)  (2) Adopt extra long context to capture long-term dependencies: • In the encoder and decoder of our seq-to-seq model Modeling Long-Term Structure Modeling Implementation
  • 12. 11 Multi-Track Modeling Modeling One Note in One Step • Why?  To let MuMIDI learn from longer music structure • How?  Apply note-level modeling : Model multiple attributes of one note in one sequence step  Regard each attribute of a note(pitch, velocity, duration) as an embedding ⇒ Sum of all attribute embeddings represent one note: ⇒ Input to encoder and decoder in our seq-to-seq model in each time step • Result  Shorter input and output sequences  Better captures the long-term dependency  Faster training and inference Modeling Long-Term Structure Modeling Implementation
  • 13. 12 Multi-Track Modeling Modeling One Note in One Step • Why?  To capture and exploit the long-range context • How?  Recurrence Transformer Encoder • Encode each token 𝑥𝑖 in conditional tracks (in one sequence step i) • Outputs of encoder: fed into decoder as condition context  Recurrence Transformer Decoder • Generate token 𝑦𝑗 : • Conditioned on 1) the previously generated tokens 𝑦𝑡 (𝑡<𝑗) and 2) context from encoder • Each token in decoder: • Only sees the condition context of the same bar Modeling Long-Term Structure Modeling Implementation
  • 14. 13 Multi-Track Modeling Modeling One Note in One Step • Input Module  The input embedding in each timestep : Sum of token, meta, position, bar embedding in the timestep  Token Embeddings • Contain <Note>, <Bar>, <Pos>, <Track>, <Chord>, etc. • <Note>: • (1) All attributes(pitch, duration, velocity) of one note ⇒ One token • (2) Sum all embeddings of all attributes as one sequence step  Bar Embeddings • Which bar the input token is located in • 𝐵1, … 𝐵𝑚 (m: max # of bars in a music piece)  Position Embeddings • The timestep the current input token is located in • In a <Bar>, there exist O(empty), 𝑃1, …, 𝑃32  Meta Embeddings • Meta symbols: 𝑇𝑒𝑚𝑝𝑜𝑙𝑜𝑤, 𝑇𝑒𝑚𝑝𝑜𝑚𝑖𝑑, 𝑇𝑒𝑚𝑝𝑜ℎ𝑖𝑔ℎ • Output Module  Predict a note symbol or non-note symbol Modeling Long-Term Structure Modeling Implementation <Figure 3> Input module of MuMIDI <Figure 4> Output module of MuMIDI Predict a note symbol Predict non-note symbol
  • 15. 14 Experimental Setup Datasets Model Configurations Training & Evaluation Setup Evaluation Metrics • Three music datastes:  LMD [23]: • Get meta info ⇒ Filter MIDIs with ‘pop’ style tag  FreeMidi: • Crawl all MIDIs in pop genre in the FreeMidi website  CPMD • Data processing  (1) Melody Extraction: • MIDI Miner [13] to recognize melody track, or use flute as melody  (2) Track Compression • Other tracks ⇒ Compress into five tracks: bass, drum, guitar, piano, and string [11]  (3) Data Filtration • Filter tracks which contain less than 20 notes • ⇒ (1) MIDIs which contain at least 3 tracks; (2) Must contain melody track and at leat one another track  (4) Data Segmentation • Only consider 4/4 time signature  (5) Chord Recognition • Infer two chords for each bar Val: 100 samples / Test: 100 samples / Train: The remaining samples
  • 16. 15 Experimental Setup Datasets Model Configurations Training & Evaluation Setup Evaluation Metrics • Model Configurations  Model: • Recurrent Transformer Encoder + Recurrent Transformer Decoder  More details: • Encoder layers (4), decoder layers (8), encoder heads (8), decoder heads (8) • Hidden size of all layers and dimension of token, bar, position embeddings: 512 • Training and Evaluation Setup  Default task: • To generate five tracks (bass, piano, guitar, string and drum) conditioned on melody and chord  Max # of generated bars: • Set to 32  For inference: • Stochastic sampling method as most music generation systems do [14, 15]
  • 17. 16 Experimental Setup Datasets Model Configurations Training & Evaluation Setup Evaluation Metrics • Subjective Evaluation  What: • Choose the musical piece you like by overall harmony  Who: • Totally 15 participants (5 understanding basic music theory)  How: • Each participant: Listens to a total of 100 listening sets (100 test musical pieces) • Each set: Contains musical pieces from several settings (e.g., generated, ground truth)
  • 18. 17 Experimental Setup Datasets Model Configurations Training & Evaluation Setup Evaluation Metrics • Objective Evaluation  Chord Accuracy (CA) • To measure harmony (Higher score ⇒ Better harmony) • Whether the chords of generated tracks match the conditional chord sequence  Perplexity (PPL) • How good a model can fit the sequence (Lower perplexity ⇒ The model better fits the sequence)  Pitch (P), Velocity (V), Duration (D), and Inter-Onset Interval (IOI) • Measure the difference between generated musical piece and ground-truth musical piece by computing the average OA(Overalapped Area) of distribution(P, V, D, or IOI). High OA = High Similarity • Pitch (P): • Compute the distribution of pitches classes (Higher score ⇒ More similar with GT) • Velocity (V): • Quantize the note velocity into 32 classes (Higher score ⇒ More similar with GT) • Duration (D): • Quantize the duration into 32 classes (Higher score ⇒ More similar with GT) • Inter-Onset Interval (IOI): The time between the beginning of one note and that of the next one • Quantize the intervals into 32 classes ⇒ Compute the distritbution of interval classes (Higher score  More similar with GT)
  • 19. 18 Results and Analyses Overall Quality Comparison with Previous Works Method Analyses Extension • Goal  To evaluate the overall harmony and high-quality of generated musical pieces (PopMAG) • How  GT vs. PopMAG, for three datasets • Results  42%, 38%, 40% of PopMAG generated music pieces have reached the quality of GT
  • 20. 19 Results and Analyses Overall Quality Comparison with Previous Works Method Analyses Extension • MuseGAN [11] vs. PopGAN • What & How  Generate four tracks (guitar, drum, string, and bass) conditioned on piano track  4 bars of notes in target tracks / NOT use chord / Velocity 100 • Result  PopMAG wins all subjective and objective metrics  PopMAG can generate long musical pieces <Figure 6> Subjective evaluations of several settings
  • 21. 20 Results and Analyses Overall Quality Comparison with Previous Works Method Analyses Extension • Comparison with Other MIDI Representation • Goal  To analyze the effectiveness of MuMIDI representation • How  PopMAG vs. REMI [15], MIDI-Like [14] • Result  PopMAG: Better scores (more harmonious musical piece) <Table 5> The result comparison of among different settings of PopMAG and LMD dataset <Figure 6> Subjective evaluations of several settings
  • 22. 21 Results and Analyses Overall Quality Comparison with Previous Works Method Analyses Extension • Analyses on Note-Level Modeling • Goal  To verify the effectiveness of the note-level modeling method (modeling one note in one step) • How  PopMAG vs. MIDI-Like [14], REMI [15] • Result  PopMAG: Faster • Shorter target token length • Shorter training time & Latency
  • 23. 22 Results and Analyses Overall Quality Comparison with Previous Works Method Analyses Extension • Analyses on Memory in the Encoder and Decoder • Goal  To investigate the effectiveness of the context memory in the encoder and decoder • How  PopMAG vs. • PopMAG – DM – EM (#4): • Removes memory in the encoder and decoder • PopMAG – DM (#5): • Removes memory in the decoder • PopMAG – EM (#6): • Removes memory in the encoder • Results  PopMAG (#1) outperforms others in all metrics ⇒ Context memory in E & D improves performance  PopMAG—EM (#6): better than PopMAG—DM (#7) ⇒ Memory in D is more important <Figure 6> Subjective evaluations of several settings <Table 5> The result comparison of among different settings of PopMAG and LMD dataset in melody-to-others task
  • 24. 23 Results and Analyses Overall Quality Comparison with Previous Works Method Analyses Extension • Analyses on Bar and Position Embeddings • Goal  To prove the effectiveness of bar and position embeddings • How  PopMAG (#1) vs. PopMAG – POS – BAR (#7), + Sinusoidal (#8), + Relative Position Encoding (#9) • Results  PopMAG outperforms Sinusoidal (#8) and Relative (#9) ⇒ Bar and position embeddings help model better capture the music structure <Figure 6> Subjective evaluations of several settings <Table 5> The result comparison of among different settings of PopMAG and LMD dataset
  • 25. 24 Results and Analyses Overall Quality Comparison with Previous Works Method Analyses Extension • Possible Future Extensions  (1) Generate multi-track accompaniments conditioned only on melody and chord  (2) Generate more tracks conditioned on other tracks (e.g., melody, chord, etc.)  (3) Recompose a song (remove or generate some tracks)
  • 26. 25 Conclusion & Future Work • Main Contributions  Propose a novel Multi-track MIDI representation (MuMIDI) • Enables simultaneous multi-track generation in a single sequence • Explicitly models the dependency of the notes from different tracks  Conduct experiment • Three datasets • Compare with previous work, and some ablation studies  Show outperforming performance • Opinions  Provide in-detailed explanation on terms and concepts to understand musical representation  Remained questions on objective evaluation metrics: validity?  Effort on subjective evaluation • Only one question to ask the quality(harmony) of the generated songs  Suggestions on usage scenario (how MuMIDI can help end-users) to make the motivation more persuasive