SlideShare a Scribd company logo
1 of 33
Download to read offline
Convolutional Generative Adversarial
Networks with Binary Neurons for
Polyphonic Music Generation
Hao-Wen Dong and Yi-Hsuan Yang
Research Center of IT Innovation, Academia Sinica
Outlines
• Introduction
• Binary Neurons
• Proposed Model
• Data
• Results
• Future Works
Source Code https://github.com/salu133445/bmusegan
Demo Page https://salu133445.github.io/bmusegan/
Introduction
Introduction
• MuseGAN
。can only generate real-valued predictions
。require postprocessing at test time
(e.g., hard thresholding or Bernoulli sampling)
zzzzz
z
Bar Generator
Gz
GGGGG
zzzz
zzzz
zzzzz
z
z
z
z
GGGGG
Introduction
• Naïve binarization methods can lead to overly-fragmented notes
raw
Bernoulli
sampling
hard
thresholding
Introduction
• Real-valued predictions can lead to training difficulties of the discriminator
Decision boundaries to learn for the discriminator
the generator output
real values
the generator output
binary values
• real samples
• fake samples
--- decision boundaries
Binary Neurons
Binary Neurons
• Neurons that output binary-valued predictions
• In this work, we consider
。deterministic binary neurons (DBNs)
𝐷𝐷𝐷𝐷𝐷𝐷 𝑥𝑥 = �
1, 𝑖𝑖𝑖𝑖 𝜎𝜎 𝑥𝑥 > 0.5
0, 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
。stochastic binary neurons (SBNs)
𝑆𝑆𝐵𝐵𝐵𝐵 𝑥𝑥 = �
1, 𝑖𝑖𝑖𝑖 𝑧𝑧 < 𝜎𝜎 𝑥𝑥
0, 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
, 𝑧𝑧~𝑈𝑈 0, 1
Gradient Estimators
• Computing the exact gradients for binary neurons is intractable
• Straight-through (ST) estimator
。treat BNs as identity functions in the backward pass
𝜕𝜕𝐵𝐵𝐵𝐵 𝑥𝑥
𝜕𝜕𝑥𝑥
= 1
• Sigmoid-adjusted ST estimator
。treat BNs as identity functions multiplied by the derivative of the sigmoid
function in the backward pass
𝜕𝜕𝐵𝐵𝐵𝐵 𝑥𝑥
𝜕𝜕𝑥𝑥
= 𝜎𝜎 𝑥𝑥
Proposed Model
D 1/0
Generator
Make G(z) indistinguishable
from real data for D Discriminator
Tell G(z) as fake data
from X being real ones
real samples
Gz ̴ pz G(z)
random noise fake samples
Generative Adversarial Networks
X ̴ pX
Generator
• One single input random vector
• Shared/private design
。Different tracks have their own musical properties (e.g. textures, patterns, techniques)
。Jointly all tracks follow a common, high-level musical idea
Gsz
Gp
Gp
Gp
…
noise
1
2
M
x1
x2
xM
…
ˆ
ˆ
ˆ
shared
private
Refiner
• Refine the real-valued outputs of the generator into binary ones
• Composed of a number of residual units
Residual
Block
Residual
Block
…xi xi
BNsˆ
(1, 3, 12)
× 64
+
add
(1, 3, 12)
× 1
DBNs — deterministic binary neurons
SBNs — stochastic binary neurons
Refiner
• Refine the real-valued outputs of the generator into binary ones
• Composed of a number of residual units
Gsz
Gp
Gp
Gp
…
noise
1
2
M
x1
x2
xM
…
R
R
R
1
2
M
Synth Pad
x1
x2
xM
Drums
Piano
…
…
ˆ
ˆ
ˆ ~
~
~
real-valued binary-valued
Discriminator
• Shared/private design (similar to the generator)
• Additional onset/offset stream and chroma stream
Dp
Dp
Dp
…
Ds
True/Fake
Synth Pad
x1 or x1
x2 or x2
xM or xM
Drums
Piano
…
…
+
concat
Do
Dm
onset/offset
extractor
chroma
extractor Dc
1
2
M~
~
~ shared
private
onset/offset stream
chroma stream
Two-stage Training
• First stage — pretrain the generator and discriminator
• Second stage — train the refiner and discriminator (with G fixed)
Gsz
Gp
Gp
Gp
…
noise
1
2
M
x1
x2
xM
… R
R
R
1
2
M
x1
x2
xM
…
…
ˆ
ˆ
ˆ ~
~
~
Dp
Dp
Dp
…
Ds
True/Fake
Synth Pad
x1 or x1
x2 or x2
xM or xM
Drums
Piano
…
…
+
concat
Do
Dm
onset/offset
extractor
chroma
extractor Dc
1
2
M
…
~
~
~
training data
Data
Data Representation
• Multi-track piano-roll
• 8 tracks
。Drums, Piano, Guitar, Bass, Ensemble, Reed, Synth Lead and Synth Pad
96 time steps
84 pitches
8 tracks
4 bars
a 4×96×84×8 tensor
Training Data
• Lakh Pianoroll Dataset (LPD)
• 13746 four-bar phrases from 2291 songs (six for each)
。Pick only songs in 4/4 time and with an alternative tag
Drums
Piano
Guitar
Bass
Ensemble
Reed
Synth Lead
Synth Pad
Results
Qualitative Comparison
raw
pretrained
(+BS)
pretrained
(+HT)
proposed
(+SBNs)
proposed
(+DBNs)
raw
proposed (+SBNs) proposed (+DBNs)
pretrained (+HT)pretrained (+BS)
Audio Samples
• proposed (+DBNs) — fewer overly-fragmented notes; more out-of-scale notes
• proposed (+SBNs) — more overly-fragmented notes ; lots of artifacts
pretrained
(+BS)
pretrained
(+HT)
proposed
(+SBNs)
proposed
(+DBNs)
Sample 1 Sample 2 Sample 1 Sample 2
More samples available on demo page
https://salu133445.github.io/bmusegan/
Evaluation Metrics
• Qualified note rate (QN)
QN =
# of notes no shorter than 3 time steps (i.e., a 32th note)
# of notes
• Polyphonicity (PP)
PP =
# of time steps where more than two pitches are played
# of time steps
• Tonal distance (TD)
。measure the distance between two chroma features in a tonal space
Comparisons of Training Strategies
• Two-stage training (proposed) — [stage 1] pretrain G and D [stage 2] train R and D
• Joint training (joint) — [stage 1] pretrain G and D [stage 2] train G, R and D
• End-to-end training (end-to-end) — train G, R and D in one stage
(values closer to that of the training data is better; underline: closest; bold: top 3 closest)
Comparisons of Training Strategies
QN
PP
Comparisons of Training Strategies
QN
PP
End-to-end Models
• First attempt, to our best knowledge, to generate such high-dimensional data with
binary neurons from scratch
DBNsSBNs
chords
bass line
End-to-end Models
• First attempt, to our best knowledge, to generate such high-dimensional data with
binary neurons from scratch
DBNsSBNs
chords
bass line
drum patterns
Effects of the Discriminator Design
• pretrained — shared/private design + offset/onset stream + chroma stream
• ablated — shared/private design
• baseline — only one shared discriminator
Future Works
Summary
• A convolutional GAN for binary-valued multi-track piano-rolls
。CNNs + residual units + binary neurons
。Shared/private design in both the generator and discriminator (proved effective)
。Onset/offset and chroma streams in the discriminator (proved effective)
。Two-stage training (proved effective)
• Proposed model with deterministic binary neurons (DBNs) features fewer overly-
fragmented notes as compared with existing methods.
Future Works
• Tradeoff
。easy-to-train generator + hard-to-train discriminator
。hard-to-train generator + easy-to-train discriminator
• Longer music
。RNNs (LSTMs or GRUs)
。How to generate high-level/long-term structure?
• More tracks
。Symphony/orchestra compositions
。(hierarchical) sections  sub-sections  instruments
Thank you for your attention

More Related Content

What's hot

Lec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain iLec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain i
Ali Hassan
 
Lect2 up390 (100329)
Lect2 up390 (100329)Lect2 up390 (100329)
Lect2 up390 (100329)
aicdesign
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)
Adam Năm
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)
Adam Năm
 

What's hot (20)

Final presentation
Final presentationFinal presentation
Final presentation
 
Lect5 v2
Lect5 v2Lect5 v2
Lect5 v2
 
07 frequency domain DIP
07 frequency domain DIP07 frequency domain DIP
07 frequency domain DIP
 
Pulse Code Modulation
Pulse Code Modulation Pulse Code Modulation
Pulse Code Modulation
 
Lec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain iLec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain i
 
Audio Signal Processing
Audio Signal Processing Audio Signal Processing
Audio Signal Processing
 
Differential pulse code modulation
Differential pulse code modulationDifferential pulse code modulation
Differential pulse code modulation
 
Dpcm ( Differential Pulse Code Modulation )
Dpcm ( Differential Pulse Code Modulation )Dpcm ( Differential Pulse Code Modulation )
Dpcm ( Differential Pulse Code Modulation )
 
Sound Source Localization with microphone arrays
Sound Source Localization with microphone arraysSound Source Localization with microphone arrays
Sound Source Localization with microphone arrays
 
Lecture 10
Lecture 10Lecture 10
Lecture 10
 
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingDemixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
 
Noise Performance of CW system
Noise Performance of CW systemNoise Performance of CW system
Noise Performance of CW system
 
04 1 - frequency domain filtering fundamentals
04 1 - frequency domain filtering fundamentals04 1 - frequency domain filtering fundamentals
04 1 - frequency domain filtering fundamentals
 
Lect2 up390 (100329)
Lect2 up390 (100329)Lect2 up390 (100329)
Lect2 up390 (100329)
 
Frequency domain methods
Frequency domain methods Frequency domain methods
Frequency domain methods
 
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)
 
Digital signal processing by YEASIN NEWAJ
Digital signal processing by YEASIN NEWAJDigital signal processing by YEASIN NEWAJ
Digital signal processing by YEASIN NEWAJ
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)
 
Companding and DPCM and ADPCM
Companding and DPCM and ADPCMCompanding and DPCM and ADPCM
Companding and DPCM and ADPCM
 

Similar to Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation

Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Seldon
 
365 digital basics after
365 digital basics after365 digital basics after
365 digital basics after
Jeff Francis
 

Similar to Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation (20)

GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Lecture 12
Lecture 12Lecture 12
Lecture 12
 
DiffusionMBIR_presentation_slide.pptx
DiffusionMBIR_presentation_slide.pptxDiffusionMBIR_presentation_slide.pptx
DiffusionMBIR_presentation_slide.pptx
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast Training
 
GANs Deep Learning Summer School
GANs Deep Learning Summer SchoolGANs Deep Learning Summer School
GANs Deep Learning Summer School
 
Audio chord recognition using deep neural networks
Audio chord recognition using deep neural networksAudio chord recognition using deep neural networks
Audio chord recognition using deep neural networks
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shang
 
365 digital basics after
365 digital basics after365 digital basics after
365 digital basics after
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov ModelAudio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
 
Vlsi physical design automation on partitioning
Vlsi physical design automation on partitioningVlsi physical design automation on partitioning
Vlsi physical design automation on partitioning
 
Efficient Parallel Set-Similarity Joins Using MapReduce
 Efficient Parallel Set-Similarity Joins Using MapReduce Efficient Parallel Set-Similarity Joins Using MapReduce
Efficient Parallel Set-Similarity Joins Using MapReduce
 

More from Hao-Wen (Herman) Dong

More from Hao-Wen (Herman) Dong (6)

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
 
Organizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository OrganizationOrganizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository Organization
 
What is Critical in GAN Training?
What is Critical in GAN Training?What is Critical in GAN Training?
What is Critical in GAN Training?
 
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative Models
 

Recently uploaded

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 

Recently uploaded (20)

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 

Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation

  • 1. Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation Hao-Wen Dong and Yi-Hsuan Yang Research Center of IT Innovation, Academia Sinica
  • 2. Outlines • Introduction • Binary Neurons • Proposed Model • Data • Results • Future Works Source Code https://github.com/salu133445/bmusegan Demo Page https://salu133445.github.io/bmusegan/
  • 4. Introduction • MuseGAN 。can only generate real-valued predictions 。require postprocessing at test time (e.g., hard thresholding or Bernoulli sampling) zzzzz z Bar Generator Gz GGGGG zzzz zzzz zzzzz z z z z GGGGG
  • 5. Introduction • Naïve binarization methods can lead to overly-fragmented notes raw Bernoulli sampling hard thresholding
  • 6. Introduction • Real-valued predictions can lead to training difficulties of the discriminator Decision boundaries to learn for the discriminator the generator output real values the generator output binary values • real samples • fake samples --- decision boundaries
  • 8. Binary Neurons • Neurons that output binary-valued predictions • In this work, we consider 。deterministic binary neurons (DBNs) 𝐷𝐷𝐷𝐷𝐷𝐷 𝑥𝑥 = � 1, 𝑖𝑖𝑖𝑖 𝜎𝜎 𝑥𝑥 > 0.5 0, 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 。stochastic binary neurons (SBNs) 𝑆𝑆𝐵𝐵𝐵𝐵 𝑥𝑥 = � 1, 𝑖𝑖𝑖𝑖 𝑧𝑧 < 𝜎𝜎 𝑥𝑥 0, 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 , 𝑧𝑧~𝑈𝑈 0, 1
  • 9. Gradient Estimators • Computing the exact gradients for binary neurons is intractable • Straight-through (ST) estimator 。treat BNs as identity functions in the backward pass 𝜕𝜕𝐵𝐵𝐵𝐵 𝑥𝑥 𝜕𝜕𝑥𝑥 = 1 • Sigmoid-adjusted ST estimator 。treat BNs as identity functions multiplied by the derivative of the sigmoid function in the backward pass 𝜕𝜕𝐵𝐵𝐵𝐵 𝑥𝑥 𝜕𝜕𝑥𝑥 = 𝜎𝜎 𝑥𝑥
  • 11. D 1/0 Generator Make G(z) indistinguishable from real data for D Discriminator Tell G(z) as fake data from X being real ones real samples Gz ̴ pz G(z) random noise fake samples Generative Adversarial Networks X ̴ pX
  • 12. Generator • One single input random vector • Shared/private design 。Different tracks have their own musical properties (e.g. textures, patterns, techniques) 。Jointly all tracks follow a common, high-level musical idea Gsz Gp Gp Gp … noise 1 2 M x1 x2 xM … ˆ ˆ ˆ shared private
  • 13. Refiner • Refine the real-valued outputs of the generator into binary ones • Composed of a number of residual units Residual Block Residual Block …xi xi BNsˆ (1, 3, 12) × 64 + add (1, 3, 12) × 1 DBNs — deterministic binary neurons SBNs — stochastic binary neurons
  • 14. Refiner • Refine the real-valued outputs of the generator into binary ones • Composed of a number of residual units Gsz Gp Gp Gp … noise 1 2 M x1 x2 xM … R R R 1 2 M Synth Pad x1 x2 xM Drums Piano … … ˆ ˆ ˆ ~ ~ ~ real-valued binary-valued
  • 15. Discriminator • Shared/private design (similar to the generator) • Additional onset/offset stream and chroma stream Dp Dp Dp … Ds True/Fake Synth Pad x1 or x1 x2 or x2 xM or xM Drums Piano … … + concat Do Dm onset/offset extractor chroma extractor Dc 1 2 M~ ~ ~ shared private onset/offset stream chroma stream
  • 16. Two-stage Training • First stage — pretrain the generator and discriminator • Second stage — train the refiner and discriminator (with G fixed) Gsz Gp Gp Gp … noise 1 2 M x1 x2 xM … R R R 1 2 M x1 x2 xM … … ˆ ˆ ˆ ~ ~ ~ Dp Dp Dp … Ds True/Fake Synth Pad x1 or x1 x2 or x2 xM or xM Drums Piano … … + concat Do Dm onset/offset extractor chroma extractor Dc 1 2 M … ~ ~ ~ training data
  • 17. Data
  • 18. Data Representation • Multi-track piano-roll • 8 tracks 。Drums, Piano, Guitar, Bass, Ensemble, Reed, Synth Lead and Synth Pad 96 time steps 84 pitches 8 tracks 4 bars a 4×96×84×8 tensor
  • 19. Training Data • Lakh Pianoroll Dataset (LPD) • 13746 four-bar phrases from 2291 songs (six for each) 。Pick only songs in 4/4 time and with an alternative tag Drums Piano Guitar Bass Ensemble Reed Synth Lead Synth Pad
  • 22. Audio Samples • proposed (+DBNs) — fewer overly-fragmented notes; more out-of-scale notes • proposed (+SBNs) — more overly-fragmented notes ; lots of artifacts pretrained (+BS) pretrained (+HT) proposed (+SBNs) proposed (+DBNs) Sample 1 Sample 2 Sample 1 Sample 2 More samples available on demo page https://salu133445.github.io/bmusegan/
  • 23. Evaluation Metrics • Qualified note rate (QN) QN = # of notes no shorter than 3 time steps (i.e., a 32th note) # of notes • Polyphonicity (PP) PP = # of time steps where more than two pitches are played # of time steps • Tonal distance (TD) 。measure the distance between two chroma features in a tonal space
  • 24. Comparisons of Training Strategies • Two-stage training (proposed) — [stage 1] pretrain G and D [stage 2] train R and D • Joint training (joint) — [stage 1] pretrain G and D [stage 2] train G, R and D • End-to-end training (end-to-end) — train G, R and D in one stage (values closer to that of the training data is better; underline: closest; bold: top 3 closest)
  • 25. Comparisons of Training Strategies QN PP
  • 26. Comparisons of Training Strategies QN PP
  • 27. End-to-end Models • First attempt, to our best knowledge, to generate such high-dimensional data with binary neurons from scratch DBNsSBNs chords bass line
  • 28. End-to-end Models • First attempt, to our best knowledge, to generate such high-dimensional data with binary neurons from scratch DBNsSBNs chords bass line drum patterns
  • 29. Effects of the Discriminator Design • pretrained — shared/private design + offset/onset stream + chroma stream • ablated — shared/private design • baseline — only one shared discriminator
  • 31. Summary • A convolutional GAN for binary-valued multi-track piano-rolls 。CNNs + residual units + binary neurons 。Shared/private design in both the generator and discriminator (proved effective) 。Onset/offset and chroma streams in the discriminator (proved effective) 。Two-stage training (proved effective) • Proposed model with deterministic binary neurons (DBNs) features fewer overly- fragmented notes as compared with existing methods.
  • 32. Future Works • Tradeoff 。easy-to-train generator + hard-to-train discriminator 。hard-to-train generator + easy-to-train discriminator • Longer music 。RNNs (LSTMs or GRUs) 。How to generate high-level/long-term structure? • More tracks 。Symphony/orchestra compositions 。(hierarchical) sections  sub-sections  instruments
  • 33. Thank you for your attention