SlideShare a Scribd company logo
1 of 30
Download to read offline
WHAT IS CRITICAL IN GAN TRAINING?
Hao-Wen Dong
Music and Computing (MAC) Lab, CITI, Academia Sinica
Outlines
• Generative Adversarial Networks (GAN[1])
• Wasserstein GANs (WGAN[2])
• Lipschitz Regularization
• Spectral Normalization (SN-GAN[3])
• Gradient Penalties (WGAN-GP[4], DRAGAN[5], GAN-GP[6])
• What is critical in GAN training?
2
Generative Adversarial Networks
Generative Adversarial Networks (GANs)
• Two-player game between the discriminator D and the generator G
𝐽 𝑫 𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
log 𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
𝐽 𝑮 𝐺 = 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
(to assign real data a 1) (to assign fake data a 0)
fake data
(to make D assign generated data a 1)
4
data distribution prior distribution
Original Algorithm
5
(Goodfellow et. al [1])
Original Convergence Proof
For a finite data set 𝑿, we only have
𝒑 𝒅𝒂𝒕𝒂 = ቊ
𝟏, 𝒙 ∈ 𝑿
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
(may need density estimation)
hard to optimize
if the criterion can be easily improved
usually not the case
usually not the case
6
(one of)
Minimax and Non-saturating GANs
𝐽 𝑫
𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
log 𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
Minimax: 𝐽 𝑮 𝐺 = 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
Non-saturating: 𝐽 𝑮
𝐺 = − 𝔼
𝑧~𝑝 𝑧
log 𝐷 𝐺 𝑧 (used in practice)
(won’t stop training when D is stronger)
7
Comparisons of GANs
D is fooledD is not fooled
(non-saturating GAN)
(minimax GAN)
Less training difficulties
at the initial stage when
G can hardly fool D
8
(Goodfellow [7])
Wasserstein GANs
Wasserstein Distance (Earth-Mover Distance)
can be optimized easier
10
𝑊 ℙ 𝑟, ℙ 𝜃 = in𝑓
𝛾∈Π ℙ 𝑟,ℙ 𝜃
𝔼 𝑥,𝑦 ~𝛾 𝑥 − 𝑦
Kantorovich-Rubinstein duality
• Kantorovich-Rubinstein duality
𝑊 ℙ 𝑟, ℙ 𝜃 = sup
𝑓 𝐿≤1
𝔼 𝑥~ℙ 𝑟
𝑓 𝑥 − 𝔼 𝑥~ℙ 𝜃
𝑓 𝑥
• Definition A function 𝑓: ℜ → ℜ is called Lipschitz continuous if
∃𝐾 ∈ ℜ 𝑠. 𝑡. ∀𝑥1, 𝑥2 ∈ ℜ 𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2
all the 1-Lipschitz functions 𝒇: 𝑿 → 𝕽
11
 
Wasserstein GAN
• Key: use a NN to estimate Wasserstein distance (and use it as critics for G)
𝐽 𝑫 𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
𝐷 𝐺 𝑧
𝐽 𝑮
𝐺 = 𝔼
𝑧~𝑝 𝑧
𝐷 𝐺 𝑧
• Original GAN (non-saturating)
𝐽 𝑫 𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
log 𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
𝐽 𝑮 𝐺 = − 𝔼
𝑧~𝑝 𝑧
log 𝐷 𝐺 𝑧
12
Wasserstein GAN
• Problem: such NN needs to satisfy a Lipschitz constraint
• Global regularization
• weight clipping  original WGAN
• spectral normalization  SNGAN
• Local regularization
• gradient penalties  WGAN-GP
13
Lipschitz Regularization
Weight Clipping
• Key: clip the weights of the critic into −𝑐, 𝑐
15
gradient exploding
gradient vanishing
output input
training difficulties
(Gulrajani et. al [4])
Spectral Normalization
• Key: constraining the spectral norm of each layer
• For each layer 𝑔: 𝒉𝑖𝑛 → 𝒉 𝑜𝑢𝑡, by definition we have
𝑔 𝐿𝑖𝑝 = sup
𝒉
𝜎 𝛻𝑔 𝒉 ,
where
𝜎 𝐴 ≔ max
𝒉≠0
𝐴𝒉 2
𝒉 2
= max
𝒉 2≤1
𝐴𝒉 2
16
spectral norm
the largest singular value of A
Spectral Normalization
• For a linear layer 𝑔 𝒉 = 𝑊𝒉, 𝑔 𝐿𝑖𝑝 = sup
𝒉
𝜎 𝛻𝑔 𝒉 = sup
𝒉
𝜎 𝑊 = 𝜎 𝑊
• For typical activation layers 𝑎 ℎ ,
𝑎 𝐿𝑖𝑝 = 1 for ReLU, LeakyReLU
𝑎 𝐿𝑖𝑝 = 𝐾 for other common activation layers (e.g. sigmoid, tanh)
• With the inequality
𝑓1 ∘ 𝑓2 𝐿𝑖𝑝 ≤ 𝑓1 𝐿𝑖𝑝 ∙ 𝑓2 𝐿𝑖𝑝,
we now have
𝑓 𝐿𝑖𝑝 ≤ 𝑊1 𝐿𝑖𝑝 ∙ 𝑎1 𝐿𝑖𝑝 ∙∙∙ 𝑊𝐿 𝐿𝑖𝑝 ∙ 𝑎 𝐿 𝐿𝑖𝑝 = ෑ
𝑙=1
𝐿
𝜎 𝑊𝑙
17linear activation
layer 1
linear activation
layer L
Spectral Normalization
18
• Spectral normalization
ഥ𝑊𝑆𝑁 𝑊 ∶=
𝑊
𝜎 𝑊
(𝑊: weight matrix)
• Now we have 𝑓 𝐿𝑖𝑝 ≤ 1 anywhere
• Fast approximation of 𝜎 𝑊 using
power iteration method (see the
paper)
(Miyato et. al [3])
Gradient Penalties
• Key: punish the critic discriminator when it violate the Lipschitz constraint
• But it’s impossible to enforce punishment anywhere
𝔼ො𝑥~ℙෝ𝑥
𝛻ො𝑥 𝐷 ො𝑥 2 − 1 2
• Two common sampling approaches for ො𝑥
WGAN-GP ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔
DRAGAN ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒
𝛼~𝑈 0,1
19
between data and model distribution
around data distribution
punish it when the gradient
norm get away from 1
make the gradient
norm stay close to 1
WGAN-GP
20
gradient exploding
gradient vanishing
inputoutput
with gradient penalty
ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔
between data and
model distribution
(Gulrajani et. al [4])
DRAGAN
21
ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒
around data distribution
GAN
WGAN-GP
DRAGAN
GAN
WGAN-GP
DRAGAN
(Fedus et. al [6])
What is critical in GAN training?
Why is WGAN more stable?
the properties of the underlying divergence that is being optimized
or
the Lipschitz constraint
23
theoretically, only minimax GAN may
suffer from gradient vanishing
(Goodfellow [7])
Comparisons
24
GAN-GP
DRAGAN
WGAN-GP
GAN
non-saturating GAN
+
gradient penalties
(Kodali et. al [5])
Why is WGAN more stable?
properties of the underlying divergence that is being optimized
or
Lipschitz constraint
25
(Recap) Original Convergence Proof
For a finite data set 𝑿, we only have
𝒑 𝒅𝒂𝒕𝒂 = ቊ
𝟏, 𝒙 ∈ 𝑿
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
(may need density estimation)
hard to optimize
26
From a Distribution Estimation Viewpoint
27
Locally regularized Globally regularizedUnregularized
smoother critics
• give a more stable guidance to the generator
• alleviate mode collapse issue
(my thoughts)
Open Questions
• Gradient penalties
• are usually too strong in WGAN-GP
• may create spurious local optima
• improved-improved-WGAN [8]
• Spectral normalization
• may impact the optimization procedure?
• can be used as a general regularization tool for any NN?
28
References
[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio, “Generative Adversarial Networks,” in Proc. NIPS, 2014.
[2] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein Generative Adversarial Networks,” in Proc.
ICML, 2017.
[3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, “Spectral Normalization for Generative
Adversarial Networks,” in Proc. ICLR, 2018.
[4] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville, “Improved training of
Wasserstein GANs,” In Proc. NIPS, 2017.
[5] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira, “On Convergence and Stability of GANs,” arXiv
preprint arXiv:1705.07215, 2017.
[6] William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, and Ian Goodfellow,
“Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step,” in Proc. ICLR,
2017.
[7] Ian J. Goodfellow, “On distinguishability criteria for estimating generative models,” in Proc. ICLR, Workshop
Track, 2015.
[8] Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, and Liqiang Wang, “Improving the Improved Training of Wasserstein
GANs: A Consistency Term and Its Dual Effect,” in Proc. ICLR, 2018.
29
Thank you for your attention!

More Related Content

Similar to What is Critical in GAN Training?

BEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial NetworksBEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial NetworksChung-Il Kim
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matchingtaeseon ryu
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birdsWangyu Han
 
J. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAIJ. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAIJaeyunSong2
 
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAIJ. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAIMLILAB
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Tsuyoshi Sakama
 
Extending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear OptimizationExtending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear OptimizationHans De Sterck
 
Optimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-CyclotronOptimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-CyclotronJackson Pybus
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffTaeoh Kim
 
Clustering coefficients for correlation networks
Clustering coefficients for correlation networksClustering coefficients for correlation networks
Clustering coefficients for correlation networksNaoki Masuda
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsAkisato Kimura
 

Similar to What is Critical in GAN Training? (20)

BEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial NetworksBEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial Networks
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
J. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAIJ. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAI
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAIJ. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
Extending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear OptimizationExtending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear Optimization
 
Optimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-CyclotronOptimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-Cyclotron
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
Clustering coefficients for correlation networks
Clustering coefficients for correlation networksClustering coefficients for correlation networks
Clustering coefficients for correlation networks
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 

More from Hao-Wen (Herman) Dong

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Hao-Wen (Herman) Dong
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Hao-Wen (Herman) Dong
 
Organizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository OrganizationOrganizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository OrganizationHao-Wen (Herman) Dong
 
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...Hao-Wen (Herman) Dong
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANHao-Wen (Herman) Dong
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative ModelsHao-Wen (Herman) Dong
 

More from Hao-Wen (Herman) Dong (6)

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
 
Organizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository OrganizationOrganizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository Organization
 
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative Models
 

Recently uploaded

Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxPABOLU TEJASREE
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 

Recently uploaded (20)

Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 

What is Critical in GAN Training?

  • 1. WHAT IS CRITICAL IN GAN TRAINING? Hao-Wen Dong Music and Computing (MAC) Lab, CITI, Academia Sinica
  • 2. Outlines • Generative Adversarial Networks (GAN[1]) • Wasserstein GANs (WGAN[2]) • Lipschitz Regularization • Spectral Normalization (SN-GAN[3]) • Gradient Penalties (WGAN-GP[4], DRAGAN[5], GAN-GP[6]) • What is critical in GAN training? 2
  • 4. Generative Adversarial Networks (GANs) • Two-player game between the discriminator D and the generator G 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 𝐽 𝑮 𝐺 = 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 (to assign real data a 1) (to assign fake data a 0) fake data (to make D assign generated data a 1) 4 data distribution prior distribution
  • 6. Original Convergence Proof For a finite data set 𝑿, we only have 𝒑 𝒅𝒂𝒕𝒂 = ቊ 𝟏, 𝒙 ∈ 𝑿 𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 (may need density estimation) hard to optimize if the criterion can be easily improved usually not the case usually not the case 6 (one of)
  • 7. Minimax and Non-saturating GANs 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 Minimax: 𝐽 𝑮 𝐺 = 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 Non-saturating: 𝐽 𝑮 𝐺 = − 𝔼 𝑧~𝑝 𝑧 log 𝐷 𝐺 𝑧 (used in practice) (won’t stop training when D is stronger) 7
  • 8. Comparisons of GANs D is fooledD is not fooled (non-saturating GAN) (minimax GAN) Less training difficulties at the initial stage when G can hardly fool D 8 (Goodfellow [7])
  • 10. Wasserstein Distance (Earth-Mover Distance) can be optimized easier 10 𝑊 ℙ 𝑟, ℙ 𝜃 = in𝑓 𝛾∈Π ℙ 𝑟,ℙ 𝜃 𝔼 𝑥,𝑦 ~𝛾 𝑥 − 𝑦
  • 11. Kantorovich-Rubinstein duality • Kantorovich-Rubinstein duality 𝑊 ℙ 𝑟, ℙ 𝜃 = sup 𝑓 𝐿≤1 𝔼 𝑥~ℙ 𝑟 𝑓 𝑥 − 𝔼 𝑥~ℙ 𝜃 𝑓 𝑥 • Definition A function 𝑓: ℜ → ℜ is called Lipschitz continuous if ∃𝐾 ∈ ℜ 𝑠. 𝑡. ∀𝑥1, 𝑥2 ∈ ℜ 𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2 all the 1-Lipschitz functions 𝒇: 𝑿 → 𝕽 11  
  • 12. Wasserstein GAN • Key: use a NN to estimate Wasserstein distance (and use it as critics for G) 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 𝐷 𝐺 𝑧 𝐽 𝑮 𝐺 = 𝔼 𝑧~𝑝 𝑧 𝐷 𝐺 𝑧 • Original GAN (non-saturating) 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 𝐽 𝑮 𝐺 = − 𝔼 𝑧~𝑝 𝑧 log 𝐷 𝐺 𝑧 12
  • 13. Wasserstein GAN • Problem: such NN needs to satisfy a Lipschitz constraint • Global regularization • weight clipping  original WGAN • spectral normalization  SNGAN • Local regularization • gradient penalties  WGAN-GP 13
  • 15. Weight Clipping • Key: clip the weights of the critic into −𝑐, 𝑐 15 gradient exploding gradient vanishing output input training difficulties (Gulrajani et. al [4])
  • 16. Spectral Normalization • Key: constraining the spectral norm of each layer • For each layer 𝑔: 𝒉𝑖𝑛 → 𝒉 𝑜𝑢𝑡, by definition we have 𝑔 𝐿𝑖𝑝 = sup 𝒉 𝜎 𝛻𝑔 𝒉 , where 𝜎 𝐴 ≔ max 𝒉≠0 𝐴𝒉 2 𝒉 2 = max 𝒉 2≤1 𝐴𝒉 2 16 spectral norm the largest singular value of A
  • 17. Spectral Normalization • For a linear layer 𝑔 𝒉 = 𝑊𝒉, 𝑔 𝐿𝑖𝑝 = sup 𝒉 𝜎 𝛻𝑔 𝒉 = sup 𝒉 𝜎 𝑊 = 𝜎 𝑊 • For typical activation layers 𝑎 ℎ , 𝑎 𝐿𝑖𝑝 = 1 for ReLU, LeakyReLU 𝑎 𝐿𝑖𝑝 = 𝐾 for other common activation layers (e.g. sigmoid, tanh) • With the inequality 𝑓1 ∘ 𝑓2 𝐿𝑖𝑝 ≤ 𝑓1 𝐿𝑖𝑝 ∙ 𝑓2 𝐿𝑖𝑝, we now have 𝑓 𝐿𝑖𝑝 ≤ 𝑊1 𝐿𝑖𝑝 ∙ 𝑎1 𝐿𝑖𝑝 ∙∙∙ 𝑊𝐿 𝐿𝑖𝑝 ∙ 𝑎 𝐿 𝐿𝑖𝑝 = ෑ 𝑙=1 𝐿 𝜎 𝑊𝑙 17linear activation layer 1 linear activation layer L
  • 18. Spectral Normalization 18 • Spectral normalization ഥ𝑊𝑆𝑁 𝑊 ∶= 𝑊 𝜎 𝑊 (𝑊: weight matrix) • Now we have 𝑓 𝐿𝑖𝑝 ≤ 1 anywhere • Fast approximation of 𝜎 𝑊 using power iteration method (see the paper) (Miyato et. al [3])
  • 19. Gradient Penalties • Key: punish the critic discriminator when it violate the Lipschitz constraint • But it’s impossible to enforce punishment anywhere 𝔼ො𝑥~ℙෝ𝑥 𝛻ො𝑥 𝐷 ො𝑥 2 − 1 2 • Two common sampling approaches for ො𝑥 WGAN-GP ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔 DRAGAN ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒 𝛼~𝑈 0,1 19 between data and model distribution around data distribution punish it when the gradient norm get away from 1 make the gradient norm stay close to 1
  • 20. WGAN-GP 20 gradient exploding gradient vanishing inputoutput with gradient penalty ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔 between data and model distribution (Gulrajani et. al [4])
  • 21. DRAGAN 21 ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒 around data distribution GAN WGAN-GP DRAGAN GAN WGAN-GP DRAGAN (Fedus et. al [6])
  • 22. What is critical in GAN training?
  • 23. Why is WGAN more stable? the properties of the underlying divergence that is being optimized or the Lipschitz constraint 23 theoretically, only minimax GAN may suffer from gradient vanishing (Goodfellow [7])
  • 25. Why is WGAN more stable? properties of the underlying divergence that is being optimized or Lipschitz constraint 25
  • 26. (Recap) Original Convergence Proof For a finite data set 𝑿, we only have 𝒑 𝒅𝒂𝒕𝒂 = ቊ 𝟏, 𝒙 ∈ 𝑿 𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 (may need density estimation) hard to optimize 26
  • 27. From a Distribution Estimation Viewpoint 27 Locally regularized Globally regularizedUnregularized smoother critics • give a more stable guidance to the generator • alleviate mode collapse issue (my thoughts)
  • 28. Open Questions • Gradient penalties • are usually too strong in WGAN-GP • may create spurious local optima • improved-improved-WGAN [8] • Spectral normalization • may impact the optimization procedure? • can be used as a general regularization tool for any NN? 28
  • 29. References [1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative Adversarial Networks,” in Proc. NIPS, 2014. [2] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein Generative Adversarial Networks,” in Proc. ICML, 2017. [3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks,” in Proc. ICLR, 2018. [4] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville, “Improved training of Wasserstein GANs,” In Proc. NIPS, 2017. [5] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira, “On Convergence and Stability of GANs,” arXiv preprint arXiv:1705.07215, 2017. [6] William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, and Ian Goodfellow, “Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step,” in Proc. ICLR, 2017. [7] Ian J. Goodfellow, “On distinguishability criteria for estimating generative models,” in Proc. ICLR, Workshop Track, 2015. [8] Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, and Liqiang Wang, “Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect,” in Proc. ICLR, 2018. 29
  • 30. Thank you for your attention!