SlideShare a Scribd company logo
WHAT IS CRITICAL IN GAN TRAINING?
Hao-Wen Dong
Music and Computing (MAC) Lab, CITI, Academia Sinica
Outlines
• Generative Adversarial Networks (GAN[1])
• Wasserstein GANs (WGAN[2])
• Lipschitz Regularization
• Spectral Normalization (SN-GAN[3])
• Gradient Penalties (WGAN-GP[4], DRAGAN[5], GAN-GP[6])
• What is critical in GAN training?
2
Generative Adversarial Networks
Generative Adversarial Networks (GANs)
• Two-player game between the discriminator D and the generator G
𝐽 𝑫 𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
log 𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
𝐽 𝑮 𝐺 = 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
(to assign real data a 1) (to assign fake data a 0)
fake data
(to make D assign generated data a 1)
4
data distribution prior distribution
Original Algorithm
5
(Goodfellow et. al [1])
Original Convergence Proof
For a finite data set 𝑿, we only have
𝒑 𝒅𝒂𝒕𝒂 = ቊ
𝟏, 𝒙 ∈ 𝑿
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
(may need density estimation)
hard to optimize
if the criterion can be easily improved
usually not the case
usually not the case
6
(one of)
Minimax and Non-saturating GANs
𝐽 𝑫
𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
log 𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
Minimax: 𝐽 𝑮 𝐺 = 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
Non-saturating: 𝐽 𝑮
𝐺 = − 𝔼
𝑧~𝑝 𝑧
log 𝐷 𝐺 𝑧 (used in practice)
(won’t stop training when D is stronger)
7
Comparisons of GANs
D is fooledD is not fooled
(non-saturating GAN)
(minimax GAN)
Less training difficulties
at the initial stage when
G can hardly fool D
8
(Goodfellow [7])
Wasserstein GANs
Wasserstein Distance (Earth-Mover Distance)
can be optimized easier
10
𝑊 ℙ 𝑟, ℙ 𝜃 = in𝑓
𝛾∈Π ℙ 𝑟,ℙ 𝜃
𝔼 𝑥,𝑦 ~𝛾 𝑥 − 𝑦
Kantorovich-Rubinstein duality
• Kantorovich-Rubinstein duality
𝑊 ℙ 𝑟, ℙ 𝜃 = sup
𝑓 𝐿≤1
𝔼 𝑥~ℙ 𝑟
𝑓 𝑥 − 𝔼 𝑥~ℙ 𝜃
𝑓 𝑥
• Definition A function 𝑓: ℜ → ℜ is called Lipschitz continuous if
∃𝐾 ∈ ℜ 𝑠. 𝑡. ∀𝑥1, 𝑥2 ∈ ℜ 𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2
all the 1-Lipschitz functions 𝒇: 𝑿 → 𝕽
11
 
Wasserstein GAN
• Key: use a NN to estimate Wasserstein distance (and use it as critics for G)
𝐽 𝑫 𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
𝐷 𝐺 𝑧
𝐽 𝑮
𝐺 = 𝔼
𝑧~𝑝 𝑧
𝐷 𝐺 𝑧
• Original GAN (non-saturating)
𝐽 𝑫 𝐷, 𝐺 = − 𝔼
𝑥~𝑝 𝑑𝑎𝑡𝑎
log 𝐷 𝑥 − 𝔼
𝑧~𝑝 𝑧
log 1 − 𝐷 𝐺 𝑧
𝐽 𝑮 𝐺 = − 𝔼
𝑧~𝑝 𝑧
log 𝐷 𝐺 𝑧
12
Wasserstein GAN
• Problem: such NN needs to satisfy a Lipschitz constraint
• Global regularization
• weight clipping  original WGAN
• spectral normalization  SNGAN
• Local regularization
• gradient penalties  WGAN-GP
13
Lipschitz Regularization
Weight Clipping
• Key: clip the weights of the critic into −𝑐, 𝑐
15
gradient exploding
gradient vanishing
output input
training difficulties
(Gulrajani et. al [4])
Spectral Normalization
• Key: constraining the spectral norm of each layer
• For each layer 𝑔: 𝒉𝑖𝑛 → 𝒉 𝑜𝑢𝑡, by definition we have
𝑔 𝐿𝑖𝑝 = sup
𝒉
𝜎 𝛻𝑔 𝒉 ,
where
𝜎 𝐴 ≔ max
𝒉≠0
𝐴𝒉 2
𝒉 2
= max
𝒉 2≤1
𝐴𝒉 2
16
spectral norm
the largest singular value of A
Spectral Normalization
• For a linear layer 𝑔 𝒉 = 𝑊𝒉, 𝑔 𝐿𝑖𝑝 = sup
𝒉
𝜎 𝛻𝑔 𝒉 = sup
𝒉
𝜎 𝑊 = 𝜎 𝑊
• For typical activation layers 𝑎 ℎ ,
𝑎 𝐿𝑖𝑝 = 1 for ReLU, LeakyReLU
𝑎 𝐿𝑖𝑝 = 𝐾 for other common activation layers (e.g. sigmoid, tanh)
• With the inequality
𝑓1 ∘ 𝑓2 𝐿𝑖𝑝 ≤ 𝑓1 𝐿𝑖𝑝 ∙ 𝑓2 𝐿𝑖𝑝,
we now have
𝑓 𝐿𝑖𝑝 ≤ 𝑊1 𝐿𝑖𝑝 ∙ 𝑎1 𝐿𝑖𝑝 ∙∙∙ 𝑊𝐿 𝐿𝑖𝑝 ∙ 𝑎 𝐿 𝐿𝑖𝑝 = ෑ
𝑙=1
𝐿
𝜎 𝑊𝑙
17linear activation
layer 1
linear activation
layer L
Spectral Normalization
18
• Spectral normalization
ഥ𝑊𝑆𝑁 𝑊 ∶=
𝑊
𝜎 𝑊
(𝑊: weight matrix)
• Now we have 𝑓 𝐿𝑖𝑝 ≤ 1 anywhere
• Fast approximation of 𝜎 𝑊 using
power iteration method (see the
paper)
(Miyato et. al [3])
Gradient Penalties
• Key: punish the critic discriminator when it violate the Lipschitz constraint
• But it’s impossible to enforce punishment anywhere
𝔼ො𝑥~ℙෝ𝑥
𝛻ො𝑥 𝐷 ො𝑥 2 − 1 2
• Two common sampling approaches for ො𝑥
WGAN-GP ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔
DRAGAN ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒
𝛼~𝑈 0,1
19
between data and model distribution
around data distribution
punish it when the gradient
norm get away from 1
make the gradient
norm stay close to 1
WGAN-GP
20
gradient exploding
gradient vanishing
inputoutput
with gradient penalty
ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔
between data and
model distribution
(Gulrajani et. al [4])
DRAGAN
21
ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒
around data distribution
GAN
WGAN-GP
DRAGAN
GAN
WGAN-GP
DRAGAN
(Fedus et. al [6])
What is critical in GAN training?
Why is WGAN more stable?
the properties of the underlying divergence that is being optimized
or
the Lipschitz constraint
23
theoretically, only minimax GAN may
suffer from gradient vanishing
(Goodfellow [7])
Comparisons
24
GAN-GP
DRAGAN
WGAN-GP
GAN
non-saturating GAN
+
gradient penalties
(Kodali et. al [5])
Why is WGAN more stable?
properties of the underlying divergence that is being optimized
or
Lipschitz constraint
25
(Recap) Original Convergence Proof
For a finite data set 𝑿, we only have
𝒑 𝒅𝒂𝒕𝒂 = ቊ
𝟏, 𝒙 ∈ 𝑿
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
(may need density estimation)
hard to optimize
26
From a Distribution Estimation Viewpoint
27
Locally regularized Globally regularizedUnregularized
smoother critics
• give a more stable guidance to the generator
• alleviate mode collapse issue
(my thoughts)
Open Questions
• Gradient penalties
• are usually too strong in WGAN-GP
• may create spurious local optima
• improved-improved-WGAN [8]
• Spectral normalization
• may impact the optimization procedure?
• can be used as a general regularization tool for any NN?
28
References
[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio, “Generative Adversarial Networks,” in Proc. NIPS, 2014.
[2] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein Generative Adversarial Networks,” in Proc.
ICML, 2017.
[3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, “Spectral Normalization for Generative
Adversarial Networks,” in Proc. ICLR, 2018.
[4] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville, “Improved training of
Wasserstein GANs,” In Proc. NIPS, 2017.
[5] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira, “On Convergence and Stability of GANs,” arXiv
preprint arXiv:1705.07215, 2017.
[6] William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, and Ian Goodfellow,
“Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step,” in Proc. ICLR,
2017.
[7] Ian J. Goodfellow, “On distinguishability criteria for estimating generative models,” in Proc. ICLR, Workshop
Track, 2015.
[8] Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, and Liqiang Wang, “Improving the Improved Training of Wasserstein
GANs: A Consistency Term and Its Dual Effect,” in Proc. ICLR, 2018.
29
Thank you for your attention!

More Related Content

Similar to What is Critical in GAN Training?

BEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial NetworksBEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial Networks
Chung-Il Kim
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Atsushi Nitanda
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
Julián Tachella
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
sagayalavanya2
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
taeseon ryu
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
ParveenMalik18
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
Wangyu Han
 
J. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAIJ. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAI
JaeyunSong2
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
MuhammadAhmedShah2
 
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAIJ. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
MLILAB
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
민재 정
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
Tsuyoshi Sakama
 
Extending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear OptimizationExtending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear Optimization
Hans De Sterck
 
Optimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-CyclotronOptimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-Cyclotron
Jackson Pybus
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
Taeoh Kim
 
Clustering coefficients for correlation networks
Clustering coefficients for correlation networksClustering coefficients for correlation networks
Clustering coefficients for correlation networks
Naoki Masuda
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
NAVER Engineering
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
Akisato Kimura
 

Similar to What is Critical in GAN Training? (20)

BEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial NetworksBEGAN Boundary Equilibrium Generative Adversarial Networks
BEGAN Boundary Equilibrium Generative Adversarial Networks
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
J. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAIJ. Song, ICML 2023, MLILAB, KAISTAI
J. Song, ICML 2023, MLILAB, KAISTAI
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAIJ. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
Extending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear OptimizationExtending Preconditioned GMRES to Nonlinear Optimization
Extending Preconditioned GMRES to Nonlinear Optimization
 
Optimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-CyclotronOptimizing the Drell-Yan Trigger-Cyclotron
Optimizing the Drell-Yan Trigger-Cyclotron
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
Clustering coefficients for correlation networks
Clustering coefficients for correlation networksClustering coefficients for correlation networks
Clustering coefficients for correlation networks
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 

More from Hao-Wen (Herman) Dong

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Hao-Wen (Herman) Dong
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Hao-Wen (Herman) Dong
 
Organizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository OrganizationOrganizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository Organization
Hao-Wen (Herman) Dong
 
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
Hao-Wen (Herman) Dong
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Hao-Wen (Herman) Dong
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative Models
Hao-Wen (Herman) Dong
 

More from Hao-Wen (Herman) Dong (6)

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
 
Organizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository OrganizationOrganizing Machine Learning Projects - Repository Organization
Organizing Machine Learning Projects - Repository Organization
 
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative Models
 

Recently uploaded

Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 

Recently uploaded (20)

Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 

What is Critical in GAN Training?

  • 1. WHAT IS CRITICAL IN GAN TRAINING? Hao-Wen Dong Music and Computing (MAC) Lab, CITI, Academia Sinica
  • 2. Outlines • Generative Adversarial Networks (GAN[1]) • Wasserstein GANs (WGAN[2]) • Lipschitz Regularization • Spectral Normalization (SN-GAN[3]) • Gradient Penalties (WGAN-GP[4], DRAGAN[5], GAN-GP[6]) • What is critical in GAN training? 2
  • 4. Generative Adversarial Networks (GANs) • Two-player game between the discriminator D and the generator G 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 𝐽 𝑮 𝐺 = 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 (to assign real data a 1) (to assign fake data a 0) fake data (to make D assign generated data a 1) 4 data distribution prior distribution
  • 6. Original Convergence Proof For a finite data set 𝑿, we only have 𝒑 𝒅𝒂𝒕𝒂 = ቊ 𝟏, 𝒙 ∈ 𝑿 𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 (may need density estimation) hard to optimize if the criterion can be easily improved usually not the case usually not the case 6 (one of)
  • 7. Minimax and Non-saturating GANs 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 Minimax: 𝐽 𝑮 𝐺 = 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 Non-saturating: 𝐽 𝑮 𝐺 = − 𝔼 𝑧~𝑝 𝑧 log 𝐷 𝐺 𝑧 (used in practice) (won’t stop training when D is stronger) 7
  • 8. Comparisons of GANs D is fooledD is not fooled (non-saturating GAN) (minimax GAN) Less training difficulties at the initial stage when G can hardly fool D 8 (Goodfellow [7])
  • 10. Wasserstein Distance (Earth-Mover Distance) can be optimized easier 10 𝑊 ℙ 𝑟, ℙ 𝜃 = in𝑓 𝛾∈Π ℙ 𝑟,ℙ 𝜃 𝔼 𝑥,𝑦 ~𝛾 𝑥 − 𝑦
  • 11. Kantorovich-Rubinstein duality • Kantorovich-Rubinstein duality 𝑊 ℙ 𝑟, ℙ 𝜃 = sup 𝑓 𝐿≤1 𝔼 𝑥~ℙ 𝑟 𝑓 𝑥 − 𝔼 𝑥~ℙ 𝜃 𝑓 𝑥 • Definition A function 𝑓: ℜ → ℜ is called Lipschitz continuous if ∃𝐾 ∈ ℜ 𝑠. 𝑡. ∀𝑥1, 𝑥2 ∈ ℜ 𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2 all the 1-Lipschitz functions 𝒇: 𝑿 → 𝕽 11  
  • 12. Wasserstein GAN • Key: use a NN to estimate Wasserstein distance (and use it as critics for G) 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 𝐷 𝐺 𝑧 𝐽 𝑮 𝐺 = 𝔼 𝑧~𝑝 𝑧 𝐷 𝐺 𝑧 • Original GAN (non-saturating) 𝐽 𝑫 𝐷, 𝐺 = − 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑥 − 𝔼 𝑧~𝑝 𝑧 log 1 − 𝐷 𝐺 𝑧 𝐽 𝑮 𝐺 = − 𝔼 𝑧~𝑝 𝑧 log 𝐷 𝐺 𝑧 12
  • 13. Wasserstein GAN • Problem: such NN needs to satisfy a Lipschitz constraint • Global regularization • weight clipping  original WGAN • spectral normalization  SNGAN • Local regularization • gradient penalties  WGAN-GP 13
  • 15. Weight Clipping • Key: clip the weights of the critic into −𝑐, 𝑐 15 gradient exploding gradient vanishing output input training difficulties (Gulrajani et. al [4])
  • 16. Spectral Normalization • Key: constraining the spectral norm of each layer • For each layer 𝑔: 𝒉𝑖𝑛 → 𝒉 𝑜𝑢𝑡, by definition we have 𝑔 𝐿𝑖𝑝 = sup 𝒉 𝜎 𝛻𝑔 𝒉 , where 𝜎 𝐴 ≔ max 𝒉≠0 𝐴𝒉 2 𝒉 2 = max 𝒉 2≤1 𝐴𝒉 2 16 spectral norm the largest singular value of A
  • 17. Spectral Normalization • For a linear layer 𝑔 𝒉 = 𝑊𝒉, 𝑔 𝐿𝑖𝑝 = sup 𝒉 𝜎 𝛻𝑔 𝒉 = sup 𝒉 𝜎 𝑊 = 𝜎 𝑊 • For typical activation layers 𝑎 ℎ , 𝑎 𝐿𝑖𝑝 = 1 for ReLU, LeakyReLU 𝑎 𝐿𝑖𝑝 = 𝐾 for other common activation layers (e.g. sigmoid, tanh) • With the inequality 𝑓1 ∘ 𝑓2 𝐿𝑖𝑝 ≤ 𝑓1 𝐿𝑖𝑝 ∙ 𝑓2 𝐿𝑖𝑝, we now have 𝑓 𝐿𝑖𝑝 ≤ 𝑊1 𝐿𝑖𝑝 ∙ 𝑎1 𝐿𝑖𝑝 ∙∙∙ 𝑊𝐿 𝐿𝑖𝑝 ∙ 𝑎 𝐿 𝐿𝑖𝑝 = ෑ 𝑙=1 𝐿 𝜎 𝑊𝑙 17linear activation layer 1 linear activation layer L
  • 18. Spectral Normalization 18 • Spectral normalization ഥ𝑊𝑆𝑁 𝑊 ∶= 𝑊 𝜎 𝑊 (𝑊: weight matrix) • Now we have 𝑓 𝐿𝑖𝑝 ≤ 1 anywhere • Fast approximation of 𝜎 𝑊 using power iteration method (see the paper) (Miyato et. al [3])
  • 19. Gradient Penalties • Key: punish the critic discriminator when it violate the Lipschitz constraint • But it’s impossible to enforce punishment anywhere 𝔼ො𝑥~ℙෝ𝑥 𝛻ො𝑥 𝐷 ො𝑥 2 − 1 2 • Two common sampling approaches for ො𝑥 WGAN-GP ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔 DRAGAN ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒 𝛼~𝑈 0,1 19 between data and model distribution around data distribution punish it when the gradient norm get away from 1 make the gradient norm stay close to 1
  • 20. WGAN-GP 20 gradient exploding gradient vanishing inputoutput with gradient penalty ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑔 between data and model distribution (Gulrajani et. al [4])
  • 21. DRAGAN 21 ℙො𝑥 = 𝛼ℙ 𝑥 + 1 − 𝛼 ℙ 𝑛𝑜𝑖𝑠𝑒 around data distribution GAN WGAN-GP DRAGAN GAN WGAN-GP DRAGAN (Fedus et. al [6])
  • 22. What is critical in GAN training?
  • 23. Why is WGAN more stable? the properties of the underlying divergence that is being optimized or the Lipschitz constraint 23 theoretically, only minimax GAN may suffer from gradient vanishing (Goodfellow [7])
  • 25. Why is WGAN more stable? properties of the underlying divergence that is being optimized or Lipschitz constraint 25
  • 26. (Recap) Original Convergence Proof For a finite data set 𝑿, we only have 𝒑 𝒅𝒂𝒕𝒂 = ቊ 𝟏, 𝒙 ∈ 𝑿 𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 (may need density estimation) hard to optimize 26
  • 27. From a Distribution Estimation Viewpoint 27 Locally regularized Globally regularizedUnregularized smoother critics • give a more stable guidance to the generator • alleviate mode collapse issue (my thoughts)
  • 28. Open Questions • Gradient penalties • are usually too strong in WGAN-GP • may create spurious local optima • improved-improved-WGAN [8] • Spectral normalization • may impact the optimization procedure? • can be used as a general regularization tool for any NN? 28
  • 29. References [1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative Adversarial Networks,” in Proc. NIPS, 2014. [2] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein Generative Adversarial Networks,” in Proc. ICML, 2017. [3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks,” in Proc. ICLR, 2018. [4] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville, “Improved training of Wasserstein GANs,” In Proc. NIPS, 2017. [5] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira, “On Convergence and Stability of GANs,” arXiv preprint arXiv:1705.07215, 2017. [6] William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, and Ian Goodfellow, “Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step,” in Proc. ICLR, 2017. [7] Ian J. Goodfellow, “On distinguishability criteria for estimating generative models,” in Proc. ICLR, Workshop Track, 2015. [8] Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, and Liqiang Wang, “Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect,” in Proc. ICLR, 2018. 29
  • 30. Thank you for your attention!