SlideShare a Scribd company logo
Improved Trainings of Wasserstein GANs
Sangwoo Mo
KAIST ALIN Lab.
July 04, 2018
1
Table of Contents
Review for GANs
Improved Training of WGANs
2
Table of Contents
Review for GANs
Improved Training of WGANs
3
Generative Adversarial Networks (GANs)
Generative model aims to learn a model distribution pθ(x) be
match with the target distribution p(x)
Usually we assume x ∼ pθ(x) is a deterministic mapping
x = Gθ(z) of a simple noise z ∼ p(z)
* Figure from OpenAI blog.
4
Generative Adversarial Networks (GANs)
Q. How to train a generative model?
Explicit model: directly optimize the objective (e.g. MLE)
For example, PixelCNN maximizes
log pθ(x) =
n
i=1
log pθ(xi | x1:i−1)
5
Generative Adversarial Networks (GANs)
Q. How to train a generative model?
Explicit model: directly optimize the objective (e.g. MLE)
Implicit model1: learning by comparison
Idea of GAN
Train a discriminator D which compares p(x) and pθ(x)
Train a generator G using the signal from D
1
Do not know pθ(x) but only can sample from.
6
Generative Adversarial Networks (GANs)
What is happening in GAN?
GAN plays a minimax game between G and D:
min
G
max
D
V (G, D) where
V (G, D) = Ex∼p(x)[log D(x)] + Ez∼p(z)[log(1 − D(G(z))]
For given G, the optimal D∗ is
D∗
(x) =
p(x)
p(x) + pθ(x)
7
Generative Adversarial Networks (GANs)
What is happening in GAN?
Putting D∗ to the objective, we have
C(G) = max
D
V (G, D)
= KL(p
p + pθ
2
) + KL(pθ
p + pθ
2
) + const
= 2 · JSD(p pθ) + const
Hence, GAN minimizes the lower bound of JSD
8
Generative Adversarial Networks (GANs)
What is happening in GAN?
In practice, GAN suffers from gradient vanishing
To avoid this problem, we minimize − log D(G(z)) instead
Putting D∗, we have
C(G) = KL(pθ p) − 2 · JSD(p pθ) + const
Hence, it minimizes the lower bound of reverse KL
9
Wasserstein GANs (WGANs)
Why GAN is unstable?
Supports of p(x) and pθ(x) are disjoint1 a.s.
Then
JSD(p pθ) = log 2
KL(p pθ) = KL(pθ p) = +∞
The loss does not provide a valuable information
Solution
1. Add noise to overlap supports
2. Use better divergence
1
Lie on the low-dimensional manifolds.
10
Wasserstein GANs (WGANs)
Toy example
Let z ∼ U[0, 1] and x = (0, z) ∼ p(x)
Let Gθ(z) = (θ, z), hence pθ(x) = p(x) for θ = 0
* Figure from Lilian Weng’s blog.
11
Wasserstein GANs (WGANs)
Toy example
Here, Wasserstein distance is
W (p pθ) = |θ|
Unlike JSD and KL, it provides the closeness info.
* Figure from WGAN paper.
12
Wasserstein GANs (WGANs)
Wasserstein distance
Wasserstein-1 distance is
W (p, q) = inf
γ∈Π(p,q)
E(x,y)∼γ[ x − y 1]
Relation between divergences
W = conv in dist. < JSD = TV < KL
13
Wasserstein GANs (WGANs)
How to minimize Wasserstein distance?
Wasserstein-1 distance has a dual form:
W (p, q) = sup
f ∈F
Ex∼p(x)[f (x)] − Ex∼q(x)[f (x)]
where F is the set of 1-Lipschitz functions
Hence, the objective of WGAN is
min
G
max
D∈D
Ex∼p(x)[D(x)] − Ez∼p(z)[D(G(z))]
To achieve Lipschitz constraints, WGAN uses weight clipping
14
Table of Contents
Review for GANs
Improved Training of WGANs
15
Motivation
Motivation: weight clipping leads optimization difficulties
1 Restricts function space too simple
2 Gradient exploding/vanishing
16
Observation
Theorem 1
Let (x, y) ∼ γ∗ where γ∗ is optimal coupling and f ∗ is optimal
function. Let xt = ty + (1 − t)x with 0 ≤ t ≤ 1. Then
P(x,y)∼γ f ∗
(xt) =
y − xt
y − xt
= 1
Corollary 2
f ∗ has gradient norm 1 a.e. on the line segments xy
17
Observation
Proof.
For (x, y) ∼ γ∗, f ∗(y) − f ∗(x) = y − x a.s
Let ψ(t) = f ∗(xt) − f ∗(x). Then
|ψ(t) − ψ(t )| = f ∗
(xt) − f ∗
(xt )
≤ xt − xt = x − y |t − t |,
hence ψ(t) is x − y -Lipschitz. Using this,
ψ(1) − ψ(0) = (ψ(1) − ψ(t)) + (ψ(t) − ψ(0))
≤ (1 − t) x − y + t x − y = x − y ,
and equality holds since
|ψ(1) − ψ(0)| = |f ∗
(y) − f ∗
(x)| = y − x
18
Observation
Proof.
Thus, ψ(t) − ψ(0) = t x − y , and so ψ(t) = t x − y .
Hence, f ∗(xt) = f ∗(x) + t y − x .
Let v = (y − x)/ y − x . Then
∂
∂v
f ∗
(xt) = lim
h→0
f ∗(xt + hv) − f ∗(xt)
h
= 1
Since f ∗(xt) ≤ 1, we conclude that f ∗(xt) = v.
19
Gradient Penalty (WGAN-GP)
From observation, we define gradient penalty
λ Eˆx∼ˆp(x)[( ˆx D(ˆx) 2 − 1)2
]
where ˆx ∼ ˆp(x) is uniformly sampled from the line segment xy
No critic BN: penalized gradient norm independently
Two-sided penalty: also tried one-sided penalty
max(0, D(ˆx) 2 − 1)2
but empirically no much difference
20
Possible Improvement
WGAN-GP does not sample (x, y) from optimal coupling γ∗
Instead, samples from x ∼ p(x), y ∼ pθ(x)
It does not match with the theory (Theorem 1)
Idea: (x, G(E(x))) would be a better approximation for γ∗
E is additionally trained encoder x → z
G(E(x)) is projection of x to G manifold
21
Experiments
WGAN-GP improves the stability
# of success1 for GAN & WGAN-GP
1
Inception score > threshold. Experiments on 32×32 ImageNet.
22
Experiments
23
Experiments
WGAN-GP improves the performance
Inception score on CIFAR-10.
24
Reference
Goodfellow et al. Generative Adversarial Nets. NIPS 2014.
Arjovsky et al. Towards Principled Methods for Training
GANs. ICLR 2017.
Arjovsky et al. Wasserstein GAN. ICML 2017.
Gulrajani et al. Improved Training of Wasserstein GANs.
NIPS 2017.
25

More Related Content

What's hot

Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
Sebastian Ruder
 
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
Yongho Ha
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
Sunggon Song
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
Jason Anderson
 
20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて
Preferred Networks
 
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
Chihiro Kusunoki
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
Hojin Yang
 
GANs Deep Learning Summer School
GANs Deep Learning Summer SchoolGANs Deep Learning Summer School
GANs Deep Learning Summer School
Rubens Zimbres, PhD
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
Shyam Krishna Khadka
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
JaeJun Yoo
 
Weighted Blended Order Independent Transparency
Weighted Blended Order Independent TransparencyWeighted Blended Order Independent Transparency
Weighted Blended Order Independent Transparency
zokweiron
 
CycleGAN이 무엇인지 알아보자
CycleGAN이 무엇인지 알아보자CycleGAN이 무엇인지 알아보자
CycleGAN이 무엇인지 알아보자
Kwangsik Lee
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Hyeongmin Lee
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
Yongho Ha
 
십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)
HyunKyu Jeon
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
Sungbin Lim
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
Artifacia
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Hansol Kang
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
수철 박
 

What's hot (20)

Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて
 
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
GANs Deep Learning Summer School
GANs Deep Learning Summer SchoolGANs Deep Learning Summer School
GANs Deep Learning Summer School
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
Weighted Blended Order Independent Transparency
Weighted Blended Order Independent TransparencyWeighted Blended Order Independent Transparency
Weighted Blended Order Independent Transparency
 
CycleGAN이 무엇인지 알아보자
CycleGAN이 무엇인지 알아보자CycleGAN이 무엇인지 알아보자
CycleGAN이 무엇인지 알아보자
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)십분딥러닝_16_WGAN (Wasserstein GANs)
십분딥러닝_16_WGAN (Wasserstein GANs)
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 

Similar to Improved Trainings of Wasserstein GANs (WGAN-GP)

Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
Daisuke Yoneoka
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
Mark Chang
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
Alexander Litvinenko
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
Pantelis Sopasakis
 
Equivariance
EquivarianceEquivariance
Equivariance
mustafa sarac
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
Frank Nielsen
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
HidenoriOgata
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Improper integral
Improper integralImproper integral
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
The Statistical and Applied Mathematical Sciences Institute
 
Slides_A4.pdf
Slides_A4.pdfSlides_A4.pdf
Slides_A4.pdf
CynthiaAndati
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Ece3075 a 8
Ece3075 a 8Ece3075 a 8
Ece3075 a 8
Aiman Malik
 
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.pptDeep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
someyamohsen2
 
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.pptDeep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
GayathriSanthosh11
 
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.pptDeep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
RMDAcademicCoordinat
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
KSChidanandKumarJSSS
 
Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013Raffaele Rainone
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
Frank Nielsen
 

Similar to Improved Trainings of Wasserstein GANs (WGAN-GP) (20)

Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
 
Equivariance
EquivarianceEquivariance
Equivariance
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Improper integral
Improper integralImproper integral
Improper integral
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
 
Slides_A4.pdf
Slides_A4.pdfSlides_A4.pdf
Slides_A4.pdf
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Ece3075 a 8
Ece3075 a 8Ece3075 a 8
Ece3075 a 8
 
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.pptDeep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
 
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.pptDeep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
 
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.pptDeep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
Sangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
Sangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
Sangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
Sangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
Sangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Sangwoo Mo
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
Sangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
Sangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
Sangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
Sangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
Sangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Sangwoo Mo
 

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Improved Trainings of Wasserstein GANs (WGAN-GP)

  • 1. Improved Trainings of Wasserstein GANs Sangwoo Mo KAIST ALIN Lab. July 04, 2018 1
  • 2. Table of Contents Review for GANs Improved Training of WGANs 2
  • 3. Table of Contents Review for GANs Improved Training of WGANs 3
  • 4. Generative Adversarial Networks (GANs) Generative model aims to learn a model distribution pθ(x) be match with the target distribution p(x) Usually we assume x ∼ pθ(x) is a deterministic mapping x = Gθ(z) of a simple noise z ∼ p(z) * Figure from OpenAI blog. 4
  • 5. Generative Adversarial Networks (GANs) Q. How to train a generative model? Explicit model: directly optimize the objective (e.g. MLE) For example, PixelCNN maximizes log pθ(x) = n i=1 log pθ(xi | x1:i−1) 5
  • 6. Generative Adversarial Networks (GANs) Q. How to train a generative model? Explicit model: directly optimize the objective (e.g. MLE) Implicit model1: learning by comparison Idea of GAN Train a discriminator D which compares p(x) and pθ(x) Train a generator G using the signal from D 1 Do not know pθ(x) but only can sample from. 6
  • 7. Generative Adversarial Networks (GANs) What is happening in GAN? GAN plays a minimax game between G and D: min G max D V (G, D) where V (G, D) = Ex∼p(x)[log D(x)] + Ez∼p(z)[log(1 − D(G(z))] For given G, the optimal D∗ is D∗ (x) = p(x) p(x) + pθ(x) 7
  • 8. Generative Adversarial Networks (GANs) What is happening in GAN? Putting D∗ to the objective, we have C(G) = max D V (G, D) = KL(p p + pθ 2 ) + KL(pθ p + pθ 2 ) + const = 2 · JSD(p pθ) + const Hence, GAN minimizes the lower bound of JSD 8
  • 9. Generative Adversarial Networks (GANs) What is happening in GAN? In practice, GAN suffers from gradient vanishing To avoid this problem, we minimize − log D(G(z)) instead Putting D∗, we have C(G) = KL(pθ p) − 2 · JSD(p pθ) + const Hence, it minimizes the lower bound of reverse KL 9
  • 10. Wasserstein GANs (WGANs) Why GAN is unstable? Supports of p(x) and pθ(x) are disjoint1 a.s. Then JSD(p pθ) = log 2 KL(p pθ) = KL(pθ p) = +∞ The loss does not provide a valuable information Solution 1. Add noise to overlap supports 2. Use better divergence 1 Lie on the low-dimensional manifolds. 10
  • 11. Wasserstein GANs (WGANs) Toy example Let z ∼ U[0, 1] and x = (0, z) ∼ p(x) Let Gθ(z) = (θ, z), hence pθ(x) = p(x) for θ = 0 * Figure from Lilian Weng’s blog. 11
  • 12. Wasserstein GANs (WGANs) Toy example Here, Wasserstein distance is W (p pθ) = |θ| Unlike JSD and KL, it provides the closeness info. * Figure from WGAN paper. 12
  • 13. Wasserstein GANs (WGANs) Wasserstein distance Wasserstein-1 distance is W (p, q) = inf γ∈Π(p,q) E(x,y)∼γ[ x − y 1] Relation between divergences W = conv in dist. < JSD = TV < KL 13
  • 14. Wasserstein GANs (WGANs) How to minimize Wasserstein distance? Wasserstein-1 distance has a dual form: W (p, q) = sup f ∈F Ex∼p(x)[f (x)] − Ex∼q(x)[f (x)] where F is the set of 1-Lipschitz functions Hence, the objective of WGAN is min G max D∈D Ex∼p(x)[D(x)] − Ez∼p(z)[D(G(z))] To achieve Lipschitz constraints, WGAN uses weight clipping 14
  • 15. Table of Contents Review for GANs Improved Training of WGANs 15
  • 16. Motivation Motivation: weight clipping leads optimization difficulties 1 Restricts function space too simple 2 Gradient exploding/vanishing 16
  • 17. Observation Theorem 1 Let (x, y) ∼ γ∗ where γ∗ is optimal coupling and f ∗ is optimal function. Let xt = ty + (1 − t)x with 0 ≤ t ≤ 1. Then P(x,y)∼γ f ∗ (xt) = y − xt y − xt = 1 Corollary 2 f ∗ has gradient norm 1 a.e. on the line segments xy 17
  • 18. Observation Proof. For (x, y) ∼ γ∗, f ∗(y) − f ∗(x) = y − x a.s Let ψ(t) = f ∗(xt) − f ∗(x). Then |ψ(t) − ψ(t )| = f ∗ (xt) − f ∗ (xt ) ≤ xt − xt = x − y |t − t |, hence ψ(t) is x − y -Lipschitz. Using this, ψ(1) − ψ(0) = (ψ(1) − ψ(t)) + (ψ(t) − ψ(0)) ≤ (1 − t) x − y + t x − y = x − y , and equality holds since |ψ(1) − ψ(0)| = |f ∗ (y) − f ∗ (x)| = y − x 18
  • 19. Observation Proof. Thus, ψ(t) − ψ(0) = t x − y , and so ψ(t) = t x − y . Hence, f ∗(xt) = f ∗(x) + t y − x . Let v = (y − x)/ y − x . Then ∂ ∂v f ∗ (xt) = lim h→0 f ∗(xt + hv) − f ∗(xt) h = 1 Since f ∗(xt) ≤ 1, we conclude that f ∗(xt) = v. 19
  • 20. Gradient Penalty (WGAN-GP) From observation, we define gradient penalty λ Eˆx∼ˆp(x)[( ˆx D(ˆx) 2 − 1)2 ] where ˆx ∼ ˆp(x) is uniformly sampled from the line segment xy No critic BN: penalized gradient norm independently Two-sided penalty: also tried one-sided penalty max(0, D(ˆx) 2 − 1)2 but empirically no much difference 20
  • 21. Possible Improvement WGAN-GP does not sample (x, y) from optimal coupling γ∗ Instead, samples from x ∼ p(x), y ∼ pθ(x) It does not match with the theory (Theorem 1) Idea: (x, G(E(x))) would be a better approximation for γ∗ E is additionally trained encoder x → z G(E(x)) is projection of x to G manifold 21
  • 22. Experiments WGAN-GP improves the stability # of success1 for GAN & WGAN-GP 1 Inception score > threshold. Experiments on 32×32 ImageNet. 22
  • 24. Experiments WGAN-GP improves the performance Inception score on CIFAR-10. 24
  • 25. Reference Goodfellow et al. Generative Adversarial Nets. NIPS 2014. Arjovsky et al. Towards Principled Methods for Training GANs. ICLR 2017. Arjovsky et al. Wasserstein GAN. ICML 2017. Gulrajani et al. Improved Training of Wasserstein GANs. NIPS 2017. 25