SlideShare a Scribd company logo
Metrics	
  for	
  distributions	
  and	
  their	
  
applications	
  for	
  generative	
  models	
  
(part	
  1)
Dai	
  Hai	
  Nguyen
Kyoto	
  University
Learning	
  generative	
  models	
  ?
Q
𝑃"
Ρ
Distance	
  ( 𝑃",Q)=?
𝑄 =
1
𝑛
( 𝛿*+
,
-./
Learning generative models?
• Maximum Likelihood Estimation (MLE):
Given training samples 𝑥/, 𝑥2,…, 𝑥,, how to learn 𝑝45678 𝑥; 𝜃 from
which training samples are likely to be generated
𝜃∗
= 𝑎𝑟𝑔𝑚𝑎𝑥" ( log	
   𝑝45678(𝑥-; 𝜃)
,
-./
Learning	
  generative	
  models?
• Likelihood-free model
Random	
  input
NEURAL	
  NETWORK
Z~Uniform
Generator Output
Learning	
  generative	
  models	
  ?
Q
𝑃"
Ρ
Distance	
  ( 𝑃",Q)=?
𝑄 =
1
𝑛
( 𝛿*+
,
-./
How to measure similarity between 𝑝 and 𝑞 ?
§ Kullback-Leibler (KL) divergence: asymmetric, i.e., 𝐷HI(𝑝| 𝑞 ≠ 𝐷HI(𝑞| 𝑝
𝐷HI(𝑝| 𝑞 = L 𝑝 𝑥 𝑙𝑜𝑔
𝑝(𝑥)
𝑞(𝑥)
𝑑𝑥
§ Jensen-shanon (JS) divergence: symmetric
𝐷PQ(𝑝| 𝑞 =
1
2
𝐷HI(𝑝||
𝑝 + 𝑞
2
) +
1
2
𝐷HI(𝑞||
𝑝 + 𝑞
2
)
§ Optimal transport (OT):
𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓
X~Z([,)
𝐸 *,^ ~X[||𝑥 − 𝑦||]
Where Π(𝑝, 𝑞) is a set of all joint distribution of (X, Y) with marginals 𝑝 and 𝑞
Many fundamental problems can be cast as quantifying
similarity between two distributions
§ Maximum likelihood estimation (MLE) is equivalent to minimizing KL
divergence
Suppose we sample N of 𝑥~𝑝(𝑥|𝜃∗
)
MLE of 𝜃 is
𝜃∗
= argmin
"
−
1
𝑁
( log 𝑝 𝑥- 𝜃 =
j
-./
− Ε*~[ 𝑥 𝜃∗ [log 𝑝 𝑥 𝜃 ]
By def of KL divergence:
𝐷HI(𝑝(𝑥|𝜃∗
)| 𝑝 𝑥 𝜃 = Ε*~[ 𝑥 𝜃∗ [log
𝑝 𝑥 𝜃∗
𝑝 𝑥 𝜃
]
= Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃∗
− Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃
Training GAN is equivalent to minimizing JS divergence
§ GAN has two networks: D and G, which are playing a minimax game
min
l
max
n
𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~r(q) log(1 − 𝐷(𝐺(𝑧)))
= Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥))
Where 𝑝 𝑥 	
  and 𝑞(𝑥)	
  is the distributions of fake images and real images,
respectively
§ Fixing G, optimal D can be easily obtained:
𝐷 𝑥 =
𝑞(𝑥)
𝑝 𝑥 + 𝑞(𝑥)
Training GAN is equivalent to minimizing JS divergence
§ GAN has two networks: D and G, which are playing a minimax game
min
l
max
n
𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~[(q) log(1 − 𝐷(𝐺(𝑧)))
= Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥))
Where 𝑝 𝑥 	
  and 𝑞(𝑥)	
  is the distribution of fake and real images, respectively
§ Fixing G, optimal D can be easily obtained by:
𝐷 𝑥 =
𝑝(𝑥)
𝑝 𝑥 + 𝑞(𝑥)
And 𝐿 𝐷, 𝐺 = ∫ 𝑞 𝑥 𝑙𝑜𝑔
(*)
[ * u(*)
𝑑𝑥 + ∫ 𝑝 𝑥 𝑙𝑜𝑔
[(*)
[ * u(*)
𝑑𝑥
= 2𝐷PQ (𝑝| 𝑞 − log4
f-­‐divergences
• Divergence	
  between	
  two	
  distributions
𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓(
𝑞 𝑥
𝑝 𝑥
)𝑑𝑥
• f:	
  generator	
  function,	
  convex	
  and	
  f(1)	
  =	
  0
• Every	
  function	
  f	
  has	
  a	
  convex	
  conjugate	
  f*	
  such	
  that:
𝑓 𝑥 = sup
^∈654(w∗)
{𝑥𝑦	
   − 𝑓∗
(𝑦)}
f-­‐divergences
• Different	
  generator	
  f	
  give	
  different	
  divergences
Estimating	
  f-­‐divergences	
  from	
  samples
𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓
𝑞 𝑥
𝑝 𝑥
𝑑𝑥
= L 𝑝 𝑥 sup
~∈654(w∗)
{𝑡
𝑞 𝑥
𝑝 𝑥
− 𝑓∗
(𝑡)} 𝑑𝑥
≥ sup
•∈‚
{L 𝑞 𝑥 𝑇 𝑥 𝑑𝑥 −L 𝑝 𝑥 𝑓∗
𝑇 𝑥 𝑑𝑥}
= sup
•∈‚
{ 𝐸*~„ 𝑇 𝑥 − 𝐸*~… 𝑓∗
𝑇 𝑥 }
Samples	
  from	
  PSamples	
  from	
  Q
Conjugate	
  function	
  of	
  f(x):
𝑓∗
𝑥 = sup
~∈654(w)
{𝑡𝑥 − 𝑓(𝑡)}
Some	
  properties:
• 𝑓(𝑥) = sup
~∈654(w∗)
{𝑡𝑥 − 𝑓∗
𝑡 }
• 𝑓∗∗
𝑥 = 𝑓 𝑥
• 𝑓∗
𝑥 is	
  always	
  convec
Training	
  f-­‐divergence	
  GAN
• f-­‐GAN:
m𝑖𝑛
"
max
†
	
   𝐹 𝜃, 𝑤 = 𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰
𝑓∗
𝑇† 𝑥
f-­‐GAN:	
  Training	
  Generative	
  Neural	
  Sampler	
  using	
  Variational Divergence	
  Minimization,	
  NIPS2016
Turns	
  out:	
  GAN	
  is	
  a	
  specific	
  case	
  of	
  f-­‐divergence
• GAN:
m𝑖𝑛
"
max
†
𝐸*~„ log 𝐷† 𝑥 − 𝐸*~…‰
log(1 − 𝐷† 𝑥 )
• f-­‐GAN:
m𝑖𝑛
"
max
†
𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰
𝑓∗
𝑇† 𝑥
By	
  choosing	
  suitable	
  T	
  and	
  f,	
  f-­‐GAN	
  turns	
  into	
  original	
  GAN	
  (^^)
1-Wasserstein distance (another option)
§ It seeks for a probabilistic coupling 𝛾:
𝑊/ = min
X∈ℙ
L 𝑐 𝑥, 𝑦
𝒳×𝒴
𝛾 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 𝐸 *,^ ~X 𝑐(𝑥, 𝑦)
Where ℙ = {𝛾 ≥ 0, ∫ 𝛾 𝑥, 𝑦 𝑑𝑦 = 𝑝𝒴
, ∫ 𝛾 𝑥, 𝑦 𝑑𝑥 = 𝑞𝒳
}
𝑐 𝑥, 𝑦 is the displacement cost from x to y (e.g. Euclidean distance)
§ a.k.a Earth mover distance
§ Can be formulated as Linear Programming (convex)
Kantarovich’s formulation of OT
§ In case of discrete input
𝑝 = ( 𝑎- 𝛿*+
4
-./
, 𝑞 = ( 𝑏“ 𝛿^”
,
“./
§ Couplings:
ℙ = {𝑃 ≥ 0, 𝑃 ∈ ℝ4×,
, 𝑃1, = 𝑎, 𝑃•
14 = 𝑏}
§ LP problem: find P
𝑃 = argmin
…∈ℙ
< 𝑃, 𝐶 >
Where C is cost matrix, i.e. 𝐶-“ = 𝑐(𝑥-, 𝑦“)
Why OT is better than KL and JS divergences?
§ OT provides a smooth measure and
more useful than KL and JS
§ Example:
How to apply 1-Wassertain distance to GAN?
𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓
X~Z([,)
𝐸 *,^ ~X 𝑥 − 𝑦
= inf
X
< 𝐶, 𝛾 >
s.t. š
∑ 𝛾-“ = 𝑝-,,
“./ 	
   𝑖 = 1, 𝑚
∑ 𝛾-“ = 𝑞“,4
-./ 𝑗 = 1, 𝑛
min 𝑐• 𝑥
s.t. 	
  	
  	
  	
  	
  	
  	
   𝐴 𝑥 = 𝑏
𝑥 ≥ 0
m𝑎𝑥 𝑏• 𝑦
s.t. 	
  	
  	
  	
  	
  	
  	
   𝐴• 𝑦 ≤ 𝑐
Primal Dual
𝑐 = 𝑣𝑒𝑐 𝐶 ∈ ℝ4×,
𝑥 = 𝑣𝑒𝑐 𝛾 ∈ ℝ4×,
𝑏•
= [𝑝•
, 𝑞•
]•
∈ ℝ4u,
max 𝑓•
𝑝 + 𝑔•
𝑞
𝑠. 𝑡. 	
   𝑓- + 𝑔“ ≤ 𝐶-“, 𝑖 = 1, . . , 𝑚; 𝑗 = 1 … 𝑛
It	
  easy	
  to	
  see	
  that	
  	
   𝑓-= −𝑔- ,	
  so:	
  | 𝑓- − 𝑓“| ≤1|𝑥- − 𝑦“|
𝒲 U 𝑝, 𝑞 = 𝑠𝑢𝑝
||w||¥¦/
𝐸*~[ 𝑓 𝑥 − 𝐸*~ 𝑓(𝑥)
(Kantorovich-­‐Rubinstein	
  duality)
Training	
  WGAN
In	
  WGAN,	
  replace	
  discrimimator with	
   𝑓 and	
  minimize	
  1-­‐Wasserstain	
  distance:
min
"
𝒲 U 𝑝, 𝑞" = 𝑠𝑢𝑝
||†||§¦/
𝐸*~[ 𝑓† 𝑥 − 𝐸q~r(𝑔"(𝑧))
Ref:	
  Wasserstein	
  GAN,	
  ICML2017
Find	
   𝑤
Update	
  	
   𝜃
Thank	
  you	
  for	
  listening

More Related Content

What's hot

Formal systems introduction
Formal systems introductionFormal systems introduction
Formal systems introduction
davidoster
 
Restricted boltzmann machine
Restricted boltzmann machineRestricted boltzmann machine
Restricted boltzmann machine
강민국 강민국
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
Yoonho Lee
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoder
Kazuki Nitta
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
Masahiro Suzuki
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNTomonari Masada
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproducts
VjekoslavKovac1
 
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
Deep Learning JP
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30Ryan White
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
VjekoslavKovac1
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
VjekoslavKovac1
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)Shane Nicklas
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensors
Francesco Tudisco
 
Product Rules & Amp Laplacian 1
Product Rules & Amp Laplacian 1Product Rules & Amp Laplacian 1
Product Rules & Amp Laplacian 1
NumanUsama
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
ChenYiHuang5
 
On uniformly continuous uniform space
On uniformly continuous uniform spaceOn uniformly continuous uniform space
On uniformly continuous uniform space
theijes
 
Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3LiloSEA
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Francesco Tudisco
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
VjekoslavKovac1
 
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular LatticesIRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET Journal
 

What's hot (20)

Formal systems introduction
Formal systems introductionFormal systems introduction
Formal systems introduction
 
Restricted boltzmann machine
Restricted boltzmann machineRestricted boltzmann machine
Restricted boltzmann machine
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoder
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILN
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproducts
 
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensors
 
Product Rules & Amp Laplacian 1
Product Rules & Amp Laplacian 1Product Rules & Amp Laplacian 1
Product Rules & Amp Laplacian 1
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
On uniformly continuous uniform space
On uniformly continuous uniform spaceOn uniformly continuous uniform space
On uniformly continuous uniform space
 
Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular LatticesIRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
 

Similar to Metrics for generativemodels

Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
Asafak Husain
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
HassanAhmad442087
 
Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence Spaces
IOSR Journals
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
taeseon ryu
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
Shuai Zhang
 
Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
Tianlu Wang
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017
Masa Kato
 
Does Zero-Shot RL Exist
Does Zero-Shot RL ExistDoes Zero-Shot RL Exist
Does Zero-Shot RL Exist
Seungeon Baek
 
Lash
LashLash
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
NAVER Engineering
 
Basic calculus (ii) recap
Basic calculus (ii) recapBasic calculus (ii) recap
Basic calculus (ii) recap
Farzad Javidanrad
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spaces
IOSR Journals
 
Integral dalam Bahasa Inggris
Integral dalam Bahasa InggrisIntegral dalam Bahasa Inggris
Integral dalam Bahasa Inggris
immochacha
 
Annals of Statistics読み回 第一回
Annals of Statistics読み回 第一回Annals of Statistics読み回 第一回
Annals of Statistics読み回 第一回
jkomiyama
 
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICSBSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
Rai University
 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
inventionjournals
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissection
ChenYiHuang5
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Generalized Laplace - Mellin Integral Transformation
Generalized Laplace - Mellin Integral TransformationGeneralized Laplace - Mellin Integral Transformation
Generalized Laplace - Mellin Integral Transformation
IJERA Editor
 
DISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMDISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEM
MANISH KUMAR
 

Similar to Metrics for generativemodels (20)

Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence Spaces
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017
 
Does Zero-Shot RL Exist
Does Zero-Shot RL ExistDoes Zero-Shot RL Exist
Does Zero-Shot RL Exist
 
Lash
LashLash
Lash
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Basic calculus (ii) recap
Basic calculus (ii) recapBasic calculus (ii) recap
Basic calculus (ii) recap
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spaces
 
Integral dalam Bahasa Inggris
Integral dalam Bahasa InggrisIntegral dalam Bahasa Inggris
Integral dalam Bahasa Inggris
 
Annals of Statistics読み回 第一回
Annals of Statistics読み回 第一回Annals of Statistics読み回 第一回
Annals of Statistics読み回 第一回
 
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICSBSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissection
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Generalized Laplace - Mellin Integral Transformation
Generalized Laplace - Mellin Integral TransformationGeneralized Laplace - Mellin Integral Transformation
Generalized Laplace - Mellin Integral Transformation
 
DISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMDISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEM
 

More from Dai-Hai Nguyen

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
Dai-Hai Nguyen
 
IBSB tutorial
IBSB tutorialIBSB tutorial
IBSB tutorial
Dai-Hai Nguyen
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
Dai-Hai Nguyen
 
Hierarchical selection
Hierarchical selectionHierarchical selection
Hierarchical selection
Dai-Hai Nguyen
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property prediction
Dai-Hai Nguyen
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
Dai-Hai Nguyen
 
Seminar
SeminarSeminar
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
Dai-Hai Nguyen
 

More from Dai-Hai Nguyen (8)

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
IBSB tutorial
IBSB tutorialIBSB tutorial
IBSB tutorial
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Hierarchical selection
Hierarchical selectionHierarchical selection
Hierarchical selection
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property prediction
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
 
Seminar
SeminarSeminar
Seminar
 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
 

Recently uploaded

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 

Recently uploaded (20)

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 

Metrics for generativemodels

  • 1. Metrics  for  distributions  and  their   applications  for  generative  models   (part  1) Dai  Hai  Nguyen Kyoto  University
  • 2. Learning  generative  models  ? Q 𝑃" Ρ Distance  ( 𝑃",Q)=? 𝑄 = 1 𝑛 ( 𝛿*+ , -./
  • 3. Learning generative models? • Maximum Likelihood Estimation (MLE): Given training samples 𝑥/, 𝑥2,…, 𝑥,, how to learn 𝑝45678 𝑥; 𝜃 from which training samples are likely to be generated 𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥" ( log   𝑝45678(𝑥-; 𝜃) , -./
  • 4. Learning  generative  models? • Likelihood-free model Random  input NEURAL  NETWORK Z~Uniform Generator Output
  • 5. Learning  generative  models  ? Q 𝑃" Ρ Distance  ( 𝑃",Q)=? 𝑄 = 1 𝑛 ( 𝛿*+ , -./
  • 6. How to measure similarity between 𝑝 and 𝑞 ? § Kullback-Leibler (KL) divergence: asymmetric, i.e., 𝐷HI(𝑝| 𝑞 ≠ 𝐷HI(𝑞| 𝑝 𝐷HI(𝑝| 𝑞 = L 𝑝 𝑥 𝑙𝑜𝑔 𝑝(𝑥) 𝑞(𝑥) 𝑑𝑥 § Jensen-shanon (JS) divergence: symmetric 𝐷PQ(𝑝| 𝑞 = 1 2 𝐷HI(𝑝|| 𝑝 + 𝑞 2 ) + 1 2 𝐷HI(𝑞|| 𝑝 + 𝑞 2 ) § Optimal transport (OT): 𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓 X~Z([,) 𝐸 *,^ ~X[||𝑥 − 𝑦||] Where Π(𝑝, 𝑞) is a set of all joint distribution of (X, Y) with marginals 𝑝 and 𝑞
  • 7. Many fundamental problems can be cast as quantifying similarity between two distributions § Maximum likelihood estimation (MLE) is equivalent to minimizing KL divergence Suppose we sample N of 𝑥~𝑝(𝑥|𝜃∗ ) MLE of 𝜃 is 𝜃∗ = argmin " − 1 𝑁 ( log 𝑝 𝑥- 𝜃 = j -./ − Ε*~[ 𝑥 𝜃∗ [log 𝑝 𝑥 𝜃 ] By def of KL divergence: 𝐷HI(𝑝(𝑥|𝜃∗ )| 𝑝 𝑥 𝜃 = Ε*~[ 𝑥 𝜃∗ [log 𝑝 𝑥 𝜃∗ 𝑝 𝑥 𝜃 ] = Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃∗ − Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃
  • 8. Training GAN is equivalent to minimizing JS divergence § GAN has two networks: D and G, which are playing a minimax game min l max n 𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~r(q) log(1 − 𝐷(𝐺(𝑧))) = Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥)) Where 𝑝 𝑥  and 𝑞(𝑥)  is the distributions of fake images and real images, respectively § Fixing G, optimal D can be easily obtained: 𝐷 𝑥 = 𝑞(𝑥) 𝑝 𝑥 + 𝑞(𝑥)
  • 9. Training GAN is equivalent to minimizing JS divergence § GAN has two networks: D and G, which are playing a minimax game min l max n 𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~[(q) log(1 − 𝐷(𝐺(𝑧))) = Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥)) Where 𝑝 𝑥  and 𝑞(𝑥)  is the distribution of fake and real images, respectively § Fixing G, optimal D can be easily obtained by: 𝐷 𝑥 = 𝑝(𝑥) 𝑝 𝑥 + 𝑞(𝑥) And 𝐿 𝐷, 𝐺 = ∫ 𝑞 𝑥 𝑙𝑜𝑔 (*) [ * u(*) 𝑑𝑥 + ∫ 𝑝 𝑥 𝑙𝑜𝑔 [(*) [ * u(*) 𝑑𝑥 = 2𝐷PQ (𝑝| 𝑞 − log4
  • 10. f-­‐divergences • Divergence  between  two  distributions 𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓( 𝑞 𝑥 𝑝 𝑥 )𝑑𝑥 • f:  generator  function,  convex  and  f(1)  =  0 • Every  function  f  has  a  convex  conjugate  f*  such  that: 𝑓 𝑥 = sup ^∈654(w∗) {𝑥𝑦   − 𝑓∗ (𝑦)}
  • 11. f-­‐divergences • Different  generator  f  give  different  divergences
  • 12. Estimating  f-­‐divergences  from  samples 𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓 𝑞 𝑥 𝑝 𝑥 𝑑𝑥 = L 𝑝 𝑥 sup ~∈654(w∗) {𝑡 𝑞 𝑥 𝑝 𝑥 − 𝑓∗ (𝑡)} 𝑑𝑥 ≥ sup •∈‚ {L 𝑞 𝑥 𝑇 𝑥 𝑑𝑥 −L 𝑝 𝑥 𝑓∗ 𝑇 𝑥 𝑑𝑥} = sup •∈‚ { 𝐸*~„ 𝑇 𝑥 − 𝐸*~… 𝑓∗ 𝑇 𝑥 } Samples  from  PSamples  from  Q Conjugate  function  of  f(x): 𝑓∗ 𝑥 = sup ~∈654(w) {𝑡𝑥 − 𝑓(𝑡)} Some  properties: • 𝑓(𝑥) = sup ~∈654(w∗) {𝑡𝑥 − 𝑓∗ 𝑡 } • 𝑓∗∗ 𝑥 = 𝑓 𝑥 • 𝑓∗ 𝑥 is  always  convec
  • 13. Training  f-­‐divergence  GAN • f-­‐GAN: m𝑖𝑛 " max †   𝐹 𝜃, 𝑤 = 𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰ 𝑓∗ 𝑇† 𝑥 f-­‐GAN:  Training  Generative  Neural  Sampler  using  Variational Divergence  Minimization,  NIPS2016
  • 14. Turns  out:  GAN  is  a  specific  case  of  f-­‐divergence • GAN: m𝑖𝑛 " max † 𝐸*~„ log 𝐷† 𝑥 − 𝐸*~…‰ log(1 − 𝐷† 𝑥 ) • f-­‐GAN: m𝑖𝑛 " max † 𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰ 𝑓∗ 𝑇† 𝑥 By  choosing  suitable  T  and  f,  f-­‐GAN  turns  into  original  GAN  (^^)
  • 15. 1-Wasserstein distance (another option) § It seeks for a probabilistic coupling 𝛾: 𝑊/ = min X∈ℙ L 𝑐 𝑥, 𝑦 𝒳×𝒴 𝛾 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 𝐸 *,^ ~X 𝑐(𝑥, 𝑦) Where ℙ = {𝛾 ≥ 0, ∫ 𝛾 𝑥, 𝑦 𝑑𝑦 = 𝑝𝒴 , ∫ 𝛾 𝑥, 𝑦 𝑑𝑥 = 𝑞𝒳 } 𝑐 𝑥, 𝑦 is the displacement cost from x to y (e.g. Euclidean distance) § a.k.a Earth mover distance § Can be formulated as Linear Programming (convex)
  • 16. Kantarovich’s formulation of OT § In case of discrete input 𝑝 = ( 𝑎- 𝛿*+ 4 -./ , 𝑞 = ( 𝑏“ 𝛿^” , “./ § Couplings: ℙ = {𝑃 ≥ 0, 𝑃 ∈ ℝ4×, , 𝑃1, = 𝑎, 𝑃• 14 = 𝑏} § LP problem: find P 𝑃 = argmin …∈ℙ < 𝑃, 𝐶 > Where C is cost matrix, i.e. 𝐶-“ = 𝑐(𝑥-, 𝑦“)
  • 17. Why OT is better than KL and JS divergences? § OT provides a smooth measure and more useful than KL and JS § Example:
  • 18. How to apply 1-Wassertain distance to GAN? 𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓 X~Z([,) 𝐸 *,^ ~X 𝑥 − 𝑦 = inf X < 𝐶, 𝛾 > s.t. š ∑ 𝛾-“ = 𝑝-,, “./   𝑖 = 1, 𝑚 ∑ 𝛾-“ = 𝑞“,4 -./ 𝑗 = 1, 𝑛 min 𝑐• 𝑥 s.t.               𝐴 𝑥 = 𝑏 𝑥 ≥ 0 m𝑎𝑥 𝑏• 𝑦 s.t.               𝐴• 𝑦 ≤ 𝑐 Primal Dual 𝑐 = 𝑣𝑒𝑐 𝐶 ∈ ℝ4×, 𝑥 = 𝑣𝑒𝑐 𝛾 ∈ ℝ4×, 𝑏• = [𝑝• , 𝑞• ]• ∈ ℝ4u, max 𝑓• 𝑝 + 𝑔• 𝑞 𝑠. 𝑡.   𝑓- + 𝑔“ ≤ 𝐶-“, 𝑖 = 1, . . , 𝑚; 𝑗 = 1 … 𝑛 It  easy  to  see  that     𝑓-= −𝑔- ,  so:  | 𝑓- − 𝑓“| ≤1|𝑥- − 𝑦“| 𝒲 U 𝑝, 𝑞 = 𝑠𝑢𝑝 ||w||¥¦/ 𝐸*~[ 𝑓 𝑥 − 𝐸*~ 𝑓(𝑥) (Kantorovich-­‐Rubinstein  duality)
  • 19. Training  WGAN In  WGAN,  replace  discrimimator with   𝑓 and  minimize  1-­‐Wasserstain  distance: min " 𝒲 U 𝑝, 𝑞" = 𝑠𝑢𝑝 ||†||§¦/ 𝐸*~[ 𝑓† 𝑥 − 𝐸q~r(𝑔"(𝑧)) Ref:  Wasserstein  GAN,  ICML2017 Find   𝑤 Update     𝜃
  • 20. Thank  you  for  listening