SlideShare a Scribd company logo
1 of 29
Download to read offline
Generative models : VAE and GANs
Jinhwan Suk
Department of Mathematical Science, KAIST
May 7, 2020
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 1 / 29
Contents
Introduction to Information Theory
What is generative Model?
Example 1 : VAE
Example 2 : GANs
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 2 / 29
Introduction to Information Theory
Introduction
Information theory is a branch of applied mathematics.
Originally proposed by Claude Shannon in 1948.
A key measure in information theory is entropy.
Basic Intuition
Learning that an unlikely event has occurred is more informative than
learning that a likely event has occurred.
• Message 1 : ”the sun rose this morning”
• Message 2 : ”there was a solar eclipse this morning”
Message 2 is much more informative than Message 1.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 3 / 29
Introduction to Information Theory
Formalization of Intuitions
Likely events should have low information. And, events that are
guaranteed to happen should have no information content whatsoever.
Less likely events should have higher information content.
Independent events should have additive information. e.g. a tossed
coin has come up as head twice.
Properties of Information function I(x) = IX (x)
I(x) is a function of P(x).
I(x) is inversely proportional to P(x).
I(x) = 0 if P(x) = 1.
I(X1 = x1, X2 = x2) = I(X1 = x1) + I(X2 = x2)
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 4 / 29
Introduction to Information Theory
Formalization of Intuitions
Let X1 and X2 be independent random variables with
P(X1 = x1) = p1 and P(X2 = x2) = p2
Then, we have
I(X1 = x1, X2 = x2) = I(P(x1, x2))
= I(P(X1 = x1)P(X2 = x2))
= I(p1p2)
= I(p1) + I(p2)
Thus, I(p) = k log p for some k < 0.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 5 / 29
Introduction to Information Theory
Measure of Information
Definition (Self-Informaiton)
self-information of an event X = x is
I(x) = − log P(x).
self-information is a measure of information(or, uncertainity, surprise) of a
certain single event.
Definition (Shannon-Entropy)
Shannon entropy is the expected amount of information in an entire
probability distribution defined by
H(X) = EX∼P[I(X)] = −EX∼P[log P(X)].
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 6 / 29
Introduction to Information Theory
Density Estimation
In classification problem, we usually want to describe P(Y |X) for
each input X.
So many models(cθ) aim to estimate conditional probability
distribution by choosing optimal ˆθ such that
cˆθ(x)[i] = P(Y = yi |X = x),
like softmax classifier or Logistic regresor.
So we can regard the classification problem as the regression problem
such that minimizes
R(cθ) = EX [L(cθ(X), P(Y |X))]
(L measures distance between two probability distribution)
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 7 / 29
Introduction to Information Theory
Two ways of measuring distance between probability distributions
Definition (Total variation)
The total variation distance between two probability measures Pθ and
Pθ∗ is defined by
TV (Pθ, Pθ∗ ) = max
A:events
|Pθ(A) − Pθ∗ (A)|.
Definition (Kullback-Leibler divergence)
The KL divergence between two probability measures Pθ and Pθ∗ is
defined by
DKL(Pθ||Pθ∗ ) = EX∼Pθ
[log Pθ(X) − log Pθ∗ (X)],
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 8 / 29
Introduction to Information Theory
Cross-Entropy
We usually use KL-divergence because finding estimator of θ is much
easier in KL-divergence.
DKL(Pθ||Pθ∗ ) = EX∼Pθ
[log P(X) − log Pθ∗ (X)]
= EX∼Pθ
[log Pθ(X)] − EX∼Pθ
[log Pθ∗ (X)]
= constant − EX∼Pθ
[log Pθ∗ (X)]
Hence, minimizing the KL divergence is equivalent to minimizing
−EX∼Pθ
[log Pθ∗ (x)], whose name is cross-entropy. And the estimation
using estimator that minimizes KL divergence or Cross-entropy is called
maximum likelihood principle.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 9 / 29
Introduction to Information Theory
Maximum Likelihood Estimation
Pθ∗ is distribution of population and we want to choose proper estimator ˆθ
by minimizing the distance between Pθ∗ and Pˆθ,
DKL(Pθ∗ || Pˆθ) = const − EX∼Pθ∗ [log Pˆθ(X)]
If X1, X2, ..., Xn are random samples, then by LLN,
EX∼Pθ∗ [log Pˆθ(x)] ∼
1
n
n
i=1
log Pˆθ(Xi )
∴ DKL(Pθ∗ || Pˆθ) = const −
1
n
n
i=1
log Pˆθ(Xi )
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 10 / 29
Introduction to Information Theory
Maximum Likelihood Estimation
min
θ∈Θ
DKL(Pθ∗ || Pˆθ) ⇐⇒ min
θ∈Θ
−
1
n
n
i=1
log Pˆθ(Xi )
⇐⇒ max
θ∈Θ
1
n
n
i=1
log Pˆθ(Xi )
⇐⇒ max
θ∈Θ
n
i=1
log Pˆθ(Xi )
⇐⇒ max
θ∈Θ
n
i=1
Pˆθ(Xi )
This is the maximum likelihood principle.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 11 / 29
Introduction to Information Theory
Return to Main Goal : Find an estimator ˆθ that minimizes
R(cθ) = EX [L(cθ(X), P(Y |X))].
Suppose that X1, X2, ..., Xn are i.i.d and cross-entropy is used for L.
EX [L(cθ(X), P(Y |X))] ∼
1
n
n
i=1
L(cθ(Xi ), P(Y |Xi ))
=
1
n
n
i=1
−EY |Xi ∼PYemp|Xi
[log cθ(Xi )]
=
1
n
n
i=1
− log{cθ(Xi )[Yi,true]}.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 12 / 29
What is Generative Model?
Generative Model vs Discriminative model
A generative model is a statistical model of the joint distribution on
X × Y , P(X, Y )
A discriminative model is a model of the conditional probability of
the target given an observation x, P(Y |X = x).
In unsupervised learning, generative model usually means the
statistical model of P(X).
How can we estimate joint(conditional) distribution?
What do we obtain while estimating the probability distribution?
What can we do with generative model?
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 13 / 29
Example of Discriminative Model
Simple Linear Regression
Assumption : P(y|x) = N(α + βx, σ2), σ > 0 is known.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 14 / 29
Concept of VAE
Goal : estimate population distribution using given observations.
Strong assumption on existence of latent variables, Z ∼ N(0, I).
X|Z ∼ N(f (Z; θ), σ2
∗ I))
X|Z ∼ Bernoulli(f (Z; θ))
Let Pemp be empirical distribution(assumption : Pemp ≈ Ppop)
arg min
θ
DKL(Pemp(X)||Pθ(X)) = arg min
θ
const − EX∼Pemp [log Pθ(X)]
= arg max
θ
EX∼Pemp [log Pθ(X)]
= arg max
θ
1
N
N
i=1
[log Pθ(Xi )]
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 15 / 29
Concept of VAE
Naive approach
Maximize Pθ(Xi ) w.r.t θ for each samples X1, X2, ..., Xn.
=⇒ But, Pθ(Xi ) is intractable.
Pθ(Xi ) =
Z
Pθ(Xi , z) dz
=
Z
Pθ(Xi |z)P(z) dz
∼
1
n
n
j=1
Pθ(Xi |Zj )P(Zj )
If we pick n large, then the approximation would be done quite well.
But for efficiency, we look for some other way to set n small.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 16 / 29
Concept of VAE
ELBO
Use Monte Carlo method on
Pθ(Xi ) =
Z
Pθ(Xi , z) dz
=
Z
Pθ(z|Xi )P(Xi ) dz
∼
1
n
n
j=1
Pθ(Zj |Xi )P(Xi )
Pick Zj where Pθ(Zj |Xi ) is high ⇒ intractable
Set Qφ(Z|X) ∼ N(µφ(X), σφ(X)2) to estimate Pθ(Zj |Xi ).
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 17 / 29
Concept of VAE
DKL(Qφ(Z|X)||Pθ(Z|X)) = EZ∼Q|X [log Qφ(Z|X) − log Pθ(Z|X)]
= EZ∼Q|X [log Qφ(Z|X) − log Pθ(X, Z)]
+ log Pθ(X)
We want to maximize log Pθ(X) and minimize
DKL(Qφ(Z|X)||Pθ(Z|X)) at once.
Define L(θ, φ, X) = EZ∼Q|X [log Pθ(X, Z) − log Qφ(Z|X)]
log Pθ(X) − DKL(Qφ(Z|X)||Pθ(Z|X)) = L(θ, φ, X)
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 18 / 29
Concept of VAE
ELBO
L(θ, φ, X) = EZ∼Q|X [log Pθ(X, Z) − log Qφ(Z|X)]
= EZ∼Q|X [log Pθ(X|Z) + log Pθ(Z) − log Qφ(Z|X)]
= EZ∼Q|X [log Pθ(X|Z)] − DKL(Qφ(Z|X)||Pθ(Z))
DKL(Qφ(Z|X)||Pθ(Z)) can be integrated analytically
DKL(Qφ(Z|X)||Pθ(Z)) =
1
2
(1 + log σφ(X)2
) − µφ(X)2
− σφ(X)2
EZ∼Q|X [log Pθ(X|Z)] requires estimation by sampling.
EZ∼Q|X [log Pθ(X|Z)] ≈
1
n
n
i=1
log Pθ(X|zi )
=
1
n
n
i=1
[−
(X − f (z1; θ))2
2σ2
− log(
√
2πσ2
)]
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 19 / 29
Concept of VAE
ELBO
Maximizing L(θ, φ, X) is equal to minimizing
1
n
n
i=1
(X − f (z1; θ))2
2σ2
+
1
2
(1 + log σφ(X)2
) − µφ(X)2
− σφ(X)2
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 20 / 29
Concept of VAE
Problem of the above formulation
Since Pθ(Xi ) ≈ 1
n
n
j=1 P(Xi |zj )P(zj ) and we use n = 1,
log Pθ(Xi ) ≈ log[P(Xi |z1)P(z1)]
= log P(Xi |z1) + log P(z1)
= log
1
√
2πσ
exp(−
(Xi − f (z1; θ))2
2σ2
) + log
1
√
2π
exp(−
z2
1
2
)
= −
(Xi − f (z1; θ))2
2σ2
+ const.
Therfore, maximizing log Pθ(Xi ) is transformed to
minimizing −(Xi −f (z1;θ))2
2σ2 .
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 21 / 29
Concept of VAE
Problem of the above formulation
To address this problem, we should set σ very small
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 22 / 29
Concept of GANs
Introduction
Goal : estimate population distribution using given observations.
Strong assumption on existence of latent variables, Z ∼ PZ .
Define G(z; θg ) which is mapping to data space,
Pg (X = x) = PZ (G(Z) = x)
Define D(x; θd ) that represents the probability that x is real.
min
G
max
D
V (D, G) = Ex∼Pemp [log D(x)] + E[log(1 − D(G(z)))]
What is difference between VAE and GANs??
⇒ GANs do not formulate about P(X) explicitly.
⇒ But we can show it has a global optimum Pg = Pemp
⇒ So we can say that GANs is generative model.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 23 / 29
Concept of GANs
Algorithm
V (D, G) = Ex∼Pemp [log D(x)] + E[log(1 − D(G(z)))]
∼
1
m
m
i=1
log D(xi ) +
1
m
m
j=1
log(1 − D(G(zj )))
1 Sample minibatch of m noise samples and minibatch of m examples.
2 update the discriminator by ascending its stochastic gradient :
1
m
m
i=1
θd
[log D(xi ) + log(1 − D(G(zi )))]
3 Sample minibatch of m noise samples.
4 Update the generator by descending its stochastic gradient :
1
m
m
i=1
θg [log(1 − D(G(zi )))]
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 24 / 29
Concept of GANs
Global optimality of Pg = Pemp
Proposition 1
For G fixed, the optimal discriminator D is
D∗
G (x) =
Pemp(x)
Pemp(x) + Pg (x)
Proposition 2
The global minimum of the virtual training criterion is achieved if and only
if Pg = Pemp.
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 25 / 29
proof of Proposition 1 :
Let generator G be fixed and define Ax = {z ∈ Z : G(z) = x}.
V (G, Dθ) =
x∈X
log(Dθ(x))Pemp(x) dx +
z∈Z
log(1 − Dθ(G(z)))PZ (z) dz
=
x∈X
log(Dθ(x))Pemp(x) dx +
x∈X z∈Ax
log(1 − Dθ(G(z)))PZ (z) dz dx
=
x∈X
log(Dθ(x))Pemp(x) dx +
x∈X
log(1 − Dθ(x))
z∈Ax
PZ (z) dz dx
=
x∈X
log(Dθ(x))Pemp(x) dx +
x∈X
log(1 − Dθ(x))Pg (x) dx
=
x∈X
log(Dθ(x))Pemp(x) + log(1 − Dθ(x))Pg (x) dx
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 26 / 29
proof of Proposition 1(continued) :
V (G, Dθ) achieves the minimum when
∂
∂θ [log(Dθ(x))Pemp(x) + log(1 − Dθ(x))Pg (x)] = 0 ∀x ∈ X.
⇔
∂
∂θ
Dθ(x)
Dθ(x) Pemp(x) −
∂
∂θ
Dθ(x)
1−Dθ(x) Pg (x) = 0 ∀x ∈ X
⇔ Dˆθ(x) =
Pemp(x)
Pemp(x)+Pg (x) ∀x ∈ X
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 27 / 29
proof of Proposition 2 :
min
G
max
D
V (G, D) = min
G
V (G, D∗
G )
= Ex∼Pemp
[log D∗
G (x)] + E[log(1 − D∗
G (G(z)))]
= Ex∼Pemp
[log D∗
G (x)] + Ex∼Pg
[log(1 − D∗
G (x))]
= Ex∼Pemp
log
Pemp(x)
Pemp(x) + Pg (x)
+ Ex∼Pg
log
Pg (x)
Pemp(x) + Pg (x)
= Ex∼Pemp
log Pemp(x) − log
Pemp(x) + Pg (x)
2
− log 2
+ Ex∼Pg log Pg (x) − log
Pemp(x) + Pg (x)
2
− log 2
= DKL(Pemp||
Pemp(x) + Pg (x)
2
) + DKL(Pg ||
Pemp(x) + Pg (x)
2
)
− 2 log 2
≥ −2 log 2
The equality holds if and only if Pemp =
Pemp(x)+Pg (x)
2 and Pg =
Pemp(x)+Pg (x)
2
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 28 / 29
Thank you
Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 29 / 29

More Related Content

What's hot

When Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewWhen Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewMohamed Farouk
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceArthur Charpentier
 
Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Arthur Charpentier
 
Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...Pierre BAUDOT
 
Some sampling techniques for big data analysis
Some sampling techniques for big data analysisSome sampling techniques for big data analysis
Some sampling techniques for big data analysisJae-kwang Kim
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Arthur Charpentier
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDValentin De Bortoli
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for ActuariesArthur Charpentier
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesisValentin De Bortoli
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 

What's hot (20)

When Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewWhen Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying View
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Side 2019, part 2
Side 2019, part 2Side 2019, part 2
Side 2019, part 2
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1
 
Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...
 
Some sampling techniques for big data analysis
Some sampling techniques for big data analysisSome sampling techniques for big data analysis
Some sampling techniques for big data analysis
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for Actuaries
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
 
Side 2019, part 1
Side 2019, part 1Side 2019, part 1
Side 2019, part 1
 
Gtti 10032021
Gtti 10032021Gtti 10032021
Gtti 10032021
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 

Similar to Generative models : VAE and GAN

Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence TestJoe Suzuki
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Advanced-Concepts-Team
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 

Similar to Generative models : VAE and GAN (20)

Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
 
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Lecture2 xing
Lecture2 xingLecture2 xing
Lecture2 xing
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 

More from SEMINARGROOT

Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learningSEMINARGROOT
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMCSEMINARGROOT
 
Demystifying Neural Style Transfer
Demystifying Neural Style TransferDemystifying Neural Style Transfer
Demystifying Neural Style TransferSEMINARGROOT
 
Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.SEMINARGROOT
 
The ways of node embedding
The ways of node embeddingThe ways of node embedding
The ways of node embeddingSEMINARGROOT
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional NetworkSEMINARGROOT
 
Denoising With Frequency Domain
Denoising With Frequency DomainDenoising With Frequency Domain
Denoising With Frequency DomainSEMINARGROOT
 
Bayesian Statistics
Bayesian StatisticsBayesian Statistics
Bayesian StatisticsSEMINARGROOT
 
Coding Test Review 3
Coding Test Review 3Coding Test Review 3
Coding Test Review 3SEMINARGROOT
 
Time Series Analysis - ARMA
Time Series Analysis - ARMATime Series Analysis - ARMA
Time Series Analysis - ARMASEMINARGROOT
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine LearningSEMINARGROOT
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsSEMINARGROOT
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedSEMINARGROOT
 
WWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewWWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewSEMINARGROOT
 
Coding test review 2
Coding test review 2Coding test review 2
Coding test review 2SEMINARGROOT
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashingSEMINARGROOT
 
Coding Test Review1
Coding Test Review1Coding Test Review1
Coding Test Review1SEMINARGROOT
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodSEMINARGROOT
 

More from SEMINARGROOT (20)

Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learning
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMC
 
Demystifying Neural Style Transfer
Demystifying Neural Style TransferDemystifying Neural Style Transfer
Demystifying Neural Style Transfer
 
Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.
 
The ways of node embedding
The ways of node embeddingThe ways of node embedding
The ways of node embedding
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional Network
 
Denoising With Frequency Domain
Denoising With Frequency DomainDenoising With Frequency Domain
Denoising With Frequency Domain
 
Bayesian Statistics
Bayesian StatisticsBayesian Statistics
Bayesian Statistics
 
Coding Test Review 3
Coding Test Review 3Coding Test Review 3
Coding Test Review 3
 
Time Series Analysis - ARMA
Time Series Analysis - ARMATime Series Analysis - ARMA
Time Series Analysis - ARMA
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine Learning
 
Effective Python
Effective PythonEffective Python
Effective Python
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Attention
AttentionAttention
Attention
 
WWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewWWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial Review
 
Coding test review 2
Coding test review 2Coding test review 2
Coding test review 2
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Coding Test Review1
Coding Test Review1Coding Test Review1
Coding Test Review1
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's method
 

Recently uploaded

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 

Generative models : VAE and GAN

  • 1. Generative models : VAE and GANs Jinhwan Suk Department of Mathematical Science, KAIST May 7, 2020 Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 1 / 29
  • 2. Contents Introduction to Information Theory What is generative Model? Example 1 : VAE Example 2 : GANs Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 2 / 29
  • 3. Introduction to Information Theory Introduction Information theory is a branch of applied mathematics. Originally proposed by Claude Shannon in 1948. A key measure in information theory is entropy. Basic Intuition Learning that an unlikely event has occurred is more informative than learning that a likely event has occurred. • Message 1 : ”the sun rose this morning” • Message 2 : ”there was a solar eclipse this morning” Message 2 is much more informative than Message 1. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 3 / 29
  • 4. Introduction to Information Theory Formalization of Intuitions Likely events should have low information. And, events that are guaranteed to happen should have no information content whatsoever. Less likely events should have higher information content. Independent events should have additive information. e.g. a tossed coin has come up as head twice. Properties of Information function I(x) = IX (x) I(x) is a function of P(x). I(x) is inversely proportional to P(x). I(x) = 0 if P(x) = 1. I(X1 = x1, X2 = x2) = I(X1 = x1) + I(X2 = x2) Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 4 / 29
  • 5. Introduction to Information Theory Formalization of Intuitions Let X1 and X2 be independent random variables with P(X1 = x1) = p1 and P(X2 = x2) = p2 Then, we have I(X1 = x1, X2 = x2) = I(P(x1, x2)) = I(P(X1 = x1)P(X2 = x2)) = I(p1p2) = I(p1) + I(p2) Thus, I(p) = k log p for some k < 0. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 5 / 29
  • 6. Introduction to Information Theory Measure of Information Definition (Self-Informaiton) self-information of an event X = x is I(x) = − log P(x). self-information is a measure of information(or, uncertainity, surprise) of a certain single event. Definition (Shannon-Entropy) Shannon entropy is the expected amount of information in an entire probability distribution defined by H(X) = EX∼P[I(X)] = −EX∼P[log P(X)]. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 6 / 29
  • 7. Introduction to Information Theory Density Estimation In classification problem, we usually want to describe P(Y |X) for each input X. So many models(cθ) aim to estimate conditional probability distribution by choosing optimal ˆθ such that cˆθ(x)[i] = P(Y = yi |X = x), like softmax classifier or Logistic regresor. So we can regard the classification problem as the regression problem such that minimizes R(cθ) = EX [L(cθ(X), P(Y |X))] (L measures distance between two probability distribution) Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 7 / 29
  • 8. Introduction to Information Theory Two ways of measuring distance between probability distributions Definition (Total variation) The total variation distance between two probability measures Pθ and Pθ∗ is defined by TV (Pθ, Pθ∗ ) = max A:events |Pθ(A) − Pθ∗ (A)|. Definition (Kullback-Leibler divergence) The KL divergence between two probability measures Pθ and Pθ∗ is defined by DKL(Pθ||Pθ∗ ) = EX∼Pθ [log Pθ(X) − log Pθ∗ (X)], Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 8 / 29
  • 9. Introduction to Information Theory Cross-Entropy We usually use KL-divergence because finding estimator of θ is much easier in KL-divergence. DKL(Pθ||Pθ∗ ) = EX∼Pθ [log P(X) − log Pθ∗ (X)] = EX∼Pθ [log Pθ(X)] − EX∼Pθ [log Pθ∗ (X)] = constant − EX∼Pθ [log Pθ∗ (X)] Hence, minimizing the KL divergence is equivalent to minimizing −EX∼Pθ [log Pθ∗ (x)], whose name is cross-entropy. And the estimation using estimator that minimizes KL divergence or Cross-entropy is called maximum likelihood principle. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 9 / 29
  • 10. Introduction to Information Theory Maximum Likelihood Estimation Pθ∗ is distribution of population and we want to choose proper estimator ˆθ by minimizing the distance between Pθ∗ and Pˆθ, DKL(Pθ∗ || Pˆθ) = const − EX∼Pθ∗ [log Pˆθ(X)] If X1, X2, ..., Xn are random samples, then by LLN, EX∼Pθ∗ [log Pˆθ(x)] ∼ 1 n n i=1 log Pˆθ(Xi ) ∴ DKL(Pθ∗ || Pˆθ) = const − 1 n n i=1 log Pˆθ(Xi ) Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 10 / 29
  • 11. Introduction to Information Theory Maximum Likelihood Estimation min θ∈Θ DKL(Pθ∗ || Pˆθ) ⇐⇒ min θ∈Θ − 1 n n i=1 log Pˆθ(Xi ) ⇐⇒ max θ∈Θ 1 n n i=1 log Pˆθ(Xi ) ⇐⇒ max θ∈Θ n i=1 log Pˆθ(Xi ) ⇐⇒ max θ∈Θ n i=1 Pˆθ(Xi ) This is the maximum likelihood principle. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 11 / 29
  • 12. Introduction to Information Theory Return to Main Goal : Find an estimator ˆθ that minimizes R(cθ) = EX [L(cθ(X), P(Y |X))]. Suppose that X1, X2, ..., Xn are i.i.d and cross-entropy is used for L. EX [L(cθ(X), P(Y |X))] ∼ 1 n n i=1 L(cθ(Xi ), P(Y |Xi )) = 1 n n i=1 −EY |Xi ∼PYemp|Xi [log cθ(Xi )] = 1 n n i=1 − log{cθ(Xi )[Yi,true]}. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 12 / 29
  • 13. What is Generative Model? Generative Model vs Discriminative model A generative model is a statistical model of the joint distribution on X × Y , P(X, Y ) A discriminative model is a model of the conditional probability of the target given an observation x, P(Y |X = x). In unsupervised learning, generative model usually means the statistical model of P(X). How can we estimate joint(conditional) distribution? What do we obtain while estimating the probability distribution? What can we do with generative model? Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 13 / 29
  • 14. Example of Discriminative Model Simple Linear Regression Assumption : P(y|x) = N(α + βx, σ2), σ > 0 is known. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 14 / 29
  • 15. Concept of VAE Goal : estimate population distribution using given observations. Strong assumption on existence of latent variables, Z ∼ N(0, I). X|Z ∼ N(f (Z; θ), σ2 ∗ I)) X|Z ∼ Bernoulli(f (Z; θ)) Let Pemp be empirical distribution(assumption : Pemp ≈ Ppop) arg min θ DKL(Pemp(X)||Pθ(X)) = arg min θ const − EX∼Pemp [log Pθ(X)] = arg max θ EX∼Pemp [log Pθ(X)] = arg max θ 1 N N i=1 [log Pθ(Xi )] Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 15 / 29
  • 16. Concept of VAE Naive approach Maximize Pθ(Xi ) w.r.t θ for each samples X1, X2, ..., Xn. =⇒ But, Pθ(Xi ) is intractable. Pθ(Xi ) = Z Pθ(Xi , z) dz = Z Pθ(Xi |z)P(z) dz ∼ 1 n n j=1 Pθ(Xi |Zj )P(Zj ) If we pick n large, then the approximation would be done quite well. But for efficiency, we look for some other way to set n small. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 16 / 29
  • 17. Concept of VAE ELBO Use Monte Carlo method on Pθ(Xi ) = Z Pθ(Xi , z) dz = Z Pθ(z|Xi )P(Xi ) dz ∼ 1 n n j=1 Pθ(Zj |Xi )P(Xi ) Pick Zj where Pθ(Zj |Xi ) is high ⇒ intractable Set Qφ(Z|X) ∼ N(µφ(X), σφ(X)2) to estimate Pθ(Zj |Xi ). Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 17 / 29
  • 18. Concept of VAE DKL(Qφ(Z|X)||Pθ(Z|X)) = EZ∼Q|X [log Qφ(Z|X) − log Pθ(Z|X)] = EZ∼Q|X [log Qφ(Z|X) − log Pθ(X, Z)] + log Pθ(X) We want to maximize log Pθ(X) and minimize DKL(Qφ(Z|X)||Pθ(Z|X)) at once. Define L(θ, φ, X) = EZ∼Q|X [log Pθ(X, Z) − log Qφ(Z|X)] log Pθ(X) − DKL(Qφ(Z|X)||Pθ(Z|X)) = L(θ, φ, X) Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 18 / 29
  • 19. Concept of VAE ELBO L(θ, φ, X) = EZ∼Q|X [log Pθ(X, Z) − log Qφ(Z|X)] = EZ∼Q|X [log Pθ(X|Z) + log Pθ(Z) − log Qφ(Z|X)] = EZ∼Q|X [log Pθ(X|Z)] − DKL(Qφ(Z|X)||Pθ(Z)) DKL(Qφ(Z|X)||Pθ(Z)) can be integrated analytically DKL(Qφ(Z|X)||Pθ(Z)) = 1 2 (1 + log σφ(X)2 ) − µφ(X)2 − σφ(X)2 EZ∼Q|X [log Pθ(X|Z)] requires estimation by sampling. EZ∼Q|X [log Pθ(X|Z)] ≈ 1 n n i=1 log Pθ(X|zi ) = 1 n n i=1 [− (X − f (z1; θ))2 2σ2 − log( √ 2πσ2 )] Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 19 / 29
  • 20. Concept of VAE ELBO Maximizing L(θ, φ, X) is equal to minimizing 1 n n i=1 (X − f (z1; θ))2 2σ2 + 1 2 (1 + log σφ(X)2 ) − µφ(X)2 − σφ(X)2 Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 20 / 29
  • 21. Concept of VAE Problem of the above formulation Since Pθ(Xi ) ≈ 1 n n j=1 P(Xi |zj )P(zj ) and we use n = 1, log Pθ(Xi ) ≈ log[P(Xi |z1)P(z1)] = log P(Xi |z1) + log P(z1) = log 1 √ 2πσ exp(− (Xi − f (z1; θ))2 2σ2 ) + log 1 √ 2π exp(− z2 1 2 ) = − (Xi − f (z1; θ))2 2σ2 + const. Therfore, maximizing log Pθ(Xi ) is transformed to minimizing −(Xi −f (z1;θ))2 2σ2 . Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 21 / 29
  • 22. Concept of VAE Problem of the above formulation To address this problem, we should set σ very small Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 22 / 29
  • 23. Concept of GANs Introduction Goal : estimate population distribution using given observations. Strong assumption on existence of latent variables, Z ∼ PZ . Define G(z; θg ) which is mapping to data space, Pg (X = x) = PZ (G(Z) = x) Define D(x; θd ) that represents the probability that x is real. min G max D V (D, G) = Ex∼Pemp [log D(x)] + E[log(1 − D(G(z)))] What is difference between VAE and GANs?? ⇒ GANs do not formulate about P(X) explicitly. ⇒ But we can show it has a global optimum Pg = Pemp ⇒ So we can say that GANs is generative model. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 23 / 29
  • 24. Concept of GANs Algorithm V (D, G) = Ex∼Pemp [log D(x)] + E[log(1 − D(G(z)))] ∼ 1 m m i=1 log D(xi ) + 1 m m j=1 log(1 − D(G(zj ))) 1 Sample minibatch of m noise samples and minibatch of m examples. 2 update the discriminator by ascending its stochastic gradient : 1 m m i=1 θd [log D(xi ) + log(1 − D(G(zi )))] 3 Sample minibatch of m noise samples. 4 Update the generator by descending its stochastic gradient : 1 m m i=1 θg [log(1 − D(G(zi )))] Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 24 / 29
  • 25. Concept of GANs Global optimality of Pg = Pemp Proposition 1 For G fixed, the optimal discriminator D is D∗ G (x) = Pemp(x) Pemp(x) + Pg (x) Proposition 2 The global minimum of the virtual training criterion is achieved if and only if Pg = Pemp. Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 25 / 29
  • 26. proof of Proposition 1 : Let generator G be fixed and define Ax = {z ∈ Z : G(z) = x}. V (G, Dθ) = x∈X log(Dθ(x))Pemp(x) dx + z∈Z log(1 − Dθ(G(z)))PZ (z) dz = x∈X log(Dθ(x))Pemp(x) dx + x∈X z∈Ax log(1 − Dθ(G(z)))PZ (z) dz dx = x∈X log(Dθ(x))Pemp(x) dx + x∈X log(1 − Dθ(x)) z∈Ax PZ (z) dz dx = x∈X log(Dθ(x))Pemp(x) dx + x∈X log(1 − Dθ(x))Pg (x) dx = x∈X log(Dθ(x))Pemp(x) + log(1 − Dθ(x))Pg (x) dx Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 26 / 29
  • 27. proof of Proposition 1(continued) : V (G, Dθ) achieves the minimum when ∂ ∂θ [log(Dθ(x))Pemp(x) + log(1 − Dθ(x))Pg (x)] = 0 ∀x ∈ X. ⇔ ∂ ∂θ Dθ(x) Dθ(x) Pemp(x) − ∂ ∂θ Dθ(x) 1−Dθ(x) Pg (x) = 0 ∀x ∈ X ⇔ Dˆθ(x) = Pemp(x) Pemp(x)+Pg (x) ∀x ∈ X Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 27 / 29
  • 28. proof of Proposition 2 : min G max D V (G, D) = min G V (G, D∗ G ) = Ex∼Pemp [log D∗ G (x)] + E[log(1 − D∗ G (G(z)))] = Ex∼Pemp [log D∗ G (x)] + Ex∼Pg [log(1 − D∗ G (x))] = Ex∼Pemp log Pemp(x) Pemp(x) + Pg (x) + Ex∼Pg log Pg (x) Pemp(x) + Pg (x) = Ex∼Pemp log Pemp(x) − log Pemp(x) + Pg (x) 2 − log 2 + Ex∼Pg log Pg (x) − log Pemp(x) + Pg (x) 2 − log 2 = DKL(Pemp|| Pemp(x) + Pg (x) 2 ) + DKL(Pg || Pemp(x) + Pg (x) 2 ) − 2 log 2 ≥ −2 log 2 The equality holds if and only if Pemp = Pemp(x)+Pg (x) 2 and Pg = Pemp(x)+Pg (x) 2 Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 28 / 29
  • 29. Thank you Jinhwan Suk (Department of Mathematical Science, KAIST)Generative models : VAE and GANs May 7, 2020 29 / 29