1. Introduction to Deep
Generative Models
Herman Dong
Music and Audio Computing Lab (MACLab),
Research Center for Information Technology Innovation,
Academia Sinica
2. MuseGAN
Learn about our recent work on using GAN to compose
pop song at https://salu133445.github.io/musegan/
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang and Yi-Hsuan Yang. 2017. MuseGAN: Symbolic-domain Music Generation
and Accompaniment with Multi-track Sequential Generative Adversarial Networks. arXiv preprint arXiv:1709.06298.
3. Outline
• Brief introduction to deep generative models
• AE (Autoencoder)
• VAE (Variational Autoencoder)
• GAN (Generative Adversarial Networks)
• AAE (Adversarial Autoencoder)
• VAE/GAN
• ADA (Adversarial Domain Adaption)
• Reformulation
• Graphical model representation
• Connection to Wake-sleep Algorithm
7. VAE (Variational Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
PP(z) z ε
latentdata
8. VAE (Variational Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
≈
9. VAE (Variational Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
≈
10. VAE (Variational Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
μ σ
≈
11. VAE (Variational Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
μ σ
μ
=
z σ
+ ∙
ε
≈
12. VAE (Variational Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
testing
(generation)
reconstruction
loss
KL
divergence
latentdata
μ σ
μ
=
z σ
+ ∙
ε
≈
13. GAN (Generative Adversarial Network)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
latent data
14. GAN (Generative Adversarial Network)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
latent data
15. GAN (Generative Adversarial Network)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
latent data
16. GAN (Generative Adversarial Network)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
latent data
17. GAN (Generative Adversarial Network)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
testing
(generation)
latent data
18. GAN vs VAE
• GAN
• Generator aim to fool the discriminator
• Discriminator aim to distinguish generated data from real data
• output images are sharper
• higher diversity, lower stability
• VAE
• Objective: reconstruct real data
• using pixel-to-pixel loss
• output images are more blurred
• lower diversity, higher stability
GAN
VAE
A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
19. GAN vs VAE
• GAN
• Generator aim to fool the discriminator
• Discriminator aim to distinguish generated data from real data
• output images are sharper
• higher diversity, lower stability
• VAE
• Objective: reconstruct real data
• using pixel-to-pixel loss
• output images are more blurred
• lower diversity, higher stability
GAN
VAE
A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
20. AAE (Adversarial Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
PP(z) z ε
reconstruction
loss
latentdata
≈
Loss
KL
divergence
21. AAE (Adversarial Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
PP(z)
reconstruction
loss
latentdata
≈
D 1/0
adversarial
training
Make Q(X) undistinguishable
from real p(X) for D
Distinguish Q(X) being fake
from p(X) being real
22. AAE (Adversarial Autoencoder)
• minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
PP(z)
reconstruction
loss
latentdata
≈
D 1/0
adversarial
training
Make Q(X) undistinguishable
from real p(X) for D
Distinguish Q(X) being fake
from p(X) being real
27. VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) z ε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
(G)
X
Make G(z) and P(z)
undistinguishable
from real data
28. VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) z ε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
(G)
X
Make G(z) and P(z)
undistinguishable
from real data
29. VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) z ε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩
(G)
X
Make G(z) and P(z)
undistinguishable
from real data
𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
30. VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) z ε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩
(G)
𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
X
Make G(z) and P(z)
undistinguishable
from real data
32. VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) z ε
KL
divergence
latent
data
≈
G(z) zG
latent
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩
(G)
𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
!?
X
39. ADA (Adversarial Domain Adaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
data feature
Xsrc C class
40. ADA (Adversarial Domain Adaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
data feature
Xsrc
G(Xtgt)
C class
bad features
41. ADA (Adversarial Domain Adaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
adversarial
training
Make G(Xtgt) and G(Xsrc)
undistinguishable for D
Distinguish G(Xtgt) as target domain
from G(Xsrc) as source domain
data feature
Xsrc
G(Xtgt) D 1/0
C class
42. ADA (Adversarial Domain Adaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
adversarial
training
Make G(Xtgt) and G(Xsrc)
undistinguishable for D
Distinguish G(Xtgt) as target domain
from G(Xsrc) as source domain
data feature
Xsrc
G(Xtgt) D 1/0
C class
43. ADA (Adversarial Domain Adaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
adversarial
training
Make G(Xtgt) and G(Xsrc)
undistinguishable for D
Distinguish G(Xtgt) as target domain
from G(Xsrc) as source domain
data feature
Xsrc
G(Xtgt) D 1/0
C class
testing
(classification)
44. On Unifying Deep Generative Models
z y
x
latent (data)
data (feature)
label
45. On Unifying Deep Generative Models
z y
x
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
46. On Unifying Deep Generative Models
• 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator
• 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator
z y
x
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
47. On Unifying Deep Generative Models
• 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator
• 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator
• Solid line – generative process
• Dashed line – inference process
• Hollow arrow – deterministic transformation
• Red arrow – adversarial mechanism
• 𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙
z y
x
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
ADA
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
GAN
48. On Unifying Deep Generative Models
• 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator
• 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator
• Solid line – generative process
• Dashed line – inference process
• Hollow arrow – deterministic transformation
• Red arrow – adversarial mechanism
• 𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
GAN
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
ADA
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
(ADA)
55. On Unifying Deep Generative Models
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒛𝒛
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
56. On Unifying Deep Generative Models
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
label
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
InfoGAN
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
degenerated
code space
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
57. On Unifying Deep Generative Models
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒛𝒛
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
58. On Unifying Deep Generative Models
x y
z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒛𝒛
data
latent
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒙𝒙
𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
59. On Unifying Deep Generative Models
x y
z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒛𝒛
data
latent
label
AAE
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒙𝒙
𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
60. GAN vs VAE
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚
𝒒𝒒∗
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
VAE
label
61. InfoGAN vs VAE
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚
𝒒𝒒∗
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
VAE
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
label
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
InfoGAN
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
62. InfoGAN vs AAE
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
label
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
InfoGAN
x y
z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒛𝒛
latent
label
𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝜼𝜼 𝒙𝒙 𝒛𝒛, 𝒚𝒚
data
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
AAE
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
63. Wake-sleep Algorithm
• 𝒉𝒉 - general latent variables
• 𝝀𝝀 - general parameters
• 𝜽𝜽 - generator parameters
• In wake phase, update 𝜽𝜽 by fitting 𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 to 𝒙𝒙 and 𝒉𝒉 inferred by 𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 .
• In sleep phase, update 𝝀𝝀 based on generated samples.
• VAE: 𝒉𝒉 → 𝒛𝒛, 𝝀𝝀 → 𝜼𝜼
• GAN: 𝒉𝒉 → 𝒚𝒚, 𝝀𝝀 → 𝝓𝝓
Wake: max
𝜽𝜽
𝔼𝔼𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 log 𝑝𝑝𝜃𝜃 𝑥𝑥 ℎ
Sleep: max
𝜆𝜆
𝔼𝔼𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 𝒑𝒑 𝒉𝒉 log 𝑞𝑞𝜆𝜆 ℎ 𝑥𝑥
64. References
• D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2014
• I. J. GoodFellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio.
Genrative Adversrial Nets. arXiv:preprint arXiv:1406.2661, 2014
• A. B. L. Larsen, S. K. Sønderby and O. Winther. Autoencoding beyond pixels using a learned
similarity metric. arXiv preprint arXiv:1512.09300, 2015
• A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow and B. Frey. Adversarial Autoencoders. arXiv preprint
arXiv:1511.05644, 2016
• Z. Hu, Z. Yang, R. Salakhutdinov and E. P. Xing. On unifying deep generative models. arXiv preprint
arXiv:1706.00550, 2017