Generative Adversarial Nets
18.05.18 You Sung Min
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., & Bengio, Y. (2014).
Generative adversarial nets.
In Advances in neural information processing
systems (pp. 2672-2680).
Paper review
Generative Adversarial Nets
Zhu et al., 2017, arXiv: 1703.10593
 Image Transformation
Generative Adversarial Nets
Ledig et al, 2016
 Super-Resolution image generation
Generative Adversarial Nets
Ian Goodfellow
 Generating Pokemon
 Generating scene from semantic map
Yota Ishida
Generative Model
Goodfellow, 2016
 Probability Density Estimation
 Data (sample) Generation
Training Sample Model Sample
Generative Adversarial Nets
Advance in both Forger and Discriminator
Forger (Generator) Discriminator
Fake Currency
 Competitive game model
Generative Adversarial Nets
Learning of Adversarial nets
 Learn generator’s distribution 𝒑 𝒈 over data x
Input noise
variable
𝒑 𝒛(𝒛)
Data
𝒙
Mapping to
data space
𝑮(𝒛; 𝜽 𝒈)
Discriminator
𝑫(𝒙; 𝜽 𝒅) Probability
that x came
from the
(real) data
𝑫(𝒙)
Train G to Minimize 𝐥𝐨𝐠(𝟏 − 𝑫 𝑮 𝒛 )
min
𝐺
max
𝐷
𝑽(𝑫, 𝑮)
= 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛)[𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 𝒛 ]
Train D to maximize the probability of assign correct label to
both training sample and samples from G (generated sample)
Learning of Adversarial nets
 Learn generator’s distribution 𝒑 𝒈 over data x
min
𝐺
max
𝐷
𝑽(𝑫, 𝑮)
= 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛)[𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 𝒛 ]
Global optimum 𝒑 𝒈 = 𝒑 𝒅𝒂𝒕𝒂
Learning of Adversarial nets
 Global Optimality
min
𝐺
max
𝐷
𝑽(𝑫, 𝑮)
= 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛) 𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 𝒛
= 𝒙
𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈 𝑫 𝒙 𝒅𝒙 + 𝒛
𝒑 𝒛 𝒛 𝒍𝒐𝒈 𝟏 − 𝑫 𝒈 𝒛 𝒅𝒛
= 𝒙
𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈 𝑫 𝒙 + 𝒑 𝒈 𝒙 𝒍𝒐𝒈 𝟏 − 𝑫 𝒙 𝒅𝒙
∵ 𝒈 𝒛 = 𝒙
𝒇 𝒚 = 𝒂 𝐥𝐨𝐠 𝒚 + 𝒃 𝐥𝐨𝐠(𝟏 − 𝒚)
𝒇′ 𝒚 =
𝒂
𝒚
−
𝒃
𝟏−𝒚
= 𝟎 ⇒ 𝒚 =
𝒂
𝒂+𝒃
𝒇′′ 𝒂
𝒂+𝒃
= −
𝒂
𝒂
𝒂+𝒃
𝟐 −
𝒃
𝟏−
𝒂
𝒂+𝒃
𝟐 < 𝟎
𝒘𝒉𝒆𝒏 𝒂, 𝒃 ∈ (𝟎, 𝟏)
𝑫(𝒙) =
𝒑 𝒅𝒂𝒕𝒂
𝒑 𝒅𝒂𝒕𝒂 + 𝒑 𝒈
The optimal
discriminator
(maximized) D
for fixed G
Learning of Adversarial nets
 Global Optimality
𝑪 𝑮 = min
𝐺
max
𝐷
𝑽(𝑫, 𝑮)
= 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝑮
∗
𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛) 𝒍𝒐𝒈 𝟏 − 𝑫 𝑮
∗
𝑮 𝒛
= 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝑮
∗
𝒙 + 𝔼 𝒙~𝒑 𝒈
𝒍𝒐𝒈 𝟏 − 𝑫 𝑮
∗
𝒙
= 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈
𝒑 𝒅𝒂𝒕𝒂(𝒙)
𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙)
+ 𝔼 𝒙~𝒑 𝒈
𝒍𝒐𝒈
𝒑 𝒈(𝒙)
𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙)
= 𝒙
𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈
𝒑 𝒅𝒂𝒕𝒂(𝒙)
𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙)
+ 𝒑 𝒈 𝒙 𝒍𝒐𝒈
𝒑 𝒈(𝒙)
𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙)
𝒅𝒙
𝑫 𝑮
∗
(𝒙) =
𝒑 𝒅𝒂𝒕𝒂
𝒑 𝒅𝒂𝒕𝒂 + 𝒑 𝒈
∵ 𝒈 𝒛 = 𝒙
𝒊𝒇 𝒑 𝒈 = 𝒑 𝒅𝒂𝒕𝒂, 𝐂 𝐆 = −𝐥𝐨𝐠 𝟒 for the global minimum
Learning of Adversarial nets
 Global Optimality
𝑪 𝑮 = min
𝐺
max
𝐷
𝑽(𝑫, 𝑮)
= 𝒙
𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈
𝒑 𝒅𝒂𝒕𝒂(𝒙)
𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙)
+ 𝒑 𝒈 𝒙 𝒍𝒐𝒈
𝒑 𝒈(𝒙)
𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙)
𝒅𝒙
Learning of Adversarial nets
 Global Optimality
𝑪 𝑮 = −𝒍𝒐𝒈𝟒 + 𝒙
𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈
𝒑 𝒅𝒂𝒕𝒂 𝒙
(𝒑 𝒅𝒂𝒕𝒂 𝒙 +𝒑 𝒈 𝒙 )/𝟐
𝒅𝒙 + 𝒙
𝒑 𝒈 𝒙 𝒍𝒐𝒈
𝒑 𝒈(𝒙)
(𝒑 𝒅𝒂𝒕𝒂 𝒙 +𝒑 𝒈 𝒙 )/𝟐
𝒅𝒙
Kullback-Leiber divergence
𝑫 𝑲𝑳(𝑷|𝑸) =
𝒙
𝒑 𝒙 𝒍𝒐𝒈
𝒑 𝒙
𝒒 𝒙
𝒅𝒙
𝑪 𝑮 = −𝒍𝒐𝒈𝟒 + 𝑫 𝑲𝑳(𝒑 𝒅𝒂𝒕𝒂 𝒙 |
𝒑 𝒅𝒂𝒕𝒂 𝒙 + 𝒑 𝒈 𝒙
𝟐
) + 𝑫 𝑲𝑳(𝒑 𝒅𝒂𝒕𝒂 𝒙 |
𝒑 𝒅𝒂𝒕𝒂 𝒙 + 𝒑 𝒈 𝒙
𝟐
)
KL divergence is always non-negative
𝑪 𝑮 = −𝒍𝒐𝒈𝟒 + 𝟐 𝑱𝑺𝑫(𝒑 𝒅𝒂𝒕𝒂 𝒙 |𝒑 𝒈 𝒙 )
Jenson-Shannon divergence
𝑱𝑺𝑫 𝑷 𝑸 =
𝟏
𝟐
𝑫 𝑲𝑳 𝑷
𝑷 + 𝑸
𝟐
+
𝟏
𝟐
𝑫 𝑲𝑳 𝑸
𝑸 + 𝑷
𝟐
Jenson-Shannon divergence
𝑱𝑺𝑫 𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒑 𝒈(𝒙) 𝒊𝒔 𝒐𝒏𝒍𝒚 𝟎,
𝒘𝒉𝒆𝒏 𝒑 𝒅𝒂𝒕𝒂 𝒙 = 𝒑 𝒈(𝒙)
Learning of Adversarial nets
 Global Optimality
Discriminative distribution (D)
Data distribution (𝒑 𝒙)
Generative distribution (𝒑 𝒈)
Experiments
MNIST Toronto Face Database
CIFAR-10 (Fully connected model) CIFAR-10 (Convolution & Deconvolution model)
Experiments
Mean log-likelihood
Comparison with other generative model
(Challenges)
Sigmoid brief nets
Restricted
Boltzmann machine
Generative
autoencoder
Generative
Adversarial nets

Review of generative adversarial nets

  • 1.
    Generative Adversarial Nets 18.05.18You Sung Min Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680). Paper review
  • 2.
    Generative Adversarial Nets Zhuet al., 2017, arXiv: 1703.10593  Image Transformation
  • 3.
    Generative Adversarial Nets Lediget al, 2016  Super-Resolution image generation
  • 4.
    Generative Adversarial Nets IanGoodfellow  Generating Pokemon  Generating scene from semantic map Yota Ishida
  • 5.
    Generative Model Goodfellow, 2016 Probability Density Estimation  Data (sample) Generation Training Sample Model Sample
  • 6.
    Generative Adversarial Nets Advancein both Forger and Discriminator Forger (Generator) Discriminator Fake Currency  Competitive game model
  • 7.
  • 8.
    Learning of Adversarialnets  Learn generator’s distribution 𝒑 𝒈 over data x Input noise variable 𝒑 𝒛(𝒛) Data 𝒙 Mapping to data space 𝑮(𝒛; 𝜽 𝒈) Discriminator 𝑫(𝒙; 𝜽 𝒅) Probability that x came from the (real) data 𝑫(𝒙) Train G to Minimize 𝐥𝐨𝐠(𝟏 − 𝑫 𝑮 𝒛 ) min 𝐺 max 𝐷 𝑽(𝑫, 𝑮) = 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛)[𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 𝒛 ] Train D to maximize the probability of assign correct label to both training sample and samples from G (generated sample)
  • 9.
    Learning of Adversarialnets  Learn generator’s distribution 𝒑 𝒈 over data x min 𝐺 max 𝐷 𝑽(𝑫, 𝑮) = 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛)[𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 𝒛 ] Global optimum 𝒑 𝒈 = 𝒑 𝒅𝒂𝒕𝒂
  • 10.
    Learning of Adversarialnets  Global Optimality min 𝐺 max 𝐷 𝑽(𝑫, 𝑮) = 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛) 𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 𝒛 = 𝒙 𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈 𝑫 𝒙 𝒅𝒙 + 𝒛 𝒑 𝒛 𝒛 𝒍𝒐𝒈 𝟏 − 𝑫 𝒈 𝒛 𝒅𝒛 = 𝒙 𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈 𝑫 𝒙 + 𝒑 𝒈 𝒙 𝒍𝒐𝒈 𝟏 − 𝑫 𝒙 𝒅𝒙 ∵ 𝒈 𝒛 = 𝒙 𝒇 𝒚 = 𝒂 𝐥𝐨𝐠 𝒚 + 𝒃 𝐥𝐨𝐠(𝟏 − 𝒚) 𝒇′ 𝒚 = 𝒂 𝒚 − 𝒃 𝟏−𝒚 = 𝟎 ⇒ 𝒚 = 𝒂 𝒂+𝒃 𝒇′′ 𝒂 𝒂+𝒃 = − 𝒂 𝒂 𝒂+𝒃 𝟐 − 𝒃 𝟏− 𝒂 𝒂+𝒃 𝟐 < 𝟎 𝒘𝒉𝒆𝒏 𝒂, 𝒃 ∈ (𝟎, 𝟏) 𝑫(𝒙) = 𝒑 𝒅𝒂𝒕𝒂 𝒑 𝒅𝒂𝒕𝒂 + 𝒑 𝒈 The optimal discriminator (maximized) D for fixed G
  • 11.
    Learning of Adversarialnets  Global Optimality 𝑪 𝑮 = min 𝐺 max 𝐷 𝑽(𝑫, 𝑮) = 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝑮 ∗ 𝒙 + 𝔼 𝒛~𝒑 𝒛(𝒛) 𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 ∗ 𝑮 𝒛 = 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝑫 𝑮 ∗ 𝒙 + 𝔼 𝒙~𝒑 𝒈 𝒍𝒐𝒈 𝟏 − 𝑫 𝑮 ∗ 𝒙 = 𝔼 𝒙~𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒍𝒐𝒈 𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙) + 𝔼 𝒙~𝒑 𝒈 𝒍𝒐𝒈 𝒑 𝒈(𝒙) 𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙) = 𝒙 𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈 𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙) + 𝒑 𝒈 𝒙 𝒍𝒐𝒈 𝒑 𝒈(𝒙) 𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙) 𝒅𝒙 𝑫 𝑮 ∗ (𝒙) = 𝒑 𝒅𝒂𝒕𝒂 𝒑 𝒅𝒂𝒕𝒂 + 𝒑 𝒈 ∵ 𝒈 𝒛 = 𝒙 𝒊𝒇 𝒑 𝒈 = 𝒑 𝒅𝒂𝒕𝒂, 𝐂 𝐆 = −𝐥𝐨𝐠 𝟒 for the global minimum
  • 12.
    Learning of Adversarialnets  Global Optimality 𝑪 𝑮 = min 𝐺 max 𝐷 𝑽(𝑫, 𝑮) = 𝒙 𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈 𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙) + 𝒑 𝒈 𝒙 𝒍𝒐𝒈 𝒑 𝒈(𝒙) 𝒑 𝒅𝒂𝒕𝒂(𝒙)+𝒑 𝒈(𝒙) 𝒅𝒙
  • 13.
    Learning of Adversarialnets  Global Optimality 𝑪 𝑮 = −𝒍𝒐𝒈𝟒 + 𝒙 𝒑 𝒅𝒂𝒕𝒂 𝒙 𝒍𝒐𝒈 𝒑 𝒅𝒂𝒕𝒂 𝒙 (𝒑 𝒅𝒂𝒕𝒂 𝒙 +𝒑 𝒈 𝒙 )/𝟐 𝒅𝒙 + 𝒙 𝒑 𝒈 𝒙 𝒍𝒐𝒈 𝒑 𝒈(𝒙) (𝒑 𝒅𝒂𝒕𝒂 𝒙 +𝒑 𝒈 𝒙 )/𝟐 𝒅𝒙 Kullback-Leiber divergence 𝑫 𝑲𝑳(𝑷|𝑸) = 𝒙 𝒑 𝒙 𝒍𝒐𝒈 𝒑 𝒙 𝒒 𝒙 𝒅𝒙 𝑪 𝑮 = −𝒍𝒐𝒈𝟒 + 𝑫 𝑲𝑳(𝒑 𝒅𝒂𝒕𝒂 𝒙 | 𝒑 𝒅𝒂𝒕𝒂 𝒙 + 𝒑 𝒈 𝒙 𝟐 ) + 𝑫 𝑲𝑳(𝒑 𝒅𝒂𝒕𝒂 𝒙 | 𝒑 𝒅𝒂𝒕𝒂 𝒙 + 𝒑 𝒈 𝒙 𝟐 ) KL divergence is always non-negative 𝑪 𝑮 = −𝒍𝒐𝒈𝟒 + 𝟐 𝑱𝑺𝑫(𝒑 𝒅𝒂𝒕𝒂 𝒙 |𝒑 𝒈 𝒙 ) Jenson-Shannon divergence 𝑱𝑺𝑫 𝑷 𝑸 = 𝟏 𝟐 𝑫 𝑲𝑳 𝑷 𝑷 + 𝑸 𝟐 + 𝟏 𝟐 𝑫 𝑲𝑳 𝑸 𝑸 + 𝑷 𝟐 Jenson-Shannon divergence 𝑱𝑺𝑫 𝒑 𝒅𝒂𝒕𝒂(𝒙) 𝒑 𝒈(𝒙) 𝒊𝒔 𝒐𝒏𝒍𝒚 𝟎, 𝒘𝒉𝒆𝒏 𝒑 𝒅𝒂𝒕𝒂 𝒙 = 𝒑 𝒈(𝒙)
  • 14.
    Learning of Adversarialnets  Global Optimality Discriminative distribution (D) Data distribution (𝒑 𝒙) Generative distribution (𝒑 𝒈)
  • 15.
    Experiments MNIST Toronto FaceDatabase CIFAR-10 (Fully connected model) CIFAR-10 (Convolution & Deconvolution model)
  • 16.
  • 17.
    Comparison with othergenerative model (Challenges) Sigmoid brief nets Restricted Boltzmann machine Generative autoencoder Generative Adversarial nets