Spectral Normalization for
Generative Adversarial Networks
PR12와 함께 이해하는
Jaejun Yoo
Clova ML / NAVER
PR12
13th May, 2018
Today’s contents
Spectral Normalization for Generative Adversarial Networks
by Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida
Feb. 2018: https://arxiv.org/abs/1802.05957
Accept: (Oral)
Rating: 8-8-8
ICLR 2018
Motivation & Contribution
“One of the challenges in the study of generative
adversarial networks is the instability of its training.”
: Proposed a novel weight normalization technique called
spectral normalization to stabilize the training of the
discriminator of GANs.
• Lipschitz constant is the only hyper-parameter to be tuned,
and the algorithm does not require intensive tuning of the
only hyper-parameter for satisfactory performance.
• Implementation is simple and the additional computational
cost is small.
GANs is hard to train… why?
GANs is hard to train… why?
WGAN, GAN-GP
GANs is hard to train… why?
WGAN, GAN-GP
GANs is hard to train… why?
WGAN, GAN-GP
GANs is hard to train… why?
WGAN, GAN-GP
While input based regularizations allow for relatively easy formulations
based on samples, they also suffer from the fact that, they cannot
impose regularization on the space outside of the supports of the
generator and data distributions without introducing somewhat
heuristic means.
Spectral Normalization: Prerequisites
“Matrix Norm”
from WiKi
https://math.stackexchange.com/questions/586663/why-
does-the-spectral-norm-equal-the-largest-singular-value
Spectral Normalization
Spectral Normalization
Spectral Normalization
Gradient Analysis of Spectrally Normalized Weights
Spectral Normalization
Gradient Analysis of Spectrally Normalized Weights *
* + (10)
Why SN is better than…
… Weight Normalization
Why SN is better than…
… Gradient Penalty
• The approach has an obvious weakness of being heavily dependent on the support of
the current generative distribution. As a matter of course, the generative distribution
and its support gradually changes in the course of the training, and this can
destabilize the effect of such regularization.
• While this seems to serve the same purpose as spectral normalization, orthonormal
regularization are mathematically quite different from our spectral normalization
because the orthonormal regularization destroys the information about the spectrum
by setting all the singular values to one. On the other hand, spectral normalization
only scales the spectrum so that the its maximum will be one.
… Orthonormal Regularization
Results
Results
Squared singular values of weight matrices trained with different methods
Results
Squared singular values of weight matrices trained with different methods
> “AS EXPECTED!”
Results
Comparison between SN and orthonormal regularization
Results
Comparison between SN and orthonormal regularization
> “SN is more stable against various network architecture”
Results
SN remains effective on a large high dimensional dataset!
Summary
They proposed a novel weight normalization technique
called spectral normalization to stabilize the training of the
discriminator of GANs
• in various network architectures
• in various hyperparameter settings
• in various datasets
• with an intuitive and straight forward idea
• only using a relatively easy linear algebra knowledge
Practicality
Principled way
Appendix

[PR12] Spectral Normalization for Generative Adversarial Networks

  • 1.
    Spectral Normalization for GenerativeAdversarial Networks PR12와 함께 이해하는 Jaejun Yoo Clova ML / NAVER PR12 13th May, 2018
  • 2.
    Today’s contents Spectral Normalizationfor Generative Adversarial Networks by Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida Feb. 2018: https://arxiv.org/abs/1802.05957 Accept: (Oral) Rating: 8-8-8 ICLR 2018
  • 3.
    Motivation & Contribution “Oneof the challenges in the study of generative adversarial networks is the instability of its training.” : Proposed a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator of GANs. • Lipschitz constant is the only hyper-parameter to be tuned, and the algorithm does not require intensive tuning of the only hyper-parameter for satisfactory performance. • Implementation is simple and the additional computational cost is small.
  • 4.
    GANs is hardto train… why?
  • 5.
    GANs is hardto train… why? WGAN, GAN-GP
  • 6.
    GANs is hardto train… why? WGAN, GAN-GP
  • 7.
    GANs is hardto train… why? WGAN, GAN-GP
  • 8.
    GANs is hardto train… why? WGAN, GAN-GP While input based regularizations allow for relatively easy formulations based on samples, they also suffer from the fact that, they cannot impose regularization on the space outside of the supports of the generator and data distributions without introducing somewhat heuristic means.
  • 9.
    Spectral Normalization: Prerequisites “MatrixNorm” from WiKi https://math.stackexchange.com/questions/586663/why- does-the-spectral-norm-equal-the-largest-singular-value
  • 10.
  • 11.
  • 12.
    Spectral Normalization Gradient Analysisof Spectrally Normalized Weights
  • 13.
    Spectral Normalization Gradient Analysisof Spectrally Normalized Weights * * + (10)
  • 14.
    Why SN isbetter than… … Weight Normalization
  • 15.
    Why SN isbetter than… … Gradient Penalty • The approach has an obvious weakness of being heavily dependent on the support of the current generative distribution. As a matter of course, the generative distribution and its support gradually changes in the course of the training, and this can destabilize the effect of such regularization. • While this seems to serve the same purpose as spectral normalization, orthonormal regularization are mathematically quite different from our spectral normalization because the orthonormal regularization destroys the information about the spectrum by setting all the singular values to one. On the other hand, spectral normalization only scales the spectrum so that the its maximum will be one. … Orthonormal Regularization
  • 16.
  • 17.
    Results Squared singular valuesof weight matrices trained with different methods
  • 18.
    Results Squared singular valuesof weight matrices trained with different methods > “AS EXPECTED!”
  • 19.
    Results Comparison between SNand orthonormal regularization
  • 20.
    Results Comparison between SNand orthonormal regularization > “SN is more stable against various network architecture”
  • 21.
    Results SN remains effectiveon a large high dimensional dataset!
  • 22.
    Summary They proposed anovel weight normalization technique called spectral normalization to stabilize the training of the discriminator of GANs • in various network architectures • in various hyperparameter settings • in various datasets • with an intuitive and straight forward idea • only using a relatively easy linear algebra knowledge Practicality Principled way
  • 23.