Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[PR12] Spectral Normalization for Generative Adversarial Networks

905 views

Published on

Introduction to Spectral Normalization for Generative Adversarial Networks (ICLR 2018 Oral)
video: https://youtu.be/iXSYqohGQhM
paper: https://openreview.net/forum?id=B1QRgziT-

Published in: Science
  • Be the first to comment

  • Be the first to like this

[PR12] Spectral Normalization for Generative Adversarial Networks

  1. 1. Spectral Normalization for Generative Adversarial Networks PR12와 함께 이해하는 Jaejun Yoo Clova ML / NAVER PR12 13th May, 2018
  2. 2. Today’s contents Spectral Normalization for Generative Adversarial Networks by Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida Feb. 2018: https://arxiv.org/abs/1802.05957 Accept: (Oral) Rating: 8-8-8 ICLR 2018
  3. 3. Motivation & Contribution “One of the challenges in the study of generative adversarial networks is the instability of its training.” : Proposed a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator of GANs. • Lipschitz constant is the only hyper-parameter to be tuned, and the algorithm does not require intensive tuning of the only hyper-parameter for satisfactory performance. • Implementation is simple and the additional computational cost is small.
  4. 4. GANs is hard to train… why?
  5. 5. GANs is hard to train… why? WGAN, GAN-GP
  6. 6. GANs is hard to train… why? WGAN, GAN-GP
  7. 7. GANs is hard to train… why? WGAN, GAN-GP
  8. 8. GANs is hard to train… why? WGAN, GAN-GP While input based regularizations allow for relatively easy formulations based on samples, they also suffer from the fact that, they cannot impose regularization on the space outside of the supports of the generator and data distributions without introducing somewhat heuristic means.
  9. 9. Spectral Normalization: Prerequisites “Matrix Norm” from WiKi https://math.stackexchange.com/questions/586663/why- does-the-spectral-norm-equal-the-largest-singular-value
  10. 10. Spectral Normalization
  11. 11. Spectral Normalization
  12. 12. Spectral Normalization Gradient Analysis of Spectrally Normalized Weights
  13. 13. Spectral Normalization Gradient Analysis of Spectrally Normalized Weights * * + (10)
  14. 14. Why SN is better than… … Weight Normalization
  15. 15. Why SN is better than… … Gradient Penalty • The approach has an obvious weakness of being heavily dependent on the support of the current generative distribution. As a matter of course, the generative distribution and its support gradually changes in the course of the training, and this can destabilize the effect of such regularization. • While this seems to serve the same purpose as spectral normalization, orthonormal regularization are mathematically quite different from our spectral normalization because the orthonormal regularization destroys the information about the spectrum by setting all the singular values to one. On the other hand, spectral normalization only scales the spectrum so that the its maximum will be one. … Orthonormal Regularization
  16. 16. Results
  17. 17. Results Squared singular values of weight matrices trained with different methods
  18. 18. Results Squared singular values of weight matrices trained with different methods > “AS EXPECTED!”
  19. 19. Results Comparison between SN and orthonormal regularization
  20. 20. Results Comparison between SN and orthonormal regularization > “SN is more stable against various network architecture”
  21. 21. Results SN remains effective on a large high dimensional dataset!
  22. 22. Summary They proposed a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator of GANs • in various network architectures • in various hyperparameter settings • in various datasets • with an intuitive and straight forward idea • only using a relatively easy linear algebra knowledge Practicality Principled way
  23. 23. Appendix

×