Successfully reported this slideshow.
Your SlideShare is downloading. ×

Modeling documents with Generative Adversarial Networks - John Glover

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 25 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Modeling documents with Generative Adversarial Networks - John Glover (20)

Advertisement

More from Sebastian Ruder (17)

Recently uploaded (20)

Advertisement

Modeling documents with Generative Adversarial Networks - John Glover

  1. 1. Modeling documents with Generative Adversarial Networks John Glover
  2. 2. Overview Learning representations of natural language documents A brief introduction to Generative Adversarial Networks Energy-based Generative Adversarial Networks An adversarial document model Future work & conclusion
  3. 3. Representation learning The ability to learn robust, reusable feature representations from unlabelled data has potential applications in a wide variety of machine learning tasks, such as data retrieval and classification. One way to create such representations is to train deep generative models that can learn to capture the complex distributions of real-world data.
  4. 4. Representation learning
  5. 5. Document representations: LDA The traditional approach to doing this is to use something like LDA. In LDA documents consist of a mixture of topics, with each topic defining a probability distribution over the words in the vocabulary. Documents represented by a vector of mixture weights over associated topics.
  6. 6. Document representations: LDA α β z w N M θ α is the parameter of the Dirichlet prior on the per-document topic distributions, β is the parameter of the Dirichlet prior on the per-topic word distribution, θm is the topic distribution for document m, zmn is the topic for the nth word in document m, and wmn is the specific word.
  7. 7. Document representations: beyond LDA Replicated softmax (Salakhutdinov and Hinton, 2009). DocNADE (Larochelle and Lauly, 2012).
  8. 8. Generative models: recent trends Variational inference: Neural variational inference (Miao, Yu, Blunsom, 2016). Generative Adversarial Networks: ?
  9. 9. Generative Adversarial Networks Generative Adversarial Networks (GANs) involve a min-max adversarial game between a generative model G and a discriminative model D. G(z) is a neural network, that is trained to map samples z from a prior noise distribution p(z) to the data space. D(x) is another neural network that takes a data sample x as input and outputs a single scalar value representing the probability that x came from the data distribution instead of G(z).
  10. 10. Generative Adversarial Networks source: https://ishmaelbelghazi.github.io/ALI
  11. 11. Generative Adversarial Networks D is trained to maximise the probability of assigning the correct label to the input x. G is trained to maximally confuse D, using the gradient of D(x) with respect to x to update its parameters. min G max D Ex∼p(data)[log D(x)] + Ez∼p(z)[log(1 − D(G(z)))]
  12. 12. GAN samples Source: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks https://arxiv.org/abs/1511.06434v2
  13. 13. GAN samples Source: ”Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” https://arxiv.org/abs/1609.04802
  14. 14. Energy-based Generative Adversarial Networks Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016. Energy function: outputs low values on the data manifold, higher values everywhere else.
  15. 15. Energy-based Generative Adversarial Networks Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016. Easy to push down energy of observed data via SGD. How to choose where to push energy up?
  16. 16. Energy-based Generative Adversarial Networks Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016. Generator learns to pick points where the energy should be increased. Can view D as a learned objective function.
  17. 17. Energy-based Generative Adversarial Networks The energy function is trained to push down on the energy of real samples x, and to push up on the energy of generated samples ˆx. (fD is the value to be minimised at each iteration and m is a margin between positive and negative energies): fD(x, z) = D(x) + max(0, m − D(G(z))) At each iteration, the generator G is trained adversarially against D to minimize fG: fG(z) = D(G(z))
  18. 18. Energy-based Generative Adversarial Networks In practise, the energy-based GAN formulation seems to be easier to train. Empirical results in ”Energy-based Generative Adversarial Network” (https://arxiv.org/abs/1609.03126) with more than 6500 experiments.
  19. 19. An adversarial document model Can we use the GAN formulation to learn representations of natural language documents? Questions: 1. How to represent documents? GANs require everything to be differentiable, but need to deal with discrete text. 2. How to get a representation? No explicit mapping back to latent (z) space.
  20. 20. An adversarial document model z x CG Enc DecMSE h D Using an Energy-Based GAN to learn document representations. G is the generator, Enc and Dec are DAE encoder and decoder networks, C is a corruption process (bypassed at test time) and D is the discriminator. Input to discriminator is the binary bag-of-words representation of a document: x ∈ {0, 1}V . Energy-based GAN with Denoising Autoencoder discriminator.
  21. 21. Document retrieval evaluation 0.0001 0.0002 0.0005 0.002 0.01 0.05 0.2 1.0 Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Precision ADM ADM (AE) DocNADE DAE Precision-recall curves for the document retrieval task on the 20 Newsgroups dataset. DocNADE is described in (Larochelle and Lauly, 2012), ADM is the adversarial document model, ADM (AE) is the adversarial document model with a standard Autoencoder as the discriminator (and so it similar to the Energy-Based GAN), and DAE is a Denoising Autoencoder.
  22. 22. Qualitative evaluation: TSNE plot t-SNE visualizations of the document representations learned by the adversarial document model on the held-out test dataset of 20 Newsgroups. The documents belong to 20 different topics, which correspond to different coloured points in the figure.
  23. 23. Future work Understanding why the DAE in the GAN discriminator appears to produce significantly better representations than a standalone DAE. Exploring the impact of applying additional constraints to the representation layer.
  24. 24. Conclusion Showed that a variation on the recently proposed Energy-Based GAN can be used to learn document representations in an unsupervised setting. In the current formulation still short of state-of-the-art, but still very early days for this line of research so likely that we can push this a lot further. Suggested some interesting areas for future research.
  25. 25. More information Introduction to GANs: http://blog.aylien.com/introduction- generative-adversarial-networks-code-tensorflow Paper: https://sites.google.com/site/nips2016adversarial/home/accepted- papers

×