16. Explicit Density Models
• Explicitly represent 𝑝↓model (𝒙;𝜽)
• Advantages:
– Easy to op)mize, just plug 𝑝↓model into the ML objec)ve
– Can evaluate the likelihood of any sample, if needed
• Disadvantages:
– 𝑝↓model must be complex enough => tractability issues
• Solu)on 1: restrict 𝑝↓model to a tractable, but rela)vely strong,
family (FVBN, nonlinear ICA)
• Solu)on 2: approximate 𝑝↓model (VAEs, Boltzmann Machines)
– Hard to generate new samples
17. Implicit Density Models
• Interact indirectly with 𝑝↓model (𝒙;𝜽) by sampling
• Advantages:
– Sampling is straighiorward
• Disadvantages:
– Likelihood is expensive to compute
• Sampling procedures
– Itera)ve (GSNs):
• Learn the denoising distribu)on (ojen unimodal) via ML
• Pick a training sample, apply noise and denoise repeatedly
• Ajer enough itera)ons, we get a sample from 𝑝↓data (𝒙)
– Direct (GANs)
• Sample in a single step
• Objec)ve func)ons
18. Genera)ve Stochas)c Networks
• How do we sample?
– Pick a random training example
– Apply noise and denoise repeatedly
– Ajer enough itera)ons, we get a sample from 𝑝↓data (𝒙)
• What do we learn?
– the denoising distribu)on 𝑝𝒙𝒙 via ML
• Advantages:
– Learning is cast as an op)miza)on problem
– 𝑝𝒙𝒙 is known to be easy to learn
• Disadvantage:
– Sampling is expensive
19. Genera)ve Adversarial Networks
𝑝↓g (𝒙;𝜽) =𝑝↓𝒛 (𝐺(𝒛;𝜽))
𝑝↓𝒛
𝒛
𝑝↓𝒈
𝒙
𝐺(𝒛,𝜽)
• How do we sample?
– Pick a random latent variable 𝑧 from a
fixed distribu)on 𝑝↓𝒛 (e.g. Gaussian)
– Pass 𝑧 through a trained generator
network 𝐺(𝒛;𝜽) that produces the
sample
• What do we learn?
– The generator 𝐺(𝒛;𝜽)
• Advantages:
– Sampling is trivial (forward prop) and
efficient
• Disadvantage:
– We need to cast learning 𝐺(𝒛;𝜽) as the
Nash equilibrium of a game => more
difficult than an op)miza)on!
31. Conclusions
• Contribu)on: GANs completely break away from the ML
approach by switching to an adversarial mini-max game
formula)on
• Strengths:
– Easy and efficient sample genera)on process
– Simple training algorithm
– No need for a noise model
– State-of-the art results (qualita)vely the best)
• Weaknesses:
– No explicit likelihood representa)on
– Convergence problems (Helve)ca scenario)
– Model comparison issues (Parzen Windows, high variance)
– We don’t know why they work (no theore)cal guarantees)
• But see: Arjovski et. al. (2017)” for a recent and elegant
solu)on for the former two!