Deep Unsupervised Learning using
Nonequlibrium Thermodynamics
Tran Quoc Hoan
@k09hthaduonght.wordpress.com/
14 December 2015, Paper Alert, Hasegawa lab., Tokyo
The University of Tokyo
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli
Proceedings of the 32nd International Conference on Machine Learning, 2015
Abstract
Deep Unsupervised Learning using Nonequilibrium Thermodynamics 2
“…The essential idea, inspired by non-equilibrium statistical
physics, is to systematically and slowly destroy structure in
a data distribution through an iterative forward diffusion
process. We then learn a reverse diffusion process
that restores structure in data, yielding a highly flexible
and tractable generative model of the data…”
Outline
3
- The promise of deep unsupervised learning
• Motivation
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
- Diffusion processes and time reversal
• Physical intuition
- Derivation and experimental results
• Diffusion probabilistic model
Deep Unsupervised Learning
4
- Novel modalities
• Unknown features/labels
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
- Ex. disease part in medical image
• Expensive labels
• Unpredictable tasks / one shot learning
- Exploratory data analysis
https://www.ceessentials.net/article40.html
Physical Intuition
5
- Destroy structure in data
• Diffusion processes and time reversal
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
- Carefully characterize the destruction
- Learn how to reverse time
Observation 1: Diffusion Destroy Structure
6
Data distribution
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Uniform distribution
Uniform distributionData distribution
(Observation)

Diffusion destroys structure
(Recover structure)

Recover data distribution by starting from uniform
distribution and running dynamics backwards
Observation 2: Microscopic Diffusion
7
• Time reversible
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
https://www.youtube.com/watch?v=cDcprgWiQEY
• Brownian motion
• Position updates are small
Gaussians (both forwards and
backwards in time)
Diffusion-based Probabilistic Models
8
• Destroy all structure in data distribution using
diffusion process
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
• Learn reversal of diffusion process
- Estimate function for mean and covariance of each
step in the reverse diffusion process (Ex. binomial rate
for binary data)
• Reverse diffusion process is the model of the data
Diffusion-based Probabilistic Models
9
• Algorithm
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
• Deep convolutional network: universal function
approximatior
• Multiplying distributions: inputation, denoising,
computing posteriors
Destroy by Diffusion Process
10
Data

distribution
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Forward

diffusion
Noise

distribution
Temporal diffusion rate
Destroy by Gaussian Process
11
Data

distribution
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Forward

diffusion
Noise

distribution
Decay towards origin Add small noise
Reversal Gaussian Diffusion Process
12
Data

distribution
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Reverse

diffusion
Noise

distribution
Learned drift and covariance functions
Case Study: Swiss Roll
13Deep Unsupervised Learning using Nonequilibrium Thermodynamics
True model
Inference model
Training the reverse diffusion
14Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Model probability
Annealed importance sampling
Training the reverse diffusion
15Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Log likelihood
Jensen’s inequality
Training the reverse diffusion
16Deep Unsupervised Learning using Nonequilibrium Thermodynamics
…do some algebra…
Training the reverse diffusion
17Deep Unsupervised Learning using Nonequilibrium Thermodynamics
…for Gaussian diffusion process…
Training
unsupervised learning becomes regression problem
Training the reverse diffusion
18Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Setting the diffusion rate
• For Binomial diffusion (erase constant fraction of stimulus
variance each step)
• For Gaussian diffusion
t
1
t = (T t + 1) 1
= small constant (prevent over-fitting)
Training t
Multiplying Distributions
19Deep Unsupervised Learning using Nonequilibrium Thermodynamics
• Required to compute posterior distribution
- Missing data (inpainting)
- Corrupted data (denoising)
• Difficult and expensive using competing techniques
- Ex. VAE, GSNs, NADEs, most graphical models
Interested in
Acts as small perturbation to diffusion process
Multiplying Distributions
20Deep Unsupervised Learning using Nonequilibrium Thermodynamics
• Modified marginal distributions
Interested in
Acts as small perturbation to diffusion process
Multiplying Distributions
21Deep Unsupervised Learning using Nonequilibrium Thermodynamics
• Modified diffusion steps
Equilibrium 

condition
Normalized
Multiplying Distributions
22Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Reversal gaussian Diffusion Process
Interested in
Acts as small perturbation to diffusion process
Small perturbation affects only mean
Deep Network as Approximator for Images
23Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Multi-scale convolution
24Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Downsample
Convolve
Upsample
Sum
Applied to CIFAR-10
25Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Training data Samples from Generative
Adversarial [Goodfellow
et al, 2014]
Samples from
diffusion model
Applied to CIFAR-10
26Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Samples from
DRAW

[Gregor et al, 2015]
Samples from Generative
Adversarial [Goodfellow
et al, 2014]
Samples from
diffusion model
Applied to Dead Leaves
27Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Training data
Samples from
[Theis et al, 2012]

Log likelihood 1.24
bits/pixel
Samples from
diffusion model

Log likelihood 1.49
bits/pixel
Applied to Inpainting
28Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Table App.1
29Deep Unsupervised Learning using Nonequilibrium Thermodynamics
References
30Deep Unsupervised Learning using Nonequilibrium Thermodynamics
h"p://jmlr.org/proceedings/papers/v37/sohl-dickstein15.html	
h"p://videolectures.net/
icml2015_sohl_dickstein_deep_unsupervised_learning/	
h"p://www.inference.vc/icml-paper-unsupervised-learning-by-
inverEng-diffusion-processes/

007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics

  • 1.
    Deep Unsupervised Learningusing Nonequlibrium Thermodynamics Tran Quoc Hoan @k09hthaduonght.wordpress.com/ 14 December 2015, Paper Alert, Hasegawa lab., Tokyo The University of Tokyo Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli Proceedings of the 32nd International Conference on Machine Learning, 2015
  • 2.
    Abstract Deep Unsupervised Learningusing Nonequilibrium Thermodynamics 2 “…The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data…”
  • 3.
    Outline 3 - The promiseof deep unsupervised learning • Motivation Deep Unsupervised Learning using Nonequilibrium Thermodynamics - Diffusion processes and time reversal • Physical intuition - Derivation and experimental results • Diffusion probabilistic model
  • 4.
    Deep Unsupervised Learning 4 -Novel modalities • Unknown features/labels Deep Unsupervised Learning using Nonequilibrium Thermodynamics - Ex. disease part in medical image • Expensive labels • Unpredictable tasks / one shot learning - Exploratory data analysis https://www.ceessentials.net/article40.html
  • 5.
    Physical Intuition 5 - Destroystructure in data • Diffusion processes and time reversal Deep Unsupervised Learning using Nonequilibrium Thermodynamics - Carefully characterize the destruction - Learn how to reverse time
  • 6.
    Observation 1: DiffusionDestroy Structure 6 Data distribution Deep Unsupervised Learning using Nonequilibrium Thermodynamics Uniform distribution Uniform distributionData distribution (Observation)
 Diffusion destroys structure (Recover structure)
 Recover data distribution by starting from uniform distribution and running dynamics backwards
  • 7.
    Observation 2: MicroscopicDiffusion 7 • Time reversible Deep Unsupervised Learning using Nonequilibrium Thermodynamics https://www.youtube.com/watch?v=cDcprgWiQEY • Brownian motion • Position updates are small Gaussians (both forwards and backwards in time)
  • 8.
    Diffusion-based Probabilistic Models 8 •Destroy all structure in data distribution using diffusion process Deep Unsupervised Learning using Nonequilibrium Thermodynamics • Learn reversal of diffusion process - Estimate function for mean and covariance of each step in the reverse diffusion process (Ex. binomial rate for binary data) • Reverse diffusion process is the model of the data
  • 9.
    Diffusion-based Probabilistic Models 9 •Algorithm Deep Unsupervised Learning using Nonequilibrium Thermodynamics • Deep convolutional network: universal function approximatior • Multiplying distributions: inputation, denoising, computing posteriors
  • 10.
    Destroy by DiffusionProcess 10 Data
 distribution Deep Unsupervised Learning using Nonequilibrium Thermodynamics Forward
 diffusion Noise
 distribution Temporal diffusion rate
  • 11.
    Destroy by GaussianProcess 11 Data
 distribution Deep Unsupervised Learning using Nonequilibrium Thermodynamics Forward
 diffusion Noise
 distribution Decay towards origin Add small noise
  • 12.
    Reversal Gaussian DiffusionProcess 12 Data
 distribution Deep Unsupervised Learning using Nonequilibrium Thermodynamics Reverse
 diffusion Noise
 distribution Learned drift and covariance functions
  • 13.
    Case Study: SwissRoll 13Deep Unsupervised Learning using Nonequilibrium Thermodynamics True model Inference model
  • 14.
    Training the reversediffusion 14Deep Unsupervised Learning using Nonequilibrium Thermodynamics Model probability Annealed importance sampling
  • 15.
    Training the reversediffusion 15Deep Unsupervised Learning using Nonequilibrium Thermodynamics Log likelihood Jensen’s inequality
  • 16.
    Training the reversediffusion 16Deep Unsupervised Learning using Nonequilibrium Thermodynamics …do some algebra…
  • 17.
    Training the reversediffusion 17Deep Unsupervised Learning using Nonequilibrium Thermodynamics …for Gaussian diffusion process… Training unsupervised learning becomes regression problem
  • 18.
    Training the reversediffusion 18Deep Unsupervised Learning using Nonequilibrium Thermodynamics Setting the diffusion rate • For Binomial diffusion (erase constant fraction of stimulus variance each step) • For Gaussian diffusion t 1 t = (T t + 1) 1 = small constant (prevent over-fitting) Training t
  • 19.
    Multiplying Distributions 19Deep UnsupervisedLearning using Nonequilibrium Thermodynamics • Required to compute posterior distribution - Missing data (inpainting) - Corrupted data (denoising) • Difficult and expensive using competing techniques - Ex. VAE, GSNs, NADEs, most graphical models Interested in Acts as small perturbation to diffusion process
  • 20.
    Multiplying Distributions 20Deep UnsupervisedLearning using Nonequilibrium Thermodynamics • Modified marginal distributions Interested in Acts as small perturbation to diffusion process
  • 21.
    Multiplying Distributions 21Deep UnsupervisedLearning using Nonequilibrium Thermodynamics • Modified diffusion steps Equilibrium 
 condition Normalized
  • 22.
    Multiplying Distributions 22Deep UnsupervisedLearning using Nonequilibrium Thermodynamics Reversal gaussian Diffusion Process Interested in Acts as small perturbation to diffusion process Small perturbation affects only mean
  • 23.
    Deep Network asApproximator for Images 23Deep Unsupervised Learning using Nonequilibrium Thermodynamics
  • 24.
    Multi-scale convolution 24Deep UnsupervisedLearning using Nonequilibrium Thermodynamics Downsample Convolve Upsample Sum
  • 25.
    Applied to CIFAR-10 25DeepUnsupervised Learning using Nonequilibrium Thermodynamics Training data Samples from Generative Adversarial [Goodfellow et al, 2014] Samples from diffusion model
  • 26.
    Applied to CIFAR-10 26DeepUnsupervised Learning using Nonequilibrium Thermodynamics Samples from DRAW
 [Gregor et al, 2015] Samples from Generative Adversarial [Goodfellow et al, 2014] Samples from diffusion model
  • 27.
    Applied to DeadLeaves 27Deep Unsupervised Learning using Nonequilibrium Thermodynamics Training data Samples from [Theis et al, 2012]
 Log likelihood 1.24 bits/pixel Samples from diffusion model
 Log likelihood 1.49 bits/pixel
  • 28.
    Applied to Inpainting 28DeepUnsupervised Learning using Nonequilibrium Thermodynamics
  • 29.
    Table App.1 29Deep UnsupervisedLearning using Nonequilibrium Thermodynamics
  • 30.
    References 30Deep Unsupervised Learningusing Nonequilibrium Thermodynamics h"p://jmlr.org/proceedings/papers/v37/sohl-dickstein15.html h"p://videolectures.net/ icml2015_sohl_dickstein_deep_unsupervised_learning/ h"p://www.inference.vc/icml-paper-unsupervised-learning-by- inverEng-diffusion-processes/