Pixel Recurrent Neural Networks

Pixel Recurrent Neural Networks
Google DeepMind
Presented by Osman Tursun
METU, CENG, KOVAN Lab.

Outline
1. Generative model
2. Proposed models
3. Optimization
4. Experiment and results
5. Conclusion
1

Generative model
What I cannot create, I do not understand.
Richard Feynman
2

Why generative model?
• Unsupervised learning is future
• Many Applications: Image compression, debluring, generate
synthetic images, frames, text to image and so on.
3

Challenges of generative model
• Probabilistic dependency on previous contents like pixels
• Complex and highly dimensional structures like images
• Inability to train complex and expressive and tractable yet scalable
models
4

Generative models
• Laten Variable models (VAES, DRAW1
)
• Adversarial models (GAN2
)
• Autoregressive models (NADE3
, MADE4
, RIDE5
)
1Karol Gregor et al. “DRAW: A recurrent neural network for image generation”. In:
arXiv preprint arXiv:1502.04623 (2015).
2Ian Goodfellow et al. “Generative adversarial nets”. In: NIPS. 2014.
3Hugo Larochelle and Iain Murray. “The Neural Autoregressive Distribution
Estimator.” In: AISTATS. vol. 1. 2011, p. 2.
4Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.”
In: ICML. 2015.
5Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial
LSTMs”. In: NIPS. 2015.
5

Comparison of generative model
Image Generation Models
-Three image generation approaches are dominating the field:
Variational AutoEncoders (VAE) Generative Adversarial Networks (GAN)
z
x
)(~ zpz θ
)|(~ zxpx θ
Decoder
Encoder
)|( xzqφ
x
z
Real
D
G
Fake
Real/Fake ?
generate
Autoregressive Models
(cf. https://openai.com/blog/generative-models/)
VAE GAN Autoregressive Models
Pros.
- Efficient inference with
approximate latent variables.
- generate sharp image.
- no need for any Markov chain or
approx networks during sampling.
- very simple and stable training process
- currently gives the best log likelihood.
- tractable likelihood
Cons.
- generated samples tend to be
blurry.
- difficult to optimize due to
unstable training dynamics.
- relatively inefficient during sampling
This slide is from Yohei Sugawara
6

Auto-regressive image modeling
The joint distribution over the image pixel is factorized into a product of
conditional distribution.
p(x) =
n2
i=1 p(xi |x1, . . . , xi−1)
p(xi,R |X<i )p(xi,G |X<i , xi,R )p(xi,B |X<i , xi,R , xi,G )
7

Proposed models
• PixelRNN: Row LSTM, Diagonal LSTM
• PixelCNN
• Multi-Scale PixelRNN
8

Generative image modeling with Spatial LSTM
MCGSM: mixtures of conditional Gaussian mixutre6
The ﬁgure is from RIDE7
6Lucas Theis, Reshad Hosseini, and Matthias Bethge. “Mixtures of conditional
Gaussian scale mixtures applied to multiscale image representations”. In: PloS one
(2012).
7Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial
LSTMs”. In: NIPS. 2015.
9

Row LSTM
• Capture a roughly triangular
context.
• 1-D convolutional Kernel size
K 3
• Convolution is masked
• Input to state is parallelized
(output feature size is
4h × n × n)
10

Diagonal BiLSTM
• Capture the entire available
context
• Scan the image in diagonal
11

Diagonal BiLSTM Skew Operation
• Parallelized by skew operation
• n × n ←→ n × (2n − 1)
• Convolutional kernel is 2 x 1
12

PixelCNN
• Large bounded receptive ﬁeld replace
the PixelRNN’s unbounded dependency
• Turn the problem into pixel level
classiﬁcation problem
• Parallelization on train step but not
test generation step
13

PixelRNN vs PixelCNN
Previous work: Pixel Recurrent Neural Networks.
 “Pixel Recurrent Neural Networks” got best paper award at ICML2016.
 They proposed two types of models, PixelRNN and PixelCNN
(two types of LSTM layers are proposed for PixelRNN.)
PixelCNNPixelRNN
masked convolution
Row LSTM Diagonal BiLSTM
PixelRNN PixelCNN
Pros.
• effectively handles long-range dependencies
⇒ good performance
Convolutions are easier to parallelize ⇒ much faster to train
Cons.
• Each state needs to be computed sequentially.
⇒ computationally expensive
Bounded receptive field ⇒ inferior performance
Blind spot problem (due to the masked convolution) needs to be eliminated.
• LSTM based models are natural choice for
dealing with the autoregressive dependencies.
• CNN based model uses masked convolution,
to ensure the model is causal.
11w 12w 13w
21w 22w 23w
31w 32w 33w 
This slide is from Yohei Sugawara
14

Multi-scale PixelRNN
• Uncondional PixelRNN and one more
conditional PixelRNNs
• Use a small original image as a sample.
• Conditional network is similar to
PixelRNN but biased by up-sampled
version of the given small image.
15

Residual Connections
• Deep network: PixelRNN 12 layers, PixelCNN 15 layers
• Residual connection increase convergence speed and propagate
16

Masked Convolution
• Masks are adopted to avoid capturing future context.
• Mask A is only used at the ﬁrst convolutional layer, mask B is all the
subsequent input-to-state convolutional transitions.
MADE:Masked Autoencoder for Distribution Estimation8
8Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.”
In: ICML. 2015.
17

Discrete Softmax Distribution
• Regression problem to classiﬁcation problem
• Easy implementation but better result
18

Evaluation
• Dataset: MNIST, CIFAR-10, and ImageNet
• Method: log-likelihood
20

Summary
• Raw and Diagonal LSTM, PixelCNN
• Using softmax layer
• Using Masked convolution
• Using Residual connection
• New SoA MNIST, CIFAR-10 and tested on ImageNet
23

Useful resources
• Sergei Turukin PixelCNN post and implementation
• PixeRNN conference presentation
• PixelRNN Review byKyle Kastner
• Post for Draw
24

Pixel Recurrent Neural Networks

More Related Content

What's hot

Similar to Pixel Recurrent Neural Networks

Recently uploaded

Pixel Recurrent Neural Networks