SlideShare a Scribd company logo
1 of 31
Download to read offline
Pixel Recurrent Neural Networks
Google DeepMind
Presented by Osman Tursun
METU, CENG, KOVAN Lab.
Outline
1. Generative model
2. Proposed models
3. Optimization
4. Experiment and results
5. Conclusion
1
Generative model
Generative model
What I cannot create, I do not understand.
Richard Feynman
2
Why generative model?
• Unsupervised learning is future
• Many Applications: Image compression, debluring, generate
synthetic images, frames, text to image and so on.
3
Challenges of generative model
• Probabilistic dependency on previous contents like pixels
• Complex and highly dimensional structures like images
• Inability to train complex and expressive and tractable yet scalable
models
4
Generative models
• Laten Variable models (VAES, DRAW1
)
• Adversarial models (GAN2
)
• Autoregressive models (NADE3
, MADE4
, RIDE5
)
1Karol Gregor et al. “DRAW: A recurrent neural network for image generation”. In:
arXiv preprint arXiv:1502.04623 (2015).
2Ian Goodfellow et al. “Generative adversarial nets”. In: NIPS. 2014.
3Hugo Larochelle and Iain Murray. “The Neural Autoregressive Distribution
Estimator.” In: AISTATS. vol. 1. 2011, p. 2.
4Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.”
In: ICML. 2015.
5Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial
LSTMs”. In: NIPS. 2015.
5
Comparison of generative model
Image Generation Models
-Three image generation approaches are dominating the field:
Variational AutoEncoders (VAE) Generative Adversarial Networks (GAN)
z
x
)(~ zpz θ
)|(~ zxpx θ
Decoder
Encoder
)|( xzqφ
x
z
Real
D
G
Fake
Real/Fake ?
generate
Autoregressive Models
(cf. https://openai.com/blog/generative-models/)
VAE GAN Autoregressive Models
Pros.
- Efficient inference with
approximate latent variables.
- generate sharp image.
- no need for any Markov chain or
approx networks during sampling.
- very simple and stable training process
- currently gives the best log likelihood.
- tractable likelihood
Cons.
- generated samples tend to be
blurry.
- difficult to optimize due to
unstable training dynamics.
- relatively inefficient during sampling
This slide is from Yohei Sugawara
6
Proposed models
Auto-regressive image modeling
The joint distribution over the image pixel is factorized into a product of
conditional distribution.
p(x) =
n2
i=1 p(xi |x1, . . . , xi−1)
p(xi,R |X<i )p(xi,G |X<i , xi,R )p(xi,B |X<i , xi,R , xi,G )
7
Proposed models
• PixelRNN: Row LSTM, Diagonal LSTM
• PixelCNN
• Multi-Scale PixelRNN
8
Generative image modeling with Spatial LSTM
MCGSM: mixtures of conditional Gaussian mixutre6
The figure is from RIDE7
6Lucas Theis, Reshad Hosseini, and Matthias Bethge. “Mixtures of conditional
Gaussian scale mixtures applied to multiscale image representations”. In: PloS one
(2012).
7Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial
LSTMs”. In: NIPS. 2015.
9
Row LSTM
• Capture a roughly triangular
context.
• 1-D convolutional Kernel size
K 3
• Convolution is masked
• Input to state is parallelized
(output feature size is
4h × n × n)
10
Diagonal BiLSTM
• Capture the entire available
context
• Scan the image in diagonal
11
Diagonal BiLSTM Skew Operation
• Parallelized by skew operation
• n × n ←→ n × (2n − 1)
• Convolutional kernel is 2 x 1
12
PixelCNN
• Large bounded receptive field replace
the PixelRNN’s unbounded dependency
• Turn the problem into pixel level
classification problem
• Parallelization on train step but not
test generation step
13
PixelRNN vs PixelCNN
Previous work: Pixel Recurrent Neural Networks.
 “Pixel Recurrent Neural Networks” got best paper award at ICML2016.
 They proposed two types of models, PixelRNN and PixelCNN
(two types of LSTM layers are proposed for PixelRNN.)
PixelCNNPixelRNN
masked convolution
Row LSTM Diagonal BiLSTM
PixelRNN PixelCNN
Pros.
• effectively handles long-range dependencies
⇒ good performance
Convolutions are easier to parallelize ⇒ much faster to train
Cons.
• Each state needs to be computed sequentially.
⇒ computationally expensive
Bounded receptive field ⇒ inferior performance
Blind spot problem (due to the masked convolution) needs to be eliminated.
• LSTM based models are natural choice for
dealing with the autoregressive dependencies.
• CNN based model uses masked convolution,
to ensure the model is causal.
11w 12w 13w
21w 22w 23w
31w 32w 33w 
This slide is from Yohei Sugawara
14
Multi-scale PixelRNN
• Uncondional PixelRNN and one more
conditional PixelRNNs
• Use a small original image as a sample.
• Conditional network is similar to
PixelRNN but biased by up-sampled
version of the given small image.
15
Optimization
Residual Connections
• Deep network: PixelRNN 12 layers, PixelCNN 15 layers
• Residual connection increase convergence speed and propagate
16
Masked Convolution
• Masks are adopted to avoid capturing future context.
• Mask A is only used at the first convolutional layer, mask B is all the
subsequent input-to-state convolutional transitions.
MADE:Masked Autoencoder for Distribution Estimation8
8Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.”
In: ICML. 2015.
17
Discrete Softmax Distribution
• Regression problem to classification problem
• Easy implementation but better result
18
Experiment and results
Specification of Models
19
Evaluation
• Dataset: MNIST, CIFAR-10, and ImageNet
• Method: log-likelihood
20
Quantitative results
21
Image completions
22
Conclusion
Summary
• Raw and Diagonal LSTM, PixelCNN
• Using softmax layer
• Using Masked convolution
• Using Residual connection
• New SoA MNIST, CIFAR-10 and tested on ImageNet
23
Useful resources
• Sergei Turukin PixelCNN post and implementation
• PixeRNN conference presentation
• PixelRNN Review byKyle Kastner
• Post for Draw
24
Questions?
24

More Related Content

What's hot

Densely Connected Convolutional Networks
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional Networks
harmonylab
 

What's hot (20)

論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
PRML輪読#12
PRML輪読#12PRML輪読#12
PRML輪読#12
 
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
 
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Densely Connected Convolutional Networks
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional Networks
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
PRML 12-12.1.4 主成分分析 (PCA) / Principal Component Analysis (PCA)
PRML 12-12.1.4 主成分分析 (PCA) / Principal Component Analysis (PCA)PRML 12-12.1.4 主成分分析 (PCA) / Principal Component Analysis (PCA)
PRML 12-12.1.4 主成分分析 (PCA) / Principal Component Analysis (PCA)
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
 
Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)
Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)
Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)
 
GANの簡単な理解から正しい理解まで
GANの簡単な理解から正しい理解までGANの簡単な理解から正しい理解まで
GANの簡単な理解から正しい理解まで
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
 
線形計画法入門
線形計画法入門線形計画法入門
線形計画法入門
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 

Similar to Pixel Recurrent Neural Networks

Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Greeshma M.S.R
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
Jacky Liu
 
The Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention ModelsThe Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention Models
Universitat Politècnica de Catalunya
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 

Similar to Pixel Recurrent Neural Networks (20)

Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
 
Pixel Recursive Super Resolution. Google Brain
 Pixel Recursive Super Resolution.  Google Brain Pixel Recursive Super Resolution.  Google Brain
Pixel Recursive Super Resolution. Google Brain
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Robustness of compressed CNNs
Robustness of compressed CNNsRobustness of compressed CNNs
Robustness of compressed CNNs
 
Face Detection.pptx
Face Detection.pptxFace Detection.pptx
Face Detection.pptx
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Mnist report
Mnist reportMnist report
Mnist report
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
The Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention ModelsThe Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention Models
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Image anomaly detection with generative adversarial networks
Image anomaly detection with generative adversarial networksImage anomaly detection with generative adversarial networks
Image anomaly detection with generative adversarial networks
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 

Recently uploaded

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Pixel Recurrent Neural Networks

  • 1. Pixel Recurrent Neural Networks Google DeepMind Presented by Osman Tursun METU, CENG, KOVAN Lab.
  • 2. Outline 1. Generative model 2. Proposed models 3. Optimization 4. Experiment and results 5. Conclusion 1
  • 4. Generative model What I cannot create, I do not understand. Richard Feynman 2
  • 5. Why generative model? • Unsupervised learning is future • Many Applications: Image compression, debluring, generate synthetic images, frames, text to image and so on. 3
  • 6. Challenges of generative model • Probabilistic dependency on previous contents like pixels • Complex and highly dimensional structures like images • Inability to train complex and expressive and tractable yet scalable models 4
  • 7. Generative models • Laten Variable models (VAES, DRAW1 ) • Adversarial models (GAN2 ) • Autoregressive models (NADE3 , MADE4 , RIDE5 ) 1Karol Gregor et al. “DRAW: A recurrent neural network for image generation”. In: arXiv preprint arXiv:1502.04623 (2015). 2Ian Goodfellow et al. “Generative adversarial nets”. In: NIPS. 2014. 3Hugo Larochelle and Iain Murray. “The Neural Autoregressive Distribution Estimator.” In: AISTATS. vol. 1. 2011, p. 2. 4Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.” In: ICML. 2015. 5Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial LSTMs”. In: NIPS. 2015. 5
  • 8. Comparison of generative model Image Generation Models -Three image generation approaches are dominating the field: Variational AutoEncoders (VAE) Generative Adversarial Networks (GAN) z x )(~ zpz θ )|(~ zxpx θ Decoder Encoder )|( xzqφ x z Real D G Fake Real/Fake ? generate Autoregressive Models (cf. https://openai.com/blog/generative-models/) VAE GAN Autoregressive Models Pros. - Efficient inference with approximate latent variables. - generate sharp image. - no need for any Markov chain or approx networks during sampling. - very simple and stable training process - currently gives the best log likelihood. - tractable likelihood Cons. - generated samples tend to be blurry. - difficult to optimize due to unstable training dynamics. - relatively inefficient during sampling This slide is from Yohei Sugawara 6
  • 10. Auto-regressive image modeling The joint distribution over the image pixel is factorized into a product of conditional distribution. p(x) = n2 i=1 p(xi |x1, . . . , xi−1) p(xi,R |X<i )p(xi,G |X<i , xi,R )p(xi,B |X<i , xi,R , xi,G ) 7
  • 11. Proposed models • PixelRNN: Row LSTM, Diagonal LSTM • PixelCNN • Multi-Scale PixelRNN 8
  • 12. Generative image modeling with Spatial LSTM MCGSM: mixtures of conditional Gaussian mixutre6 The figure is from RIDE7 6Lucas Theis, Reshad Hosseini, and Matthias Bethge. “Mixtures of conditional Gaussian scale mixtures applied to multiscale image representations”. In: PloS one (2012). 7Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial LSTMs”. In: NIPS. 2015. 9
  • 13. Row LSTM • Capture a roughly triangular context. • 1-D convolutional Kernel size K 3 • Convolution is masked • Input to state is parallelized (output feature size is 4h × n × n) 10
  • 14. Diagonal BiLSTM • Capture the entire available context • Scan the image in diagonal 11
  • 15. Diagonal BiLSTM Skew Operation • Parallelized by skew operation • n × n ←→ n × (2n − 1) • Convolutional kernel is 2 x 1 12
  • 16. PixelCNN • Large bounded receptive field replace the PixelRNN’s unbounded dependency • Turn the problem into pixel level classification problem • Parallelization on train step but not test generation step 13
  • 17. PixelRNN vs PixelCNN Previous work: Pixel Recurrent Neural Networks.  “Pixel Recurrent Neural Networks” got best paper award at ICML2016.  They proposed two types of models, PixelRNN and PixelCNN (two types of LSTM layers are proposed for PixelRNN.) PixelCNNPixelRNN masked convolution Row LSTM Diagonal BiLSTM PixelRNN PixelCNN Pros. • effectively handles long-range dependencies ⇒ good performance Convolutions are easier to parallelize ⇒ much faster to train Cons. • Each state needs to be computed sequentially. ⇒ computationally expensive Bounded receptive field ⇒ inferior performance Blind spot problem (due to the masked convolution) needs to be eliminated. • LSTM based models are natural choice for dealing with the autoregressive dependencies. • CNN based model uses masked convolution, to ensure the model is causal. 11w 12w 13w 21w 22w 23w 31w 32w 33w  This slide is from Yohei Sugawara 14
  • 18. Multi-scale PixelRNN • Uncondional PixelRNN and one more conditional PixelRNNs • Use a small original image as a sample. • Conditional network is similar to PixelRNN but biased by up-sampled version of the given small image. 15
  • 20. Residual Connections • Deep network: PixelRNN 12 layers, PixelCNN 15 layers • Residual connection increase convergence speed and propagate 16
  • 21. Masked Convolution • Masks are adopted to avoid capturing future context. • Mask A is only used at the first convolutional layer, mask B is all the subsequent input-to-state convolutional transitions. MADE:Masked Autoencoder for Distribution Estimation8 8Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.” In: ICML. 2015. 17
  • 22. Discrete Softmax Distribution • Regression problem to classification problem • Easy implementation but better result 18
  • 25. Evaluation • Dataset: MNIST, CIFAR-10, and ImageNet • Method: log-likelihood 20
  • 29. Summary • Raw and Diagonal LSTM, PixelCNN • Using softmax layer • Using Masked convolution • Using Residual connection • New SoA MNIST, CIFAR-10 and tested on ImageNet 23
  • 30. Useful resources • Sergei Turukin PixelCNN post and implementation • PixeRNN conference presentation • PixelRNN Review byKyle Kastner • Post for Draw 24