SlideShare a Scribd company logo
1 of 19
Download to read offline
Abhishek Koirala Neural Discrete Representation Learning
Neural Discrete Representation Learning
(DeepMind Research)
Paper Review
Authors:
● Aaron van den Oord
● Oriol Vinyals
● Koray Kavukcuoglu
Abhishek Koirala Neural Discrete Representation Learning
What the paper introduces?
Concept of VQ-VAEs
Different from VAEs in two ways
1) Encoder network outputs discrete rather than continuous codes
2) Prior is learnt from the latent distribution rather than static priors
Abhishek Koirala Neural Discrete Representation Learning
AutoEncoders
Limitations of AutoEncoders
● Fixed dimensional latent space
● Cannot generate new samples directly from latent space
○ Due to unstructured and messy latent space
● Difficulty in generating new variety of samples
○ Autoencoders only relies on reconstruction loss which limits variability
Abhishek Koirala Neural Discrete Representation Learning
Variational AutoEncoders (VAE)
● Probabilistic modelling of latent space
● Generally enforcing a prior which is standard normal distribution
● Structured latent space
○ VAEs enforce probabilistic prior resulting in structured latent space
● Regularization and control over latent space
● Sampling based generation
○ Enables generation of new samples by sampling from the
learned latent space distribution
Limitation of VAEs
● Disentanglement of latent features
● Static prior
● Posterior collapse
Joseph Rocca, Understanding Variational AutoEncoders
Abhishek Koirala Neural Discrete Representation Learning
Vector Quantization - Variational AutoEncoders(VQ-VAE)
● Discrete latent representation
● Prior is learnt from the data rather than assuming it as static
● Avoid posterior collapse
Additional Contributions
● Discrete latent model performs as well as its continuous counterpart
● When paired with a powerful prior, the samples are high quality on a wide
range of applications such as image, speech and video generation
Abhishek Koirala Neural Discrete Representation Learning
Vector Quantization - Variational AutoEncoders(VQ-VAE)
Posterior Distribution
Prior Distribution p(z)
Posterior and priors in
VAEs are assumed
normally distributed
In VQ-VAE
● Discrete latent variables
● Novel training method inspired by Vector
Quantization
● Posterior and prior distributions are
categorical
Yang, Y. et. al(2019). Improving the classification effectiveness of intrusion detection by using improved conditional
variational autoencoder and deep neural network. Sensors, 19(11), 2528.
Abhishek Koirala Neural Discrete Representation Learning
Vector Quantization
● Codebook initialization: Create initial set of representative codewords serving as prototype for discrete latent
variables
● Encoding: Map the input data to nearest codeword in codebook, quantizing the continuous values into discrete latent
variables
● Discrete latent variables: Represent the input data using discrete latent variables obtained from encoding step
● Decoding: Reconstruct the data from discrete latent variables by mapping them back to the corresponding codewords
in the codebook
● Training
Abhishek Koirala Neural Discrete Representation Learning
Training VQ-VAE
Forward Pass
● Intend to find the index of
codebook vector which has the
least distance from the encoding
vector
● Fill it in q(z|x) → Posterior
● Reconstruct the decoder i/p using
the posterior indices
BackPropagation
● We cannot compute gradients here because of the discrete latent
variables which are non-differentiable
● Copy the gradients from the decoder input onto the gradient for
encoder output through Straight Through Estimator
Abhishek Koirala Neural Discrete Representation Learning
Training VQ-VAE
Loss Function
Reconstruction loss
VQ Loss Commitment Loss
Enforces reconstruction
closer to input
Enforces embedding
space close to encoder
input
Enforces encoder
output close to
embedding space
Stop Gradients (sg) used in both VQ loss and commitment loss for
updating of embedding space parameters and encoder parameters
respectively.
Decoders → optimizes the reconstruction loss
Encoder → optimizes the reconstruction loss and commitment loss
Embeddings/Codebook → optimized by VQ loss
The term β used in
commitment loss depends
on the scale of
reconstruction loss .
Higher the reconstruction
loss~higher the β value to amplify
the impact of commitment loss
Abhishek Koirala Neural Discrete Representation Learning
Prior
● The prior is kept constant and uniform.
● An autoregressive distribution is fit over z, p(z) to generate x via ancestral sampling
● 2 autoregressive models discussed here
○ PixelCNN over discrete latent for images
○ WaveNet for raw audio
Training of prior and VQ-VAE jointly is left as a future research
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Comparison with continuous variables
CIFAR 10 dataset, ADAM optimizer, 50 samples used in training objective for VIMCO
Models Features Results(bits/dim)
VQ-VAE Discrete latent
representation
4.67
VIMCO Gaussian or categorical
priors
5.14
VAE Continuous variables 4.51
Note: bits/dim measures the average number of bits required to represent each dimension of input data. A lower value indicates
better compression and reconstruction performance
VQ-VAE becomes the first model that challenges the performance of continuous VAEs
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 1
● Model achieves a reduction of approximately 42.6 bits per images
● A powerful prior model called PixelCNN is trained over the discrete latent space to capture global
structures instead of low level image statistics
● Results are slightly blurry but still retain the overall content
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 2
● Trained a PixelCNN prior on the 32*32*1 latent space using spatial masking
● Samples drawn from PixelCNN were mapped to pixel-space with decoder of VQ-VAE
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 3
● Same experiment as 2 for 84*84*3 frames drawn from the DeepMind Lab environment
● Reconstruction looked nearly identical to their originals
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 4
● Training of second VQ-VAE with a PixelCNN decoder on top of 21*21*1 latent space obtained from first vQ-VAE trained
on DM-LAB frames
● Interesting setup, because VAE would suffer from “posterior collapse” due to powerful decoder enough to perfectly
model the input data
● Posterior collapse not observed
Abhishek Koirala Neural Discrete Representation Learning
Experiments
2) Audio
Reconstructions Samples from prior
Aäron van den Oord, Neural Discrete Representation Learning Aäron van den Oord, Neural Discrete Representation Learning
Although the reconstructed waveforms are
different, but the semantic meaning in the audio is
retained without knowing any information about the
language or speaker details
Abhishek Koirala Neural Discrete Representation Learning
Experiments
2) Video
Abhishek Koirala Neural Discrete Representation Learning
Conclusion
● Introduced VQ-VAE: combines VAEs with vector quantization.
● VQ-VAEs capture long-term dependencies in data.
● Successful experiments: generate images, video sequences, and meaningful speech.
● Discrete latent space learns important features without supervision.
● VQ-VAEs achieve comparable likelihoods to continuous latent variable models, model long-range
sequences, and learn speech descriptors related to phonemes in an unsupervised fashion.
Abhishek Koirala Neural Discrete Representation Learning

More Related Content

What's hot

[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報Deep Learning JP
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2harmonylab
 
【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...
【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...
【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...Deep Learning JP
 
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイDeep Learning JP
 
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...Deep Learning JP
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...Deep Learning JP
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...Deep Learning JP
 
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向SSII
 
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...Deep Learning JP
 
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...Deep Learning JP
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)cvpaper. challenge
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for VisionDeep Learning JP
 
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...준식 최
 

What's hot (20)

[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
 
実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE
 
【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...
【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...
【DL輪読会】Representational Continuity for Unsupervised Continual Learning ( ICLR...
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
 
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
 
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
 
continual learning survey
continual learning surveycontinual learning survey
continual learning survey
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
 
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
[DL輪読会]RobustNet: Improving Domain Generalization in Urban- Scene Segmentatio...
 
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision
 
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
 
Anomaly detection survey
Anomaly detection surveyAnomaly detection survey
Anomaly detection survey
 

Similar to Neural Discrete Representation Learning - A paper review

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Hao-Wen (Herman) Dong
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classificationCenk Bircanoğlu
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Introduction to Artificial Neural Networks - PART II.pdf
Introduction to Artificial Neural Networks - PART II.pdfIntroduction to Artificial Neural Networks - PART II.pdf
Introduction to Artificial Neural Networks - PART II.pdfSasiKala592103
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...MYEONGGYU LEE
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkNAVER Engineering
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptxNoorUlHaq47
 
jefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretrainingjefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretrainingcevesom156
 
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...Sunghyun Park
 

Similar to Neural Discrete Representation Learning - A paper review (20)

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Introduction to Artificial Neural Networks - PART II.pdf
Introduction to Artificial Neural Networks - PART II.pdfIntroduction to Artificial Neural Networks - PART II.pdf
Introduction to Artificial Neural Networks - PART II.pdf
 
False colouring
False colouringFalse colouring
False colouring
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
WaveNet
WaveNetWaveNet
WaveNet
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Unit 5 Quantization
Unit 5 QuantizationUnit 5 Quantization
Unit 5 Quantization
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
jefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretrainingjefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretraining
 
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Neural Discrete Representation Learning - A paper review

  • 1. Abhishek Koirala Neural Discrete Representation Learning Neural Discrete Representation Learning (DeepMind Research) Paper Review Authors: ● Aaron van den Oord ● Oriol Vinyals ● Koray Kavukcuoglu
  • 2. Abhishek Koirala Neural Discrete Representation Learning What the paper introduces? Concept of VQ-VAEs Different from VAEs in two ways 1) Encoder network outputs discrete rather than continuous codes 2) Prior is learnt from the latent distribution rather than static priors
  • 3. Abhishek Koirala Neural Discrete Representation Learning AutoEncoders Limitations of AutoEncoders ● Fixed dimensional latent space ● Cannot generate new samples directly from latent space ○ Due to unstructured and messy latent space ● Difficulty in generating new variety of samples ○ Autoencoders only relies on reconstruction loss which limits variability
  • 4. Abhishek Koirala Neural Discrete Representation Learning Variational AutoEncoders (VAE) ● Probabilistic modelling of latent space ● Generally enforcing a prior which is standard normal distribution ● Structured latent space ○ VAEs enforce probabilistic prior resulting in structured latent space ● Regularization and control over latent space ● Sampling based generation ○ Enables generation of new samples by sampling from the learned latent space distribution Limitation of VAEs ● Disentanglement of latent features ● Static prior ● Posterior collapse Joseph Rocca, Understanding Variational AutoEncoders
  • 5. Abhishek Koirala Neural Discrete Representation Learning Vector Quantization - Variational AutoEncoders(VQ-VAE) ● Discrete latent representation ● Prior is learnt from the data rather than assuming it as static ● Avoid posterior collapse Additional Contributions ● Discrete latent model performs as well as its continuous counterpart ● When paired with a powerful prior, the samples are high quality on a wide range of applications such as image, speech and video generation
  • 6. Abhishek Koirala Neural Discrete Representation Learning Vector Quantization - Variational AutoEncoders(VQ-VAE) Posterior Distribution Prior Distribution p(z) Posterior and priors in VAEs are assumed normally distributed In VQ-VAE ● Discrete latent variables ● Novel training method inspired by Vector Quantization ● Posterior and prior distributions are categorical Yang, Y. et. al(2019). Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors, 19(11), 2528.
  • 7. Abhishek Koirala Neural Discrete Representation Learning Vector Quantization ● Codebook initialization: Create initial set of representative codewords serving as prototype for discrete latent variables ● Encoding: Map the input data to nearest codeword in codebook, quantizing the continuous values into discrete latent variables ● Discrete latent variables: Represent the input data using discrete latent variables obtained from encoding step ● Decoding: Reconstruct the data from discrete latent variables by mapping them back to the corresponding codewords in the codebook ● Training
  • 8. Abhishek Koirala Neural Discrete Representation Learning Training VQ-VAE Forward Pass ● Intend to find the index of codebook vector which has the least distance from the encoding vector ● Fill it in q(z|x) → Posterior ● Reconstruct the decoder i/p using the posterior indices BackPropagation ● We cannot compute gradients here because of the discrete latent variables which are non-differentiable ● Copy the gradients from the decoder input onto the gradient for encoder output through Straight Through Estimator
  • 9. Abhishek Koirala Neural Discrete Representation Learning Training VQ-VAE Loss Function Reconstruction loss VQ Loss Commitment Loss Enforces reconstruction closer to input Enforces embedding space close to encoder input Enforces encoder output close to embedding space Stop Gradients (sg) used in both VQ loss and commitment loss for updating of embedding space parameters and encoder parameters respectively. Decoders → optimizes the reconstruction loss Encoder → optimizes the reconstruction loss and commitment loss Embeddings/Codebook → optimized by VQ loss The term β used in commitment loss depends on the scale of reconstruction loss . Higher the reconstruction loss~higher the β value to amplify the impact of commitment loss
  • 10. Abhishek Koirala Neural Discrete Representation Learning Prior ● The prior is kept constant and uniform. ● An autoregressive distribution is fit over z, p(z) to generate x via ancestral sampling ● 2 autoregressive models discussed here ○ PixelCNN over discrete latent for images ○ WaveNet for raw audio Training of prior and VQ-VAE jointly is left as a future research
  • 11. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Comparison with continuous variables CIFAR 10 dataset, ADAM optimizer, 50 samples used in training objective for VIMCO Models Features Results(bits/dim) VQ-VAE Discrete latent representation 4.67 VIMCO Gaussian or categorical priors 5.14 VAE Continuous variables 4.51 Note: bits/dim measures the average number of bits required to represent each dimension of input data. A lower value indicates better compression and reconstruction performance VQ-VAE becomes the first model that challenges the performance of continuous VAEs
  • 12. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 1 ● Model achieves a reduction of approximately 42.6 bits per images ● A powerful prior model called PixelCNN is trained over the discrete latent space to capture global structures instead of low level image statistics ● Results are slightly blurry but still retain the overall content
  • 13. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 2 ● Trained a PixelCNN prior on the 32*32*1 latent space using spatial masking ● Samples drawn from PixelCNN were mapped to pixel-space with decoder of VQ-VAE
  • 14. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 3 ● Same experiment as 2 for 84*84*3 frames drawn from the DeepMind Lab environment ● Reconstruction looked nearly identical to their originals
  • 15. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 4 ● Training of second VQ-VAE with a PixelCNN decoder on top of 21*21*1 latent space obtained from first vQ-VAE trained on DM-LAB frames ● Interesting setup, because VAE would suffer from “posterior collapse” due to powerful decoder enough to perfectly model the input data ● Posterior collapse not observed
  • 16. Abhishek Koirala Neural Discrete Representation Learning Experiments 2) Audio Reconstructions Samples from prior Aäron van den Oord, Neural Discrete Representation Learning Aäron van den Oord, Neural Discrete Representation Learning Although the reconstructed waveforms are different, but the semantic meaning in the audio is retained without knowing any information about the language or speaker details
  • 17. Abhishek Koirala Neural Discrete Representation Learning Experiments 2) Video
  • 18. Abhishek Koirala Neural Discrete Representation Learning Conclusion ● Introduced VQ-VAE: combines VAEs with vector quantization. ● VQ-VAEs capture long-term dependencies in data. ● Successful experiments: generate images, video sequences, and meaningful speech. ● Discrete latent space learns important features without supervision. ● VQ-VAEs achieve comparable likelihoods to continuous latent variable models, model long-range sequences, and learn speech descriptors related to phonemes in an unsupervised fashion.
  • 19. Abhishek Koirala Neural Discrete Representation Learning