SlideShare a Scribd company logo
1 of 13
Download to read offline
The Reversible Residual Network:
Backpropagation Without Storing Activations
Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B.
Grosse
presentation by Jiaqi Yang
LAMDA Group
Idea
Deep residual networks (ResNets) are the state-of-the-art
architecture across multiple computer vision tasks. The key
architectural innovation behind ResNets was the residual block.
Memory consumption is a bottleneck of deep neural networks, as
one needs to store the activations in order to calculate gradients
using backpropagation.
If we can restore activation from outputs, then backpropagation can
be as memory efficient as forward pass.
1
Related work
2
Related work
Trade memory with computation.
Checkpointing: divide to O(
√
n) blocks, reduce memory to O(
√
n).
Exploit the idea of checkpointing recursively:
g(n) = k + g(n/(k + 1)) =⇒ g(n) = klogk+1(n).
k = 1 =⇒ g(n) = log2(n).
computational complexity: O(nlogn).
3
ResNet
One of the main difficulties in training very deep networks is the
problem of exploding and vanishing gradients.
residual blocks:
y = x + f(x)
The basic and bottleneck residual block:
a(x) = ReLU(BN(x))
ck = Convk×k(a(x))
Basic(x) = c3(c3(x))
Bottleneck(x) = c1(c3(c1(x)))
4
Reversible Residual Blocks
Partition the units in each layer into two groups, denoted x1 and x2.
Partition the channels.
Each reversible block takes inputs (x1, x2) and produces outputs
(y1, y2).
y1 = x1 + f(x2)
y2 = x2 + g(y1)
Each layer’s activations can be reconstructed from the next layer’s
activations:
x2 = y2 − g(y1)
x1 = y1 − f(x2)
5
Algorithm
6
Extend to RNN
Reversible Recurrent Neural Networks (NIPS 2018).
Trouble: forget-gate
ht
= zt
⊙ ht−1
+ (1 − zt
) ⊙ gt
The forget gate make it hard to use the same idea directly.
Drop the forget-gate?
ht
= ht−1
+ (1 − zt
) ⊙ gt
7
Extend to RNN
Simply drop the forget gate will harm performance (they call it:
Impossibility of No Forgetting), show by repeat.
Deal with fixed point math explicitly (still need to tolerate some loss)
=⇒ Gradient-based Hyperparameter Optimization through
Reversible Learning (ICML 2015).
Attention mechanism: crop the cell state to a fraction.
8
architecture
9
Experiments
10
Experiments
11
Experiments
12

More Related Content

What's hot

2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
nozomuhamada
 

What's hot (20)

2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
Circular Convolution
Circular ConvolutionCircular Convolution
Circular Convolution
 
Project PPT
Project PPTProject PPT
Project PPT
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
 
Low-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamicsLow-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamics
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
About Unsupervised Image-to-Image Translation
About Unsupervised Image-to-Image TranslationAbout Unsupervised Image-to-Image Translation
About Unsupervised Image-to-Image Translation
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural NetworkISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network
 
Vgg
VggVgg
Vgg
 
fast-matmul-ppopp2015
fast-matmul-ppopp2015fast-matmul-ppopp2015
fast-matmul-ppopp2015
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
2013.10.24 big datavisualization
2013.10.24 big datavisualization2013.10.24 big datavisualization
2013.10.24 big datavisualization
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 

Similar to The reversible residual network

ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)
WoochulShin10
 

Similar to The reversible residual network (20)

Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
ASCC2022_JunsooKim_220530_.pdf
ASCC2022_JunsooKim_220530_.pdfASCC2022_JunsooKim_220530_.pdf
ASCC2022_JunsooKim_220530_.pdf
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Eye deep
Eye deepEye deep
Eye deep
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn
 
chap4_ann.pptx
chap4_ann.pptxchap4_ann.pptx
chap4_ann.pptx
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
 
Lecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsLecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network Models
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGA
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
Fast Algorithms for Quantized Convolutional Neural Networks
Fast Algorithms for Quantized Convolutional Neural NetworksFast Algorithms for Quantized Convolutional Neural Networks
Fast Algorithms for Quantized Convolutional Neural Networks
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
 

Recently uploaded

Lipids: types, structure and important functions.
Lipids: types, structure and important functions.Lipids: types, structure and important functions.
Lipids: types, structure and important functions.
Cherry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
Cherry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Cherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 

Recently uploaded (20)

FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Lipids: types, structure and important functions.
Lipids: types, structure and important functions.Lipids: types, structure and important functions.
Lipids: types, structure and important functions.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 

The reversible residual network

  • 1. The Reversible Residual Network: Backpropagation Without Storing Activations Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse presentation by Jiaqi Yang LAMDA Group
  • 2. Idea Deep residual networks (ResNets) are the state-of-the-art architecture across multiple computer vision tasks. The key architectural innovation behind ResNets was the residual block. Memory consumption is a bottleneck of deep neural networks, as one needs to store the activations in order to calculate gradients using backpropagation. If we can restore activation from outputs, then backpropagation can be as memory efficient as forward pass. 1
  • 4. Related work Trade memory with computation. Checkpointing: divide to O( √ n) blocks, reduce memory to O( √ n). Exploit the idea of checkpointing recursively: g(n) = k + g(n/(k + 1)) =⇒ g(n) = klogk+1(n). k = 1 =⇒ g(n) = log2(n). computational complexity: O(nlogn). 3
  • 5. ResNet One of the main difficulties in training very deep networks is the problem of exploding and vanishing gradients. residual blocks: y = x + f(x) The basic and bottleneck residual block: a(x) = ReLU(BN(x)) ck = Convk×k(a(x)) Basic(x) = c3(c3(x)) Bottleneck(x) = c1(c3(c1(x))) 4
  • 6. Reversible Residual Blocks Partition the units in each layer into two groups, denoted x1 and x2. Partition the channels. Each reversible block takes inputs (x1, x2) and produces outputs (y1, y2). y1 = x1 + f(x2) y2 = x2 + g(y1) Each layer’s activations can be reconstructed from the next layer’s activations: x2 = y2 − g(y1) x1 = y1 − f(x2) 5
  • 8. Extend to RNN Reversible Recurrent Neural Networks (NIPS 2018). Trouble: forget-gate ht = zt ⊙ ht−1 + (1 − zt ) ⊙ gt The forget gate make it hard to use the same idea directly. Drop the forget-gate? ht = ht−1 + (1 − zt ) ⊙ gt 7
  • 9. Extend to RNN Simply drop the forget gate will harm performance (they call it: Impossibility of No Forgetting), show by repeat. Deal with fixed point math explicitly (still need to tolerate some loss) =⇒ Gradient-based Hyperparameter Optimization through Reversible Learning (ICML 2015). Attention mechanism: crop the cell state to a fraction. 8