SlideShare a Scribd company logo
1 of 100
Download to read offline
Multi-Dimensional RNNs
Grigory Sapunov
Moscow Computer Vision Group
Moscow, Yandex, 27.04.2016
gs@inten.to
Outline
● RNN intro
○ RNN vs. FFNN
○ LSTM/GRU
○ CTC
● Some architectural issues
○ BRNN
○ MDRNN
○ MDMDRNN
○ HSRNN
● Some recent results
○ (same ideas) 2D LSTM, CLSTM, C-RNN, C-HRNN
○ (new ideas) ReNet, PyraMiD-LSTM, Grid LSTM
● (Discussion) RNN vs CNN for Computer Vision
● Resources
A tiny intro into RNNs
Feedforward NN vs. Recurrent NN
Recurrent neural networks (RNNs) allow cyclical connections.
Unfolding the RNN and training using BPTT
Can do backprop on the unfolded network: Backpropagation through time (BPTT)
http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf
Neural Network properties
Feedforward NN (FFNN):
● FFNN is a universal approximator: feed-forward network with a single hidden layer,
which contains finite number of hidden neurons, can approximate continuous functions
on compact subsets of Rn
, under mild assumptions on the activation function.
● Typical FFNNs have no inherent notion of order in time. They remember only training.
Recurrent NN (RNN):
● RNNs are Turing-complete: they can compute anything that can be computed and
have the capacity to simulate arbitrary procedures.
● RNNs possess a certain type of memory. They are much better suited to dealing with
sequences, context modeling and time dependencies.
RNN problem: Vanishing gradients
Solution: Long short-term memory (LSTM, Hochreiter, Schmidhuber, 1997)
LSTM cell
LSTM network
LSTM: Fixing vanishing gradient problem
Comparing LSTM and Simple RNN
More on LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Another solution: Gated Recurrent Unit (GRU)
GRU (Cho et al., 2014) is a bit simpler than LSTM (less weights)
Another useful thing: CTC Output Layer
CTC (Connectionist Temporal Classification; Graves, Fernández, Gomez,
Schmidhuber, 2006) was specifically designed for temporal classification tasks; that
is, for sequence labelling problems where the alignment between the inputs and the
target labels is unknown.
CTC models all aspects of the sequence with a single neural network, and does not
require the network to be combined with a hidden Markov model. It also does not
require presegmented training data, or external post-processing to extract the
label sequence from the network outputs.
The CTC network predicts only the sequence of phonemes (typically as a series
of spikes, separated by ‘blanks’, or null predictions), while the framewise network
attempts to align them with the manual segmentation.
Example: CTC vs. Framewise classification
End of Intro
So, further we will not make a distinction between RNN/GRU/LSTM, and will usually
be using the word RNN for any kind of internal block. Typically most RNNs now are
actually LSTMs.
Significant part of the presentation is based on works of Alex Graves et al.
Some interesting generalizations
of simple RNN architecture
#1 Directionality (BRNN/BLSTM)
Bidirectional RNN/LSTM
There are many situations when you see the whole sequence at once (OCR,
speech recognition, translation, caption generation, …).
So you can scan the [1-d] sequence in both directions, forward and backward.
Here comes BLSTM (Graves, Schmidhuber, 2005).
BLSTM
Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
Example: BLSTM classifying the utterance “one oh five”
#2 Dimensionality (MDRNN/MDLSTM)
Multidimensional RNN/LSTM
Standard RNNs are inherently one dimensional, and therefore poorly suited to
multidimensional data (e.g. images).
The basic idea of MDRNNs (Graves, Fernandez, Schmidhuber, 2007) is to replace
the single recurrent connection found in standard RNNs with as many recurrent
connections as there are dimensions in the data.
It assumes some ordering on the multidimensional data.
MDRNN
The basic idea of MDRNNs is to replace the single recurrent connection found
in standard RNNs with as many recurrent connections as there are dimensions
in the data.
Uni-directionality
MDRNN assumes some ordering on the multidimensional data. And it’s not the
only possible one.
Uni-directionality
#3 Directionality + Dimensionality
(MDMDRNN?)
Multidirectional multidimensional RNN (MDMDRNN?)
The previously mentioned ordering is not the only possible one. It might be OK for
some tasks, but it is usually preferable for the network to have access to the
surrounding context in all directions. This is particularly true for tasks where precise
localisation is required, such as image segmentation.
For one dimensional RNNs, the problem of multidirectional context was solved by
the introduction of bidirectional recurrent neural networks (BRNNs). BRNNs contain
two separate hidden layers that process the input sequence in the forward and
reverse directions.
BRNNs can be extended to n-dimensional data by using 2n
separate hidden layers,
each of which processes the sequence using the ordering defined above, but with a
different choice of axes.
Multi-directionality
Multi-directionality
As before, the hidden layers are connected to a single output layer, which now
has access to all surrounding context
MDMDRNN example: Air Freight database (2007)
A ray-traced colour image sequence that comes with a ground truth segmentation
into the different textures mapped onto the 3-d models. The sequence is 455
frames (160x120 px) long and contains 155 distinct textures.
MDMDRNN example: Air Freight database (2007)
Network structure:
● Multidirectional 2D LSTM.
● 4 layers (not levels! just 4 directional layers on a single level) consisted of 25
memory blocks, each containing 1 cell, 2 forget gates, 1 input gate, 1 output
gate and 5 peephole weights.
● The input and output activation function of the cells was tanh, and the
activation function for the gates was the logistic sigmoid.
● The input layer was size 3 (RGB) and the output layer (softmax) was size 155
(one unit for each texture).
● The network contained 43,257 trainable weights in total.
● The final pixel classification error rate, after 330 training epochs, was 7.1% on
the test set.
MDMDRNN example: Air Freight database (2007)
MDMDRNN example: MNIST (2007)
Additional evaluation on the warped dataset (not used in training at all)
MDMDRNN example: MNIST (2007)
#4 Hierarchical subsampling (HSRNN)
Hierarchical Subsampling Networks (HSRNN)
So-called hierarchical subsampling is commonly used in fields such as computer
vision where the volume of data is too great to be processed by a ‘flat’
architecture. As well as reducing computational cost, it also reduces the effective
dispersal of the data, since inputs that are widely separated at the bottom of the
hierarchy are transformed to features that are close together at the top.
A hierarchical subsampling recurrent neural network (HSRNN, Graves and
Schmidhuber, 2009) consists of an input layer, an output layer and multiple levels of
recurrently connected hidden layers. The output sequence of each level in the
hierarchy is used as the input sequence for the next level up. All input sequences
are subsampled using subsampling windows of predetermined width. The structure
is similar to that used by convolutional networks, except with recurrent, rather
than feedforward, hidden layers.
HSRNN
HSRNN
For each layer in the hierarchy, the forward pass equations are identical to those for
a standard RNN, except that the sum over input units is replaced by a sum of sums
over the subsampling window.
A good rule of thumb is to choose the layer sizes so that each level consumes
roughly half the processing time of the level below.
HSRNN
HSRNN
Can be easily extended into multidimensional and multidirectional case.
The problem is that each level of the hierarchy requires 2n
hidden layers instead of
one. To connect every layer at one level to every layer at the next therefore requires
O(22n
) weights.
One way to reduce the number of weights is to separate the levels with nonlinear
feedforward layers, which reduces the number of weights between the levels to O
(2n
)—the same as standard MDRNNs.
As a rule of thumb, giving each feedforward layer between half and one times as
many units as the combined hidden layers in the level below appears to work well in
practice.
HSRNN
HSRNN example: Arabic handwriting recognition
Network structure:
● The hierarchy contained three levels, multidirectional MDLSTM (so 4 hidden
layers for 2D data).
● The three levels were separated by two feedforward layers with the tanh
activation function.
● Subsampling windows were applied in three places: to the input sequence, to
the output sequence of the first hidden level, and to the output sequence of
the second hidden level.
HSRNN
Offline arabic handwriting recognition (2009)
● 32,492 black-and-white images of individual handwritten Tunisian town and
village names, of which we used 30,000 for training, and 2,492 for validation
● Each image was supplied with a manual transcription for the individual
characters, and the postcode of the corresponding town. There were 120
distinct characters in total
● The task was to identify the postcode, from a list of 937 town names and
corresponding postcodes. Many of the town names had transcription variants,
giving a total of 1,518 entries in the complete postcode lexicon.
● The test data (which is not published) was divided into sets ‘f’ and ‘s’. The
main competition results were based on set ‘f’. Set ‘s’ contains data collected
in the United Arab Emirates using the same forms; its purpose was to test the
robustness of the recognisers to regional writing variations
Offline arabic handwriting recognition (2009)
Offline arabic handwriting recognition (2009)
Some more recent examples
using the same ideas
Example #1:
Scene Labeling with 2D LSTM
(CVPR 2015)
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
2D LSTM for Scene Labeling (2015)
2D LSTM for Scene Labeling (2015)
Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015,
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
2D LSTM for Scene Labeling (2015)
Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
Example #2:
Convolutional LSTM (CLSTM)
(ILSVRC 2015)
http://image-net.org/challenges/posters/ILSVRC2015_Poster_VUNO.pdf
Convolutional LSTM (CLSTM) (2015)
Actually an LSTM over the last layers of CNN.
“Among various models, multi-dimensional recurrent neural network, specifically
multi-dimensional long-short term memory (MD-LSTM) has shown promising
results and can be naturally integrated and trained ‘end-to-end’ fashion. However,
when we try to learn the structure with very low level representation such as input
pixel level, the dependency structure can be too noisy or spatially long-term
dependency information can be vanished while training. Therefore, we propose to
use 2D-LSTM layer on top of convolutional layers by taking advantage of
convolution layers to extract high level representation of the image and 2D-LSTM
layer to learn global spatial dependencies. We call this network as convolutional
LSTM (CLSTM)”
Convolutional LSTM (CLSTM) (2015)
“Our CLSTM models are constructed by replacing the last
two convolution layer of CNN with two 2D LSTM layers.
Since we used multidirectional 2D LSTM, there are 2^2
directional nodes for each location of feature map.”
Example #3:
Convolutional RNN (C-RNN)
(CVPR 2015)
http://www.cv-foundation.org/openaccess/content_cvpr_workshops_2015/W03/papers/Zuo_Convolutional_Recurrent_Neural_2015_CVPR_paper.pdf
Convolutional RNN (C-RNN) (2015)
“The C-RNN is trained in an end-to-end manner from raw pixel images. CNN
layers are firstly processed to generate middle level features. RNN layer is then
learned to encode spatial dependencies.”
“In [13], MDLSTM was proposed to solve the handwriting recognition problem by
using RNN. Different from this work, we utilize quad-directional 1D RNN
instead of their 2D RNN, our RNN is simpler and it has fewer parameters, but it
can already cover the context from all directions. Moreover, our C-RNN make both
use of the discriminative representation power of CNN and contextual information
modeling capability of RNN, which is more powerful for solving large scale image
classification problem.”
Funny, it’s not an LSTM. Just simple RNN.
Convolutional RNN (C-RNN) (2015)
Convolutional RNN (C-RNN) (2015)
“Our C-RNN had the same settings with Alex-net, except that it directly connects
the output of the fifth convolutional layer to the sixth fully connected layer, while
our C-RNN uses the RNN to connect the fifth convolutional layer and the fully
connected layers”
Example #3B:
Convolutional Hierarchical RNN (C-
HRNN) (2015)
http://arxiv.org/abs/1509.03877
Convolutional hierarchical RNN (C-HRNN) (2015)
“In Hierarchical RNNs (HRNNs), each RNN layer focuses on modeling spatial
dependencies among image regions from the same scale but different locations.
While the cross RNN scale connections target on modeling scale dependencies
among regions from the same location but different scales.”
Finally with LSTM:
“Specifically, we propose two recurrent neural network models: 1) hierarchical
simple recurrent network (HSRN), which is fast and has low computational cost;
and 2) hierarchical long-short term memory recurrent network (HLSTM), which
performs better than HSRN with the price of more computational cost.”
Convolutional hierarchical RNN (C-HRNN) (2015)
Convolutional hierarchical RNN (C-HRNN) (2015)
“Thus, inspired by [22], we generate “2D sequences” for images, and each element
simultaneously receives spatial contextual references from its 2D neighborhood
elements.”
Convolutional hierarchical RNN (C-HRNN) (2015)
Some more recent examples
using new ideas
Example #4:
ReNet (2015)
[Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio]
http://arxiv.org/abs/1505.00393
ReNet (2015)
“Our model relies on purely uni-dimensional RNNs coupled in a novel way, rather
than on a multi-dimensional RNN. The basic idea behind the proposed ReNet
architecture is to replace each convolutional layer (with convolution+pooling
making up a layer) in the CNN with four RNNs that sweep over lower-layer
features in different directions: (1) bottom to top, (2) top to bottom, (3) left to right
and (4) right to left.”
“The main difference between ReNet and the model of Graves and Schmidhuber
[2009] is that we use the usual sequence RNN, instead of the multidimensional
RNN.“
ReNet (2015)
“One important consequence of the proposed approach
compared to the multidimensional RNN is that the
number of RNNs at each layer scales now linearly with
respect to the number of dimensions d of the input
image (2d). A multidimensional RNN, on the other
hand, requires the exponential number of RNNs at each
layer (2d
). Furthermore, the proposed variant is more
easily parallelizable, as each RNN is dependent only
along a horizontal or vertical sequence of patches. This
architectural distinction results in our model being much
more amenable to distributed computing than that of
Graves and Schmidhuber [2009]”.
… But for d=2 2d == 2d
ReNet (2015)
ReNet (2015)
Example #5: “The Empire Strikes Back”
PyraMiD-LSTM (2015)
[Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber]
http://arxiv.org/abs/1506.07452
PyraMiD-LSTM (2015)
“Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-
temporal context of each pixel in a few sweeps through all pixels, especially when
the RNN is a Long Short-Term Memory (LSTM). Despite these theoretical
advantages, however, unlike CNNs, previous MD-LSTM variants were hard to
parallelize on GPUs. Here we re-arrange the traditional cuboid order of
computations in MD-LSTM in pyramidal fashion. The resulting PyraMiD-LSTM is
easy to parallelize, especially for 3D data such as stacks of brain slice images.”
PyraMiD-LSTM (2015)
PyraMiD-LSTM (2015)
“One of the striking differences between PyraMiD-LSTM and MD-LSTM is the
shape of the scanned contexts. Each LSTM of an MD-LSTM scans rectangle-
like contexts in 2D or cuboids in 3D. Each LSTM of a PyraMiD-LSTM scans
triangles in 2D and pyramids in 3D. An MD-LSTM needs 8 LSTMs to scan a
volume, while a PyraMiD-LSTM needs only 6, since it takes 8 cubes or 6 pyramids
to fill a volume. Given dimension d, the number of LSTMs grows as 2d
for an MD-
LSTM (exponentially) and 2 × d for a PyraMiD-LSTM (linearly).”
PyraMiD-LSTM (2015)
PyraMiD-LSTM (2015)
PyraMiD-LSTM (2015)
PyraMiD-LSTM (2015)
PyraMiD-LSTM (2015)
PyraMiD-LSTM (2015)
On the MR brain dataset, training took around three days, and testing per volume
took around 2 minutes. Networks contain three PyraMiD-LSTM layers:
1. 16 hidden units + fully-connected layer with 25 hidden units;
2. 32 hidden units + fully-connected layer with 45 hidden units;
3. 64 hidden units + fully-connected output layer whose size #classes.
“Previous MD-LSTM implementations, however, could not exploit the parallelism
of modern GPU hardware. This has changed through our work presented here.
Although our novel highly parallel PyraMiD-LSTM has already achieved state-of-
the-art segmentation results in challenging benchmarks, we feel we have only
scratched the surface of what will become possible with such PyraMiD-
LSTM and other MD-RNNs.”
Example #6: Grid LSTM (ICLR 2016)
(Graves again!)
[Nal Kalchbrenner, Ivo Danihelka, Alex Graves]
http://arxiv.org/abs/1507.01526
Grid LSTM (2016)
“This paper introduces Grid Long Short-Term Memory, a network of LSTM cells
arranged in a multidimensional grid that can be applied to vectors, sequences
or higher dimensional data such as images. The network differs from existing
deep LSTM architectures in that the cells are connected between network layers
as well as along the spatiotemporal dimensions of the data. The network provides
a unified way of using LSTM for both deep and sequential computation.”
Grid LSTM (2016)
“Deep networks suffer from exactly the same problems as recurrent networks
applied to long sequences: namely that information from past computations rapidly
attenuates as it progresses through the chain – the vanishing gradient problem
(Hochreiter, 1991) – and that each layer cannot dynamically select or ignore its
inputs. It therefore seems attractive to generalise the advantages of LSTM to deep
computation.”
Can be N-dimensional. N-dimensional Grid LSTM is called N-LSTM for short.
Grid LSTM (2016)
One-dimensional Grid LSTM corresponds to a feed-forward network that uses
LSTM cells in place of transfer functions such as tanh and ReLU. These networks
are related to Highway Networks (Srivastava et al., 2015) where a gated transfer
function is used to successfully train feed-forward networks with up to 900 layers
of depth.
Grid LSTM with two dimensions is analogous to the Stacked LSTM, but it adds
cells along the depth dimension too.
Grid LSTM with three or more dimensions is analogous to Multidimensional
LSTM, but differs from it not just by having the cells along the depth dimension,
but also by using the proposed mechanism for modulating the N-way interaction
that is not prone to the instability present in Multidimesional LSTM.
Grid LSTM (2016)
Grid LSTM (2016)
Grid LSTM (2016)
Grid LSTM (2016)
Grid LSTM (2016)
The difference with the Multidimensional LSTM is that we apply multiple layers
of depth to the image, use three-dimensional blocks and concatenate the top
output vectors before classification.
The difference with the ReNet architecture is that the 3-LSTM processes the
image according to the two inherent spatial dimensions; instead of stacking hidden
layers as in the ReNet, the block also modulates directly what information is
passed along the depth dimension.
Time for Discussion:
RNN vs. CNN for Computer Vision
Resources
Resources (The Classic)
- Multi-Dimensional Recurrent Neural Networks,
Alex Graves, Santiago Fernandez, Juergen Schmidhuber
https://arxiv.org/abs/0705.2011
- Offline Handwriting Recognition with Multidimensional Recurrent Neural
Networks / NIPS 2009,
Alex Graves, Juergen Schmidhuber http://papers.nips.cc/paper/3449-offline-handwriting-recognition-with-
multidimensional-recurrent-neural-networks
Resources (The Classic)
- Supervised Sequence Labelling with Recurrent Neural Networks
Alex Graves, Springer, 2012
http://www.springer.com/us/book/9783642247965
https://www.cs.toronto.edu/~graves/preprint.pdf
- RNNLIB https://sourceforge.net/projects/rnnl/
- http://deeplearning.cs.cmu.edu/slides.2015/20.graves.pdf
Resources (more recent)
- Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015,
Wonmin Byeon, Thomas M. Breuel, Federico Raue, Marcus Liwicki http://www.cv-
foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
- Deep Convolutional and Recurrent Neural Network for Object Classification
and Localization / ILSVRC 2015
http://image-net.org/challenges/posters/ILSVRC2015_Poster_VUNO.pdf
- Convolutional Recurrent Neural Networks: Learning Spatial Dependencies
for Image Representation / CVPR 2015 http://www.cv-foundation.
org/openaccess/content_cvpr_workshops_2015/W03/papers/Zuo_Convolutional_Recurrent_Neural_2015_CVPR_paper.pdf
Resources (more recent)
- ReNet: A Recurrent Neural Network Based Alternative to Convolutional
Networks
Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville,
Yoshua Bengio
http://arxiv.org/abs/1505.00393
- Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical
Volumetric Image Segmentation
Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber
http://arxiv.org/abs/1506.07452
- Grid Long Short-Term Memory
Nal Kalchbrenner, Ivo Danihelka, Alex Graves
http://arxiv.org/abs/1507.01526
Resources (more recent)
- OCRopus
https://github.com/tmbdev/ocropy
https://github.com/tmbdev/clstm
The problem is MDRNNs are mostly unsupported. Not enough modern libraries
available.
More to come
- Graph LSTM http://arxiv.org/pdf/1603.07063v1.pdf
- Local-Global LSTM http://arxiv.org/pdf/1511.04510v1.pdf
- …
https://ru.linkedin.com/in/grigorysapunov
gs@inten.to
Thanks!

More Related Content

What's hot

What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...Simplilearn
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...Edge AI and Vision Alliance
 
Shor’s algorithm the ppt
Shor’s algorithm the pptShor’s algorithm the ppt
Shor’s algorithm the pptMrinal Mondal
 
Grovers Algorithm
Grovers Algorithm Grovers Algorithm
Grovers Algorithm CaseyHaaland
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learningayatan2
 
Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...
Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...
Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...Fujitsu Central Europe
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningJörgen Sandig
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers leopauly
 

What's hot (20)

Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
LSTM
LSTMLSTM
LSTM
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
 
Shor’s algorithm the ppt
Shor’s algorithm the pptShor’s algorithm the ppt
Shor’s algorithm the ppt
 
Grovers Algorithm
Grovers Algorithm Grovers Algorithm
Grovers Algorithm
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learning
 
Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...
Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...
Quantum Algorithms @ work - Short introduction to Quantum Annealing and opera...
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
"Attention Is All You Need" presented by Maroua Maachou (Veepee)
"Attention Is All You Need" presented by Maroua Maachou (Veepee)"Attention Is All You Need" presented by Maroua Maachou (Veepee)
"Attention Is All You Need" presented by Maroua Maachou (Veepee)
 

Similar to Multidimensional RNN

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15Karen Pao
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networksFörderverein Technische Fakultät
 
deeplearning
deeplearningdeeplearning
deeplearninghuda2018
 
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...ssuser4b1f48
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Machine Vision on Embedded Hardware
Machine Vision on Embedded HardwareMachine Vision on Embedded Hardware
Machine Vision on Embedded HardwareJash Shah
 
UNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on SparkUNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on SparkZhan Zhang
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...Sharath TS
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
 
STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...
STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...
STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...I3E Technologies
 

Similar to Multidimensional RNN (20)

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
 
deeplearning
deeplearningdeeplearning
deeplearning
 
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
 
Chapter 4 better.pptx
Chapter 4 better.pptxChapter 4 better.pptx
Chapter 4 better.pptx
 
Machine Vision on Embedded Hardware
Machine Vision on Embedded HardwareMachine Vision on Embedded Hardware
Machine Vision on Embedded Hardware
 
UNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on SparkUNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on Spark
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Et25897899
Et25897899Et25897899
Et25897899
 
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
 
STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...
STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...
STATE-CLUSTERING BASED MULTIPLE DEEP NEURAL NETWORKS MODELING APPROACH FOR SP...
 
Scene understanding
Scene understandingScene understanding
Scene understanding
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 

More from Grigory Sapunov

AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021Grigory Sapunov
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)Grigory Sapunov
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Grigory Sapunov
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionGrigory Sapunov
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)Grigory Sapunov
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Grigory Sapunov
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Grigory Sapunov
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep LearningGrigory Sapunov
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучениеGrigory Sapunov
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureGrigory Sapunov
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Computer Vision and Deep Learning
Computer Vision and Deep LearningComputer Vision and Deep Learning
Computer Vision and Deep LearningGrigory Sapunov
 

More from Grigory Sapunov (20)

Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
NLP in 2020
NLP in 2020NLP in 2020
NLP in 2020
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
BERTology meets Biology
BERTology meets BiologyBERTology meets Biology
BERTology meets Biology
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 version
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep Learning
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучение
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and Future
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Computer Vision and Deep Learning
Computer Vision and Deep LearningComputer Vision and Deep Learning
Computer Vision and Deep Learning
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

Multidimensional RNN

  • 1. Multi-Dimensional RNNs Grigory Sapunov Moscow Computer Vision Group Moscow, Yandex, 27.04.2016 gs@inten.to
  • 2. Outline ● RNN intro ○ RNN vs. FFNN ○ LSTM/GRU ○ CTC ● Some architectural issues ○ BRNN ○ MDRNN ○ MDMDRNN ○ HSRNN ● Some recent results ○ (same ideas) 2D LSTM, CLSTM, C-RNN, C-HRNN ○ (new ideas) ReNet, PyraMiD-LSTM, Grid LSTM ● (Discussion) RNN vs CNN for Computer Vision ● Resources
  • 3. A tiny intro into RNNs
  • 4. Feedforward NN vs. Recurrent NN Recurrent neural networks (RNNs) allow cyclical connections.
  • 5. Unfolding the RNN and training using BPTT Can do backprop on the unfolded network: Backpropagation through time (BPTT) http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf
  • 6. Neural Network properties Feedforward NN (FFNN): ● FFNN is a universal approximator: feed-forward network with a single hidden layer, which contains finite number of hidden neurons, can approximate continuous functions on compact subsets of Rn , under mild assumptions on the activation function. ● Typical FFNNs have no inherent notion of order in time. They remember only training. Recurrent NN (RNN): ● RNNs are Turing-complete: they can compute anything that can be computed and have the capacity to simulate arbitrary procedures. ● RNNs possess a certain type of memory. They are much better suited to dealing with sequences, context modeling and time dependencies.
  • 7. RNN problem: Vanishing gradients Solution: Long short-term memory (LSTM, Hochreiter, Schmidhuber, 1997)
  • 10. LSTM: Fixing vanishing gradient problem
  • 11. Comparing LSTM and Simple RNN More on LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 12. Another solution: Gated Recurrent Unit (GRU) GRU (Cho et al., 2014) is a bit simpler than LSTM (less weights)
  • 13. Another useful thing: CTC Output Layer CTC (Connectionist Temporal Classification; Graves, Fernández, Gomez, Schmidhuber, 2006) was specifically designed for temporal classification tasks; that is, for sequence labelling problems where the alignment between the inputs and the target labels is unknown. CTC models all aspects of the sequence with a single neural network, and does not require the network to be combined with a hidden Markov model. It also does not require presegmented training data, or external post-processing to extract the label sequence from the network outputs. The CTC network predicts only the sequence of phonemes (typically as a series of spikes, separated by ‘blanks’, or null predictions), while the framewise network attempts to align them with the manual segmentation.
  • 14. Example: CTC vs. Framewise classification
  • 15. End of Intro So, further we will not make a distinction between RNN/GRU/LSTM, and will usually be using the word RNN for any kind of internal block. Typically most RNNs now are actually LSTMs. Significant part of the presentation is based on works of Alex Graves et al.
  • 16. Some interesting generalizations of simple RNN architecture
  • 18. Bidirectional RNN/LSTM There are many situations when you see the whole sequence at once (OCR, speech recognition, translation, caption generation, …). So you can scan the [1-d] sequence in both directions, forward and backward. Here comes BLSTM (Graves, Schmidhuber, 2005).
  • 19. BLSTM
  • 20. Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
  • 21. Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
  • 22. Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN
  • 23. Example: BLSTM classifying the utterance “one oh five”
  • 25. Multidimensional RNN/LSTM Standard RNNs are inherently one dimensional, and therefore poorly suited to multidimensional data (e.g. images). The basic idea of MDRNNs (Graves, Fernandez, Schmidhuber, 2007) is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data. It assumes some ordering on the multidimensional data.
  • 26. MDRNN The basic idea of MDRNNs is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data.
  • 27. Uni-directionality MDRNN assumes some ordering on the multidimensional data. And it’s not the only possible one.
  • 29. #3 Directionality + Dimensionality (MDMDRNN?)
  • 30. Multidirectional multidimensional RNN (MDMDRNN?) The previously mentioned ordering is not the only possible one. It might be OK for some tasks, but it is usually preferable for the network to have access to the surrounding context in all directions. This is particularly true for tasks where precise localisation is required, such as image segmentation. For one dimensional RNNs, the problem of multidirectional context was solved by the introduction of bidirectional recurrent neural networks (BRNNs). BRNNs contain two separate hidden layers that process the input sequence in the forward and reverse directions. BRNNs can be extended to n-dimensional data by using 2n separate hidden layers, each of which processes the sequence using the ordering defined above, but with a different choice of axes.
  • 32. Multi-directionality As before, the hidden layers are connected to a single output layer, which now has access to all surrounding context
  • 33. MDMDRNN example: Air Freight database (2007) A ray-traced colour image sequence that comes with a ground truth segmentation into the different textures mapped onto the 3-d models. The sequence is 455 frames (160x120 px) long and contains 155 distinct textures.
  • 34. MDMDRNN example: Air Freight database (2007) Network structure: ● Multidirectional 2D LSTM. ● 4 layers (not levels! just 4 directional layers on a single level) consisted of 25 memory blocks, each containing 1 cell, 2 forget gates, 1 input gate, 1 output gate and 5 peephole weights. ● The input and output activation function of the cells was tanh, and the activation function for the gates was the logistic sigmoid. ● The input layer was size 3 (RGB) and the output layer (softmax) was size 155 (one unit for each texture). ● The network contained 43,257 trainable weights in total. ● The final pixel classification error rate, after 330 training epochs, was 7.1% on the test set.
  • 35. MDMDRNN example: Air Freight database (2007)
  • 36. MDMDRNN example: MNIST (2007) Additional evaluation on the warped dataset (not used in training at all)
  • 39. Hierarchical Subsampling Networks (HSRNN) So-called hierarchical subsampling is commonly used in fields such as computer vision where the volume of data is too great to be processed by a ‘flat’ architecture. As well as reducing computational cost, it also reduces the effective dispersal of the data, since inputs that are widely separated at the bottom of the hierarchy are transformed to features that are close together at the top. A hierarchical subsampling recurrent neural network (HSRNN, Graves and Schmidhuber, 2009) consists of an input layer, an output layer and multiple levels of recurrently connected hidden layers. The output sequence of each level in the hierarchy is used as the input sequence for the next level up. All input sequences are subsampled using subsampling windows of predetermined width. The structure is similar to that used by convolutional networks, except with recurrent, rather than feedforward, hidden layers.
  • 40. HSRNN
  • 41. HSRNN For each layer in the hierarchy, the forward pass equations are identical to those for a standard RNN, except that the sum over input units is replaced by a sum of sums over the subsampling window. A good rule of thumb is to choose the layer sizes so that each level consumes roughly half the processing time of the level below.
  • 42. HSRNN
  • 43. HSRNN Can be easily extended into multidimensional and multidirectional case. The problem is that each level of the hierarchy requires 2n hidden layers instead of one. To connect every layer at one level to every layer at the next therefore requires O(22n ) weights. One way to reduce the number of weights is to separate the levels with nonlinear feedforward layers, which reduces the number of weights between the levels to O (2n )—the same as standard MDRNNs. As a rule of thumb, giving each feedforward layer between half and one times as many units as the combined hidden layers in the level below appears to work well in practice.
  • 44. HSRNN
  • 45. HSRNN example: Arabic handwriting recognition Network structure: ● The hierarchy contained three levels, multidirectional MDLSTM (so 4 hidden layers for 2D data). ● The three levels were separated by two feedforward layers with the tanh activation function. ● Subsampling windows were applied in three places: to the input sequence, to the output sequence of the first hidden level, and to the output sequence of the second hidden level.
  • 46. HSRNN
  • 47. Offline arabic handwriting recognition (2009) ● 32,492 black-and-white images of individual handwritten Tunisian town and village names, of which we used 30,000 for training, and 2,492 for validation ● Each image was supplied with a manual transcription for the individual characters, and the postcode of the corresponding town. There were 120 distinct characters in total ● The task was to identify the postcode, from a list of 937 town names and corresponding postcodes. Many of the town names had transcription variants, giving a total of 1,518 entries in the complete postcode lexicon. ● The test data (which is not published) was divided into sets ‘f’ and ‘s’. The main competition results were based on set ‘f’. Set ‘s’ contains data collected in the United Arab Emirates using the same forms; its purpose was to test the robustness of the recognisers to regional writing variations
  • 48. Offline arabic handwriting recognition (2009)
  • 49. Offline arabic handwriting recognition (2009)
  • 50. Some more recent examples using the same ideas
  • 51. Example #1: Scene Labeling with 2D LSTM (CVPR 2015) http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
  • 52. 2D LSTM for Scene Labeling (2015)
  • 53. 2D LSTM for Scene Labeling (2015) Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015, http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
  • 54. 2D LSTM for Scene Labeling (2015) Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015 http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
  • 55. Example #2: Convolutional LSTM (CLSTM) (ILSVRC 2015) http://image-net.org/challenges/posters/ILSVRC2015_Poster_VUNO.pdf
  • 56. Convolutional LSTM (CLSTM) (2015) Actually an LSTM over the last layers of CNN. “Among various models, multi-dimensional recurrent neural network, specifically multi-dimensional long-short term memory (MD-LSTM) has shown promising results and can be naturally integrated and trained ‘end-to-end’ fashion. However, when we try to learn the structure with very low level representation such as input pixel level, the dependency structure can be too noisy or spatially long-term dependency information can be vanished while training. Therefore, we propose to use 2D-LSTM layer on top of convolutional layers by taking advantage of convolution layers to extract high level representation of the image and 2D-LSTM layer to learn global spatial dependencies. We call this network as convolutional LSTM (CLSTM)”
  • 57. Convolutional LSTM (CLSTM) (2015) “Our CLSTM models are constructed by replacing the last two convolution layer of CNN with two 2D LSTM layers. Since we used multidirectional 2D LSTM, there are 2^2 directional nodes for each location of feature map.”
  • 58. Example #3: Convolutional RNN (C-RNN) (CVPR 2015) http://www.cv-foundation.org/openaccess/content_cvpr_workshops_2015/W03/papers/Zuo_Convolutional_Recurrent_Neural_2015_CVPR_paper.pdf
  • 59. Convolutional RNN (C-RNN) (2015) “The C-RNN is trained in an end-to-end manner from raw pixel images. CNN layers are firstly processed to generate middle level features. RNN layer is then learned to encode spatial dependencies.” “In [13], MDLSTM was proposed to solve the handwriting recognition problem by using RNN. Different from this work, we utilize quad-directional 1D RNN instead of their 2D RNN, our RNN is simpler and it has fewer parameters, but it can already cover the context from all directions. Moreover, our C-RNN make both use of the discriminative representation power of CNN and contextual information modeling capability of RNN, which is more powerful for solving large scale image classification problem.” Funny, it’s not an LSTM. Just simple RNN.
  • 61. Convolutional RNN (C-RNN) (2015) “Our C-RNN had the same settings with Alex-net, except that it directly connects the output of the fifth convolutional layer to the sixth fully connected layer, while our C-RNN uses the RNN to connect the fifth convolutional layer and the fully connected layers”
  • 62. Example #3B: Convolutional Hierarchical RNN (C- HRNN) (2015) http://arxiv.org/abs/1509.03877
  • 63. Convolutional hierarchical RNN (C-HRNN) (2015) “In Hierarchical RNNs (HRNNs), each RNN layer focuses on modeling spatial dependencies among image regions from the same scale but different locations. While the cross RNN scale connections target on modeling scale dependencies among regions from the same location but different scales.” Finally with LSTM: “Specifically, we propose two recurrent neural network models: 1) hierarchical simple recurrent network (HSRN), which is fast and has low computational cost; and 2) hierarchical long-short term memory recurrent network (HLSTM), which performs better than HSRN with the price of more computational cost.”
  • 65. Convolutional hierarchical RNN (C-HRNN) (2015) “Thus, inspired by [22], we generate “2D sequences” for images, and each element simultaneously receives spatial contextual references from its 2D neighborhood elements.”
  • 67. Some more recent examples using new ideas
  • 68. Example #4: ReNet (2015) [Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio] http://arxiv.org/abs/1505.00393
  • 69. ReNet (2015) “Our model relies on purely uni-dimensional RNNs coupled in a novel way, rather than on a multi-dimensional RNN. The basic idea behind the proposed ReNet architecture is to replace each convolutional layer (with convolution+pooling making up a layer) in the CNN with four RNNs that sweep over lower-layer features in different directions: (1) bottom to top, (2) top to bottom, (3) left to right and (4) right to left.” “The main difference between ReNet and the model of Graves and Schmidhuber [2009] is that we use the usual sequence RNN, instead of the multidimensional RNN.“
  • 70. ReNet (2015) “One important consequence of the proposed approach compared to the multidimensional RNN is that the number of RNNs at each layer scales now linearly with respect to the number of dimensions d of the input image (2d). A multidimensional RNN, on the other hand, requires the exponential number of RNNs at each layer (2d ). Furthermore, the proposed variant is more easily parallelizable, as each RNN is dependent only along a horizontal or vertical sequence of patches. This architectural distinction results in our model being much more amenable to distributed computing than that of Graves and Schmidhuber [2009]”. … But for d=2 2d == 2d
  • 73. Example #5: “The Empire Strikes Back” PyraMiD-LSTM (2015) [Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber] http://arxiv.org/abs/1506.07452
  • 74. PyraMiD-LSTM (2015) “Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio- temporal context of each pixel in a few sweeps through all pixels, especially when the RNN is a Long Short-Term Memory (LSTM). Despite these theoretical advantages, however, unlike CNNs, previous MD-LSTM variants were hard to parallelize on GPUs. Here we re-arrange the traditional cuboid order of computations in MD-LSTM in pyramidal fashion. The resulting PyraMiD-LSTM is easy to parallelize, especially for 3D data such as stacks of brain slice images.”
  • 76. PyraMiD-LSTM (2015) “One of the striking differences between PyraMiD-LSTM and MD-LSTM is the shape of the scanned contexts. Each LSTM of an MD-LSTM scans rectangle- like contexts in 2D or cuboids in 3D. Each LSTM of a PyraMiD-LSTM scans triangles in 2D and pyramids in 3D. An MD-LSTM needs 8 LSTMs to scan a volume, while a PyraMiD-LSTM needs only 6, since it takes 8 cubes or 6 pyramids to fill a volume. Given dimension d, the number of LSTMs grows as 2d for an MD- LSTM (exponentially) and 2 × d for a PyraMiD-LSTM (linearly).”
  • 82. PyraMiD-LSTM (2015) On the MR brain dataset, training took around three days, and testing per volume took around 2 minutes. Networks contain three PyraMiD-LSTM layers: 1. 16 hidden units + fully-connected layer with 25 hidden units; 2. 32 hidden units + fully-connected layer with 45 hidden units; 3. 64 hidden units + fully-connected output layer whose size #classes. “Previous MD-LSTM implementations, however, could not exploit the parallelism of modern GPU hardware. This has changed through our work presented here. Although our novel highly parallel PyraMiD-LSTM has already achieved state-of- the-art segmentation results in challenging benchmarks, we feel we have only scratched the surface of what will become possible with such PyraMiD- LSTM and other MD-RNNs.”
  • 83. Example #6: Grid LSTM (ICLR 2016) (Graves again!) [Nal Kalchbrenner, Ivo Danihelka, Alex Graves] http://arxiv.org/abs/1507.01526
  • 84. Grid LSTM (2016) “This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images. The network differs from existing deep LSTM architectures in that the cells are connected between network layers as well as along the spatiotemporal dimensions of the data. The network provides a unified way of using LSTM for both deep and sequential computation.”
  • 85. Grid LSTM (2016) “Deep networks suffer from exactly the same problems as recurrent networks applied to long sequences: namely that information from past computations rapidly attenuates as it progresses through the chain – the vanishing gradient problem (Hochreiter, 1991) – and that each layer cannot dynamically select or ignore its inputs. It therefore seems attractive to generalise the advantages of LSTM to deep computation.” Can be N-dimensional. N-dimensional Grid LSTM is called N-LSTM for short.
  • 86. Grid LSTM (2016) One-dimensional Grid LSTM corresponds to a feed-forward network that uses LSTM cells in place of transfer functions such as tanh and ReLU. These networks are related to Highway Networks (Srivastava et al., 2015) where a gated transfer function is used to successfully train feed-forward networks with up to 900 layers of depth. Grid LSTM with two dimensions is analogous to the Stacked LSTM, but it adds cells along the depth dimension too. Grid LSTM with three or more dimensions is analogous to Multidimensional LSTM, but differs from it not just by having the cells along the depth dimension, but also by using the proposed mechanism for modulating the N-way interaction that is not prone to the instability present in Multidimesional LSTM.
  • 91. Grid LSTM (2016) The difference with the Multidimensional LSTM is that we apply multiple layers of depth to the image, use three-dimensional blocks and concatenate the top output vectors before classification. The difference with the ReNet architecture is that the 3-LSTM processes the image according to the two inherent spatial dimensions; instead of stacking hidden layers as in the ReNet, the block also modulates directly what information is passed along the depth dimension.
  • 92. Time for Discussion: RNN vs. CNN for Computer Vision
  • 94. Resources (The Classic) - Multi-Dimensional Recurrent Neural Networks, Alex Graves, Santiago Fernandez, Juergen Schmidhuber https://arxiv.org/abs/0705.2011 - Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks / NIPS 2009, Alex Graves, Juergen Schmidhuber http://papers.nips.cc/paper/3449-offline-handwriting-recognition-with- multidimensional-recurrent-neural-networks
  • 95. Resources (The Classic) - Supervised Sequence Labelling with Recurrent Neural Networks Alex Graves, Springer, 2012 http://www.springer.com/us/book/9783642247965 https://www.cs.toronto.edu/~graves/preprint.pdf - RNNLIB https://sourceforge.net/projects/rnnl/ - http://deeplearning.cs.cmu.edu/slides.2015/20.graves.pdf
  • 96. Resources (more recent) - Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015, Wonmin Byeon, Thomas M. Breuel, Federico Raue, Marcus Liwicki http://www.cv- foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf - Deep Convolutional and Recurrent Neural Network for Object Classification and Localization / ILSVRC 2015 http://image-net.org/challenges/posters/ILSVRC2015_Poster_VUNO.pdf - Convolutional Recurrent Neural Networks: Learning Spatial Dependencies for Image Representation / CVPR 2015 http://www.cv-foundation. org/openaccess/content_cvpr_workshops_2015/W03/papers/Zuo_Convolutional_Recurrent_Neural_2015_CVPR_paper.pdf
  • 97. Resources (more recent) - ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio http://arxiv.org/abs/1505.00393 - Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber http://arxiv.org/abs/1506.07452 - Grid Long Short-Term Memory Nal Kalchbrenner, Ivo Danihelka, Alex Graves http://arxiv.org/abs/1507.01526
  • 98. Resources (more recent) - OCRopus https://github.com/tmbdev/ocropy https://github.com/tmbdev/clstm The problem is MDRNNs are mostly unsupported. Not enough modern libraries available.
  • 99. More to come - Graph LSTM http://arxiv.org/pdf/1603.07063v1.pdf - Local-Global LSTM http://arxiv.org/pdf/1511.04510v1.pdf - …