SlideShare a Scribd company logo
Skip, residual and densely
connected RNN architectures
Frederic Godin - Ph.D. Researcher
Department of Electronics and Information Systems
IDLab
Fréderic Godin - Skip, residual and densely connected RNN architectures
Who is Fréderic?
Ph.D. Reseacher Deep Learning @ IDLab
Main interests:
̶ Sequence models
̶ Hybrid RNN/CNN models
Major application domain: Natural Language Processing
̶ Noisy data (E.g., Twitter data)
̶ Parsing tasks (E.g., Named Entity Recognition)
Minor application domain: Computer Vision
̶ Lung cancer detection (Kaggle competition 7th/1972)
(http://blog.kaggle.com/2017/05/16/data-science-bowl-2017-predicting-lung-cancer-solution-write-up-team-deep-breath/)
2
Fréderic Godin - Skip, residual and densely connected RNN architectures
Agenda
1. Recurrent neural networks
2. Skip, residual and dense connections
3. Dense connections in practice
3
Recurrent neural networks
4
Fréderic Godin - Skip, residual and densely connected RNN architectures
Recurrent neural networks
̶ Neural network with a cyclic connection
̶ Has memory
̶ Models variable-length sequences
5
Fréderic Godin - Skip, residual and densely connected RNN architectures 6
t=1 t=2 t=3 t=4
word1 word2 word3 word4E.g.:
Unfolded recurrent neural network
Fréderic Godin - Skip, residual and densely connected RNN architectures
Stacking recurrent neural networks
7
t=1 t=2 t=3 t=4
word1 word2 word3 word4
Deep in time
...Deep
in height
Fréderic Godin - Skip, residual and densely connected RNN architectures
Vanishing gradients
- When updating the weights using backpropagation, the
gradient tends to vanish with every neuron it crosses
- Often caused by the activation function
8
Fréderic Godin - Skip, residual and densely connected RNN architectures
Backpropagating through stacked RNNs
9
t=1 t=2 t=3 t=4
word1 word2 word3 word4
Backpropagation in time
...
Back-
propagation
in height
Fréderic Godin - Skip, residual and densely connected RNN architectures
Mitigating the vanishing gradient problem
In time: Long Short-Term Memory (LSTM)
10
In height:
̶ Many techniques exist in convolutional neural networks
̶ This talk: can we apply them in RNNs?
Key equation to model
depth in time
Skip, residual and dense
connections
11
Fréderic Godin - Skip, residual and densely connected RNN architectures
Skip connection
12
Layer 2
Merge 1,2
Out 1
A direct connection between 2
non-consecutive layers
- No vanishing gradient
- 2 main flavors
- Concatenative skip
connections
- Additive skip connections
Layer 3
Layer 1
Fréderic Godin - Skip, residual and densely connected RNN architectures
(Concatenative) skip connection
13
Concatenate output of previous
layer and skip connection
Advantage:
Provides the output of first layer
to third layer without altering it
Disadvantage:
Doubles the input size
Layer 2
Out 2
Out 1
Layer 3
Layer 1
Out 1
Fréderic Godin - Skip, residual and densely connected RNN architectures
Additive skip connection (Residual connection)
Originates from image
classification domain
Residual connection is defined as:
14
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
“Residue”
Out 1 + 2 Layer 2 Out 1
Fréderic Godin - Skip, residual and densely connected RNN architectures
Residual connections do not
make sense in RNNs
Layer 2 also depends on h(t-1)
15
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
Additive skip connection (Residual connection)
in RNN
Additive skip connection
Out 1 + 2 Layer 2 Out 1
h(t-1) ht
y
x
Fréderic Godin - Skip, residual and densely connected RNN architectures 16
Layer 2
Out 1 + 2
Out 1
Layer 3
Layer 1
Additive skip connection
Sum output of previous layer and
skip connection
Advantage:
Input size to next layer does not
increase
Disadvantage:
Can create noisy input to next layer
Fréderic Godin - Skip, residual and densely connected RNN architectures
Densely connecting layers
Add a skip connection between every
output and every input of every layer
Advantage:
- Direct paths between every layer
- Hierarchy of features as input to
every layer
Disadvantage: (L-1)*L connections
17
Layer 2
Out 2
Out 1
Layer 3
Layer 1
Out 1
Out 3
Layer 4
Out 2Out 1
Densely connected layers
in practice
18
Fréderic Godin - Skip, residual and densely connected RNN architectures
Language modeling
Building a model which captures statistical characteristics of
a language:
In practice: predicting next word in a sentence
19
Fréderic Godin - Skip, residual and densely connected RNN architectures
Example architecture
20
word2 word3 word4 word5
word1 word2 word3 word4
...
Classification layer
LSTM
LSTM
Embedding
layer
Fréderic Godin - Skip, residual and densely connected RNN architectures
Training details
21
Stochastic Gradient Descent with learning scheme
Uniform initialization [-0.05:0.05]
Dropout with probability 0.6
Fréderic Godin - Skip, residual and densely connected RNN architectures
Experimental results
22
Model Hidden states # Layers # Params Perplexity
Stacked LSTM
(Zaremba et al., 2014)
650 2 20M 82.7
1500 2 66M 78.4
Stacked LSTM
200 2 5M 100.9
200 3 5M 108.8
350 2 9M 87.9
Densely Connected LSTM
200 2 9M 80.4
200 3 11M 78.5
200 4 14M 76.9
Lower perplexity is better
Fréderic Godin - Skip, residual and densely connected RNN architectures
Character-to-word language modeling
23
word2 word3 word4 word5
word1 word2 word3 word4
...
Classification layer
LSTM
LSTM
Highway layer
ConvNet
Embedding layer
Fréderic Godin - Skip, residual and densely connected RNN architectures
Experimental results
24
Model Hidden states # Layers # Params Perplexity
Stacked LSTM
(Zaremba et al., 2014)
650 2 20M 82.7
1500 2 66M 78.4
CharCNN (Kim et al. 2016) 650 2 19M 78.9
Densely Connected LSTM
200 3 11M 78.5
200 4 14M 76.9
Densely Connected CharCNN* 200 4 20M 74.6
*Not published
Lower perplexity is better
Conclusion
25
Fréderic Godin - Skip, residual and densely connected RNN architectures
Conclusion
Densely connecting all layers improves language modeling
performance
Avoids vanishing gradients
Creates hierarchy of features, available
to each layer
We use six times fewer parameters to obtain the same result
as a stacked LSTM
26
Fréderic Godin - Skip, residual and densely connected RNN architectures
Q&A
Also more details in our publication:
Fréderic Godin, Joni Dambre & Wesley De Neve
“Improving Language Modeling using Densely Connected
Recurrent Neural Networks”
https://arxiv.org/abs/1707.06130
27
Fréderic Godin
Ph.D. Researcher Deep Learning
IDLab
E frederic.godin@ugent.be
@frederic_godin
www.fredericgodin.com
idlab.technology / idlab.ugent.be

More Related Content

What's hot

Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TF
WonjoongCheon
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Mobilenet
MobilenetMobilenet
Mobilenet
harmonylab
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
Dongmin Choi
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
Shunta Saito
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
07 regularization
07 regularization07 regularization
07 regularization
Ronald Teo
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Sungjoon Choi
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
leopauly
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Rakuten Group, Inc.
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
Hasan H Topcu
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
Sushant Gautam
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
Owin Will
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
taeseon ryu
 

What's hot (20)

Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TF
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Mobilenet
MobilenetMobilenet
Mobilenet
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
07 regularization
07 regularization07 regularization
07 regularization
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
 

Similar to Skip, residual and densely connected RNN architectures

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
Grigory Sapunov
 
ICRA Nathan Piasco
ICRA Nathan PiascoICRA Nathan Piasco
ICRA Nathan Piasco
Nathan Piasco
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
Jure Leskovec
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image Description
Dmytro Mishkin
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networks
fgodin
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
YanhuaSi
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Universitat Politècnica de Catalunya
 
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESREPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
Ramnandan Krishnamurthy
 
Resnet.pdf
Resnet.pdfResnet.pdf
Resnet.pdf
YanhuaSi
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
thanhdowork
 
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Benyamin Moadab
 
Human parsing
Human parsingHuman parsing
Human parsing
ssuserb1420b
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
Karen Pao
 
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
fgodin
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
Grigory Sapunov
 

Similar to Skip, residual and densely connected RNN architectures (20)

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
06svenss
06svenss06svenss
06svenss
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
ICRA Nathan Piasco
ICRA Nathan PiascoICRA Nathan Piasco
ICRA Nathan Piasco
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image Description
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networks
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMESREPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
 
Resnet.pdf
Resnet.pdfResnet.pdf
Resnet.pdf
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...
 
Human parsing
Human parsingHuman parsing
Human parsing
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Skip, residual and densely connected RNN architectures

  • 1. Skip, residual and densely connected RNN architectures Frederic Godin - Ph.D. Researcher Department of Electronics and Information Systems IDLab
  • 2. Fréderic Godin - Skip, residual and densely connected RNN architectures Who is Fréderic? Ph.D. Reseacher Deep Learning @ IDLab Main interests: ̶ Sequence models ̶ Hybrid RNN/CNN models Major application domain: Natural Language Processing ̶ Noisy data (E.g., Twitter data) ̶ Parsing tasks (E.g., Named Entity Recognition) Minor application domain: Computer Vision ̶ Lung cancer detection (Kaggle competition 7th/1972) (http://blog.kaggle.com/2017/05/16/data-science-bowl-2017-predicting-lung-cancer-solution-write-up-team-deep-breath/) 2
  • 3. Fréderic Godin - Skip, residual and densely connected RNN architectures Agenda 1. Recurrent neural networks 2. Skip, residual and dense connections 3. Dense connections in practice 3
  • 5. Fréderic Godin - Skip, residual and densely connected RNN architectures Recurrent neural networks ̶ Neural network with a cyclic connection ̶ Has memory ̶ Models variable-length sequences 5
  • 6. Fréderic Godin - Skip, residual and densely connected RNN architectures 6 t=1 t=2 t=3 t=4 word1 word2 word3 word4E.g.: Unfolded recurrent neural network
  • 7. Fréderic Godin - Skip, residual and densely connected RNN architectures Stacking recurrent neural networks 7 t=1 t=2 t=3 t=4 word1 word2 word3 word4 Deep in time ...Deep in height
  • 8. Fréderic Godin - Skip, residual and densely connected RNN architectures Vanishing gradients - When updating the weights using backpropagation, the gradient tends to vanish with every neuron it crosses - Often caused by the activation function 8
  • 9. Fréderic Godin - Skip, residual and densely connected RNN architectures Backpropagating through stacked RNNs 9 t=1 t=2 t=3 t=4 word1 word2 word3 word4 Backpropagation in time ... Back- propagation in height
  • 10. Fréderic Godin - Skip, residual and densely connected RNN architectures Mitigating the vanishing gradient problem In time: Long Short-Term Memory (LSTM) 10 In height: ̶ Many techniques exist in convolutional neural networks ̶ This talk: can we apply them in RNNs? Key equation to model depth in time
  • 11. Skip, residual and dense connections 11
  • 12. Fréderic Godin - Skip, residual and densely connected RNN architectures Skip connection 12 Layer 2 Merge 1,2 Out 1 A direct connection between 2 non-consecutive layers - No vanishing gradient - 2 main flavors - Concatenative skip connections - Additive skip connections Layer 3 Layer 1
  • 13. Fréderic Godin - Skip, residual and densely connected RNN architectures (Concatenative) skip connection 13 Concatenate output of previous layer and skip connection Advantage: Provides the output of first layer to third layer without altering it Disadvantage: Doubles the input size Layer 2 Out 2 Out 1 Layer 3 Layer 1 Out 1
  • 14. Fréderic Godin - Skip, residual and densely connected RNN architectures Additive skip connection (Residual connection) Originates from image classification domain Residual connection is defined as: 14 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 “Residue” Out 1 + 2 Layer 2 Out 1
  • 15. Fréderic Godin - Skip, residual and densely connected RNN architectures Residual connections do not make sense in RNNs Layer 2 also depends on h(t-1) 15 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 Additive skip connection (Residual connection) in RNN Additive skip connection Out 1 + 2 Layer 2 Out 1 h(t-1) ht y x
  • 16. Fréderic Godin - Skip, residual and densely connected RNN architectures 16 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 Additive skip connection Sum output of previous layer and skip connection Advantage: Input size to next layer does not increase Disadvantage: Can create noisy input to next layer
  • 17. Fréderic Godin - Skip, residual and densely connected RNN architectures Densely connecting layers Add a skip connection between every output and every input of every layer Advantage: - Direct paths between every layer - Hierarchy of features as input to every layer Disadvantage: (L-1)*L connections 17 Layer 2 Out 2 Out 1 Layer 3 Layer 1 Out 1 Out 3 Layer 4 Out 2Out 1
  • 19. Fréderic Godin - Skip, residual and densely connected RNN architectures Language modeling Building a model which captures statistical characteristics of a language: In practice: predicting next word in a sentence 19
  • 20. Fréderic Godin - Skip, residual and densely connected RNN architectures Example architecture 20 word2 word3 word4 word5 word1 word2 word3 word4 ... Classification layer LSTM LSTM Embedding layer
  • 21. Fréderic Godin - Skip, residual and densely connected RNN architectures Training details 21 Stochastic Gradient Descent with learning scheme Uniform initialization [-0.05:0.05] Dropout with probability 0.6
  • 22. Fréderic Godin - Skip, residual and densely connected RNN architectures Experimental results 22 Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 Stacked LSTM 200 2 5M 100.9 200 3 5M 108.8 350 2 9M 87.9 Densely Connected LSTM 200 2 9M 80.4 200 3 11M 78.5 200 4 14M 76.9 Lower perplexity is better
  • 23. Fréderic Godin - Skip, residual and densely connected RNN architectures Character-to-word language modeling 23 word2 word3 word4 word5 word1 word2 word3 word4 ... Classification layer LSTM LSTM Highway layer ConvNet Embedding layer
  • 24. Fréderic Godin - Skip, residual and densely connected RNN architectures Experimental results 24 Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 CharCNN (Kim et al. 2016) 650 2 19M 78.9 Densely Connected LSTM 200 3 11M 78.5 200 4 14M 76.9 Densely Connected CharCNN* 200 4 20M 74.6 *Not published Lower perplexity is better
  • 26. Fréderic Godin - Skip, residual and densely connected RNN architectures Conclusion Densely connecting all layers improves language modeling performance Avoids vanishing gradients Creates hierarchy of features, available to each layer We use six times fewer parameters to obtain the same result as a stacked LSTM 26
  • 27. Fréderic Godin - Skip, residual and densely connected RNN architectures Q&A Also more details in our publication: Fréderic Godin, Joni Dambre & Wesley De Neve “Improving Language Modeling using Densely Connected Recurrent Neural Networks” https://arxiv.org/abs/1707.06130 27
  • 28. Fréderic Godin Ph.D. Researcher Deep Learning IDLab E frederic.godin@ugent.be @frederic_godin www.fredericgodin.com idlab.technology / idlab.ugent.be