SlideShare a Scribd company logo
Natural Language Processing Labs. By Daanv
2019.11.18
Can Recurrent Neural
Networks Warp Time?
Natural Language Processing Labs. By Daanv
Abstract
Gating Mechanisms
> LSTM (Long Short-Term Memories)
> GRUs (Gated Recurrent Units)
: to improve the learning of medium to long term temporal dependencies
to help with vanishing gradient issues
Proposal
> To prove the learnable gates in a recurrent model formally provide quasi-invariance
to general time transformations in the input data
Result
> Lead to a new way of initializing gate biases in LSTMs and GRUs (Chrono initialization)
> Improve learning of long term dependencies, with minimal implementation effort.
Natural Language Processing Labs. By Daanv
Introduce
RNN(Recurrent Neural Network)
> Vanishing gradient
> Long term dependency
Natural Language Processing Labs. By Daanv
Introduce
LSTM (Long Short-Term Memory) GRU (Gated Recurrent Units)
Natural Language Processing Labs. By Daanv
Introduce
[Sections1]
- The invariance to time transformations in the data
leads to a gate mechanism in recurrent models
- Gate values appear as time contraction or time dilation coefficients
(similar in spirit to the notion of time constant)
[Section2]
- How to initialize gate biases depending on the range of time dependencies
- (Gers & Schmidhuber, 2000) Setting the bias of the forget gate of LSTMs to 1 or 2
[Section3]
- Test the empirical benefits of the new initialization (synthetic and real world data)
Natural Language Processing Labs. By Daanv
From Time Warping Invariance to Gating
Sequential learning problem > being resilient to a change in time scale is crucial
RNN: non-resilient to time rescaling
> That is, a function class that can be represented by RNN does not affect time rescaling
- Input data: x(t) -> time-warped input data: x(c(t)) (the time warping c(t) is not overly complex)
- The change of time c : time rescaling & accelerations or decelerations
of the phenomena in the input data
Invariant to time warping
: The class has another(or the same) model that works on x(c(t)) in the same way that the model
works on x(t)
> Gating mechanism
Natural Language Processing Labs. By Daanv
From Time Warping Invariance to Gating
Invariance to time rescaling
> Linear transformation of time (a > 0) [ex) 10 step -> a = 0.1]
(Continuous time setting is easy to time transformation)
> RNN hidden state: ht
discrete time equation]
continuous time equation]
- t : c(t) = at
- x(t) : x(at)
- h(t) : h(at)
Natural Language Processing Labs. By Daanv
From Time Warping Invariance to Gating
Invariance to time rescaling
back translation to discrete time model: leaky RNN (a>0)
: The class has another(or the same) model that works on x(c(t)) in the same way that the model
works on x(t) > leaky RNN
Natural Language Processing Labs. By Daanv
From Time Warping Invariance to Gating
Invariance to time warpings
Time warping invariance has to introduce learnable function g.
> quasi-invariant to time warpings
> Input gate: gt
> Forget gate: (1-gt)
> Gated RNN
Natural Language Processing Labs. By Daanv
Time Warpings and Gate Initialization
If the sequential data have temporal dependencies in an approximate range [T_min, T_max].
> Use a model with memory(forgetting time)
This amounts to having values of g in the range [1/T_max, 1/T_min].
If the values of both inputs and hidden layers are centered over time,
g(t) will typically take values centered around 𝜎 𝑏𝑔 .
Value of 𝜎 𝑏𝑔 in the desired range [1/T_max, 1/T_min] are obtained by choosing the biases 𝑏𝑔
between –log(T_max – 1) ~ -log(T_min – 1)
> To control the order of magnitude of the memory range of the neural networks
> Initialize the biases of g as (u: the uniform distribution)
− log(𝒰([𝑇min, 𝑇max]) − 1)
Natural Language Processing Labs. By Daanv
Experiments
> Test by Introducing Random time warpings in some data,
comparing the robustness of gated and ungated architectures
> Chrono LSTM initialization vs Standard initialization
in Synthetic task and real task
- Synthetic task : chrono LSTM > standard LSTM
- Real task : chrono LSTM >= standard LSTM
> Addition Test
- Text 8 (Mahoney, 2011): Next character prediction
- Penn Treebank (Mikolob et al., 2012): Next word prediction
Natural Language Processing Labs. By Daanv
Experiments
Pure warpings and paddings
> Uniform warping vs Variable warping
The unwarped task (without time warping or padding)
- Remember the previous character in any character sequence (too easy)
The only difficulty will come from warping
> we test the robustness of various architectures for time warping
More difficult task uses padded sequences
> Each element in the input sequence with a fixed or variable number of 0’s
Natural Language Processing Labs. By Daanv
Experiments
Natural Language Processing Labs. By Daanv
Experiments
Compare three recurrent architectures: RNN & leaky RNN & gated RNN
(all networks contain 64 recurrent units)
- RNN(ungated recurrent network)
- Leaky RNN(each unit has a constant learnable “gate” between 0 and 1)
- Gated RNN(with one gate per unit)
Natural Language Processing Labs. By Daanv
Experiments
Uniform time warping Uniform padding
Variable time warping Variable padding
Natural Language Processing Labs. By Daanv
Experiments
> Copy task
: check whether a model is able to remember information for arbitrarily long durations.
(Hochreiter & Schmidhuber, 1997)
- For a given T, input sequences consist of T+20 characters.
- The first 10 characters are drawn uniformly randomly from the first 8 letters.
- Last 10 characters are dummy characters
[T-1 dummy, letter1, letter2, letter3, letter4, letter5, letter6, letter7, letter8, dummy]
- Target sequences consists of T+10 dummy characters
Predict at random from among possible characters > loss: (Arjovsky et al., 2016)
LSTM (128 hidden units)
> Standard initialization(baseline): forget biases = 1
> Chrono initialization: Tmax = T
Natural Language Processing Labs. By Daanv
Experiments
Chrono initialization(red) vs Standard initialization(blue)
Standard initialization
> forget gate biases = 1
New initialization
> forget gate and input gate biases are chosen
according to the chrono initialization
Natural Language Processing Labs. By Daanv
Experiments
> Adding task
- training: Input sequences of length T
First sequence of numbers drawn from u([0,1]) & Second sequence containing zeros everywhere
- Target: the sum of the numbers contained in the first sequence
at the positions marked in the second sequence
Predict the mean of 2 x u([0,1]) > MSE: 0.167 (Arjovsky et al., 2016)
LSTM (128 hidden units)
> Standard initialization(baseline): forget biases = 1
> Chrono initialization: Tmax = T
Natural Language Processing Labs. By Daanv
Experiments
Natural Language Processing Labs. By Daanv
Conclusion
> Gated connections appear to adjust the local time constant in the recurrent model.
> Chrono initialization is introduced, which is the basic way of initializing gate bias in LSTM.
> Chrono initialization has been shown to provide notable benefits
when faced with long-term dependencies.
Natural Language Processing Labs. By Daanv
THANK YOU.

More Related Content

What's hot

DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
Amr E. Mohamed
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
Muhammad Muzammal
 
Daa notes 3
Daa notes 3Daa notes 3
Daa notes 3
smruti sarangi
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
odsc
 
NP completeness
NP completenessNP completeness
NP completeness
Amrinder Arora
 
Algorithm_NP-Completeness Proof
Algorithm_NP-Completeness ProofAlgorithm_NP-Completeness Proof
Algorithm_NP-Completeness Proof
Im Rafid
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05
Rediet Moges
 
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
Hector Zenil
 
Beamerpresentation
BeamerpresentationBeamerpresentation
Beamerpresentation
Yogesh Choudhary
 
Fractal dimension versus Computational Complexity
Fractal dimension versus Computational ComplexityFractal dimension versus Computational Complexity
Fractal dimension versus Computational Complexity
Hector Zenil
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Universitat Politècnica de Catalunya
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applicationsAgam Goel
 
Complexity Analysis of Recursive Function
Complexity Analysis of Recursive FunctionComplexity Analysis of Recursive Function
Complexity Analysis of Recursive Function
Meghaj Mallick
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
NagendraK18
 
Machine Learning for Trading
Machine Learning for TradingMachine Learning for Trading
Machine Learning for Trading
Larry Guo
 
Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06
Rediet Moges
 
Towards a stable definition of Algorithmic Randomness
Towards a stable definition of Algorithmic RandomnessTowards a stable definition of Algorithmic Randomness
Towards a stable definition of Algorithmic Randomness
Hector Zenil
 
Chap10 slides
Chap10 slidesChap10 slides
Chap10 slides
BaliThorat1
 

What's hot (19)

DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
 
Daa notes 3
Daa notes 3Daa notes 3
Daa notes 3
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
NP completeness
NP completenessNP completeness
NP completeness
 
Algorithm_NP-Completeness Proof
Algorithm_NP-Completeness ProofAlgorithm_NP-Completeness Proof
Algorithm_NP-Completeness Proof
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05
 
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
 
Beamerpresentation
BeamerpresentationBeamerpresentation
Beamerpresentation
 
Fractal dimension versus Computational Complexity
Fractal dimension versus Computational ComplexityFractal dimension versus Computational Complexity
Fractal dimension versus Computational Complexity
 
Np complete
Np completeNp complete
Np complete
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applications
 
Complexity Analysis of Recursive Function
Complexity Analysis of Recursive FunctionComplexity Analysis of Recursive Function
Complexity Analysis of Recursive Function
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
 
Machine Learning for Trading
Machine Learning for TradingMachine Learning for Trading
Machine Learning for Trading
 
Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06
 
Towards a stable definition of Algorithmic Randomness
Towards a stable definition of Algorithmic RandomnessTowards a stable definition of Algorithmic Randomness
Towards a stable definition of Algorithmic Randomness
 
Chap10 slides
Chap10 slidesChap10 slides
Chap10 slides
 

Similar to Can recurrent neural networks warp time

Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
telss09
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
Editor Jacotech
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
Dongang (Sean) Wang
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
Abner Huang
 
Sample presentation slides
Sample presentation slidesSample presentation slides
Sample presentation slides
vahid baghi
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptx
HamedNassar5
 
Role of Tensors in Machine Learning
Role of Tensors in Machine LearningRole of Tensors in Machine Learning
Role of Tensors in Machine Learning
Anima Anandkumar
 
Model based similarity measure in time cloud
Model based similarity measure in time cloudModel based similarity measure in time cloud
Model based similarity measure in time cloud
PlanetData Network of Excellence
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filterszukun
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
HamzaJaved306957
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf
Mohamedshabana38
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
showkat27
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
Chiheb Ben Hammouda
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
Pei-Che Chang
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
Steve Nouri
 
Semet Gecco06
Semet Gecco06Semet Gecco06
Semet Gecco06ysemet
 

Similar to Can recurrent neural networks warp time (20)

Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
 
PPoPP15
PPoPP15PPoPP15
PPoPP15
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Sample presentation slides
Sample presentation slidesSample presentation slides
Sample presentation slides
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptx
 
Role of Tensors in Machine Learning
Role of Tensors in Machine LearningRole of Tensors in Machine Learning
Role of Tensors in Machine Learning
 
Model based similarity measure in time cloud
Model based similarity measure in time cloudModel based similarity measure in time cloud
Model based similarity measure in time cloud
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filters
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
 
Report
ReportReport
Report
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
Semet Gecco06
Semet Gecco06Semet Gecco06
Semet Gecco06
 

More from Danbi Cho

Crf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkCrf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic network
Danbi Cho
 
Gpt models
Gpt modelsGpt models
Gpt models
Danbi Cho
 
Attention boosted deep networks for video classification
Attention boosted deep networks for video classificationAttention boosted deep networks for video classification
Attention boosted deep networks for video classification
Danbi Cho
 
A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...
Danbi Cho
 
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
Danbi Cho
 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in text
Danbi Cho
 
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Danbi Cho
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
Danbi Cho
 
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsMan is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Danbi Cho
 
Situation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingSituation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understanding
Danbi Cho
 
Mitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningMitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learning
Danbi Cho
 

More from Danbi Cho (11)

Crf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkCrf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic network
 
Gpt models
Gpt modelsGpt models
Gpt models
 
Attention boosted deep networks for video classification
Attention boosted deep networks for video classificationAttention boosted deep networks for video classification
Attention boosted deep networks for video classification
 
A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...
 
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in text
 
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsMan is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddings
 
Situation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingSituation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understanding
 
Mitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningMitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learning
 

Recently uploaded

Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 

Recently uploaded (20)

Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 

Can recurrent neural networks warp time

  • 1. Natural Language Processing Labs. By Daanv 2019.11.18 Can Recurrent Neural Networks Warp Time?
  • 2. Natural Language Processing Labs. By Daanv Abstract Gating Mechanisms > LSTM (Long Short-Term Memories) > GRUs (Gated Recurrent Units) : to improve the learning of medium to long term temporal dependencies to help with vanishing gradient issues Proposal > To prove the learnable gates in a recurrent model formally provide quasi-invariance to general time transformations in the input data Result > Lead to a new way of initializing gate biases in LSTMs and GRUs (Chrono initialization) > Improve learning of long term dependencies, with minimal implementation effort.
  • 3. Natural Language Processing Labs. By Daanv Introduce RNN(Recurrent Neural Network) > Vanishing gradient > Long term dependency
  • 4. Natural Language Processing Labs. By Daanv Introduce LSTM (Long Short-Term Memory) GRU (Gated Recurrent Units)
  • 5. Natural Language Processing Labs. By Daanv Introduce [Sections1] - The invariance to time transformations in the data leads to a gate mechanism in recurrent models - Gate values appear as time contraction or time dilation coefficients (similar in spirit to the notion of time constant) [Section2] - How to initialize gate biases depending on the range of time dependencies - (Gers & Schmidhuber, 2000) Setting the bias of the forget gate of LSTMs to 1 or 2 [Section3] - Test the empirical benefits of the new initialization (synthetic and real world data)
  • 6. Natural Language Processing Labs. By Daanv From Time Warping Invariance to Gating Sequential learning problem > being resilient to a change in time scale is crucial RNN: non-resilient to time rescaling > That is, a function class that can be represented by RNN does not affect time rescaling - Input data: x(t) -> time-warped input data: x(c(t)) (the time warping c(t) is not overly complex) - The change of time c : time rescaling & accelerations or decelerations of the phenomena in the input data Invariant to time warping : The class has another(or the same) model that works on x(c(t)) in the same way that the model works on x(t) > Gating mechanism
  • 7. Natural Language Processing Labs. By Daanv From Time Warping Invariance to Gating Invariance to time rescaling > Linear transformation of time (a > 0) [ex) 10 step -> a = 0.1] (Continuous time setting is easy to time transformation) > RNN hidden state: ht discrete time equation] continuous time equation] - t : c(t) = at - x(t) : x(at) - h(t) : h(at)
  • 8. Natural Language Processing Labs. By Daanv From Time Warping Invariance to Gating Invariance to time rescaling back translation to discrete time model: leaky RNN (a>0) : The class has another(or the same) model that works on x(c(t)) in the same way that the model works on x(t) > leaky RNN
  • 9. Natural Language Processing Labs. By Daanv From Time Warping Invariance to Gating Invariance to time warpings Time warping invariance has to introduce learnable function g. > quasi-invariant to time warpings > Input gate: gt > Forget gate: (1-gt) > Gated RNN
  • 10. Natural Language Processing Labs. By Daanv Time Warpings and Gate Initialization If the sequential data have temporal dependencies in an approximate range [T_min, T_max]. > Use a model with memory(forgetting time) This amounts to having values of g in the range [1/T_max, 1/T_min]. If the values of both inputs and hidden layers are centered over time, g(t) will typically take values centered around 𝜎 𝑏𝑔 . Value of 𝜎 𝑏𝑔 in the desired range [1/T_max, 1/T_min] are obtained by choosing the biases 𝑏𝑔 between –log(T_max – 1) ~ -log(T_min – 1) > To control the order of magnitude of the memory range of the neural networks > Initialize the biases of g as (u: the uniform distribution) − log(𝒰([𝑇min, 𝑇max]) − 1)
  • 11. Natural Language Processing Labs. By Daanv Experiments > Test by Introducing Random time warpings in some data, comparing the robustness of gated and ungated architectures > Chrono LSTM initialization vs Standard initialization in Synthetic task and real task - Synthetic task : chrono LSTM > standard LSTM - Real task : chrono LSTM >= standard LSTM > Addition Test - Text 8 (Mahoney, 2011): Next character prediction - Penn Treebank (Mikolob et al., 2012): Next word prediction
  • 12. Natural Language Processing Labs. By Daanv Experiments Pure warpings and paddings > Uniform warping vs Variable warping The unwarped task (without time warping or padding) - Remember the previous character in any character sequence (too easy) The only difficulty will come from warping > we test the robustness of various architectures for time warping More difficult task uses padded sequences > Each element in the input sequence with a fixed or variable number of 0’s
  • 13. Natural Language Processing Labs. By Daanv Experiments
  • 14. Natural Language Processing Labs. By Daanv Experiments Compare three recurrent architectures: RNN & leaky RNN & gated RNN (all networks contain 64 recurrent units) - RNN(ungated recurrent network) - Leaky RNN(each unit has a constant learnable “gate” between 0 and 1) - Gated RNN(with one gate per unit)
  • 15. Natural Language Processing Labs. By Daanv Experiments Uniform time warping Uniform padding Variable time warping Variable padding
  • 16. Natural Language Processing Labs. By Daanv Experiments > Copy task : check whether a model is able to remember information for arbitrarily long durations. (Hochreiter & Schmidhuber, 1997) - For a given T, input sequences consist of T+20 characters. - The first 10 characters are drawn uniformly randomly from the first 8 letters. - Last 10 characters are dummy characters [T-1 dummy, letter1, letter2, letter3, letter4, letter5, letter6, letter7, letter8, dummy] - Target sequences consists of T+10 dummy characters Predict at random from among possible characters > loss: (Arjovsky et al., 2016) LSTM (128 hidden units) > Standard initialization(baseline): forget biases = 1 > Chrono initialization: Tmax = T
  • 17. Natural Language Processing Labs. By Daanv Experiments Chrono initialization(red) vs Standard initialization(blue) Standard initialization > forget gate biases = 1 New initialization > forget gate and input gate biases are chosen according to the chrono initialization
  • 18. Natural Language Processing Labs. By Daanv Experiments > Adding task - training: Input sequences of length T First sequence of numbers drawn from u([0,1]) & Second sequence containing zeros everywhere - Target: the sum of the numbers contained in the first sequence at the positions marked in the second sequence Predict the mean of 2 x u([0,1]) > MSE: 0.167 (Arjovsky et al., 2016) LSTM (128 hidden units) > Standard initialization(baseline): forget biases = 1 > Chrono initialization: Tmax = T
  • 19. Natural Language Processing Labs. By Daanv Experiments
  • 20. Natural Language Processing Labs. By Daanv Conclusion > Gated connections appear to adjust the local time constant in the recurrent model. > Chrono initialization is introduced, which is the basic way of initializing gate bias in LSTM. > Chrono initialization has been shown to provide notable benefits when faced with long-term dependencies.
  • 21. Natural Language Processing Labs. By Daanv THANK YOU.