This document discusses recurrent neural networks (RNNs) and some of their applications and design patterns. RNNs are able to process sequential data like text or time series due to their ability to maintain an internal state that captures information about what has been observed in the past. The key challenges with training RNNs are vanishing and exploding gradients, which various techniques like LSTMs and GRUs aim to address. RNNs have been successfully applied to tasks involving sequential input and/or output like machine translation, image captioning, and language modeling. Memory networks extend RNNs with an external memory component that can be explicitly written to and retrieved from.
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
This Material is an in_depth study report of Recurrent Neural Network (RNN)
Material mainly from Deep Learning Book Bible, http://www.deeplearningbook.org/
Topics: Briefing, Theory Proof, Variation, Gated RNNN Intuition. Real World Application
Application (CNN+RNN on SVHN)
Also a video (In Chinese)
https://www.youtube.com/watch?v=p6xzPqRd46w
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
This Material is an in_depth study report of Recurrent Neural Network (RNN)
Material mainly from Deep Learning Book Bible, http://www.deeplearningbook.org/
Topics: Briefing, Theory Proof, Variation, Gated RNNN Intuition. Real World Application
Application (CNN+RNN on SVHN)
Also a video (In Chinese)
https://www.youtube.com/watch?v=p6xzPqRd46w
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Basics of RNNs and its applications with following papers:
- Generating Sequences With Recurrent Neural Networks, 2013
- Show and Tell: A Neural Image Caption Generator, 2014
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2015
- DenseCap: Fully Convolutional Localization Networks for Dense Captioning, 2015
- Deep Tracking- Seeing Beyond Seeing Using Recurrent Neural Networks, 2016
- Robust Modeling and Prediction in Dynamic Environments Using Recurrent Flow Networks, 2016
- Social LSTM- Human Trajectory Prediction in Crowded Spaces, 2016
- DESIRE- Distant Future Prediction in Dynamic Scenes with Interacting Agents, 2017
- Predictive State Recurrent Neural Networks, 2017
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Recurrent Neural Networks are popular Deep Learning models that have shown great promise to achieve state-of-the-art results in many tasks like Computer Vision, NLP, Finance and much more. Although being models proposed several years ago, RNN have gained popularity recently. In this talk, we will review how these models evolved over the years, dissection of RNN, current applications and its future.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Basics of RNNs and its applications with following papers:
- Generating Sequences With Recurrent Neural Networks, 2013
- Show and Tell: A Neural Image Caption Generator, 2014
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2015
- DenseCap: Fully Convolutional Localization Networks for Dense Captioning, 2015
- Deep Tracking- Seeing Beyond Seeing Using Recurrent Neural Networks, 2016
- Robust Modeling and Prediction in Dynamic Environments Using Recurrent Flow Networks, 2016
- Social LSTM- Human Trajectory Prediction in Crowded Spaces, 2016
- DESIRE- Distant Future Prediction in Dynamic Scenes with Interacting Agents, 2017
- Predictive State Recurrent Neural Networks, 2017
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Recurrent Neural Networks are popular Deep Learning models that have shown great promise to achieve state-of-the-art results in many tasks like Computer Vision, NLP, Finance and much more. Although being models proposed several years ago, RNN have gained popularity recently. In this talk, we will review how these models evolved over the years, dissection of RNN, current applications and its future.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
"Mainstream access to deep learning technology will greatly impact most industries over the next three to five years."
So what exactly is deep learning? How does it work? And most importantly, why should you even care?
Deep learning is used in the research community and in industry to help solve many big data problems such as computer vision, speech recognition, and natural language processing.
Practical examples include:
-Vehicle, pedestrian and landmark identification for driver assistance
-Image recognition
-Speech recognition and translation
-Natural language processing
-Life sciences
-What You Will Learn
-Understand the intuition behind Artificial Neural Networks
-Apply Artificial Neural Networks in practice
-Understand the intuition behind Convolutional Neural Networks
-Apply Convolutional Neural Networks in practice
-Understand the intuition behind Recurrent Neural Networks
-Apply Recurrent Neural Networks in practice
-Understand the intuition behind Self-Organizing Maps
-Apply Self-Organizing Maps in practice
-Understand the intuition behind Boltzmann Machines
-Apply Boltzmann Machines in practice
-Understand the intuition behind AutoEncoders
-Apply AutoEncoders in practice
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
The slides includes an introduction to Long Short-term Memory (LSTM ) >> A novel approach in dealing with vanishing gradients in deep neural networks. Made for students, and anyone out there who'd love to learn about recurrent artificial neural networks, specifically of the LSTMs architecture.
Reference material has been attached to further your reading.
Language translation with Deep Learning (RNN) with TensorFlowS N
The author is going to take you into the realm of Recurrent Neural Network (RNN). He will be training a sequence to sequence model on a dataset of English and French sentences that can translate new (unseen) sentences from English to French.
This will be a walkthrough of an end to end technique to train a Deep RNN model. You will learn to build various components necessary to build a Sequence-to-Sequence model.
You will learn about the fundamentals of Deep Learning, mainly RNN, concepts that will be required in this solution. A familiarity of Deep Learning concepts would be handy, but most of the concepts used in this example will be covered during the demo.
Technologies to be used:
Python, Jupyter, TensorFlow, FloydHub
Source code: https://github.com/syednasar/deeplearning/blob/master/language-translation/dlnd_language_translation.ipynb
...
A Threshold Logic Unit (TLU) is a mathematical function conceived as a crude model, or abstraction of biological neurons. Threshold logic units are the constitutive units in an artificial neural network. In this paper a positive clock-edge triggered T flip-flop is designed using Perceptron Learning Algorithm, which is a basic design algorithm of threshold logic units. Then this T flip-flop is used to design a two-bit up-counter that goes through the states 0, 1, 2, 3, 0, 1… Ultimately, the goal is to show how to design simple logic units based on threshold logic based perceptron concepts.
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Fordham University
In recent years, the field of artificial intelligence (AI) has witnessed remarkable advancements, particularly in the domain of Generative models. Generative AI, a subset of machine learning, focuses on developing systems that can create novel and realistic content, ranging from text, speech, images to the multimodal content. This burgeoning field has demonstrated unprecedented potential to revolutionize various industries, making it imperative to introduce dedicated study materials on the foundation of Generative AI. With the increasing integration of Generative AI in various industries, professionals with expertise in this field are in high demand, and thus we believe that the publication of the slides are extremely important to meet the current need. The proposed outline aims to equip students with the knowledge and skills required to harness the creative power of AI and navigate the ethical implications associated with Generative technologies. * Materials used in this PPT were collected from Wikipedia, Google Image, and OpenAI GPT. No copyright is claimed by the author.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
4. Computation Graphs
● Different sources use different systems.
This is the system the book uses.
● Nodes are variables
● Connections are functions
● A variable is computed using all of the
connections pointing towards it
● Can compute derivatives by applying
chain rule, working backwards through
the graph.
NOT ALL GRAPHS FOLLOW THESE RULES!
>:(
L
y
h
x
yt
W
5. RNN Computation Graphs - Unfolding
Output
Loss func.
Truth
Hidden L.
Input
Square Connects to next timestep
x(0) x(1) x(2)x(t)
Folded Unfolded
yt
(t)
L L L L
y(t)
h h h h
yt
(1) yt
(2) yt
(3)
y(t) y(t) y(t)
6. Common Design Patterns
Standard,
output every step
Output is the only
recurrence information Output computed at the end
x x
y y
y
yt
yt
yt
L L
L
h h
h h h h
x(n)x(t)x(2)x(1)
7. Training
● Compute forward in time, work backwards for gradient.
○ Exploding/vanishing gradient problem
● Teacher forcing:
○ Pipe true outputs into the hidden layer instead of model outputs
y(t) y(t+1)
x(t) x(t+1)
h h
yt
(t) yt
(t+1)
y(t) y(t+1)
x(t) x(t+1)
h h
L L
Training Time Test Time
8. Recursive Neural Nets
● Map a sequence to a tree, and reduce
the tree one layer at a time until you
reach a single point, your output
● Many choices of how to arrange the
tree.
x(1) x(2) x(3) x(4)
y yt
L
U W U W
U W
9. Deep Recurrent Neural Nets
Can add depth to any of the stages mentioned:
Multiple Recurrent Layers
Extra input, output, and
hidden layer processing
+Direct hidden layer
yt
yt
yt
x x x
h1
h2 h h
10. (1) Vanilla mode without RNN (e.g. image classification).
(2) Sequence output (e.g. image captioning).
(3) Sequence input (e.g. sentiment analysis).
(4) Sequence input and sequence output (e.g. Machine Translation).
(5) Synced sequence input and output (e.g. video classification).
What makes Recurrent Networks so special? Sequences !
11. The unreasonable effectiveness of RNNs
-
- Character level language model
- LSTM of Leo Tolstoy’s War and Peace
- Outputs after 100 iters, 300 iters, 700 iters and
2000 iters
12. Challenges of Vanishing and Exploding Gradients
Hidden State Recurrence Relation
using Power Method
- Spectral radius will make gradient explode or vanish
- Variance multiplies at every cell (or timestep)
- For Feed-forward networks of fixed size:
- obtain some desired variance v∗
, choose the individual weights with variance v = n
√ v∗
.
- carefully chosen scaling can avoid the vanishing and exploding gradient problem
- For RNNs , this means we cannot effectively capture Long term dependencies.
- Gradient of a long term interaction has exponentially smaller magnitude than short term
interaction
13. - After a forward pass, the gradients of the non-linearities are fixed.
- Back propagation is like going forwards through a linear system in which the slope of the
non-linearity has been fixed.
14. Loss function of a char-level RNN
def lossFun(inputs, targets, hprev):
"""
inputs,targets are both list of integers.
hprev is Hx1 array of initial hidden state
returns the loss, gradients on model parameters, and last hidden state
"""
xs, hs, ys, ps = {}, {}, {}, {}
hs[-1] = np.copy(hprev)
loss = 0
# forward pass
for t in xrange(len(inputs)):
xs[t] = np.zeros((vocab_size,1)) # encode in 1-of-k representation
xs[t][inputs[t]] = 1
hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state
ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars
ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars
loss += -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)
# backward pass: compute gradients going backwards
dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh),
np.zeros_like(Why)
dbh, dby = np.zeros_like(bh), np.zeros_like(by)
dhnext = np.zeros_like(hs[0])
for t in reversed(xrange(len(inputs))):
dy = np.copy(ps[t])
dy[targets[t]] -= 1 # backprop into y. see
http://cs231n.github.io/neural-networks-case-study/#grad if confused here
dWhy += np.dot(dy, hs[t].T)
dby += dy
dh = np.dot(Why.T, dy) + dhnext # backprop into h
dhraw = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity
dbh += dhraw
dWxh += np.dot(dhraw, xs[t].T)
dWhh += np.dot(dhraw, hs[t-1].T)
dhnext = np.dot(Whh.T, dhraw)
for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
np.clip(dparam, -5, 5, out=dparam) # clip to mitigate exploding gradients
return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]
17. Remedial strategies #1
- Gradient Clipping for Exploding Gradient
- Skip connections
- Integer valued skip length
- Example : ResNet
- Leaky Units
- Linear self-connections approach allows the effect of remembrance and forgetfulness to
be adapted more smoothly and flexibly by adjusting the real-valued α rather than by
adjusting the integer-valued skip length.
- α can be sampled from a distribution or learnt.
- Removing connections
- Learns to interact with far off and nearby connections
- Have explicit and discrete updates taking place at different times, with a different
frequency for different groups of units
18. Remedial strategies #2
- Regularization to maintain information flow
- Require the gradient at any time step t to be similar in magnitude to the gradient of the
loss at the very last layer.
=
- For easy gradient computation, is treated as a constant
- Doesn’t perform as well as leaky units with abundant data
- Perhaps because the constant gradient assumption doesn’t scale well .
19. Echo State Networks
- Recurrent and input weights are fixed. Only output weights are learnable.
- Relies on the idea that a big, random expansion of the input vector, can often make it easy for a
linear model to fit the data.
- fix the recurrent weights to have some spectral radius such as 3, does not explode due to the
stabilizing effect of saturating nonlinearities like tanh.
- Sparse connectivity - Very few non zero values in hidden to hidden weights
- Creates loosely coupled oscillators, information can hang around in a particular part of
the network.
- Important to choose the scale of the input to hidden connections. They need to drive the
states of the loosely coupled oscillators but, they mustn't wipe out information that those
oscillators contain about the recent history.
- used to initialize the weights in a fully trainable recurrent network (Sutskever 2012, Sutskever
et al., 2013).
23. LSTMs
- Adding even more structure
- LSTM : RNN cell with 4 gates that control how information is retained
- Input value can be accumulated into the state if the sigmoidal input gate allows it.
- The cell state unit has a linear self-loop whose weight is controlled by the forget gate.
- The output of the cell can be shut off by the output gate.
- All the gating units have a sigmoid nonlinearity, while the ‘g’ gate can have any squashing
nonlinearity.
- i and g gates - multiplicative interaction
- g - what between -1 to 1 should I add to the cell state
- i - should I go through with the operation.
- Forget gate - Can kill gradients in LSTM if set to zero. Initialize to 1 at start so gradients flow
nicely and LSTM learns to shut or open whenever it wants.
- The state unit can also be used as an extra input to the gating units(Peephole connections).
26. LSTM : Search Space Odyssey
- 2015 Paper by Greff et al.
- Compare 8 different configurations of LSTM Architecture
- GRUs
- Without Peephole connections
- Without output gate
- Without non-linearities at output and forget gate etc
- Trained for 5200 iters, over 15 CPU years
- Did not see any major improvement in results, classic LSTM architecture
works as well as other versions
29. Explicit Memory
● Motivation
○ Some knowledge can be implicit, subconscious, and difficult to verbalize
■ Ex - how a dog looks different from a cat.
○ It can also be explicit, declarative and straightforward to put into words
■ Ex - everyday commonsense knowledge -> a cat is a kind of animal
■ Ex - Very specific facts -> the meeting with the sales team is at 3:00 PM, room 141.”
○ Neural networks excel at storing implicit knowledge but struggle to memorize facts
■ SGD requires a sample to be repeated several time for a NN to memorize, that too
not precisely. (Graves et al, 2014b)
○ Such explicit memory allows systems to rapidly and intentionally store and retrieve
specific facts and to sequentially reason with them.
30. Memory Networks
● Memory networks include a set of memory cells that can be accessed via an addressing
mechanism.
○ Originally required a supervision signal instructing them how to use their memory cells
Weston et al. (2014)
○ Graves et al. (2014b) introduced NMTs
■ able to learn to read from and write arbitrary content to memory cells without
explicit supervision about which actions to undertake
■ allow end-to-end training using a content-based soft attention mechanism.
Bahdanau et al.(2015)
31. Memory Networks
● Soft Addressing - (Content based)
○ Cell state is a Vector - weight used to read to or write from a cell is a function of that cell.
■ Weight can be produced using a softmax across all cells.
○ Completely retrieve vector-valued memory if we are able to produce a pattern that
matches some but not all of its elements
● Hard addressing - (Location based)
○ Output a discrete memory location/Treat weights as probabilities and choose a particular
cell to read or write from
○ Requires specialized optimization algorithms
35. Optimisation for Long term dependencies
- Problem
- Specifically, whenever the model is able to represent long term dependencies, the
gradient of a long term interaction has exponentially smaller magnitude than the gradient
of a short term interaction.
- It does not mean that it is impossible to learn, but that it might take a very long time to
learn long-term dependencies.
- gradient-based optimization becomes increasingly difficult with the probability of
successful training reaching 0 for sequences of only length 10 or 20
- Leaky units & multiple time scales
- Skip connections through time
- Leaky units - The linear self-connection approach allows this effect to be adapted more
smoothly and flexibly by adjusting the real-valued α rather than by adjusting the
integer-valued skip length.
- Remove connections -
- Gradient Clipping