Recurrent neural networks (RNNs) and long short-term memory (LSTM) networks can process sequential data like text and time series data. RNNs have memory and can perform the same task for every element in a sequence, but struggle with long-term dependencies. LSTMs address this issue using memory cells and gates that allow them to learn long-term dependencies. LSTMs have four interacting layers - a forget gate, input gate, cell state, and output gate that allow them to store and access information over long periods of time. RNNs and LSTMs are applied to tasks like language modeling, machine translation, speech recognition, and image caption generation.
Overview of RNN and LSTM as neural networks designed for processing sequential data.
Categorization of machine learning: Supervised, Unsupervised, and Reinforcement Learning with various examples.
Introduction to the K-Nearest Neighbor algorithm for classification.
Discussion on how neural networks can reveal hidden relationships within features.
Intro to RNNs, their sequential process capabilities, and applications in various fields.
Method of training RNNs using Backpropagation Through Time (BPTT).
Different types of RNNs including bidirectional and LSTM networks. Detailed explanation of LSTM architecture, including cell state and gated mechanisms.
Stepwise explanation of LSTM gate operations, including forget, input, and output decisions.
Different variations of LSTM including peephole connections, coupled gates, and GRUs.
Discussion on the effectiveness of RNN and LSTM models.
Examples of multiple applications of RNN models.
Concept of Turing-Completeness in relation to RNNs and their capability to simulate programs.
Application of RNNs on non-sequential data through sequential processing.
Resources showcasing interesting applications of RNN and LSTM architectures.
Compilation of references and resources for deeper understanding of RNNs, LSTMs, and ML.
Supervised Machine Learning
•Training Set: Inputs + Outputs
• Learn a link between the inputs and the outputs
• Linear and logistic regression
• Support vector machine
• K-nearest neighbors (k-NN)
• Naive Bayes
• Neural network
• Gradient boosting
• Classification trees and random forest
Step 1: ForgetGate Layer
• Decide what info to throw away
• Look at h[t-1] and x[t] and output a number 0~1 to decide how much cell state to keep C[t-1]
• E.g. When see a new subject, we want to forget the gender of the old subject
23.
Step 2: InputGate Layer
• Decide what info to add
• A sigmoid: decide which value to update
• A tanh layer: create a new candidate value C~[t]
• E.g. add a new gender of the new subject
24.
Step 3: Combinestep 1 & 2
• Combine step 1 & 2
• Multiply the old state by f[t]: to forget the things
• Add i[t] * C~[t] : to add new candidate value (scaled)
25.
Step 4: Filter/outputthe Cell state
• Decide what to output
• sigmoid: decide which part to output
• tanh: push the value to be between -1 ~ 1
• Multiply them to only output the part we decided to
• E.g. output a info related to a Verb
• E.g. output whether the subject it singular or plural
Variants on LSTM(1)
• Peephole:
let the gate layer look at the cell state (entire/ partial)
28.
Variants on LSTM(2)
• Coupled forgot and input gates:
Not deciding separately
f[t] * C[t-1] + (1-f[t]) * C~[t]
29.
Variants on LSTM(3)
• Gated Recurrent Unit (GRU):
combine the forget and input layer into a single “update gate”
merge the cell state and the hidden state
simpler and popular
Turing-Complete
• Running afixed program with certain inputs and some internal variables (can simulate
arbitrary programs)
• Andrej Karpathy (Ph.D. @ Stanford):
33.
Non-sequential data
• Thoughthe data is not in form of sequences, we can still use RNN by process
it sequentially.
34.
Some cool RNN/LSTMapplications
• http://karpathy.github.io/2015/05/21/rnn-effectiveness/