SlideShare a Scribd company logo
1 of 21
Download to read offline
Recursive Neural Networks
2018.06.27.
Sangwoo Mo
Recursive Neural Network (RNN) - Motivation
• Motivation: Many real objects has a recursive structure,
e.g. Images are sum of segments, and sentences are sum of words
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Motivation
• Motivation: Can we learn a good representation for the recursive structures?
• Recursive structures (phrases) and components (words) should lie on the same space,
e.g. the country of my birth ≃ Germany, France, etc.
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Model
• Goal: Design a neural network that features are recursively constructed
• Each module maps two children to one parents, lying on the same vector space
• To give the order of recursion, we give a score (plausibility) for each node
• Hence, the neural network module outputs (representation, score) pairs
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Each line is
Recursive Neural Network (RNN) - Model
• cf. Note that recurrent neural network is a special case of recursive neural network
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Ratsgo’s blog for text mining.
=
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• We can apply beam search to improve the performance
• Beam search: Keep 𝑘-memory for each step (Greedy = 1-Beam search)
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Ratsgo’s blog for text mining.
Recursive Neural Network (RNN) - Training
• Let (sentence, tree) pair (𝑥𝑖, 𝑦𝑖) are given
• Let 𝑠(𝑥𝑖, 𝑦) be score of tree 𝑦, sum of scores of every non-leaf nodes
• Let 𝐴(𝑥𝑖) be candidate trees (approximated by beam search)
• Then max-margin objective (maximize) is
where Δ 𝑦, 𝑦𝑖 is number of wrong subtrees
• We can also give a classification loss for each node
(use node’s feature as input for the classifier)
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Increases 𝑠(𝑥𝑖, 𝑦𝑖) decreases 𝑠(𝑥𝑖, 𝑦) if 𝑠 𝑥𝑖, 𝑦 + Δ 𝑦, 𝑦𝑖 > 𝑠(𝑥𝑖, 𝑦𝑖)
class vector
Recursive Neural Network (RNN) - Experiments
• After training, both leaf and higher nodes learn the valid representation
• Image segmentation: Infer classes for segments (feature extractor is jointly trained)
• Phrase clustering: Nearest neighborhood on phrase features
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Recursive Neural Network (RNN) - Appendix
• Preprocessing: How to convert segments/words to the representation space ℝ 𝑛
?
• Word: Use pretrained word2vec model (𝑉 → ℝ 𝑛)
• Image: Extract hand-crafted features in ℝ 𝑚
, and jointly train a network 𝐹: ℝ 𝑚
→ ℝ 𝑛
• Extension to image segmentation
• There are multiple adjacency segments
• Hence, there are multiple true tree structures
• Hence, Δ 𝑦, 𝑦𝑖 checks if the subtree is
included in the set of true tree structures
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Recursive Autoencoder (RAE) - Motivation & Idea
• Motivation: Recursive neural network (RNN) requires true tree structures for training
• Recursive autoencoder (RAE) extends RNN to un- (semi-)supervised learning setting
• If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1
′
, 𝑐2
′
on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1
′
, 𝑐2
′ 2
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Recursive Autoencoder (RAE) - Model
• If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1
′
, 𝑐2
′
on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1
′
, 𝑐2
′ 2
• If tree structure is not given, we take minimum over all candidate trees 𝐴(𝑥𝑖)
argmin
𝑦∈𝐴(𝑥 𝑖)
𝐿(𝑦) = argmin
𝑦∈𝐴(𝑥 𝑖)
෍
𝑐1,𝑐2,𝑝 ∈𝑦
𝑐1, 𝑐2 − 𝑐1
′
, 𝑐2
′ 2
• Here, 𝐴(𝑥𝑖) is approximated by greedy search, using recon loss as score
• Length normalization: Minimizing recon loss forces the scale of hidden nodes be 0
To prevent this, normalize hidden nodes by length: 𝑝/‖𝑝‖
• The resulting tree captures the information of words, but not follows the syntactics
• However, the learnt representation was still useful
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Recursive Autoencoder (RAE) - Experiments
• For each paragraph, votes on 5 sentiments are labeled (multiple votes for one paragraph)
• Train a logistic regression model using the learnt representation
• The learnt representation was better than baseline models,
e.g. binary bag-of-words, hand-crafted features, and average of word vectors
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Unfolding RAE & Dynamic Pooling - Model
• Unfolding RAE is global autoencoder version of RAE (expensive but may better)
• In some tasks, e.g. paraphrase detection, we should compare features of sentences
• Comparing all features would be better than root features, but size does not match
• Dynamic pooling converts the similarity matrix to the fixed-sized matrix
Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
Unfolding RAE & Dynamic Pooling - Experiments
• Unfolding RAE learns better representation than RAE
• Unfolding RAE + dynamic pooling gives the best representation for similarity
Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
Nearest Neighbors
Similarity Classification
Matrix-Vector RNN (MV-RNN)
• Motivation: Different word pairs have different composition rule
• Idea: Represent the composition rule of words ∈ ℝ 𝑛 by a matrix ∈ ℝ 𝑛×𝑛
• Hence, each word is represented by a matrix-vector pair 𝑎, 𝐴 ∈ ℝ 𝑛
× ℝ 𝑛×𝑛
• For two words 𝑎, 𝐴 and 𝑏, 𝐵 , the parent node 𝑝, 𝑃 is given by
𝑝 = 𝑓𝑉 𝑎, 𝑏, 𝐴, 𝐵 = ሚ𝑓𝑉 𝐵𝑎, 𝐴𝑏
and
𝑃 = 𝑓 𝑀 𝐴, 𝐵 = 𝑊 𝑀 ⋅ 𝐴 𝐵 𝑇
• We should store ℝ 𝑛×𝑛×|𝑉|
matrixes, hence the authors use
low-rank approximation to reduce the # of parameters
• MV-RNN shows better performance than vanilla RNN
Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012.
Semantic Classification
Recursive Neural Tensor Network (RNTN)
• Motivation: Considering composition is cool, but MV-RNN uses too many parameters
• Instead of using one matrix for each word, use a single tensor to represent composition
• Formally, let 𝑉[1:𝑛] ∈ ℝ2𝑛×2𝑛×𝑛 where 𝑉[𝑖] ∈ ℝ2𝑛×2𝑛 indicates each tensor slices
• Then the composition rule ℎ ∈ ℝ 𝑛 for children (𝑎, 𝑏) are given by
ℎ𝑖 = 𝑎 𝑏 ⋅ 𝑉 𝑖
⋅ 𝑎 𝑏 𝑇
and the parent 𝑝 ∈ ℝ 𝑛
is
𝑝 = 𝑓 𝑎, 𝑏, ℎ = ሚ𝑓(ℎ + 𝑊 ⋅ 𝑎 𝑏 𝑇
)
• It reduced the # of parameters from 𝑑 × 𝑑 × |𝑉| to 2𝑑 × 2𝑑 × 𝑑
• RNTN also shows better performance than MV-RNN
Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.
Reference
• Recursive Neural Network (RNN): Socher et al. Parsing Natural Scenes and Natural
Language with Recursive Neural Networks. ICML 2011.
• Recursive Autoencoder (RAE): Socher et al. Semi-Supervised Recursive Autoencoders for
Predicting Sentiment Distributions. EMNLP 2011.
• Unfolding RAE & Dynamic Pooling: Socher et al. Dynamic Pooling and Unfolding Recursive
Autoencoders for Paraphrase Detection. NIPS 2011.
• Matrix-Vector RNN (MV-RNN): Socher et al. Semantic Compositionality through Recursive
Matrix-Vector Spaces. EMNLP 2012.
• Recursive Neural Tensor Network (RNTN): Socher et al. Recursive Deep Models for
Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.

More Related Content

What's hot

What's hot (20)

An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Training Deep Neural Nets
Training Deep Neural NetsTraining Deep Neural Nets
Training Deep Neural Nets
 
Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 

Similar to Recursive Neural Networks

Similar to Recursive Neural Networks (20)

Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detectionDynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
Nn devs
Nn devsNn devs
Nn devs
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Evolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network TreesEvolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network Trees
 
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De... Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Neuromorphic computing for neural networks
Neuromorphic computing for neural networksNeuromorphic computing for neural networks
Neuromorphic computing for neural networks
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Semester presentation
Semester presentationSemester presentation
Semester presentation
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 

More from Sangwoo Mo

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Recursive Neural Networks

  • 2. Recursive Neural Network (RNN) - Motivation • Motivation: Many real objects has a recursive structure, e.g. Images are sum of segments, and sentences are sum of words Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 3. Recursive Neural Network (RNN) - Motivation • Motivation: Can we learn a good representation for the recursive structures? • Recursive structures (phrases) and components (words) should lie on the same space, e.g. the country of my birth ≃ Germany, France, etc. Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 4. Recursive Neural Network (RNN) - Model • Goal: Design a neural network that features are recursively constructed • Each module maps two children to one parents, lying on the same vector space • To give the order of recursion, we give a score (plausibility) for each node • Hence, the neural network module outputs (representation, score) pairs Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14. Each line is
  • 5. Recursive Neural Network (RNN) - Model • cf. Note that recurrent neural network is a special case of recursive neural network Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Ratsgo’s blog for text mining. =
  • 6. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 7. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 8. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 9. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 10. Recursive Neural Network (RNN) - Inference • We can apply beam search to improve the performance • Beam search: Keep 𝑘-memory for each step (Greedy = 1-Beam search) Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Ratsgo’s blog for text mining.
  • 11. Recursive Neural Network (RNN) - Training • Let (sentence, tree) pair (𝑥𝑖, 𝑦𝑖) are given • Let 𝑠(𝑥𝑖, 𝑦) be score of tree 𝑦, sum of scores of every non-leaf nodes • Let 𝐴(𝑥𝑖) be candidate trees (approximated by beam search) • Then max-margin objective (maximize) is where Δ 𝑦, 𝑦𝑖 is number of wrong subtrees • We can also give a classification loss for each node (use node’s feature as input for the classifier) Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14. Increases 𝑠(𝑥𝑖, 𝑦𝑖) decreases 𝑠(𝑥𝑖, 𝑦) if 𝑠 𝑥𝑖, 𝑦 + Δ 𝑦, 𝑦𝑖 > 𝑠(𝑥𝑖, 𝑦𝑖) class vector
  • 12. Recursive Neural Network (RNN) - Experiments • After training, both leaf and higher nodes learn the valid representation • Image segmentation: Infer classes for segments (feature extractor is jointly trained) • Phrase clustering: Nearest neighborhood on phrase features Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
  • 13. Recursive Neural Network (RNN) - Appendix • Preprocessing: How to convert segments/words to the representation space ℝ 𝑛 ? • Word: Use pretrained word2vec model (𝑉 → ℝ 𝑛) • Image: Extract hand-crafted features in ℝ 𝑚 , and jointly train a network 𝐹: ℝ 𝑚 → ℝ 𝑛 • Extension to image segmentation • There are multiple adjacency segments • Hence, there are multiple true tree structures • Hence, Δ 𝑦, 𝑦𝑖 checks if the subtree is included in the set of true tree structures Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
  • 14. Recursive Autoencoder (RAE) - Motivation & Idea • Motivation: Recursive neural network (RNN) requires true tree structures for training • Recursive autoencoder (RAE) extends RNN to un- (semi-)supervised learning setting • If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1 ′ , 𝑐2 ′ on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 15. Recursive Autoencoder (RAE) - Model • If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1 ′ , 𝑐2 ′ on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 • If tree structure is not given, we take minimum over all candidate trees 𝐴(𝑥𝑖) argmin 𝑦∈𝐴(𝑥 𝑖) 𝐿(𝑦) = argmin 𝑦∈𝐴(𝑥 𝑖) ෍ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 • Here, 𝐴(𝑥𝑖) is approximated by greedy search, using recon loss as score • Length normalization: Minimizing recon loss forces the scale of hidden nodes be 0 To prevent this, normalize hidden nodes by length: 𝑝/‖𝑝‖ • The resulting tree captures the information of words, but not follows the syntactics • However, the learnt representation was still useful Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 16. Recursive Autoencoder (RAE) - Experiments • For each paragraph, votes on 5 sentiments are labeled (multiple votes for one paragraph) • Train a logistic regression model using the learnt representation • The learnt representation was better than baseline models, e.g. binary bag-of-words, hand-crafted features, and average of word vectors Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 17. Unfolding RAE & Dynamic Pooling - Model • Unfolding RAE is global autoencoder version of RAE (expensive but may better) • In some tasks, e.g. paraphrase detection, we should compare features of sentences • Comparing all features would be better than root features, but size does not match • Dynamic pooling converts the similarity matrix to the fixed-sized matrix Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
  • 18. Unfolding RAE & Dynamic Pooling - Experiments • Unfolding RAE learns better representation than RAE • Unfolding RAE + dynamic pooling gives the best representation for similarity Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011. Nearest Neighbors Similarity Classification
  • 19. Matrix-Vector RNN (MV-RNN) • Motivation: Different word pairs have different composition rule • Idea: Represent the composition rule of words ∈ ℝ 𝑛 by a matrix ∈ ℝ 𝑛×𝑛 • Hence, each word is represented by a matrix-vector pair 𝑎, 𝐴 ∈ ℝ 𝑛 × ℝ 𝑛×𝑛 • For two words 𝑎, 𝐴 and 𝑏, 𝐵 , the parent node 𝑝, 𝑃 is given by 𝑝 = 𝑓𝑉 𝑎, 𝑏, 𝐴, 𝐵 = ሚ𝑓𝑉 𝐵𝑎, 𝐴𝑏 and 𝑃 = 𝑓 𝑀 𝐴, 𝐵 = 𝑊 𝑀 ⋅ 𝐴 𝐵 𝑇 • We should store ℝ 𝑛×𝑛×|𝑉| matrixes, hence the authors use low-rank approximation to reduce the # of parameters • MV-RNN shows better performance than vanilla RNN Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012. Semantic Classification
  • 20. Recursive Neural Tensor Network (RNTN) • Motivation: Considering composition is cool, but MV-RNN uses too many parameters • Instead of using one matrix for each word, use a single tensor to represent composition • Formally, let 𝑉[1:𝑛] ∈ ℝ2𝑛×2𝑛×𝑛 where 𝑉[𝑖] ∈ ℝ2𝑛×2𝑛 indicates each tensor slices • Then the composition rule ℎ ∈ ℝ 𝑛 for children (𝑎, 𝑏) are given by ℎ𝑖 = 𝑎 𝑏 ⋅ 𝑉 𝑖 ⋅ 𝑎 𝑏 𝑇 and the parent 𝑝 ∈ ℝ 𝑛 is 𝑝 = 𝑓 𝑎, 𝑏, ℎ = ሚ𝑓(ℎ + 𝑊 ⋅ 𝑎 𝑏 𝑇 ) • It reduced the # of parameters from 𝑑 × 𝑑 × |𝑉| to 2𝑑 × 2𝑑 × 𝑑 • RNTN also shows better performance than MV-RNN Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.
  • 21. Reference • Recursive Neural Network (RNN): Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. • Recursive Autoencoder (RAE): Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011. • Unfolding RAE & Dynamic Pooling: Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011. • Matrix-Vector RNN (MV-RNN): Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012. • Recursive Neural Tensor Network (RNTN): Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.