Formation of low mass protostars and their circumstellar disks
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
1. Dynamic Pooling and Unfolding Recursive
Autoencoders for Paraphrase Detection
R. Socher, et al, 2011
Presenter: Shun Yoshida
2. Purpose of This Paper
Objective: To detect paraphrase
S1 The judge also refused to postpone the trial date of
Sept. 29.
S2 Obus also denied a defense motion to postpone the
September trial date.
➔Identifying paraphrases is an important task for
information retrieval, text summarization,
evaluation of machine translation etc.
Relevance to My Research:
This can help me to classify sentiment more precisely
1
3. Word Representation
In general, words are represented as vectors.
1. One-hot representation
This assigns ID to each word individually.
2
[ 0,0,…,1,0,…,0]
Problem:
• Very sparse
• High dimension
• Unable to measure the similarity
between words
1:apple
2:book
⋮
200:zoo
⋮
Vocabulary
4. Word Representation
2. Distributed Representation
:word embedding
This method aims to learn this representation
Merit:
• Low dimension
• Similar words take similar vector
3
zoo [ 1.5, 1.8, 0.3, 4 ]
This represents the semantic, syntactic information
5. Autoencoder
One kind of neural networks
#units of hidden is less
than #units of input
Trained to reconstruct its
own input
4
➔Enable to learn low dimensional representations
which capture the information well
6. Autoencoder
푊푑:weight of decode
푊푒 :weight of encode
Considered as binary tree;
Input:2 childs [푐1; 푐2] ∈ ℝ2푛 Hidden:푝 ∈ ℝ푛
5
푥 ∈ ℝ푛:word embedding
(initialized by neural language model)
푐1 푐2
푝
푐1 ′
푐2 ′
childs to parent:
reconstruction:
reconstruction error:
7. Recursive Auto Encoders
The dimension of child and parent is same,
thus we can repeat same step until full tree is
constructed.
6
phrase vector word embedding
reconstruction error of tree:
8. Unfolding RAE
Unfolding RAE tries to encode each hidden layer such
that it best reconstructs its entire subtree to the leaf
nodes.
7
9. Why Unfolding RAE?
Problem of RAE:
• Equal weight to both children
though each child could
represent a different number of
words
• Lowers 퐸푟푒푐 by making the
hidden layer very small
➔Unfolding RAE can solve there
problems.
8
1word 3words
10. RAE Training
Training is computed by minimizing
the sum of all node’s and all tree’s reconstruction error.
퐸푟푒푐 (푡표푡푎푙) is function of 푥 (word embedding)
and 푊푑 , 푊푒 (weight of neural network)
➔Able to obtain word embeddings and phrase vectors
after training
9
11. Similarity Matrix
After training, we compute the similarities (Euclidean
distances) between all word and phrase vectors of the
two sentences.
These distances fill a similarity matrix 풮.
10
S[3,4] represents the similarity between node 4 of
sentence1(mice) and node 3 of sentence2 (mice).
➔zero distance
12. Why Dynamic Pooling?
Classifying from average distance or histogram distances
of 풮 does not result in good performance.
➔Need to feed 풮 into a classifier.
Problem:
The matrix dimensions vary based on the sentence
length
풮 ∈ ℝ 2푛−1 ×(2푚−1)
Solution:
Map 풮 into a matrix 풮푝표표푙 of fixed size
풮푝표표푙 ∈ ℝ푛푝×푛푝
➔Dynamic Pooling
11
13. Dynamic Pooling 12
Example:
푛푝 = 3 (2푛 − 1, 2푚 − 1 are divisible by 푛푝)
2푛 − 1 = 3
2푚 − 1 = 9
1. Produce an 푛푝 × 푛푝 grid
grid window size: 2푛−1
푛푝
×
take
minimum
2푚−1
푛푝
=1×3
푛푝 = 3
푛푝 = 3
2. Define element of 풮푝표표푙 to be minimum value of
each grid
(small value means that there are similar words or phrases in
both sentences, thus take minimum to keep this information)
14. Dynamic Pooling 13
Example:
푛푝 = 2 (2푛 − 1, 2푚 − 1 are NOT divisible by 푛푝)
2푛 − 1 = 3
2푚 − 1 = 9
1. Produce an 푛푝 × 푛푝 grid
grid window size: 2푛−1
푛푝
×
2푚−1
푛푝
=1×4
2. Distribute remaining rows/columns to the last M
grid.
푛푝 = 2
푛푝 = 2
take
minimum
15. Experiments
1. Does autoencoders capture the phrase information?
➔Unfolding RAE is better.
14
16. Experiments
2. Does unfolding RAE really decode the leaf nodes?
➔Unfolding RAE is better
This can reconstruct phrases up to length five very well
15
17. Experiments
3. How is the performance of proposed method
to detect paraphrase?
16
➔Proposed method achieves state-of-the-art performance