SlideShare a Scribd company logo
1 of 19
Dynamic Pooling and Unfolding Recursive 
Autoencoders for Paraphrase Detection 
R. Socher, et al, 2011 
Presenter: Shun Yoshida
Purpose of This Paper 
Objective: To detect paraphrase 
S1 The judge also refused to postpone the trial date of 
Sept. 29. 
S2 Obus also denied a defense motion to postpone the 
September trial date. 
➔Identifying paraphrases is an important task for 
information retrieval, text summarization, 
evaluation of machine translation etc. 
Relevance to My Research: 
This can help me to classify sentiment more precisely 
1
Word Representation 
In general, words are represented as vectors. 
1. One-hot representation 
This assigns ID to each word individually. 
2 
[ 0,0,…,1,0,…,0] 
Problem: 
• Very sparse 
• High dimension 
• Unable to measure the similarity 
between words 
1:apple 
2:book 
⋮ 
200:zoo 
⋮ 
Vocabulary
Word Representation 
2. Distributed Representation 
:word embedding 
This method aims to learn this representation 
Merit: 
• Low dimension 
• Similar words take similar vector 
3 
zoo [ 1.5, 1.8, 0.3, 4 ] 
This represents the semantic, syntactic information
Autoencoder 
 One kind of neural networks 
 #units of hidden is less 
than #units of input 
 Trained to reconstruct its 
own input 
4 
➔Enable to learn low dimensional representations 
which capture the information well
Autoencoder 
푊푑:weight of decode 
푊푒 :weight of encode 
Considered as binary tree; 
Input:2 childs [푐1; 푐2] ∈ ℝ2푛 Hidden:푝 ∈ ℝ푛 
5 
푥 ∈ ℝ푛:word embedding 
(initialized by neural language model) 
푐1 푐2 
푝 
푐1 ′ 
푐2 ′ 
childs to parent: 
reconstruction: 
reconstruction error:
Recursive Auto Encoders 
The dimension of child and parent is same, 
thus we can repeat same step until full tree is 
constructed. 
6 
phrase vector word embedding 
reconstruction error of tree:
Unfolding RAE 
Unfolding RAE tries to encode each hidden layer such 
that it best reconstructs its entire subtree to the leaf 
nodes. 
7
Why Unfolding RAE? 
Problem of RAE: 
• Equal weight to both children 
though each child could 
represent a different number of 
words 
• Lowers 퐸푟푒푐 by making the 
hidden layer very small 
➔Unfolding RAE can solve there 
problems. 
8 
1word 3words
RAE Training 
Training is computed by minimizing 
the sum of all node’s and all tree’s reconstruction error. 
퐸푟푒푐 (푡표푡푎푙) is function of 푥 (word embedding) 
and 푊푑 , 푊푒 (weight of neural network) 
➔Able to obtain word embeddings and phrase vectors 
after training 
9
Similarity Matrix 
After training, we compute the similarities (Euclidean 
distances) between all word and phrase vectors of the 
two sentences. 
These distances fill a similarity matrix 풮. 
10 
S[3,4] represents the similarity between node 4 of 
sentence1(mice) and node 3 of sentence2 (mice). 
➔zero distance
Why Dynamic Pooling? 
Classifying from average distance or histogram distances 
of 풮 does not result in good performance. 
➔Need to feed 풮 into a classifier. 
Problem: 
The matrix dimensions vary based on the sentence 
length 
풮 ∈ ℝ 2푛−1 ×(2푚−1) 
Solution: 
Map 풮 into a matrix 풮푝표표푙 of fixed size 
풮푝표표푙 ∈ ℝ푛푝×푛푝 
➔Dynamic Pooling 
11
Dynamic Pooling 12 
Example: 
푛푝 = 3 (2푛 − 1, 2푚 − 1 are divisible by 푛푝) 
2푛 − 1 = 3 
2푚 − 1 = 9 
1. Produce an 푛푝 × 푛푝 grid 
grid window size: 2푛−1 
푛푝 
× 
take 
minimum 
2푚−1 
푛푝 
=1×3 
푛푝 = 3 
푛푝 = 3 
2. Define element of 풮푝표표푙 to be minimum value of 
each grid 
(small value means that there are similar words or phrases in 
both sentences, thus take minimum to keep this information)
Dynamic Pooling 13 
Example: 
푛푝 = 2 (2푛 − 1, 2푚 − 1 are NOT divisible by 푛푝) 
2푛 − 1 = 3 
2푚 − 1 = 9 
1. Produce an 푛푝 × 푛푝 grid 
grid window size: 2푛−1 
푛푝 
× 
2푚−1 
푛푝 
=1×4 
2. Distribute remaining rows/columns to the last M 
grid. 
푛푝 = 2 
푛푝 = 2 
take 
minimum
Experiments 
1. Does autoencoders capture the phrase information? 
➔Unfolding RAE is better. 
14
Experiments 
2. Does unfolding RAE really decode the leaf nodes? 
➔Unfolding RAE is better 
This can reconstruct phrases up to length five very well 
15
Experiments 
3. How is the performance of proposed method 
to detect paraphrase? 
16 
➔Proposed method achieves state-of-the-art performance
Experiments 
4. Examples of classified data. 
17
おわり

More Related Content

Similar to Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural NetworksSangwoo Mo
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxRama Irsheidat
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Jinho Choi
 
G6 m2-a-lesson 7-t
G6 m2-a-lesson 7-tG6 m2-a-lesson 7-t
G6 m2-a-lesson 7-tmlabuski
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATIONX-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATIONIJCI JOURNAL
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionGyeongman Kim
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionGeonDoPark1
 
Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013Isaac_Schools_5
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...Alexander Decker
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 

Similar to Dynamic pooling and unfolding recursive autoencoders for paraphrase detection (20)

sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptx
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
 
G6 m2-a-lesson 7-t
G6 m2-a-lesson 7-tG6 m2-a-lesson 7-t
G6 m2-a-lesson 7-t
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Word embedding
Word embedding Word embedding
Word embedding
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATIONX-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
 
Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

  • 1. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection R. Socher, et al, 2011 Presenter: Shun Yoshida
  • 2. Purpose of This Paper Objective: To detect paraphrase S1 The judge also refused to postpone the trial date of Sept. 29. S2 Obus also denied a defense motion to postpone the September trial date. ➔Identifying paraphrases is an important task for information retrieval, text summarization, evaluation of machine translation etc. Relevance to My Research: This can help me to classify sentiment more precisely 1
  • 3. Word Representation In general, words are represented as vectors. 1. One-hot representation This assigns ID to each word individually. 2 [ 0,0,…,1,0,…,0] Problem: • Very sparse • High dimension • Unable to measure the similarity between words 1:apple 2:book ⋮ 200:zoo ⋮ Vocabulary
  • 4. Word Representation 2. Distributed Representation :word embedding This method aims to learn this representation Merit: • Low dimension • Similar words take similar vector 3 zoo [ 1.5, 1.8, 0.3, 4 ] This represents the semantic, syntactic information
  • 5. Autoencoder  One kind of neural networks  #units of hidden is less than #units of input  Trained to reconstruct its own input 4 ➔Enable to learn low dimensional representations which capture the information well
  • 6. Autoencoder 푊푑:weight of decode 푊푒 :weight of encode Considered as binary tree; Input:2 childs [푐1; 푐2] ∈ ℝ2푛 Hidden:푝 ∈ ℝ푛 5 푥 ∈ ℝ푛:word embedding (initialized by neural language model) 푐1 푐2 푝 푐1 ′ 푐2 ′ childs to parent: reconstruction: reconstruction error:
  • 7. Recursive Auto Encoders The dimension of child and parent is same, thus we can repeat same step until full tree is constructed. 6 phrase vector word embedding reconstruction error of tree:
  • 8. Unfolding RAE Unfolding RAE tries to encode each hidden layer such that it best reconstructs its entire subtree to the leaf nodes. 7
  • 9. Why Unfolding RAE? Problem of RAE: • Equal weight to both children though each child could represent a different number of words • Lowers 퐸푟푒푐 by making the hidden layer very small ➔Unfolding RAE can solve there problems. 8 1word 3words
  • 10. RAE Training Training is computed by minimizing the sum of all node’s and all tree’s reconstruction error. 퐸푟푒푐 (푡표푡푎푙) is function of 푥 (word embedding) and 푊푑 , 푊푒 (weight of neural network) ➔Able to obtain word embeddings and phrase vectors after training 9
  • 11. Similarity Matrix After training, we compute the similarities (Euclidean distances) between all word and phrase vectors of the two sentences. These distances fill a similarity matrix 풮. 10 S[3,4] represents the similarity between node 4 of sentence1(mice) and node 3 of sentence2 (mice). ➔zero distance
  • 12. Why Dynamic Pooling? Classifying from average distance or histogram distances of 풮 does not result in good performance. ➔Need to feed 풮 into a classifier. Problem: The matrix dimensions vary based on the sentence length 풮 ∈ ℝ 2푛−1 ×(2푚−1) Solution: Map 풮 into a matrix 풮푝표표푙 of fixed size 풮푝표표푙 ∈ ℝ푛푝×푛푝 ➔Dynamic Pooling 11
  • 13. Dynamic Pooling 12 Example: 푛푝 = 3 (2푛 − 1, 2푚 − 1 are divisible by 푛푝) 2푛 − 1 = 3 2푚 − 1 = 9 1. Produce an 푛푝 × 푛푝 grid grid window size: 2푛−1 푛푝 × take minimum 2푚−1 푛푝 =1×3 푛푝 = 3 푛푝 = 3 2. Define element of 풮푝표표푙 to be minimum value of each grid (small value means that there are similar words or phrases in both sentences, thus take minimum to keep this information)
  • 14. Dynamic Pooling 13 Example: 푛푝 = 2 (2푛 − 1, 2푚 − 1 are NOT divisible by 푛푝) 2푛 − 1 = 3 2푚 − 1 = 9 1. Produce an 푛푝 × 푛푝 grid grid window size: 2푛−1 푛푝 × 2푚−1 푛푝 =1×4 2. Distribute remaining rows/columns to the last M grid. 푛푝 = 2 푛푝 = 2 take minimum
  • 15. Experiments 1. Does autoencoders capture the phrase information? ➔Unfolding RAE is better. 14
  • 16. Experiments 2. Does unfolding RAE really decode the leaf nodes? ➔Unfolding RAE is better This can reconstruct phrases up to length five very well 15
  • 17. Experiments 3. How is the performance of proposed method to detect paraphrase? 16 ➔Proposed method achieves state-of-the-art performance
  • 18. Experiments 4. Examples of classified data. 17

Editor's Notes

  1. この論文のしたいこと:言い換えの検出 例えばS1 裁判官は9/29の公判期日の延期を拒否した.と S2 Obusは9月の公判期日を延期しようとする動きを否定した. は同じ意味の文章,すなわち言い換え これを検知したい これができれば検索やテキスト要約や機械翻訳の評価などに使える 僕の研究で言えば,ネガティブの文章の言い換えが検知できたりして,よりよい精度でポジネガ判定ができるのではないかというところが 研究との関連性です
  2. 計算機で単語を扱うために,一般に単語をベクトルに変換して扱う ベクトルへの変換方法としてOne~ これはまずボキャブリーを作成してその単語にそれぞれIDを与え,そのID番目の要素だけ非0として他を0とするベクトルで表す表現 問題:非常にスパース 次元がボキャブラリーに登録されている単語数次元なので高次元    辞書にない未知の単語扱えない
  3. 意味的,構文的な情報を表す抽象的な情報量をベクトルで表す表現 この表現を使えば低次元で表現できる この論文はこの表現の学習も同時に行うことでパラフレーズ検出を行う また,意味的・構文的に似た単語はベクトルの値も近い値をとる 例えばappleとorangeは果物という意味で近いので,ある程度近いベクトルになるが, Appleとsoccerはぜんぜん違うベクトルになる
  4. NNの一種 隠れ層のユニット数は入力層のユニット数より少ない 出力が入力を再現するような学習を行う これにより特徴をよく捉えた低次元の表現(word embedding)が学習できる
  5. autoencoderの入力にはword embedding xを入力する 初期値はニューラルランゲージモデルというモデルを使って計算したword embeddingを使う word embegging xはn次元とする 単語2つのword embegging 2n次元をn次元に射影し,そこから入力の2n次元を復元しようとしている autoencoderを二分木とみなして,つまり下部は子,上部は親として 子から親への式はこれ 親から個を復元,再構築のしきはこれ 入力そのものを復元する,つまり教師ベクトルは入力そのものなので 再構築エラーは教師ベクトル-復元したベクトル の二乗差
  6. さっき注目してたのが青□で囲まれている範囲 入力ベクトルも親ベクトルも同じn次元なので同じ動作を繰り返して任意の長さの入力ベクトルで二分木を作成できる 文章全体の二分木の再構築エラーはそれぞれの場所でのエラーの足し合わせ どういう順番で二分木を作るかというのは,構文解析器を使うと書いてある...
  7. 先ほど説明したRAEでは入力とした2つの子を復元するだけ 例えば図左のx1とy2を子とした親y2はx1とy2を復元しようとする 今回のUnfolding RAEはx1とy1を復元し,さらにy1からx2とx3を復元する つまり,phrase vectorを復元するのではなくすべてword embeddingを復元しようとするのがこれ
  8. RAEの問題点として各ノードは異なる単語の情報を持ちうるのに同等の重みとして扱っていることである 例えば,図のこれは3語分の情報を持っているが,これは1語文の情報.なのにどちらも同等の重みで復元しようとしている. 2つめに再構築エラーはベクトルの2乗誤差で求めるが,2層目以降ではベクトルの大きさ自体を小さくしてしまうことでエラーを小さくしようとする傾向が現れる. つまり,隠れ層を小さくしてしまう. Unfoldingでは多くの情報をもつノードは大きな部分木を再構築しないといけないので必然的に重みをおくようになり, 隠れ層から葉まで復元するので隠れ層を小さくする問題が起こらなくなる.
  9. 訓練データの数だけ木が作られ,その木の各ノードでの再構築エラーを足し合わせる. この再構築エラーの合計を最小にすることで学習を行う. 再構築エラーはword embedding xとフレーズベクトルyの関数になっているので,これを勾配法で学習することで, 意味的・構文的な情報量を表すword embeddingとフレーズベクトルが得られる
  10. 学習が終わったあと,パラフレーズ判定したい2つの文(図左)で同様の手順で木を作り,木のそれぞれのベクトル間の類似性をユークリッド距離で測り, それをsimilarity matrixに格納する. 行列の[3,2]成分は左側の文のノード4であるmiceと右側の文のノード3であるmiceのユークリッド距離を表す
  11. similarity matrixの平均距離やヒストグラムから判定を行ってもよいパフォーマンスは得られない そこで,このSの値を別の識別器に入力して判定を行うようにしたい しかし,similarity matrixの次元は入力する文によって変化してしまうので,このまま識別器に放り込めない そこで,similarity matrixを固定次元のSpoolに射影し,このSpoolを識別器にいれるという方法をとる. 識別器はニューラルネットとか従来手法と同じものを使えばいいので,ここではSpookをどう作るかを説明する
  12. similarity matrixが3かける9次元で,これを3かける3次元に射影したいとする 𝑛 𝑝 =3 (2𝑛−1, 2𝑚−1 are divisible by 𝑛 𝑝 )のとき Sを3かける3のグリッドに分割する グリッドの窓の大きさはこの式で与えられる グリッドの最小値を対応する位置のSpoolに格納する Sの値が小さいことは,ユークリッド距離が小さい,すなわち似た意味の単語もしくはフレーズが存在することを意味しているので,この情報を保持するために最小値を取る
  13. similarity matrixが3かける9次元で,これを2かける2次元に射影したいとする Sを2かける2のグリッドに分割する グリッドの窓の大きさはこの式で与えられる ただあまりの部分がでてきてしまう あまりの数,今回は行も列も1なので,最後の1回だけ行と列のグリッドsizeを+1する ※行列の左上の法の要素はphraseでなく単語同士の類似度が格納されているが,この単語同士の類似度の情報の保持におもきをおいていることになる これはパラフレーズ検出では単語のoverlapが重要であると仮定しているから
  14. 実験結果です 1つめ:autoencoderは本当にphrase vectorをちゃんと学習できてるのか? 表1 左の列のフある文から抽出したフレーズと最も近いベクトルとなるフレーズを別の文から抽出してみた Unfolding RAEがいい感じですって結果でした
  15. 2つめの結果 unfolding RAEは葉のノード,つまり単語ベクトルを復元するということだったが 本当に復元できているのか?という結果 表2 単語の数が5以下のときはかなりできていて,3以下のときは完全に復元できるそう
  16. 最後,パラフレーズ検出に提案手法がどれくらい有効か?の結果 提案手法は従来手法よりよくなりました!