Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space

www.postersession.com
Word embeddings are being used for several linguistic problems and NLP
applications. The quality of these embeddings have improved largely over the
past few years. However, vector embeddings of phrases keeping semantics
intact with words has been challenging. We present a novel methodology using
Siamese deep neural networks to embed multi-word units and fine-tune the
current state-of-the-art word embeddings keeping both in the same vector
space.
Vijay Prakash Dwivedi Manish Shrivastava
CSED, MNNIT Allahabad LTRC, IIIT Hyderabad
mail@vijaydwivedi.com.np m.shrivastava@iiit.ac.in
Introduction
 Currently, compositional models are used to build phrase embeddings with
less emphasis on building the compositions using deep learning and more using
specific composition functions.
 Our major contribution in this work is employing deep neural architectures to
transform constituent word embeddings of a multi-word unit into its vector
representation.
Contribution
 Model developed with the objective of training a system to predict similarity
between a word and a phrase leveraging a similarity dataset.
 We capture the sequence of words in a phrase and their interaction as well.
 The model learns to generate phrase representations according to its closest
single word meaning.
The more the semantic similarity between the word and the phrase, the closer
is their similarity metric output to 0 (1 otherwise).
Approach
Input Word Input Phrase Output
remorse
athletes
suez
payment
deep regret
bring up
the suez canal
earth orbit
0
1
0
1
Table 1. Input-Output samples. Output is the similarity metric output, i.e. 0 for similar and 1 for dissimilar
Model Architecture: Siamese LSTM
Fig 1. Model Architecture; First input Xa is a single word unit and second input Xb is a multi-word unit
We use 3 layers of stacked LSTMs in the Siamese network. The hidden layers
and the number of neurons were selected after repeated experiments and we
obtained best results with this configuration and the computational constraints.
Data set and Baseline Word Vectors
We use the PPDB dataset (Ganitkevitch et al., 2013)
Size XL
N-gram Numbers Percentage
2-gram
3-gram
4-gram
5-gram
6-gram
298,536
60,657
13,132
2,993
682
79.40 %
16.13 %
3.49 %
0.80 %
0.18 %
Total 376,000 100.00 %
As base word
embeddings for the input
layers, we use GloVe
word vector embeddings
(Pennington et al., 2014)
of dimension 200.
GloVe: https://nlp.stanford.edu/projects/
glove
Experiments and Results
Table 2. Composition of the PPDB dataset of size XL showing only 0.18% of the phrases are of more than
5-grams; the scores are 0.15% & 0.12% for XXL & XXXL sizes respectively
1. Word-Phrase Similarity Task 2. SemEval2013 5(a) Task
Fig 2. Accuracy of similarity over dataset size
Model SemEval2013 Test
RAE
FCT-LM
FCT-Joint
ReNN
Our System
51.75
67.22
70.64
72.22
72.14
Table 3. Performance Comparison;
RAE: Recursive Auto-encoder; FCT: Feature
Rich Compositional Transformation Models;
ReNN: Recursive Neural Networks
3. Nearest Words and Phrases
Word Nearest Words Nearest Phrases
viewpoints perspectives, opinions,
viewpoint, views
differing views, the viewpoints, different
opinions, different perspectives
upbeat optimistic, cautious,
outlook, gloomy
optimistic about, overly optimistic, more
cautious, cautious
1600 1400, 1700, 1300, 1500 1600 hours, 1300 hours, 1700 hours,
1100 hours
asem apec, asean, g20, summit the asem process, apec leaders, apec
economies, summit meetings
Phrase Nearest Phrases
are crucial are important, are needed, are essential, are necessary
2005 to 2006 2005 to 2007, 2005 to 2008, 2003 to 2006, 2004 to 2005
president hosni
mubarak
president mubarak, egyptian president hosni mubarak, hosni
mubarak, the egyptian president hosni mubarak
the violence the acts of violence, the violent, the prevalence of violence, the
cycle of violence
References
1. Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch. 2013. PPDB: The Paraphrase
Database. NAACL-HLT, 758-764.
2. Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011.
Semi-supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP,151-161.
3. Richard Socher, John Bauer, Christopher D. Manning, and Andrew Ng. 2013. Parsing with compositional
vector grammars. ACL, 455-465.
4. Mo Yu and Mark Dredze. 2015. Learning Composition Models for Phrase Embeddings. ACL, 227-242.

Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space

Similar to Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space (20)

Recently uploaded

Recently uploaded (17)

Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space