Vector Space Word Representations - Rani Nelken PhD

Vector space word
representations
Rani Nelken, PhD
Director of Research, Outbrain
@RaniNelken

https://www.flickr.com/photos/hyku/295930906/in/photolist-EbXgJ-ajDBs8-9hevWb-s9HX1-5hZqnb-a1Jk8H-a1Mcx7-7QiUWL-6AFs53-9TRtkz-bqt2GQ-
F574u-F56EA-3imqK7/

That would be crazy for numbers
https://www.flickr.com/photos/proimos/4199675334/

The distributional hypothesis
What is a word?
Wittgenstein (1953): The meaning of a word is
its use in the language
Firth (1957): You shall know a word by the
company it keeps

From atomic symbols to vectors
• Map words to dense numerical vectors
“representing” their contexts
• Map words with similar contexts to vectors
with small angle

History
• Hard Clustering: Brown clustering
• Soft clustering: LSA, Random projections, LDA
• Neural nets

Feedforward Neural Net Language
Model

Training
• Input is one-hot vectors of context
(0…0,1,0…0)
• We’re trying to learn a vector for each word
(“projection”)
• Such that the output is close to the one-hot
vector of w(t)

What can we do with these
representations?
• Plug them into your existing classifier
• Plug them into further neural nets – better!
• Improves accuracy on many NLP tasks
– Named entity recognition
– POS tagging
– sentiment analysis
– semantic role labeling

Back to cheese…
• cos(crumbled, cheese) = 0.042
• cos(crumpled, cheese) = 0.203

And now for the magic
http://en.wikipedia.org/wiki/Penn_%26_Teller#mediaviewer/File:Penn_and_Teller_(1988).jpg

“Magical” property
• [Paris] - [France] + [Italy] ≈ [Rome]
• [king] - [man] + [woman] ≈ [queen]
• We can use it to solve word analogy problems
Boston: Red_Sox= New_York: ?
Demo

Why does it work?
[king] - [man] + [woman] ≈ [queen]
cos (x, ([king] – [man] + [woman])) =
cos (x, [king]) – cos(x, [man]) + cos(x, [woman])
[queen] is a good candidate

It doesn’t always work
• London : England = Baghdad : ?
• We expect Iraq, but get Mosul
• We’re looking for a word that is close to
Baghdad, and to England, but not to London

Why did it fail?
• London : England = Baghdad : ?
• cos(Mosul, Baghdad) >> cos(Iraq, London)
• Instead of adding the cosines, multiply them
• Improves accuracy

Word2Vec
• Open source C implementation from Google
• Comes with pre-learned embeddings
• Gensim: fast python implementation

Active field of research
• Bilingual embeddings
• Joint word and image embeddings
• Embeddings for sentiment
• Phrase and document embeddings

Bigger picture: how can we make NLP
less fragile?
• 90’s: Linguistic engineering
• 00’s: Feature engineering
• 10’s: Unsupervised preprocessing

References
• https://code.google.com/p/word2vec/
• http://www.cs.bgu.ac.il/~yoavg/publications/c
onll2014analogies.pdf
• http://radimrehurek.com/2014/02/word2vec-tutorial/

Thanks
@RaniNelken
We’re hiring for NLP positions

Vector Space Word Representations - Rani Nelken PhD

More Related Content

Similar to Vector Space Word Representations - Rani Nelken PhD

More from freshdatabos

Recently uploaded

Vector Space Word Representations - Rani Nelken PhD