Word2Vec

Outline • Goal
• History
• Word Embedding
• Introduction toWord2Vec
• CBOW
• Skip-Gram
• Parameters
• Implementations
• Other usecases
2

When? Who?
• Word2vec was created by a team of
researchers led by Tomas Mikolov at Google.
• Embedding vectors created using the
Word2vec algorithm have many advantages
compared to earlier algorithmssuch as :
latent semantic analysis.
2013
3

Goal:
Reconstruct linguistic
contexts of words
context
words
Word2Vec
Target
Word
WordWord2Vec
Context
words
4

Tasks
(WATER – WET ) + FIRE = FLAMES
(PARIS - FRANCE) + ITALY = ROME
(WINTER - COLD) + SUMMER = WARM
(KING - MAN) +WOMAN = QUEEN
5

Why vector
space?
similar
distributions
similar
meanings
6

Vector space: word embeddings
7

word
embedding
A technique to turn
words into numbers
to use by many of the machine learning
algorithms
8

One-hot
vector
simple word representation
• Vector length is equal to dictionary size
• Any vector has one non-zero element
9

Types of
Word
Embeddings
Frequency
based
CountVector
TF-IDF
Vector
Co-
Occurrence
Vector
Prediction
based
CBOW
Skip – Gram
10

What is
word2vec?
• Word2vec is a combination of two
techniques
– CBOW(Continuous bag of words)
– Skip-gram model.
• Both of these map word(s) to
word(s).
• learn weights which act as word
vector representations.
Skip-
gram
CBOW
12

How it
works?
1. Both input word wi and the output word wj are one-hot
encoded into binary vectors x and y of size V.
2. First, the multiplication of the binary vector xx and the
word embedding matrix W of size V×N gives us the
embedding vector of the input word wi: the i-th row of
the matrix W.
3. The multiplication of the hidden layer and the word
context matrix W′ of size N×W produces the output
one-hot encoded vector y.
13

Training Samples
By
sibling window
15

CBOW
(Continuous Bag of words)
Skip-gram
Syntactic relation Semantic relation
16

Loss
Functions
Full Softmax
Hierarchical Softmax
Cross Entropy
Noise Contrastive Estimation (NCE)
Negative Sampling (NEG)
17

Parametrization • Sub-sampling
– High frequency words often provide little information.
• Dimensionality
– Quality of word embedding increases with higher
dimensionality.
– But after reaching some point, marginal gain will
diminish.
– Typically, the dimensionality of the vectors is set to be
between 100 and 1,000.
• Context window
– The recommended value is 10 for skip-gram and 5 for
CBOW.
19

Result
https://ronxin.github.io/wevi/ 20

Variants models class
• documents to vector spaceDoc2vec
• There are a lot of noisy text and informal
language structure.tweet2vec
• dealing with item and user similarity is at heart
of lot of recommendation algorithmsitem2vec
• this embedding technique tries to marry best of
both worlds, word2vec and LDALda2vec
21

Word2Vec

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Word2Vec

Similar to Word2Vec (20)

Recently uploaded

Recently uploaded (20)

Word2Vec