How Does word2vec Work? 
Andrew Koo - Insight Data Science
word2vec (Google, 2013) 
• Use documents to train a neural network model 
maximizing the conditional probability of context given the 
word 
• Apply the trained model to each word to get its 
corresponding vector 
• Calculate the vector of sentences by averaging the vector 
of their words 
• Construct the similarity matrix between sentences 
• Use Pagerank to score the sentences in graph
1. Use documents to train a neural 
network model maximizing the conditional 
probability of context given the word 
The goal is to optimize the parameters (Θ) maximizing the 
conditional probability of context (c) given the word (w). D is the set 
of all (w, c) pairs 
For example: I ate a “????” at McDonald last night is more likely 
given Big Mac
2. Apply the model to each word 
to get its corresponding vector 
word vector 
(0.12, 0.23, 0.56) 
(0.24, 0.65, 0.72) 
(0.38, 0.42, 0.12) 
(0.57, 0.01, 0.02) 
(0.53, 0.68, 0.91) 
(0.11, 0.27, 0.45) 
(0.01, 0.05, 0.62) 
The 
Cardinals 
will 
win 
the 
world 
series
3. Calculate the vector of sentences 
by averaging the vector of their words 
word vector 
(0.12, 0.23, 0.56) 
(0.24, 0.65, 0.72) 
(0.38, 0.42, 0.12) 
(0.57, 0.01, 0.02) 
(0.53, 0.68, 0.91) 
(0.11, 0.27, 0.45) 
(0.01, 0.05, 0.62) 
The 
Cardinals 
will 
win 
the 
world 
series 
sentence vector 
(0.28, 0.33, 0.49)
4. Construct the similarity 
matrix between sentences 
1 
0.366 
0.243 
0.564 
0.720 
Sentence Vector 
S’1 
S’2 
S’3 
S’4 
S’5 
0.366 
1 
0.623 
0.132 
0.189 
0.243 
0.623 
1 
0.014 
0.523 
0.564 
0.132 
0.014 
1 
0.002 
matrix * matrix.T similarity matrix 
0.720 
0.189 
0.523 
0.002 
1
5. Use Pagerank to score the 
sentences in graph 
• Rank the sentences 
with underlying 
assumption that 
“summary sentences” 
are similar to most 
other sentences

Word2vec algorithm

  • 1.
    How Does word2vecWork? Andrew Koo - Insight Data Science
  • 2.
    word2vec (Google, 2013) • Use documents to train a neural network model maximizing the conditional probability of context given the word • Apply the trained model to each word to get its corresponding vector • Calculate the vector of sentences by averaging the vector of their words • Construct the similarity matrix between sentences • Use Pagerank to score the sentences in graph
  • 3.
    1. Use documentsto train a neural network model maximizing the conditional probability of context given the word The goal is to optimize the parameters (Θ) maximizing the conditional probability of context (c) given the word (w). D is the set of all (w, c) pairs For example: I ate a “????” at McDonald last night is more likely given Big Mac
  • 4.
    2. Apply themodel to each word to get its corresponding vector word vector (0.12, 0.23, 0.56) (0.24, 0.65, 0.72) (0.38, 0.42, 0.12) (0.57, 0.01, 0.02) (0.53, 0.68, 0.91) (0.11, 0.27, 0.45) (0.01, 0.05, 0.62) The Cardinals will win the world series
  • 5.
    3. Calculate thevector of sentences by averaging the vector of their words word vector (0.12, 0.23, 0.56) (0.24, 0.65, 0.72) (0.38, 0.42, 0.12) (0.57, 0.01, 0.02) (0.53, 0.68, 0.91) (0.11, 0.27, 0.45) (0.01, 0.05, 0.62) The Cardinals will win the world series sentence vector (0.28, 0.33, 0.49)
  • 6.
    4. Construct thesimilarity matrix between sentences 1 0.366 0.243 0.564 0.720 Sentence Vector S’1 S’2 S’3 S’4 S’5 0.366 1 0.623 0.132 0.189 0.243 0.623 1 0.014 0.523 0.564 0.132 0.014 1 0.002 matrix * matrix.T similarity matrix 0.720 0.189 0.523 0.002 1
  • 7.
    5. Use Pagerankto score the sentences in graph • Rank the sentences with underlying assumption that “summary sentences” are similar to most other sentences