25. idea: instead of capturing co-
occurrence counts
predict surrounding words
26. Two models:
C-BOW
predicting the word given its context
skip-gram
predicting the context given a word
Explained in great detail here, so we’ll skip it for now Also see: word2vec Parameter
Learning Explained, Rong, paper
28. CBOW: several times faster than skip-gram,
slightly better accuracy for the frequent words
Skip-Gram: works well with small amount of
data, represents well rare words or phrases