Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

Word2vec

54 views

Published on

word2vec basic concept!

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Word2vec

1. 1. Word2Vec JUNLIN WU (Jim)
2. 2. Understanding Word2vec • New representation (Vector of the original data) • With semantic meaning • Ex: vector(King) – vector(man) + vector(Woman) ≈ Queen
3. 3. Original word representation • Bag of words • Ex: I am eating pizza now 00001 00010 00100 01000 10000 Why one-hot encoding? Let machine know word is ”Nominal” instead of “Ordinal”. Ex: If we set “I” = 1 ”am” = 2 Machine will think “am” = “I” * 2, but it is not!
4. 4. Main idea of Word2Vec • Two different algorithm: 1. Continuous Bag-of-Words Model 2. Continuous Skip-gram Model • And then use it to train a neural network to do a fake task • Why fake task? • Because we are not actually care about that NN. Instead, the goal is to learn the weights of the hidden layer(word vectors) .7 .2 .8 .19 .28 .22 .22 .23 .21 .1 .5 .3 .2 .13 .23 W W’
5. 5. Word2Vec – Continuous Bag-of-Words model • Continuous Bag-of-Words Model • Predict target word by the context words • Ex: Given a sentence and window size 1 I am eating pizza now Target word Context word Context word I am eating pizza now Target word Context word Context word And so on….. Here we only take one snapshot for example!
6. 6. Word2Vec – Continuous Bag-of-Words model • Continuous Bag-of-Words Model • Predict target word by the context words • The middle word as label, context words as features I am eating pizza now Target word Context word Context word “I, am, pizza, now” as features ”eating” as label Before next slide, we do one hot encoding I am eating pizza now 00001 00010 00100 01000 10000
7. 7. Word2Vec – Continuous Bag-of-Words model • We train one depth neural network, set hidden layer’s width 3. 1 0 0 0 0 Input layer 00001 00010 00100 01000 10000 I am eating pizza now 1x5 Hidden layer 0.1 0.2 0.5 0.1 0.1 Softmax layer W W‘ 5x3 3x5 0 0 1 0 0 Actual label 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 Backward propagation for update weight
8. 8. Word2Vec – Continuous Bag-of-Words model 1 0 0 0 0 Input layer 1x5 Hidden layer 0.1 0.2 0.5 0.1 0.1 Softmax layer W W‘ 5x3 3x5 0 0 1 0 0 Actual label 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 Backward propagation for update weight • Our goal is to extract word representation vector instead of prob of predict target word. Let’s look into here.
9. 9. Word2Vec – Continuous Bag-of-Words model Input layer 1x5 W 5x3 I am eating pizza now 00001 00010 00100 01000 10000 .7 .2 .8 .19 .28 .22 .22 .23 .21 .1 .5 .3 .2 .13 .23 0 0 0 0 1 X = .2 .13 .23 It’s operating as a lookup table, the output of the hidden layer is just the “word vector” for the input word. Hidden layer
10. 10. Word2Vec – Continuous Bag-of-Words model • After training, we get the W(word vector) instead of predicting target word. 1 0 0 0 0 Input layer 1x5 Hidden layer 0.1 0.2 0.5 0.1 0.1 Softmax layer W W‘ 5x3 3x5 0 0 1 0 0 Actual label 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 Backward propagation for update weight .7 .2 .8 .19 .28 .22 .22 .23 .21 .1 .5 .3 .2 .13 .23 Goal:
11. 11. • Disadvantage: • It cannot capture rare word. • Ex: Word2Vec – Continuous Bag-of-Words model So, The skip gram algorithm comes. This is a good movie theater Target word Marvelous Context word Context word
12. 12. Word2Vec – Continuous Skip-gram Model • Continuous Skip-gram Model • Predict context word by the target words (Like opposite of CBOW) • Ex: Given a sentence, window size 1 I am eating pizza now Target word Context word Context word I am eating pizza now Target word Context word Context word I am eating pizza now And so on….. Here we only take one snapshot for example!
13. 13. Word2Vec – Continuous Skip-gram Model • Continuous Skip-gram Model • Predict context word by the target words (Like opposite of CBOW) • The middle word as feature, context words as labels I am eating pizza now Target word Context word Context word “I, am, pizza, now” as labels ”eating” as feature Before next slide, we do one hot encoding I am eating pizza now 00001 00010 00100 01000 10000
14. 14. Word2Vec – Continuous Skip-gram Model • We train one depth neural network, set hidden layer’s width 3 Hidden layer 0.1 0.2 0.1 0.1 0.5 1 0 0 0 0 Input layer Softmax layer W W‘ 00001 00010 00100 01000 10000 I am eating pizza now 5x3 1x5 5x3 0 0 1 0 0 Actual label 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 Backward propagation for update weight 0.3 0.2 0.2 0.1 0.2 . .
15. 15. Word2Vec – Continuous Skip-gram Model Hidden layer Input layer Softmax layer W W‘ 5x3 1x5 5x3 0 0 1 0 0 Actual label Backward propagation for update weight • After training, we get the W(word vector) instead of predicting target word. .7 .2 .8 .19 .28 .22 .22 .23 .21 .1 .5 .3 .2 .13 .23 0.1 0.2 0.5 0.1 0.1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0.3 0.2 0.2 0.1 0.2 . .
16. 16. • Advantage: • It can capture rare word. • Ex: • It’s also good at similarity of word semantic. • synonyms like “intelligent” and “smart” would have very similar contexts Word2Vec – Continuous Skip-gram Model This is a good movie theater Target word Marvelous Context word Context word
17. 17. Reference • Stanford NLP Youtube • https://arxiv.org/pdf/1310.4546.pdf • https://www.youtube.com/watch?v=D-ekE-Wlcds • https://iksinc.wordpress.com/tag/continuous-bag-of-words-cbow/ • http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram- model/ • http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative- sampling/ • http://www.1-4-5.net/~dmm/ml/how_does_word2vec_work.pdf • https://ricardokleinklein.github.io/2017/09/25/word2vec.html
18. 18. Thanks you!