Successfully reported this slideshow.
Upcoming SlideShare
×

# Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans

4,171 views

Published on

Student lecture for the Master course in Data Mining of the University of Maastricht.

authors: Daniele Di Mitri and Joeri Hermans

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv

Are you sure you want to  Yes  No

### Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans

1. 1. Word2Vec: Learning of word representations in a vector space 1 Daniele Di Mitri - Joeri Hermans 23 March 2015
2. 2. Student Lecture - Di Mitri & Hermans 1. Classic NLP techniques limitations 2. Skip-gram 3. Negative sampling 4. Learning of word representations 5. Applications 6. References Outline 2
3. 3. Student Lecture - Di Mitri & Hermans classic NLP techniques N-grams, Bag of words • words as atomic units • or in vector space [0,0,0,0,1,0,0….0] also known as one-hot simple and robust models also when trained on huge amounts of data BUT • No semantical relationships between words: not designed to model linguistic knowledge. • Data is extremely sparse due to high number of dimensions • Scaling up will not result in significant progress 3 love candy store Classic NLP techniques limitations
4. 4. Student Lecture - Di Mitri & Hermans successful intuition: the context represents the semantics Word’s context 4 these words represent banking
5. 5. Student Lecture - Di Mitri & Hermans • One-hot problem [0,0,1] AND [1,0,0] = 0! • Bengio et al (2003) introduce word features (feature vector) learned using a neural architecture P(wt |wt-(n-1) ,…,wt-1 ) candy = {0.124, -0.553, 0.923, 0.345, -0.009} • Dimensionality reduction using word vectors • Data sparsity is no longer a problem. • Not computationally efficient. Feature vectors 5
6. 6. Student Lecture - Di Mitri & Hermans • Mikolov et al. introduce in 2013 more computationally efficient neural architectures skip-gram and Continuous Bag of words • Hypothesis: more simple models trained on (a lot) more data will result in better word representations • How to evaluate these word representations? Semantical similarity (cosine similarity)! Importance of efficiency 6
7. 7. Student Lecture - Di Mitri & Hermans Example 7 vec(“man”) – vec(“king”) + vec(“woman”) = vec(“queen”)
8. 8. Student Lecture - Di Mitri & Hermans Feedforward NN for classification Classification task: predict next and previous words (the context) The features learned in weight matrix to hidden layer are our word vectors Skip-gram 8 Supervised learning with unlabeled input data!
9. 9. Student Lecture - Di Mitri & Hermans • Computing similarity between every word is very expensive. • Including the correct context, select multiple incorrect contexts at random. • Faster training • Only a few words will change instead of all words in the language. Negative sampling 9
10. 10. Student Lecture - Di Mitri & Hermans 10
11. 11. Student Lecture - Di Mitri & Hermans • In Machine learning • Machine translation. • In Data mining • Dimensionality reduction. Example applications 11
12. 12. Student Lecture - Di Mitri & Hermans 1. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. 2. Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. 3. Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. Ecient estimation of word representations in vector space. 4. Tomas Mikolov, Wen tau Yih, and Georey Zweig. Linguistic regularities in continuous space word representations. • Try the code word2vec.googlecode.com References 13
13. 13. Student Lecture - Di Mitri & Hermans Questions? Thank you for your attention! 14