Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Word2Vec: Learning of word
representations in a vector space
1
Daniele Di Mitri - Joeri Hermans
23 March 2015
Student Lecture - Di Mitri & Hermans
1. Classic NLP techniques limitations
2. Skip-gram
3. Negative sampling
4. Learning o...
Student Lecture - Di Mitri & Hermans
classic NLP techniques N-grams, Bag of words
• words as atomic units
• or in vector s...
Student Lecture - Di Mitri & Hermans
successful intuition: the context represents the semantics
Word’s context
4
these wor...
Student Lecture - Di Mitri & Hermans
• One-hot problem [0,0,1] AND [1,0,0] = 0!
• Bengio et al (2003) introduce word featu...
Student Lecture - Di Mitri & Hermans
• Mikolov et al. introduce in 2013 more computationally efficient
neural architecture...
Student Lecture - Di Mitri & Hermans
Example
7
vec(“man”) – vec(“king”) + vec(“woman”) = vec(“queen”)
Student Lecture - Di Mitri & Hermans
Feedforward NN for
classification
Classification task: predict
next and previous word...
Student Lecture - Di Mitri & Hermans
• Computing similarity between every word is very
expensive.
• Including the correct ...
Student Lecture - Di Mitri & Hermans 10
Student Lecture - Di Mitri & Hermans
• In Machine learning
• Machine translation.
• In Data mining
• Dimensionality reduct...
Student Lecture - Di Mitri & Hermans
1. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and
Christian Janvin. A neural pro...
Student Lecture - Di Mitri & Hermans
Questions?
Thank you for your attention!
14
Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans
Upcoming SlideShare
Loading in …5
×

Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans

4,171 views

Published on

Student lecture for the Master course in Data Mining of the University of Maastricht.

authors: Daniele Di Mitri and Joeri Hermans

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans

  1. 1. Word2Vec: Learning of word representations in a vector space 1 Daniele Di Mitri - Joeri Hermans 23 March 2015
  2. 2. Student Lecture - Di Mitri & Hermans 1. Classic NLP techniques limitations 2. Skip-gram 3. Negative sampling 4. Learning of word representations 5. Applications 6. References Outline 2
  3. 3. Student Lecture - Di Mitri & Hermans classic NLP techniques N-grams, Bag of words • words as atomic units • or in vector space [0,0,0,0,1,0,0….0] also known as one-hot simple and robust models also when trained on huge amounts of data BUT • No semantical relationships between words: not designed to model linguistic knowledge. • Data is extremely sparse due to high number of dimensions • Scaling up will not result in significant progress 3 love candy store Classic NLP techniques limitations
  4. 4. Student Lecture - Di Mitri & Hermans successful intuition: the context represents the semantics Word’s context 4 these words represent banking
  5. 5. Student Lecture - Di Mitri & Hermans • One-hot problem [0,0,1] AND [1,0,0] = 0! • Bengio et al (2003) introduce word features (feature vector) learned using a neural architecture P(wt |wt-(n-1) ,…,wt-1 ) candy = {0.124, -0.553, 0.923, 0.345, -0.009} • Dimensionality reduction using word vectors • Data sparsity is no longer a problem. • Not computationally efficient. Feature vectors 5
  6. 6. Student Lecture - Di Mitri & Hermans • Mikolov et al. introduce in 2013 more computationally efficient neural architectures skip-gram and Continuous Bag of words • Hypothesis: more simple models trained on (a lot) more data will result in better word representations • How to evaluate these word representations? Semantical similarity (cosine similarity)! Importance of efficiency 6
  7. 7. Student Lecture - Di Mitri & Hermans Example 7 vec(“man”) – vec(“king”) + vec(“woman”) = vec(“queen”)
  8. 8. Student Lecture - Di Mitri & Hermans Feedforward NN for classification Classification task: predict next and previous words (the context) The features learned in weight matrix to hidden layer are our word vectors Skip-gram 8 Supervised learning with unlabeled input data!
  9. 9. Student Lecture - Di Mitri & Hermans • Computing similarity between every word is very expensive. • Including the correct context, select multiple incorrect contexts at random. • Faster training • Only a few words will change instead of all words in the language. Negative sampling 9
  10. 10. Student Lecture - Di Mitri & Hermans 10
  11. 11. Student Lecture - Di Mitri & Hermans • In Machine learning • Machine translation. • In Data mining • Dimensionality reduction. Example applications 11
  12. 12. Student Lecture - Di Mitri & Hermans 1. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. 2. Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. 3. Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. Ecient estimation of word representations in vector space. 4. Tomas Mikolov, Wen tau Yih, and Georey Zweig. Linguistic regularities in continuous space word representations. • Try the code word2vec.googlecode.com References 13
  13. 13. Student Lecture - Di Mitri & Hermans Questions? Thank you for your attention! 14

×