Ashis Kumar Chanda
PhD Candidate
Understanding Word2Vector
Authors: Tomas Mikolov et al. 2013
Contents
• Problem description
• Motivation
• Proposed Method
• Experiments
• Conclusion
• Criticism
2
Problem description
• Every word has a meaning
• But, how can we learn a new word?
• We can check dictionary for its meaning
– It takes time and we are not always ready with dictionary
• Otherwise, we can guess the meaning of a new word from its
context
3
Her limpid prose made even the most difficult subjects accessible to all.
This part helps to guess the meaning of limpid
It would be “pleasant” or “clear”
Problem description
• How machine can understand a word meaning?
• It can translate from a dictionary or word library
– difficult to create and maintain such a library
• However, a word can have different meaning
– neighboring / context words can help to suggest
• Machine should learn word representation itself
4
Word embeddings
• There are many methods to find word embeddings
– Frequency based embeddings
– Count vectors
– TF-IDF
– Co-occurrence matrix
– Skip gram model
– CBOW
• We are going to discuss the last two methods
https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
5
Motivation
• Finding semantic meaning of words
• Learning word from its context words
• Representing word in a low dimensional vector
• Easy to compare two words in vector space
6
Proposed method
• Representing a word as a vector
• How should we learn these vector values
• There are two methods
– 1. Continuous Bag of Word (CBOW)
– 2. Skip gram model (SG)
7
0.2
0
0
0.7
0
0
0
0
…
0
cat
0.1
0.3
0.9
0
0
0
0
0
…
0
dog
Proposed method
• CBOW: use a set of words in fixed length (window) to predict
the middle word
• SG: use a word to predict the surrounding words in a fixed
distance (window)
8
Proposed method
• Scanning words in a window from an article
• Word order is not important in window
• Eg: Many days ago, there was a king who had ……
Here, “king” is our target word = Wt
9
Wt Wt+1 Wt+2Wt-1Wt-2 Window = 5
Next window
Proposed method
• Used a two layer neural network
• First layer is fully connected
• Final layer used softmax function to know probability of one
word with respect of others
• Stochastic gradient descent is used to learn parameters in
back propagation
10
Proposed method
• Representing a word as a vector
• Translate the query tree into a SQL statement
11
0.2
0
0
0.7
0
0
0
0
…
0
cat
0.1
0.3
0.9
0
0
0
0
0
…
0
cat
Fig: collected from Coursera course of Andrew Ng
word
Feature
Conclusions
• Introducing a new state of art in natural language processing
• Big Data is needed to find a good embedding
• Training process takes a long time
• Learned W2V model on wikipedia documents is publicly
available
• Used in many applications successfully
12
Project Links
• https://code.google.com/archive/p/word2vec/
• https://radimrehurek.com/gensim/models/word2vec.html
13
Application on Medical Data
• Medical data contains notes and codes
• Note is a description of patient’s condition and treatments
• Codes are unique values that used to represent diagnosis and
medicine
• There are many standard coding methods, like ICD-9, CPT …
• W2V can be used in medical dataset to know the medical
code embeddings
T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. "Joint learning of
representations of medical concepts and words from EHR data". In the
BIBM conference, 2017
14
References
• T. Mikolov, K. Chen, G. Corrado, J. Dean, Ecient estimation of word representations in vector space, CoRR
• abs/1301.3781. arXiv:1301.3781. URL http://arxiv.org/abs/1301.3781
• X. Rong, word2vec parameter learning explained, CoRR abs/1411.2738. arXiv:1411.2738. URL
http://arxiv.org/abs/1411.2738
• T. Bai, A. K. Chanda, B. L. Egleston, S. Vucetic, Joint learning of representations of medical concepts and words from
• EHR data, in: 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017, Kansas City, MO, USA,
November 13-16, 2017, 2017, pp. 764{769. doi:10.1109/BIBM.2017.8217752. URL
https://doi.org/10.1109/BIBM.2017.8217752
15

Word2vector

  • 1.
    Ashis Kumar Chanda PhDCandidate Understanding Word2Vector Authors: Tomas Mikolov et al. 2013
  • 2.
    Contents • Problem description •Motivation • Proposed Method • Experiments • Conclusion • Criticism 2
  • 3.
    Problem description • Everyword has a meaning • But, how can we learn a new word? • We can check dictionary for its meaning – It takes time and we are not always ready with dictionary • Otherwise, we can guess the meaning of a new word from its context 3 Her limpid prose made even the most difficult subjects accessible to all. This part helps to guess the meaning of limpid It would be “pleasant” or “clear”
  • 4.
    Problem description • Howmachine can understand a word meaning? • It can translate from a dictionary or word library – difficult to create and maintain such a library • However, a word can have different meaning – neighboring / context words can help to suggest • Machine should learn word representation itself 4
  • 5.
    Word embeddings • Thereare many methods to find word embeddings – Frequency based embeddings – Count vectors – TF-IDF – Co-occurrence matrix – Skip gram model – CBOW • We are going to discuss the last two methods https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/ 5
  • 6.
    Motivation • Finding semanticmeaning of words • Learning word from its context words • Representing word in a low dimensional vector • Easy to compare two words in vector space 6
  • 7.
    Proposed method • Representinga word as a vector • How should we learn these vector values • There are two methods – 1. Continuous Bag of Word (CBOW) – 2. Skip gram model (SG) 7 0.2 0 0 0.7 0 0 0 0 … 0 cat 0.1 0.3 0.9 0 0 0 0 0 … 0 dog
  • 8.
    Proposed method • CBOW:use a set of words in fixed length (window) to predict the middle word • SG: use a word to predict the surrounding words in a fixed distance (window) 8
  • 9.
    Proposed method • Scanningwords in a window from an article • Word order is not important in window • Eg: Many days ago, there was a king who had …… Here, “king” is our target word = Wt 9 Wt Wt+1 Wt+2Wt-1Wt-2 Window = 5 Next window
  • 10.
    Proposed method • Useda two layer neural network • First layer is fully connected • Final layer used softmax function to know probability of one word with respect of others • Stochastic gradient descent is used to learn parameters in back propagation 10
  • 11.
    Proposed method • Representinga word as a vector • Translate the query tree into a SQL statement 11 0.2 0 0 0.7 0 0 0 0 … 0 cat 0.1 0.3 0.9 0 0 0 0 0 … 0 cat Fig: collected from Coursera course of Andrew Ng word Feature
  • 12.
    Conclusions • Introducing anew state of art in natural language processing • Big Data is needed to find a good embedding • Training process takes a long time • Learned W2V model on wikipedia documents is publicly available • Used in many applications successfully 12
  • 13.
    Project Links • https://code.google.com/archive/p/word2vec/ •https://radimrehurek.com/gensim/models/word2vec.html 13
  • 14.
    Application on MedicalData • Medical data contains notes and codes • Note is a description of patient’s condition and treatments • Codes are unique values that used to represent diagnosis and medicine • There are many standard coding methods, like ICD-9, CPT … • W2V can be used in medical dataset to know the medical code embeddings T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. "Joint learning of representations of medical concepts and words from EHR data". In the BIBM conference, 2017 14
  • 15.
    References • T. Mikolov,K. Chen, G. Corrado, J. Dean, Ecient estimation of word representations in vector space, CoRR • abs/1301.3781. arXiv:1301.3781. URL http://arxiv.org/abs/1301.3781 • X. Rong, word2vec parameter learning explained, CoRR abs/1411.2738. arXiv:1411.2738. URL http://arxiv.org/abs/1411.2738 • T. Bai, A. K. Chanda, B. L. Egleston, S. Vucetic, Joint learning of representations of medical concepts and words from • EHR data, in: 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017, Kansas City, MO, USA, November 13-16, 2017, 2017, pp. 764{769. doi:10.1109/BIBM.2017.8217752. URL https://doi.org/10.1109/BIBM.2017.8217752 15

Editor's Notes

  • #3 Problem description, Motivation, Proposal, Experiments, Conclusion, Criticism
  • #8 A multiple choice selection panel
  • #9 A multiple choice selection panel
  • #10 A multiple choice selection panel
  • #11 A multiple choice selection panel
  • #12 A multiple choice selection panel
  • #15 No link for their application