Efficient Estimation of Word Representations in Vector Space, by T. Mikolov et al. (2013). Continuous vector representations of words by learning its context words.
3. Problem description
• Every word has a meaning
• But, how can we learn a new word?
• We can check dictionary for its meaning
– It takes time and we are not always ready with dictionary
• Otherwise, we can guess the meaning of a new word from its context
Her limpid prose made even the most difficult subjects accessible to all.
This part helps to guess the meaning of limpid
It would be “pleasant” or “clear”
4. Problem description
• How machine can understand a word meaning?
• It can translate from a dictionary or word library
– difficult to create and maintain such a library
• However, a word can have different meaning
– neighboring / context words can help to suggest
• Machine should learn word representation itself
5. Word embeddings
• There are many methods to find word embeddings
– Frequency based embeddings
– Count vectors
– TF-IDF
– Co-occurrence matrix
– Skip gram model
– CBOW
• We are going to discuss the last two methods
https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
6. Motivation
• Finding semantic meaning of words
• Learning word from its context words
• Representing word in a low dimensional vector
• Easy to compare two words in vector space
7. Proposed method
• Representing a word as a vector
• How should we learn these vector values
• There are two methods
– 1. Continuous Bag of Word (CBOW)
– 2. Skip gram model (SG)
0.2
0
0
0.7
0
0
0
0
…
0
cat
0.1
0.3
0.9
0
0
0
0
0
…
0
dog
8. Proposed method
• CBOW: use a set of words in fixed length (window) to predict
the middle word
• SG: use a word to predict the surrounding words in a fixed
distance (window)
9. Proposed method
• Scanning words in a window from an article
• Word order is not important in window
• Eg: Many days ago, there was a king who had ……
Here, “king” is our target word = Wt
Wt Wt+1 Wt+2Wt-1Wt-2 Window = 5
Next window
10. Proposed method
• Used a two layer neural network
• First layer is fully connected
• Final layer used softmax function to know probability of one
word with respect of others
• Stochastic gradient descent is used to learn parameters in
back propagation
11. Proposed method
• Representing a word as a vector
• Translate the query tree into a SQL statement
0.2
0
0
0.7
0
0
0
0
…
0
cat
0.1
0.3
0.9
0
0
0
0
0
…
0
cat
Fig: collected from Coursera course of Andrew Ng
word
Feature
12. Conclusions
• Introducing a new state of art in natural language processing
• Big Data is needed to find a good embedding
• Training process takes a long time
• Learned W2V model on wikipedia documents is publicly
available
• Used in many applications successfully
14. Application on Medical Data
• Medical data contains notes and codes
• Note is a description of patient’s condition and treatments
• Codes are unique values that used to represent diagnosis and medicine
• There are many standard coding methods, like ICD-9, CPT …
• W2V can be used in medical dataset to know the medical code
embeddings
T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. "Joint learning of representations of
medical concepts and words from EHR data". In the BIBM conference, 2017
15. References
• T. Mikolov, K. Chen, G. Corrado, J. Dean, Ecient estimation of word representations in vector space, CoRR
• abs/1301.3781. arXiv:1301.3781. URL http://arxiv.org/abs/1301.3781
• X. Rong, word2vec parameter learning explained, CoRR abs/1411.2738. arXiv:1411.2738. URL http://
arxiv.org/abs/1411.2738
• T. Bai, A. K. Chanda, B. L. Egleston, S. Vucetic, Joint learning of representations of medical concepts and words from
• EHR data, in: 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017, Kansas City, MO, USA,
November 13-16, 2017, 2017, pp. 764{769. doi:10.1109/BIBM.2017.8217752. URL
https://doi.org/10.1109/BIBM.2017.8217752
Editor's Notes
<number>
Problem description, Motivation, Proposal, Experiments, Conclusion, Criticism
<number>
A multiple choice selection panel
<number>
A multiple choice selection panel
<number>
A multiple choice selection panel
<number>
A multiple choice selection panel
<number>
A multiple choice selection panel
<number>
No link for their application
<number>