SlideShare a Scribd company logo
Hyunyoung Lee
Seminar for NLP labs
Word2Vec
Agenda
1. Word Embedding
- Vectorization of Image and Text
Word2Vec
2. Word2Vec
- One-hot vector and Co-occurrence matrix for word vector
3. Fundamental
- Basic component of word embedding in a neural net
4. Word Vector in a neural net
5. Word2Vec, CBOW and skip-gram
- Comparing Image processing with Word Vector about vector presentation.
6. Glove
- Image Vector representation
1. Word Embedding Word2Vec
- RGB Values of every pixel like Height * Width * RGB as a value in a row
So it is easy to make the image a vector in some space, i.e. RGB space.
- What is the Word Embedding ?
1. Word Embedding Word2Vec
- In NLP tasks, Before a neural net,
Word vector is represented by Word frequency like TF-IDF and so on.
In a neural net, There are multiple tries for word vector representation :
- Language modeling and Word embedding modeling
One-hot representation
Dim = |V| (v is the size of vocabulary)
- motel
- hotel
If you search for [Seattle motel] key word, we want the search engine to match web page containing
“Seattle hotel”
Similarity(motal, hotel) = 0
motel
hotel = 0
If we do inner product with the above vectors, we can not find out similarity between words
2. Word2Vec Word2Vec
T
Co-occurrence matrix
Let’s see window based co-occurrence matrix
- Example Corpus :
- I like deep learning.
- I like NLP.
- I enjoy flying.
Total vocabulary size(|V|) = 8
Vector(“I”) = [0, 2, 1, 0, 0, 0, 0, 0]
Vector(“like”) = [2, 0, 0, 1, 0, 1, 0 , 0] …
2. Word2Vec Word2Vec
Co-occurrence with SVD
With SVD(Singular Value Decomposition)
- this calculation is so expensive and not efficient. For example, for M * N matrix is O(mn )
- SVD based methods don’t scale well for big matrices, and it is too hard to incorporate new words
or documents
2. Word2Vec Word2Vec
2
3. Fundamental Word2Vec
output layer’s values is regarded as :
- score
- probability
Backpropagation makes those value maximum or
minimum
Feedforward Neural Network(Basic Neural Network)
- Embedding Layer(Inner product)
3. Fundamental Word2Vec
- Intermediate Layer(s)
- Softmax Layer
- One or more layer that produce an intermediate representation of the input
For Example, Hidden layer with tanh, sigmoid activation function or RNN(LSTM, GRU) which is
state-of-the-art neural language models.
- The final layer to compute the probability distribution over words in total vocabulary.
Language model and Word embedding model with a neural net
- The main purpose of language model is to compute the probability of a sentence or sequence of
words and the probability of an upcoming word
The probability of a sequence of m words {W1, … , Wm} is denoted as P(W1, … , Wm)
P(W1, … , Wm) is conditioned on a widow of n previous words : P(Wt | Wt-1 , … , Wt-n+1)
i.e. The probability of a sentence or sequence of words :
The probability of an upcoming word :
- So, a model that computes either of those probability above is called a language model(LM)
- The Chain Rule applied to compute joint probability of words in sentence
Markov Assumption :
for example, P(“it water is so transparent”) =
P(its) * P(water | its) * P(is | its water) * P(so | its water is) * P(transparent | its water is so)
By Markov Assumption, the probability of the above sentence :
OR
4. Word Vector in a neural net Word2Vec
Language model and Word embedding model with a neural net
- How to estimate these probability
In N-gram based language model -
For example, bigram -
trigram -
4. Word Vector in a neural net Word2Vec
Language model and Word embedding model with a neural net
- The first deep neural network architecture model For NLP presented by Bengio et al(2003) to predict
P(Wt | Wt-1 , … , Wt-n+1)
- This model is prototype which we now refer to as a word embedding.
There is some issue :
- softmax layer
- computing power
4. Word Vector in a neural net Word2Vec
Classic neural language model (Bengio et al. 2003)
Language model and Word embedding model with a neural net
- A little more model than Begino et al is C&W model(2011)
There is some variation :
- changing cost function like the above
4. Word Vector in a neural net Word2Vec
The C&W model without ranking objective(collobert et al. 2011)
Language model and Word embedding model with a neural net
- Another way to make word2vec in a neural net
- In NLP, transfer learning is word2vec, BUT Sometimes
we could make word2vec on the specific task using a neural net
4. Word Vector in a neural net Word2Vec
Distributional similarity based representations
A lot of value by presenting a word by means of its neighbors
One of the most successful ideas of modern statistical NLP
5. Word2Vec, CBOW, skip-gram Word2Vec
Banking
Google’s Word2Vec – CBOW, skip-gram
Goal : simple (shallow) neural network model
Learning from billion words scale corpus
Predict middle word from neighbors with
A fixed size context window
1. Skip-gram
2. CBOW(continuous bag-of-words)
5. Word2Vec, CBOW, skip-gram Word2Vec
Skip-gram
Method : Predict neighbor Wt+j given word Wt
Maximizes following average log probability
5. Word2Vec, CBOW, skip-gram Word2Vec
Skip-gram(Mikolov et al. 2013)
CBOW
Method : Predict word given bag-of-neighbors
Loss function =
5. Word2Vec, CBOW, skip-gram Word2Vec
CBOW(Mikolov et al. 2013)
Skip-gram & CBOW
WV*N (WIN)and W’N*V (WOUT) is embedding layer.
N of these embedding layer is word2vec’s dimension
5. Word2Vec, CBOW, skip-gram Word2Vec
Let’s see an example of skip-gram
5. Word2Vec, CBOW, skip-gram Word2Vec
Word Analogies with Word2Vec
[king] – [man] + [woman] ≈ [queen]
5. Word2Vec, CBOW, skip-gram Word2Vec
Word Analogies with Word2Vec
[king] – [man] + [woman] ≈ [queen]
5. Word2Vec, CBOW, skip-gram Word2Vec
Global statistics of co-occurrence probability
6. Glove Word2Vec
Global statistics of co-occurrence probability
6. Glove Word2Vec
Glove visualization Company – CEO Superlatives
Word2Vec vs Glove
6. Glove Word2Vec
Stanford lecture(Online)
CS224n : Natural Language Processing with Deep Learning
- lecture note1 : http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes1.pdf
- lecture note2 : http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes2.pdf
- lecture note5 : http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes5.pdf
- lecture slide2 :http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture2.pdf
CS223n : Convolutional Natural Networks For Visual Recognition
- lecture note : Neural networks Part 1: Setting up the Architecture http://cs231n.github.io/neural-networks-1/
- lecture note : Linear classification : Support Vector Machine, Softmax http://cs231n.github.io/linear-classify/
Sebastian Ruder blog : http://ruder.io/word-embeddings-1/index.html#fn:2
Colah’s blog : http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
Neural Text Embedding for information Retrieval (WSDM 2017) by MicroSoft
Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Proceedings
of the International Conference on Learning Representations (ICLR 2013), 1–12
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionali
ty. NIPS, 1–9.
Reference Word2Vec
Private Blog
ACM International Conference on Web Search and Data mining
Paper

More Related Content

What's hot

Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
irpycon
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Bhaskar Mitra
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Devashish Shanker
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
ankit_ppt
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Ajay Taneja
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
Hady Elsahar
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
JEE HYUN PARK
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 
BERT
BERTBERT
Bert
BertBert
Attention
AttentionAttention
Attention
SEMINARGROOT
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Arvind Devaraj
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
NameetDaga1
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Abdullah Khan Zehady
 

What's hot (20)

Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
BERT
BERTBERT
BERT
 
Bert
BertBert
Bert
 
Attention
AttentionAttention
Attention
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 

Similar to Word2Vec

ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
iwan_rg
 
Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017
Brian Ho
 
Context-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vecContext-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vec
JIN KYU CHANG
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Shruti kar
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Sease
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
ijsc
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Tae Hwan Jung
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
JaeHo Jang
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
GowrySailaja
 
presentation2-180202073525.pptx
presentation2-180202073525.pptxpresentation2-180202073525.pptx
presentation2-180202073525.pptx
KtonNguyn2
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
Vincenzo Lomonaco
 
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding ApproachText Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding ApproachAhmed Hani Ibrahim
 
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notesLda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
👋 Christopher Moody
 
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
Hiroki Shimanaka
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Yuki Tomo
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
KtonNguyn2
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET Journal
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
ijdms
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
IRJET Journal
 

Similar to Word2Vec (20)

ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017
 
Context-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vecContext-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vec
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
 
presentation2-180202073525.pptx
presentation2-180202073525.pptxpresentation2-180202073525.pptx
presentation2-180202073525.pptx
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
 
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding ApproachText Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
 
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notesLda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
 
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 

More from hyunyoung Lee

(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
hyunyoung Lee
 
(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining
hyunyoung Lee
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
hyunyoung Lee
 
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
hyunyoung Lee
 
(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors
hyunyoung Lee
 
(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons
hyunyoung Lee
 
(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...
hyunyoung Lee
 
Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...
hyunyoung Lee
 
Language grounding and never-ending language learning
Language grounding and never-ending language learningLanguage grounding and never-ending language learning
Language grounding and never-ending language learning
hyunyoung Lee
 
Spam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural networkSpam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural network
hyunyoung Lee
 
Word embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filteringWord embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filtering
hyunyoung Lee
 
Memory Networks
Memory NetworksMemory Networks
Memory Networks
hyunyoung Lee
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
hyunyoung Lee
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflow
hyunyoung Lee
 
Natural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usageNatural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usage
hyunyoung Lee
 
large-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identificationlarge-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identification
hyunyoung Lee
 
NLTK practice with nltk book
NLTK practice with nltk bookNLTK practice with nltk book
NLTK practice with nltk book
hyunyoung Lee
 
SVM light and SVM Multiclass Practice
SVM light and SVM Multiclass PracticeSVM light and SVM Multiclass Practice
SVM light and SVM Multiclass Practice
hyunyoung Lee
 
Distribution system presentation of chapter 4(distributed systems concepts ...
Distribution system presentation of chapter 4(distributed systems   concepts ...Distribution system presentation of chapter 4(distributed systems   concepts ...
Distribution system presentation of chapter 4(distributed systems concepts ...
hyunyoung Lee
 
Visual question answering vqa
Visual question answering vqaVisual question answering vqa
Visual question answering vqa
hyunyoung Lee
 

More from hyunyoung Lee (20)

(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
 
(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
 
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
 
(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors
 
(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons
 
(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...
 
Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...
 
Language grounding and never-ending language learning
Language grounding and never-ending language learningLanguage grounding and never-ending language learning
Language grounding and never-ending language learning
 
Spam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural networkSpam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural network
 
Word embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filteringWord embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filtering
 
Memory Networks
Memory NetworksMemory Networks
Memory Networks
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflow
 
Natural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usageNatural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usage
 
large-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identificationlarge-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identification
 
NLTK practice with nltk book
NLTK practice with nltk bookNLTK practice with nltk book
NLTK practice with nltk book
 
SVM light and SVM Multiclass Practice
SVM light and SVM Multiclass PracticeSVM light and SVM Multiclass Practice
SVM light and SVM Multiclass Practice
 
Distribution system presentation of chapter 4(distributed systems concepts ...
Distribution system presentation of chapter 4(distributed systems   concepts ...Distribution system presentation of chapter 4(distributed systems   concepts ...
Distribution system presentation of chapter 4(distributed systems concepts ...
 
Visual question answering vqa
Visual question answering vqaVisual question answering vqa
Visual question answering vqa
 

Recently uploaded

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 

Recently uploaded (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 

Word2Vec

  • 1. Hyunyoung Lee Seminar for NLP labs Word2Vec
  • 2. Agenda 1. Word Embedding - Vectorization of Image and Text Word2Vec 2. Word2Vec - One-hot vector and Co-occurrence matrix for word vector 3. Fundamental - Basic component of word embedding in a neural net 4. Word Vector in a neural net 5. Word2Vec, CBOW and skip-gram - Comparing Image processing with Word Vector about vector presentation. 6. Glove
  • 3. - Image Vector representation 1. Word Embedding Word2Vec - RGB Values of every pixel like Height * Width * RGB as a value in a row So it is easy to make the image a vector in some space, i.e. RGB space.
  • 4. - What is the Word Embedding ? 1. Word Embedding Word2Vec - In NLP tasks, Before a neural net, Word vector is represented by Word frequency like TF-IDF and so on. In a neural net, There are multiple tries for word vector representation : - Language modeling and Word embedding modeling
  • 5. One-hot representation Dim = |V| (v is the size of vocabulary) - motel - hotel If you search for [Seattle motel] key word, we want the search engine to match web page containing “Seattle hotel” Similarity(motal, hotel) = 0 motel hotel = 0 If we do inner product with the above vectors, we can not find out similarity between words 2. Word2Vec Word2Vec T
  • 6. Co-occurrence matrix Let’s see window based co-occurrence matrix - Example Corpus : - I like deep learning. - I like NLP. - I enjoy flying. Total vocabulary size(|V|) = 8 Vector(“I”) = [0, 2, 1, 0, 0, 0, 0, 0] Vector(“like”) = [2, 0, 0, 1, 0, 1, 0 , 0] … 2. Word2Vec Word2Vec
  • 7. Co-occurrence with SVD With SVD(Singular Value Decomposition) - this calculation is so expensive and not efficient. For example, for M * N matrix is O(mn ) - SVD based methods don’t scale well for big matrices, and it is too hard to incorporate new words or documents 2. Word2Vec Word2Vec 2
  • 8. 3. Fundamental Word2Vec output layer’s values is regarded as : - score - probability Backpropagation makes those value maximum or minimum Feedforward Neural Network(Basic Neural Network)
  • 9. - Embedding Layer(Inner product) 3. Fundamental Word2Vec - Intermediate Layer(s) - Softmax Layer - One or more layer that produce an intermediate representation of the input For Example, Hidden layer with tanh, sigmoid activation function or RNN(LSTM, GRU) which is state-of-the-art neural language models. - The final layer to compute the probability distribution over words in total vocabulary.
  • 10. Language model and Word embedding model with a neural net - The main purpose of language model is to compute the probability of a sentence or sequence of words and the probability of an upcoming word The probability of a sequence of m words {W1, … , Wm} is denoted as P(W1, … , Wm) P(W1, … , Wm) is conditioned on a widow of n previous words : P(Wt | Wt-1 , … , Wt-n+1) i.e. The probability of a sentence or sequence of words : The probability of an upcoming word : - So, a model that computes either of those probability above is called a language model(LM) - The Chain Rule applied to compute joint probability of words in sentence Markov Assumption : for example, P(“it water is so transparent”) = P(its) * P(water | its) * P(is | its water) * P(so | its water is) * P(transparent | its water is so) By Markov Assumption, the probability of the above sentence : OR 4. Word Vector in a neural net Word2Vec
  • 11. Language model and Word embedding model with a neural net - How to estimate these probability In N-gram based language model - For example, bigram - trigram - 4. Word Vector in a neural net Word2Vec
  • 12. Language model and Word embedding model with a neural net - The first deep neural network architecture model For NLP presented by Bengio et al(2003) to predict P(Wt | Wt-1 , … , Wt-n+1) - This model is prototype which we now refer to as a word embedding. There is some issue : - softmax layer - computing power 4. Word Vector in a neural net Word2Vec Classic neural language model (Bengio et al. 2003)
  • 13. Language model and Word embedding model with a neural net - A little more model than Begino et al is C&W model(2011) There is some variation : - changing cost function like the above 4. Word Vector in a neural net Word2Vec The C&W model without ranking objective(collobert et al. 2011)
  • 14. Language model and Word embedding model with a neural net - Another way to make word2vec in a neural net - In NLP, transfer learning is word2vec, BUT Sometimes we could make word2vec on the specific task using a neural net 4. Word Vector in a neural net Word2Vec
  • 15. Distributional similarity based representations A lot of value by presenting a word by means of its neighbors One of the most successful ideas of modern statistical NLP 5. Word2Vec, CBOW, skip-gram Word2Vec Banking
  • 16. Google’s Word2Vec – CBOW, skip-gram Goal : simple (shallow) neural network model Learning from billion words scale corpus Predict middle word from neighbors with A fixed size context window 1. Skip-gram 2. CBOW(continuous bag-of-words) 5. Word2Vec, CBOW, skip-gram Word2Vec
  • 17. Skip-gram Method : Predict neighbor Wt+j given word Wt Maximizes following average log probability 5. Word2Vec, CBOW, skip-gram Word2Vec Skip-gram(Mikolov et al. 2013)
  • 18. CBOW Method : Predict word given bag-of-neighbors Loss function = 5. Word2Vec, CBOW, skip-gram Word2Vec CBOW(Mikolov et al. 2013)
  • 19. Skip-gram & CBOW WV*N (WIN)and W’N*V (WOUT) is embedding layer. N of these embedding layer is word2vec’s dimension 5. Word2Vec, CBOW, skip-gram Word2Vec
  • 20. Let’s see an example of skip-gram 5. Word2Vec, CBOW, skip-gram Word2Vec
  • 21. Word Analogies with Word2Vec [king] – [man] + [woman] ≈ [queen] 5. Word2Vec, CBOW, skip-gram Word2Vec
  • 22. Word Analogies with Word2Vec [king] – [man] + [woman] ≈ [queen] 5. Word2Vec, CBOW, skip-gram Word2Vec
  • 23. Global statistics of co-occurrence probability 6. Glove Word2Vec
  • 24. Global statistics of co-occurrence probability 6. Glove Word2Vec Glove visualization Company – CEO Superlatives
  • 25. Word2Vec vs Glove 6. Glove Word2Vec
  • 26. Stanford lecture(Online) CS224n : Natural Language Processing with Deep Learning - lecture note1 : http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes1.pdf - lecture note2 : http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes2.pdf - lecture note5 : http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes5.pdf - lecture slide2 :http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture2.pdf CS223n : Convolutional Natural Networks For Visual Recognition - lecture note : Neural networks Part 1: Setting up the Architecture http://cs231n.github.io/neural-networks-1/ - lecture note : Linear classification : Support Vector Machine, Softmax http://cs231n.github.io/linear-classify/ Sebastian Ruder blog : http://ruder.io/word-embeddings-1/index.html#fn:2 Colah’s blog : http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ Neural Text Embedding for information Retrieval (WSDM 2017) by MicroSoft Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013), 1–12 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionali ty. NIPS, 1–9. Reference Word2Vec Private Blog ACM International Conference on Web Search and Data mining Paper