Distributed representation of sentences and documents

Distributed Representations of Words and
Phrases and their Compositionality
Abdullah Khan Zehady

Neural Word Embedding
● Continuous vector space representation
o Words represented as dense real-valued vectors in Rd
● Distributed word representation ↔ Word Embedding
o Embed an entire vocabulary into a relatively low-dimensional linear
space where dimensions are latent continuous features.
● Classical n-gram model works in terms of discrete units
o No inherent relationship in n-gram.
● In contrast, word embeddings capture regularities and relationships
between words.

Syntactic & Semantic Relationship
Regularities are observed as the constant offset vector between
pair of words sharing some relationship.
Gender Relation
KING-QUEEN ~ MAN - WOMAN
Singular/Plural Relation
KING-KINGS ~ QUEEN - QUEENS
Other Relations:

Language
France - French
~
Spain - Spanish

Past Tense
Go – Went
~
Capture - Captured

Language Model(LM)

Different models for estimating continuous representations of words.

Latent Semantic Analysis (LSA)

Latent Dirichlet Allocation (LDA)

Neural network Language Model(NNLM)

Feed Forward NNLM

Consists of input, projection, hidden and output layers.

N previous words are encoded using 1-of-V coding, where V is size of the
vocabulary. Ex: A = (1,0,...,0), B = (0,1,...,0), … , Z = (0,0,...,1) in R26

NNLM becomes computationally complex between projection(P) and
hidden(H) layer

For N=10, size of P = 500-2000, size of H = 500-1000

Hidden layer is used to compute prob. dist. over all the words in
vocabulary V

Hierarchical softmax as the rescue.

Recurrent NNLM

No projection Layer, consists of input, hidden and output layers only.

No need to specify the context length like feed forward NNLM

What is special in RNN model?

Recurrent matrix that connects
layer to itself

Recurrent NNLM
w(t): Input word at time t
y(t): Output layer produces a prob. Dist.
over words.
s(t): Hidden layer
U: Each column represents a word
RNN is trained with backpropagation
to maximize the log likelihood.

Compare with published
word representations

Distributed representation of sentences and documents

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Distributed representation of sentences and documents

Similar to Distributed representation of sentences and documents (20)

More from Abdullah Khan Zehady

More from Abdullah Khan Zehady (17)

Distributed representation of sentences and documents

Editor's Notes