SlideShare a Scribd company logo
Paper Presentation
Word Representations
in Vector space
Abdullah Khan Zehady
Department of Computer Science,
Purdue University.
E-mail: azehady@purdue.edu
Word Representation
Neural Word Embedding
● Continuous vector space representation
o Words represented as dense real-valued vectors in Rd
● Distributed word representation ↔ Word Embedding
o Embed an entire vocabulary into a relatively low-dimensional linear
space where dimensions are latent continuous features.
● Classical n-gram model works in terms of discrete units
o No inherent relationship in n-gram.
● In contrast, word embeddings capture regularities and relationships
between words.
Syntactic & Semantic Relationship
Regularities are observed as the constant offset vector between
pair of words sharing some relationship.
Gender Relation
KING-QUEEN ~ MAN - WOMAN
Singular/Plural Relation
KING-KINGS ~ QUEEN - QUEENS
Other Relations:
● Language
France - French
~
Spain - Spanish
● Past Tense
Go – Went
~
Capture - Captured
Vector Space Model
Language 1: English
Language 2: Estonian
Neural Net
Hidden
Layer
Input
Layer
Output
Layer
Language Model(LM)
● Different models for estimating continuous representations of
words.
○ Latent Semantic Analysis (LSA)
○ Latent Dirichlet Allocation (LDA)
○ Neural network Language Model(NNLM)
Feed Forward NNLM
● Consists of input, projection, hidden and output layers.
● N previous words are encoded using 1-of-V coding, where V is size of the
vocabulary. Ex: A = (1,0,...,0), B = (0,1,...,0), … , Z = (0,0,...,1) in R26
● NNLM becomes computationally complex between projection(P) and
hidden(H) layer
○ For N=10, size of P = 500-2000, size of H = 500-1000
○ Hidden layer is used to compute prob. dist. over all the words in
vocabulary V
● Hierarchical softmax as the rescue.
Recurrent NNLM
● No projection Layer, consists of input, hidden and output layers only.
● No need to specify the context length like feed forward NNLM
● What is special in RNN model?
○ Recurrent matrix that connects layer to itself.
○ Allows to form short-term memory
■ Information from the past is represen-
ted by the hidden layer
● RNN-embedded vector achieved state of the
art results in relational similarity identification task.
RNN Model
Recurrent NNLM
w(t): Input word at time t
y(t): Output layer produces a prob. Dist.
over words.
s(t): Hidden layer
U: Each column represents a word
● Four-gram neural net language model architecture(Bengio 2001)
● RNN is trained with SGD and backpropagation to maximize the
● log likelihood.
Bringing efficiency..
● Computational complexity of the NNLMs are high.
● We can remove the hidden layer and speed up 1000x
○ Continuous bag-of-words model
○ Continuous skip-gram model
● The full softmax can be replaced by:
○ Hierarchical softmax (Morin and Bengio)
○ Hinge loss (Collobert and Weston)
○ Noise contrastive estimation (Mnih et al.)
Continuous Bag of Word Model(CBOW)
● Non-linear hidden layer is removed
● Projection layer is shared for all words(not
just the projection matrix).
● All words get projected into the same
position(vectors are averaged).
● Naming Reson: Order of words in the
history does not influence the projection.
● Best performance obtained by a log-
linear classifier with four future and
four history words at the input
Predicts the current word based on
the context.
Continuous Skip-gram Model
● Objective: Tries to maximize
classification of a word based on another
word in the same sentence. Maximize the
average log probability
● Define p(wt+j |wt ) using the softmax
function:
Predicts surrounding word given
the current word.
Bringing efficiency..
● Computational complexity of the NNLMs are high.
● We can remove the hidden layer and speed up 1000x
○ Continuous bag-of-words model
○ Continuous skip-gram model
● The full softmax can be replaced by:
○ Hierarchical softmax (Morin and Bengio)
○ Hinge loss (Collobert and Weston)
○ Noise contrastive estimation (Mnih et al.)
Hierarchical Softmax for efficient computation
● This formulation is impractical because the cost of computing ∇logp(wO|wI)
is proportional to W, which is often large (105–107 terms).
● With hierarchical softmax, the cost is reduced
Hierarchical Softmax
● Uses a binary tree (Huffman code) representation of the output layer with the W
words as its leaves.
o A random walk that assigns probabilities to words.
● Instead of evaluating W output nodes, evaluate log(W) nodes to calculate prob. dist.
● Each word w can be reached by an appropriate path from the root of the tree● n(w, j): j-th node on the path from the root to w
● L(w): The length of this path
● n(w, 1) = root and n(w, L(w)) = w
● ch(n): An arbitrary fixed child of an inner node n
● [x] = 1 if x is true and [x] = -1 otherwise
Negative Sampling
● Noise Contrastive Estimation (NCE)
o A good model should be able to differentiate data from noise by means of
logistic regression.
o Alternative to the hierarchical softmax.
o Introduced by Gutmann and Hyvarinen and applied to language modeling by
Mnih and Teh.
● NCE approximates the log probability of the softmax
● Define Negative Sampling by the objective which replaces log P(w0|wI) in the skip-
gram.
● Task: Distinguish the target word wO from draws from the noise distribution
Subsampling of Frequent words
● Most frequent words provide less information than rare words.
o Co-occurrences of “France” and “Paris” is informative
o Co-occurrences of “France” and “the” is less informative
● A simple subsampling approach to counter imbalance
o Each word wi in the training set is discarded with probability
where f(wi) is the frequency of word wi and t is a chosen threshold,
typically around 10−5
● Aggressive subsampling of words whose frequency is greater than
t while preserving the ranking of the frequencies.
Empirical Results
Automatic learning by skip-gram model
● No supervised information
about what a capital city
means.
● But the model is still
capable of
o Automatic
organization of
concepts
o Learning implicit
relationship
PCA projection of 100- dimensional skip-gram vectors
Analogical Reasoning Performance
● Analogical Reasoning task introduced by Mikolov
o Syntactic analogies: “quick” : “quickly” :: “slow” : ? “slowly”
o Semantic analogies: “Germany” : “Berlin” :: “France” : ? “Paris”
Learning Phrases
● To learn phrase vectors
o First find words that appear frequently together, and infrequently in
other contexts.
o Replace with unique tokens. Ex: “New York Times” ->
New_York_Times
● Phrases are formed based on the unigram and bigram counts, using
δ(discounting coefficient) prevents too many phrases consisting of very
Learning Phrases
Goal: Compute the fourth phrase using the first three.
(Best model accuracy: 72%)
Phrase Skip-gram Results
● Accuracies of the Skip-gram models on the phrase analogy dataset
o Using different hyperparameters
o Models trained on approximately one billion words from the news
dataset
● Size of the training data matters.
o HS-Huffman( dimensionality=1000) trained on 33 billion words
reaches an accuracy of 72%
Additive compositionality
● Possible to meaningfully combine words by an element-wise addition of their
vector representations.
○ Word vectors represents the distribution of the context in which it appears.
● Vector values related logarithmically to the probabilities computed by output layer.
○ The sum of two word vectors is related to the product of the two context
distributions
Closest Entities
Closest entity search using two methods-Negative sampling and Hierarchical Softmax.
Compare with published word representations
Comments
● Reduction of computational complexity is impressive.
● Works with unsupervised/unlabelled data
● Vector representation can be extended to large pieces of text
Paragraph Vector (Mikolov et al. 2013)
● Applicable to a lot of NLP tasks
o Tagging
o Named Entity Recognition
o Translation
o Paraphrasing
Thank you.

More Related Content

What's hot

A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
fmguler
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
Alia Hamwi
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
Christian Perone
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
Vsevolod Dyomkin
 
BERT
BERTBERT
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
Seoung-Ho Choi
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
Khang Pham
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
Language Model.pptx
Language Model.pptxLanguage Model.pptx
Language Model.pptx
Firas Obeid
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
Artifacia
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
健程 杨
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
Fabrizio Sebastiani
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
Word2Vec
Word2VecWord2Vec

What's hot (20)

A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
BERT
BERTBERT
BERT
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Word embedding
Word embedding Word embedding
Word embedding
 
Language Model.pptx
Language Model.pptxLanguage Model.pptx
Language Model.pptx
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 

Similar to Word representations in vector space

Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documentsAbdullah Khan Zehady
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
Jisoo Jang
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
ANISH BHANUSHALI
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processing
nxmaosdh232
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
KALPANATCSE
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
MeetupDataScienceRoma
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
Yue Xiangnan
 
Reduction Monads and Their Signatures
Reduction Monads and Their SignaturesReduction Monads and Their Signatures
Reduction Monads and Their Signatures
Marco Maggesi
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
L. Thorne McCarty
 
Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous space
Shuhei Iitsuka
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
Rama Irsheidat
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
JOBANPREETSINGH62
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
Estelle Delpech
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
Ganesh Borle
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
Katerina Vylomova
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
Soojung Hong
 
MultiSeg
MultiSegMultiSeg
MultiSeg
Efsun Kayi
 

Similar to Word representations in vector space (20)

Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documents
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
 
Esa act
Esa actEsa act
Esa act
 
Language models
Language modelsLanguage models
Language models
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processing
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
 
Reduction Monads and Their Signatures
Reduction Monads and Their SignaturesReduction Monads and Their Signatures
Reduction Monads and Their Signatures
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
 
Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous space
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
MultiSeg
MultiSegMultiSeg
MultiSeg
 

More from Abdullah Khan Zehady

Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Abdullah Khan Zehady
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Abdullah Khan Zehady
 
Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the world
Abdullah Khan Zehady
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural network
Abdullah Khan Zehady
 
Tribeflow on bitcoin data
Tribeflow on bitcoin dataTribeflow on bitcoin data
Tribeflow on bitcoin data
Abdullah Khan Zehady
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?
Abdullah Khan Zehady
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysis
Abdullah Khan Zehady
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super reads
Abdullah Khan Zehady
 
Bitcoin Multisig Transaction
Bitcoin Multisig TransactionBitcoin Multisig Transaction
Bitcoin Multisig Transaction
Abdullah Khan Zehady
 
Bitcoin ideas
Bitcoin ideasBitcoin ideas
Bitcoin ideas
Abdullah Khan Zehady
 
Bitcoin investments
Bitcoin investmentsBitcoin investments
Bitcoin investments
Abdullah Khan Zehady
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysis
Abdullah Khan Zehady
 
Rich gets richer-Bitcoin Network
Rich gets richer-Bitcoin NetworkRich gets richer-Bitcoin Network
Rich gets richer-Bitcoin Network
Abdullah Khan Zehady
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin Club
Abdullah Khan Zehady
 
Bitcoin Network Analysis
Bitcoin Network AnalysisBitcoin Network Analysis
Bitcoin Network Analysis
Abdullah Khan Zehady
 

More from Abdullah Khan Zehady (17)

Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
 
Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the world
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural network
 
Tribeflow on bitcoin data
Tribeflow on bitcoin dataTribeflow on bitcoin data
Tribeflow on bitcoin data
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysis
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super reads
 
Bitcoin Multisig Transaction
Bitcoin Multisig TransactionBitcoin Multisig Transaction
Bitcoin Multisig Transaction
 
Bitcoin ideas
Bitcoin ideasBitcoin ideas
Bitcoin ideas
 
Bitcoin investments
Bitcoin investmentsBitcoin investments
Bitcoin investments
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysis
 
Rich gets richer-Bitcoin Network
Rich gets richer-Bitcoin NetworkRich gets richer-Bitcoin Network
Rich gets richer-Bitcoin Network
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin Club
 
Bitcoin Network Analysis
Bitcoin Network AnalysisBitcoin Network Analysis
Bitcoin Network Analysis
 
Bitcoin & Bitcoin Mining
Bitcoin & Bitcoin MiningBitcoin & Bitcoin Mining
Bitcoin & Bitcoin Mining
 
The true measure of success
The true measure of successThe true measure of success
The true measure of success
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 

Word representations in vector space

  • 1. Paper Presentation Word Representations in Vector space Abdullah Khan Zehady Department of Computer Science, Purdue University. E-mail: azehady@purdue.edu
  • 2.
  • 4. Neural Word Embedding ● Continuous vector space representation o Words represented as dense real-valued vectors in Rd ● Distributed word representation ↔ Word Embedding o Embed an entire vocabulary into a relatively low-dimensional linear space where dimensions are latent continuous features. ● Classical n-gram model works in terms of discrete units o No inherent relationship in n-gram. ● In contrast, word embeddings capture regularities and relationships between words.
  • 5. Syntactic & Semantic Relationship Regularities are observed as the constant offset vector between pair of words sharing some relationship. Gender Relation KING-QUEEN ~ MAN - WOMAN Singular/Plural Relation KING-KINGS ~ QUEEN - QUEENS Other Relations: ● Language France - French ~ Spain - Spanish ● Past Tense Go – Went ~ Capture - Captured
  • 6. Vector Space Model Language 1: English Language 2: Estonian
  • 8. Language Model(LM) ● Different models for estimating continuous representations of words. ○ Latent Semantic Analysis (LSA) ○ Latent Dirichlet Allocation (LDA) ○ Neural network Language Model(NNLM)
  • 9. Feed Forward NNLM ● Consists of input, projection, hidden and output layers. ● N previous words are encoded using 1-of-V coding, where V is size of the vocabulary. Ex: A = (1,0,...,0), B = (0,1,...,0), … , Z = (0,0,...,1) in R26 ● NNLM becomes computationally complex between projection(P) and hidden(H) layer ○ For N=10, size of P = 500-2000, size of H = 500-1000 ○ Hidden layer is used to compute prob. dist. over all the words in vocabulary V ● Hierarchical softmax as the rescue.
  • 10. Recurrent NNLM ● No projection Layer, consists of input, hidden and output layers only. ● No need to specify the context length like feed forward NNLM ● What is special in RNN model? ○ Recurrent matrix that connects layer to itself. ○ Allows to form short-term memory ■ Information from the past is represen- ted by the hidden layer ● RNN-embedded vector achieved state of the art results in relational similarity identification task. RNN Model
  • 11. Recurrent NNLM w(t): Input word at time t y(t): Output layer produces a prob. Dist. over words. s(t): Hidden layer U: Each column represents a word ● Four-gram neural net language model architecture(Bengio 2001) ● RNN is trained with SGD and backpropagation to maximize the ● log likelihood.
  • 12. Bringing efficiency.. ● Computational complexity of the NNLMs are high. ● We can remove the hidden layer and speed up 1000x ○ Continuous bag-of-words model ○ Continuous skip-gram model ● The full softmax can be replaced by: ○ Hierarchical softmax (Morin and Bengio) ○ Hinge loss (Collobert and Weston) ○ Noise contrastive estimation (Mnih et al.)
  • 13. Continuous Bag of Word Model(CBOW) ● Non-linear hidden layer is removed ● Projection layer is shared for all words(not just the projection matrix). ● All words get projected into the same position(vectors are averaged). ● Naming Reson: Order of words in the history does not influence the projection. ● Best performance obtained by a log- linear classifier with four future and four history words at the input Predicts the current word based on the context.
  • 14. Continuous Skip-gram Model ● Objective: Tries to maximize classification of a word based on another word in the same sentence. Maximize the average log probability ● Define p(wt+j |wt ) using the softmax function: Predicts surrounding word given the current word.
  • 15. Bringing efficiency.. ● Computational complexity of the NNLMs are high. ● We can remove the hidden layer and speed up 1000x ○ Continuous bag-of-words model ○ Continuous skip-gram model ● The full softmax can be replaced by: ○ Hierarchical softmax (Morin and Bengio) ○ Hinge loss (Collobert and Weston) ○ Noise contrastive estimation (Mnih et al.)
  • 16. Hierarchical Softmax for efficient computation ● This formulation is impractical because the cost of computing ∇logp(wO|wI) is proportional to W, which is often large (105–107 terms). ● With hierarchical softmax, the cost is reduced
  • 17. Hierarchical Softmax ● Uses a binary tree (Huffman code) representation of the output layer with the W words as its leaves. o A random walk that assigns probabilities to words. ● Instead of evaluating W output nodes, evaluate log(W) nodes to calculate prob. dist. ● Each word w can be reached by an appropriate path from the root of the tree● n(w, j): j-th node on the path from the root to w ● L(w): The length of this path ● n(w, 1) = root and n(w, L(w)) = w ● ch(n): An arbitrary fixed child of an inner node n ● [x] = 1 if x is true and [x] = -1 otherwise
  • 18. Negative Sampling ● Noise Contrastive Estimation (NCE) o A good model should be able to differentiate data from noise by means of logistic regression. o Alternative to the hierarchical softmax. o Introduced by Gutmann and Hyvarinen and applied to language modeling by Mnih and Teh. ● NCE approximates the log probability of the softmax ● Define Negative Sampling by the objective which replaces log P(w0|wI) in the skip- gram. ● Task: Distinguish the target word wO from draws from the noise distribution
  • 19. Subsampling of Frequent words ● Most frequent words provide less information than rare words. o Co-occurrences of “France” and “Paris” is informative o Co-occurrences of “France” and “the” is less informative ● A simple subsampling approach to counter imbalance o Each word wi in the training set is discarded with probability where f(wi) is the frequency of word wi and t is a chosen threshold, typically around 10−5 ● Aggressive subsampling of words whose frequency is greater than t while preserving the ranking of the frequencies.
  • 21. Automatic learning by skip-gram model ● No supervised information about what a capital city means. ● But the model is still capable of o Automatic organization of concepts o Learning implicit relationship PCA projection of 100- dimensional skip-gram vectors
  • 22. Analogical Reasoning Performance ● Analogical Reasoning task introduced by Mikolov o Syntactic analogies: “quick” : “quickly” :: “slow” : ? “slowly” o Semantic analogies: “Germany” : “Berlin” :: “France” : ? “Paris”
  • 23. Learning Phrases ● To learn phrase vectors o First find words that appear frequently together, and infrequently in other contexts. o Replace with unique tokens. Ex: “New York Times” -> New_York_Times ● Phrases are formed based on the unigram and bigram counts, using δ(discounting coefficient) prevents too many phrases consisting of very
  • 24. Learning Phrases Goal: Compute the fourth phrase using the first three. (Best model accuracy: 72%)
  • 25. Phrase Skip-gram Results ● Accuracies of the Skip-gram models on the phrase analogy dataset o Using different hyperparameters o Models trained on approximately one billion words from the news dataset ● Size of the training data matters. o HS-Huffman( dimensionality=1000) trained on 33 billion words reaches an accuracy of 72%
  • 26. Additive compositionality ● Possible to meaningfully combine words by an element-wise addition of their vector representations. ○ Word vectors represents the distribution of the context in which it appears. ● Vector values related logarithmically to the probabilities computed by output layer. ○ The sum of two word vectors is related to the product of the two context distributions
  • 27. Closest Entities Closest entity search using two methods-Negative sampling and Hierarchical Softmax.
  • 28. Compare with published word representations
  • 29. Comments ● Reduction of computational complexity is impressive. ● Works with unsupervised/unlabelled data ● Vector representation can be extended to large pieces of text Paragraph Vector (Mikolov et al. 2013) ● Applicable to a lot of NLP tasks o Tagging o Named Entity Recognition o Translation o Paraphrasing

Editor's Notes

  1. words are represented as dense real-valued vectors in Rd
  2. words are represented as dense real-valued vectors in Rd
  3. he basic Skip-gram formulation defines p(wt+j |wt ) using the softmax function
  4. his formulation is impractical because the cost of computing ∇logp(wO|wI)isproportionaltoW,whichisoftenlarge(105–107 terms).
  5. each word w can be reached by an appropriate path from the root of the tree
  6. Neg-k : Negative sampling with k negative samples
  7. the vectors can be seen as representing the distribution of the context in which a word appears. These values are related logarithmically to the probabilities computed by the output layer, so the sum of two word vectors is related to the product of the two context distributions. The product works here as the AND function: words that are assigned high probabilities by both word vectors will have high probability, and the other words will have low probability. Thus, if “Volga River” appears frequently in the same sentence together with the words “Russian” and “river”, the sum of these two word vectors will result in such a feature vector that is close to the vector of “Volga River”.