SlideShare a Scribd company logo
1 of 21
Download to read offline
1
Auto-Encoding Dictionary Definitions into Consistent
Word Embeddings
Hanoi 8/2020
Van-Tan Bui
SEMINAR
Outline
1. Learning Word Embeddings using Lexical Dictionaries
2. Auto-Encoding Dictionary Definitions into Consistent
Word Embeddings
3. Definition Auto-encoder with Semantic Injection
4. Our Conducted Work
5. Discussion 2
3
Skip-Gram
Parameterization of the skip-gram model
Context-Independent Word Embeddings
1. Learning Word Embeddings using Lexical Dictionaries
4
car: A road vehicle, typically with four wheels, powered by an
internal combustion engine and able to carry a small number
of people
The definition of a word
xe đạp: phương tiện di chuyển có hai hoặc ba bánh, có tay lái
nối với bánh trước, được chuyển động bằng cách dùng sức
người tác động vào bàn đạp
5
Strong Word Pairs and Weak Word Pairs [Julien Tissier et al.]
In a definition, each word does not have the same semantic relevance.
In the definition of “car”, the words “internal” or “number” are less
relevant than “vehicle”
If the word wa is in the definition of the word wb and wb is in the
definition of wa, they form a strong pair, as well as the K closest
words to wa (resp. wb) form a strong pair with wb (resp. wa).
If the word wa is in the definition of wb but wb is not in the definition of
wa, they form a weak pair.
Some weak pairs can be promoted as strong pairs if the two words
are among the K closest neighbours of each other
1. Learning Word Embeddings using Lexical Dictionaries
Positive sampling
6
Let S(w) be the set of all words forming a strong pair with the word w
and W(w) be the set of all words forming a weak pair with w. For each
target wt from the corpus, we build Vs(wt) a random set of ns words
drawn with replacement from S(wt) and Vw(wt) a random set of nw
words drawn with replacement from W(wt)
1. Learning Word Embeddings using Lexical Dictionaries
7
Negative sampling replaces the softmax with binary classifiers
1. The unigram distribution only takes into account word frequency,
and provides the same noise distribution when selecting negative
examples for different target words.
2. Labeau and Allauzen (2017) already showed that a context-
dependent noise distribution could be a better solution to learn a
language model.
3. Unlike the positive target words, the meaning of negative
examples remain unclear: For a training word, we do not know what
a good noise distribution should be, while we do know what a good
target word is (one of its surrounding words).
1. Learning Word Embeddings using Lexical Dictionaries
Controlled negative sampling
8
Negative sampling consists in considering two random words from the
vocabulary V to be unrelated. For each word wt from the vocabulary, we
generate a set F(wt) of k randomly selected words from the vocabulary
In our experiments, we noticed this method discards around 2% of
generated negative pairs.
1. Learning Word Embeddings using Lexical Dictionaries
Global objective function
9
1. Learning Word Embeddings using Lexical Dictionaries
Fetching online definitions
10
1. Learning Word Embeddings using Lexical Dictionaries
• We extract all unique words with more than 5 occurrences from a full Wikipedia
dump, representing around 2.2M words
• We use the English version of Cambridge, Oxford, Collins and dictionary.com.
For each word, we download the 4 different webpages, and use regex to extract
the definitions from the HTML template specific to each website, making the
process fully accurate.
• Our approach does not focus on polysemy, so we concatenate all definitions for
each word. Then we concatenate results from all dictionaries, remove stop
words and punctuation and lowercase all words
• Among the 2.2M unique words, only 200K does have a definition. We generate
strong and weak pairs from the downloaded definitions according to the rule
described in subsection 3.1 leading to 417K strong pairs (when the parameter K
from 3.1 is set to 5) and 3.9M weak pairs.
11
1. Learning Word Embeddings using Lexical Dictionaries
+ Containing only data fromWikipedia (corpus A)
+ Data from Wikipedia concatenated with the definitions extracted (corpus B).
2. Consistency Penalized Auto Encoder
12
Overview of the CPAE model [Tom Bosc].
13
Auto encoder model
2. Consistency Penalized Auto Encoder
14
Consistency penalty
2. Consistency Penalized Auto Encoder
Three different embeddings:
a) definition embeddings h, produced by the definition encoder, are the
embeddings we are ultimately interested in computing;
b) input embeddings E are used by the encoder as inputs;
c) output embeddings E’ are compared to definition embeddings to yield a
probability distribution over the words in the definition.
A soft weight-tying scheme that brings the input embeddings closer to the
definition embeddings. We call this term a consistency penalty because its goal is
to to ensure that the embeddings used by the encoder (input embeddings) and
the embeddings produced by the encoder (definition embeddings) are consistent
with each other.
Complete objective
15
2. Consistency Penalized Auto Encoder
16
2. Consistency Penalized Auto Encoder
17
3. Definition Auto-encoder with Semantic Injection
18
3. Definition Auto-encoder with Semantic Injection
19
4. Our Conducted Work
I. A Word representation learning Method to
Separating Synonymy with Relatedness
II. A Word Similarity Measure Method
20
5. Discussion
+Hai (hay nhiều) từ là đồng nghĩa nếu chúng xuất hiện trong các ngữ cảnh giống nhau.
+ Ngữ cảnh chính là thông tin mô tả từ.
Thank you for your attention
21

More Related Content

Similar to Seminar

Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
 

Similar to Seminar (20)

Explicit Semantic Analysis
Explicit Semantic AnalysisExplicit Semantic Analysis
Explicit Semantic Analysis
 
Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...
Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...
Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTSLEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
 
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTSLEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
 
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTSLEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Jq3616701679
Jq3616701679Jq3616701679
Jq3616701679
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 
ijcai11
ijcai11ijcai11
ijcai11
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Word embedding
Word embedding Word embedding
Word embedding
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 

Recently uploaded (20)

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 

Seminar

  • 1. 1 Auto-Encoding Dictionary Definitions into Consistent Word Embeddings Hanoi 8/2020 Van-Tan Bui SEMINAR
  • 2. Outline 1. Learning Word Embeddings using Lexical Dictionaries 2. Auto-Encoding Dictionary Definitions into Consistent Word Embeddings 3. Definition Auto-encoder with Semantic Injection 4. Our Conducted Work 5. Discussion 2
  • 3. 3 Skip-Gram Parameterization of the skip-gram model Context-Independent Word Embeddings
  • 4. 1. Learning Word Embeddings using Lexical Dictionaries 4 car: A road vehicle, typically with four wheels, powered by an internal combustion engine and able to carry a small number of people The definition of a word xe đạp: phương tiện di chuyển có hai hoặc ba bánh, có tay lái nối với bánh trước, được chuyển động bằng cách dùng sức người tác động vào bàn đạp
  • 5. 5 Strong Word Pairs and Weak Word Pairs [Julien Tissier et al.] In a definition, each word does not have the same semantic relevance. In the definition of “car”, the words “internal” or “number” are less relevant than “vehicle” If the word wa is in the definition of the word wb and wb is in the definition of wa, they form a strong pair, as well as the K closest words to wa (resp. wb) form a strong pair with wb (resp. wa). If the word wa is in the definition of wb but wb is not in the definition of wa, they form a weak pair. Some weak pairs can be promoted as strong pairs if the two words are among the K closest neighbours of each other 1. Learning Word Embeddings using Lexical Dictionaries
  • 6. Positive sampling 6 Let S(w) be the set of all words forming a strong pair with the word w and W(w) be the set of all words forming a weak pair with w. For each target wt from the corpus, we build Vs(wt) a random set of ns words drawn with replacement from S(wt) and Vw(wt) a random set of nw words drawn with replacement from W(wt) 1. Learning Word Embeddings using Lexical Dictionaries
  • 7. 7 Negative sampling replaces the softmax with binary classifiers 1. The unigram distribution only takes into account word frequency, and provides the same noise distribution when selecting negative examples for different target words. 2. Labeau and Allauzen (2017) already showed that a context- dependent noise distribution could be a better solution to learn a language model. 3. Unlike the positive target words, the meaning of negative examples remain unclear: For a training word, we do not know what a good noise distribution should be, while we do know what a good target word is (one of its surrounding words). 1. Learning Word Embeddings using Lexical Dictionaries
  • 8. Controlled negative sampling 8 Negative sampling consists in considering two random words from the vocabulary V to be unrelated. For each word wt from the vocabulary, we generate a set F(wt) of k randomly selected words from the vocabulary In our experiments, we noticed this method discards around 2% of generated negative pairs. 1. Learning Word Embeddings using Lexical Dictionaries
  • 9. Global objective function 9 1. Learning Word Embeddings using Lexical Dictionaries
  • 10. Fetching online definitions 10 1. Learning Word Embeddings using Lexical Dictionaries • We extract all unique words with more than 5 occurrences from a full Wikipedia dump, representing around 2.2M words • We use the English version of Cambridge, Oxford, Collins and dictionary.com. For each word, we download the 4 different webpages, and use regex to extract the definitions from the HTML template specific to each website, making the process fully accurate. • Our approach does not focus on polysemy, so we concatenate all definitions for each word. Then we concatenate results from all dictionaries, remove stop words and punctuation and lowercase all words • Among the 2.2M unique words, only 200K does have a definition. We generate strong and weak pairs from the downloaded definitions according to the rule described in subsection 3.1 leading to 417K strong pairs (when the parameter K from 3.1 is set to 5) and 3.9M weak pairs.
  • 11. 11 1. Learning Word Embeddings using Lexical Dictionaries + Containing only data fromWikipedia (corpus A) + Data from Wikipedia concatenated with the definitions extracted (corpus B).
  • 12. 2. Consistency Penalized Auto Encoder 12 Overview of the CPAE model [Tom Bosc].
  • 13. 13 Auto encoder model 2. Consistency Penalized Auto Encoder
  • 14. 14 Consistency penalty 2. Consistency Penalized Auto Encoder Three different embeddings: a) definition embeddings h, produced by the definition encoder, are the embeddings we are ultimately interested in computing; b) input embeddings E are used by the encoder as inputs; c) output embeddings E’ are compared to definition embeddings to yield a probability distribution over the words in the definition. A soft weight-tying scheme that brings the input embeddings closer to the definition embeddings. We call this term a consistency penalty because its goal is to to ensure that the embeddings used by the encoder (input embeddings) and the embeddings produced by the encoder (definition embeddings) are consistent with each other. Complete objective
  • 17. 17 3. Definition Auto-encoder with Semantic Injection
  • 18. 18 3. Definition Auto-encoder with Semantic Injection
  • 19. 19 4. Our Conducted Work I. A Word representation learning Method to Separating Synonymy with Relatedness II. A Word Similarity Measure Method
  • 21. +Hai (hay nhiều) từ là đồng nghĩa nếu chúng xuất hiện trong các ngữ cảnh giống nhau. + Ngữ cảnh chính là thông tin mô tả từ. Thank you for your attention 21