Contextual Definition
Generation
Jeffrey Yarbro, Andrew Olney
Institute for Intelligent Systems
University of Memphis
Introduction
• This paper explores the idea of generating contextual definitions
for words using a deep-learning model. It does this by accepting a
word and a context for that word and then autoregressively
generating a definition to match the specific context.
Overview
• Created a new dataset with definition and context pairs.
• Trained a GPT-2 model on dataset
• Evaluated the model with human raters
Motivation for work
• Approximately 98% of words must be within a reader’s vocabulary for
optimal reading comprehension to occur.
• Textbooks often attempt to make up for potential vocabulary gaps by
defining key-terms.
• Problems:
• Reader is required to stop reading and lookup definition
• Limited number of terms defined
• Term may have multiple definitions
Motivation for work (cont.)
• Modern software can make the process easier.
• Can use search engine
• Newer tools allow reader to highlight word and have the definition appear in
a pop-up.
• Problems:
• Definitions may be vague and not adequately fit the context.
• Word may have a long list of definitions.
• If the word has multiple definitions, you must pick the most appropriate one.
Data Collection
• All data was required to have definitions and a labeled context paired
with that definition.
• With this in mind, we collected data from the following sources:
• Lexico
• Wikipedia
• Wiktionary
• Wordnet
Data Collection (cont.)
Source:
Dataset:
Definition Modification
• Some definitions contained low information.
• We attempt to expand these definitions by using regular expressions, parts of
speech tags, and word frequency to find the key reference word.
• We then choose the most fitting definition by using word vectors and comparing
each definition for the reference word (e.g., “country”) with the context and
choose the most similar one by performing cosine similarity.
Model
• GPT-2 is an autoregressive model that uses the decoding blocks of the
transformer architecture.
1. Animation sourced from The Illustrated GPT-2 written by Jay Alammar
Model (cont.)
• Trained the model for 1 epoch
• Used GPT-2 Large: 774 parameter model.
• Two special tokens: <CONTEXT> and <DEFINITION>
Human Evaluation
• Posted survey on CloudResearch. Which sources high quality
participants on Mechanical Turks.
• Allowed participants to choose what topic they wanted to evaluate.
The topics available were from the following subjects:
• American Government
• Anatomy and Physiology
• Astronomy
• Psychology
• Three different surveys for the following context types:
1. Model-generated Short-context: Term used in a sentence
2. Model-generated Long-context: Term used in a sentence along with both
the prior and following sentence.
3. Human-generated: Definitions from the training dataset.
• Raters evaluated 50 questions each.
Survey Format
Results
• Short-context performed significantly better than long-context in terms of accuracy (𝑝 = 0.045). We
speculate the reason for this has to do with the training data containing far more shorter-contexts than
long.
• Real definitions performed significantly better than both short-context (𝑝 < 0.001).
• There were no significant differences between fluency.
• Topic was trending but not significant
Short-Context vs Human-Generated Density Plots
Problems with model
• Too much fluctuation depending on context.
• Trouble interpreting some contexts.
• Some tendency to memorize definitions
Q&A

Contextual Definition Generation

  • 1.
    Contextual Definition Generation Jeffrey Yarbro,Andrew Olney Institute for Intelligent Systems University of Memphis
  • 2.
    Introduction • This paperexplores the idea of generating contextual definitions for words using a deep-learning model. It does this by accepting a word and a context for that word and then autoregressively generating a definition to match the specific context. Overview • Created a new dataset with definition and context pairs. • Trained a GPT-2 model on dataset • Evaluated the model with human raters
  • 3.
    Motivation for work •Approximately 98% of words must be within a reader’s vocabulary for optimal reading comprehension to occur. • Textbooks often attempt to make up for potential vocabulary gaps by defining key-terms. • Problems: • Reader is required to stop reading and lookup definition • Limited number of terms defined • Term may have multiple definitions
  • 4.
    Motivation for work(cont.) • Modern software can make the process easier. • Can use search engine • Newer tools allow reader to highlight word and have the definition appear in a pop-up. • Problems: • Definitions may be vague and not adequately fit the context. • Word may have a long list of definitions. • If the word has multiple definitions, you must pick the most appropriate one.
  • 5.
    Data Collection • Alldata was required to have definitions and a labeled context paired with that definition. • With this in mind, we collected data from the following sources: • Lexico • Wikipedia • Wiktionary • Wordnet
  • 6.
  • 7.
    Definition Modification • Somedefinitions contained low information. • We attempt to expand these definitions by using regular expressions, parts of speech tags, and word frequency to find the key reference word. • We then choose the most fitting definition by using word vectors and comparing each definition for the reference word (e.g., “country”) with the context and choose the most similar one by performing cosine similarity.
  • 8.
    Model • GPT-2 isan autoregressive model that uses the decoding blocks of the transformer architecture. 1. Animation sourced from The Illustrated GPT-2 written by Jay Alammar
  • 9.
    Model (cont.) • Trainedthe model for 1 epoch • Used GPT-2 Large: 774 parameter model. • Two special tokens: <CONTEXT> and <DEFINITION>
  • 10.
    Human Evaluation • Postedsurvey on CloudResearch. Which sources high quality participants on Mechanical Turks. • Allowed participants to choose what topic they wanted to evaluate. The topics available were from the following subjects: • American Government • Anatomy and Physiology • Astronomy • Psychology • Three different surveys for the following context types: 1. Model-generated Short-context: Term used in a sentence 2. Model-generated Long-context: Term used in a sentence along with both the prior and following sentence. 3. Human-generated: Definitions from the training dataset. • Raters evaluated 50 questions each.
  • 11.
  • 12.
    Results • Short-context performedsignificantly better than long-context in terms of accuracy (𝑝 = 0.045). We speculate the reason for this has to do with the training data containing far more shorter-contexts than long. • Real definitions performed significantly better than both short-context (𝑝 < 0.001). • There were no significant differences between fluency. • Topic was trending but not significant
  • 13.
  • 14.
    Problems with model •Too much fluctuation depending on context. • Trouble interpreting some contexts. • Some tendency to memorize definitions
  • 15.