2. Introduction
• This paper explores the idea of generating contextual definitions
for words using a deep-learning model. It does this by accepting a
word and a context for that word and then autoregressively
generating a definition to match the specific context.
Overview
• Created a new dataset with definition and context pairs.
• Trained a GPT-2 model on dataset
• Evaluated the model with human raters
3. Motivation for work
• Approximately 98% of words must be within a reader’s vocabulary for
optimal reading comprehension to occur.
• Textbooks often attempt to make up for potential vocabulary gaps by
defining key-terms.
• Problems:
• Reader is required to stop reading and lookup definition
• Limited number of terms defined
• Term may have multiple definitions
4. Motivation for work (cont.)
• Modern software can make the process easier.
• Can use search engine
• Newer tools allow reader to highlight word and have the definition appear in
a pop-up.
• Problems:
• Definitions may be vague and not adequately fit the context.
• Word may have a long list of definitions.
• If the word has multiple definitions, you must pick the most appropriate one.
5. Data Collection
• All data was required to have definitions and a labeled context paired
with that definition.
• With this in mind, we collected data from the following sources:
• Lexico
• Wikipedia
• Wiktionary
• Wordnet
7. Definition Modification
• Some definitions contained low information.
• We attempt to expand these definitions by using regular expressions, parts of
speech tags, and word frequency to find the key reference word.
• We then choose the most fitting definition by using word vectors and comparing
each definition for the reference word (e.g., “country”) with the context and
choose the most similar one by performing cosine similarity.
8. Model
• GPT-2 is an autoregressive model that uses the decoding blocks of the
transformer architecture.
1. Animation sourced from The Illustrated GPT-2 written by Jay Alammar
9. Model (cont.)
• Trained the model for 1 epoch
• Used GPT-2 Large: 774 parameter model.
• Two special tokens: <CONTEXT> and <DEFINITION>
10. Human Evaluation
• Posted survey on CloudResearch. Which sources high quality
participants on Mechanical Turks.
• Allowed participants to choose what topic they wanted to evaluate.
The topics available were from the following subjects:
• American Government
• Anatomy and Physiology
• Astronomy
• Psychology
• Three different surveys for the following context types:
1. Model-generated Short-context: Term used in a sentence
2. Model-generated Long-context: Term used in a sentence along with both
the prior and following sentence.
3. Human-generated: Definitions from the training dataset.
• Raters evaluated 50 questions each.
12. Results
• Short-context performed significantly better than long-context in terms of accuracy (𝑝 = 0.045). We
speculate the reason for this has to do with the training data containing far more shorter-contexts than
long.
• Real definitions performed significantly better than both short-context (𝑝 < 0.001).
• There were no significant differences between fluency.
• Topic was trending but not significant