Generating Adequate Distractors for Multiple-Choice Questions
1. Learning with PurposeLearning with Purpose
Generating Adequate Distractors for
Multiple-Choice Questions
Authors: Cheng Zhang, Yicheng Sun,
Hejia Chen and Jie Wang
Presenter: Cheng Zhang
University of Massachusetts Lowell, USA
3. Learning with Purpose
An approach to automatic generation of adequate distractors for
a given question answer pair (QAP) to form an adequate
multiple-choice question (MCQ).
Combination of part-of-speech tagging, named-entity tagging,
semantic-role labeling, regular expressions, domain knowledge
bases, word embeddings, word edit distance, WordNet, and
other algorithms.
Evaluations by human judges, each MCQ has at least one
adequate distractor and 84% of MCQs have three adequate
distractors
Introduction
Abstract
4. Learning with Purpose
Methods of generating adequate distractors are typically
following two directions (Pho et al., 2014; Rao and Saha, 2018):
• 1. Domain specific knowledge bases
• 2. Semantic similarity
Previous efforts have focused on finding some forms of
distractors, instead of making them look more distracting.
Introduction
Background
5. Learning with Purpose
The Generated adequate distractor must satisfy the following
requirements:
• It is an incorrect answer to the question.
• It is grammatically correct.
• It is semantically related to the correct answer.
• It must provide enough distraction.
Introduction
Our Goals
6. Learning with Purpose
Input:
• Original article
• Answer in QAP
The fixed order of distractor generation for each target word:
1. Subjects,
2. Objects,
3. Adjectives for subjects,
4. Adjectives for objects,
5. Predicates,
6. Adverbs
Distractor Generation
Output:
• Distractors
7. Learning with Purpose
Three type of target word:
• Type-1: time point, time range,
numerical number, ordinal
number.
• Type-2: person, location,
organization.
• Type-3: others.
Distractor Generation
8. Learning with Purpose
Distractor candidates for Type-3:
• Semantic similarly words
• Hypernyms
• Antonyms
Filter out unsuitable candidates:
• Distractor candidates that contain the target word.
• Distractor candidates that have the same prefix of the target word
with edit distance less than three.
• E.g. Misspelled: “knowledge” vs “knowladge”
• Different tense: “try” vs “tries”
Distractor Generation
Target word in Type-3
9. Learning with Purpose
For each distractor candidate 𝑊𝑐 with target word 𝑊𝑡 :
𝑆 𝑣 = Word embedding cosine similarity score.
𝑆 𝑛 = WordNet WUP (Wu and Palmer, 1994) similarity score.
𝑆 𝑑 = Edit distance score.
where E is the edit distance.
Distractor Generation
Ranking Algorithm
10. Learning with Purpose
R = Ranking score.
if 𝑊𝑐 is an antonym of 𝑊𝑡
otherwise
Note that 𝑆 𝑣, 𝑆 𝑛, 𝑆 𝑑 are each between 0 and 1, and so 𝑅′(𝑊𝑐, 𝑊𝑡)
is between 0 and 1, which implies that log 𝑅′
𝑊𝑐, 𝑊𝑡 > 0 .
Also note that we give more weight to antonyms.
Distractor Generation
Ranking Algorithm
11. Learning with Purpose
U.S. SAT practice reading tests as a dataset.
Total of 303 distractors for evaluation.
Evaluated by human judgment.
Evaluation result:
• All distractors generated by our method are grammatically correct.
• 98% distractors are relevant to the QAP with distraction.
• 96% distractors provide sufficient distraction.
• 84% MCQs are adequate.
• All MCQs are acceptable (i.e., with at least one adequate distractor).
Evaluation
12. Learning with Purpose
What did Chie hear? (SAT practice test 1 article 1)
• her soft scuttling footsteps, the creak of the driveway.
• her soft scuttling footsteps, the creak of the stairwell.
• her soft scuttling footsteps, the knock of the door.
• her soft scuttling footsteps, the creak of the door. (Correct answer)
When should ethics apply? (SAT practice test 2 article 2)
• when someone makes an economic request.
• when someone makes an economic proposition.
• when someone makes a political decision.
• when someone makes an economic decision. (Correct answer)
Evaluation
Examples
13. Learning with Purpose
We presented a novel method using various NLP tools for
generating adequate distractors.
Improve the ranking measure to help select a better distractor
for a target word from a list of candidates.
Explore how to produce generative distractors using neural
networks, instead of just replacing a few target words in a given
answer.
Conclusions