This document proposes three methods for generating reliable and valid distractors for fill-in-the-blank language learning quizzes: 1) A confusion matrix method using an ESL corpus, 2) A discriminative ESL method using classifiers trained on an ESL corpus, and 3) A discriminative simulated-ESL method using classifiers trained on pseudo-ESL data. An experiment compares the three proposed methods to existing thesaurus- and roundtrip translation-based methods. The discriminative simulated-ESL method performed best in terms of distractor appropriateness and ability to discriminate learner proficiency levels.
Generate Reliable Distractors for Language Learners
1. Discrimina)ve
Approach
to
Fill-‐in-‐the-‐Blank
Quiz
Genera)on
for
Language
Learners
Keisuke
Sakaguchi1,
Yuki
Arase2,
Mamoru
Komachi1
1
Nara
Ins9tute
of
Science
and
Technology
(NAIST),
Japan
2
MicrosoD
Research
Asia,
China
keisuke-sa@is.naist.jp, yukiar@microsoft.com, komachi@tmu.ac.jp
Prior work!
v Thesaurus
(Sumita
et
al.)
v Roundtrip
transla9on
(Dahlmeier
and
Ng.)
à re-‐ranking
by
genera9ve
LMs
References!
Summary!
v Charles
Alderson,
Caroline
Clapham,
and
Dianne
Wall.
1995.
Language
Test
Construc/on
and
Evalua/on.
Cambridge
University
Press.
v Daniel
Dahlmeier
and
Hwee
Tou
Ng.
2011.
Correc9ng
seman9c
colloca9on
errors
with
L1-‐induced
paraphrases.
In
Proceedings
of
the
2011
Conference
on
Empirical
Methods
in
Natural
Language
Processing,
pages
107–117,
Edinburgh,
Scotland,
UK.,
July.
v Eiichiro
Sumita,
Fumiaki
Sugaya,
and
Seiichi
Yamamoto.
2005.
Measuring
Non-‐na9ve
Speakers’
Proficiency
of
English
by
Using
a
Test
with
Automa9cally-‐Generated
Fill-‐in-‐the-‐Blank
Ques9ons.
In
Proceedings
of
the
2nd
Workshop
on
Building
Educa/onal
Applica/ons
Using
NLP,
pages
61–
68,
Ann
Arbor,
June.
Generate
more
reliable
and
valid
distractors
using
1.
Large-‐scale
ESL
corpus
2.
Discrimina9ve
models
v Fill-‐in-‐the-‐blank
quiz
for
ESL
learners.
v Good
(seman9c)
distractors
(Alderson
et
al.)
-‐
reliable:
exclusive
against
the
correct
answer
-‐
valid:
discriminate
learners’
proficiency
v Reliability
-‐
3
na9ve
speakers
1.
Ra9o
of
Appropriate
Distractors
=
※
NAD:
#
of
quizzes
that
2+
par9cipants
agree
on
2.
Inter-‐rater
agreement
κ
v Validity
-‐
23
Japanese
ESL
learners
1.
Correla9on
Coefficient
(r)
Proposed Method!
v Confusion
Matrix
Method
v Discrimina9ve
ESL
Method
Confusion
matrix
from
ESL
corpus
(Lang-‐8)
Classifier
for
each
target
(trained
on
ESL
corpus)
v Discrimina9ve
Simulated-‐ESL
Method
Classifier
for
each
target
(trained
on
Pseudo-‐ESL
corpus)
Features:
±1
lemma,
±2
lemma,
dependency
Label:
generated
from
confusion
matrix
Features:
±1
lemma,
±2
lemma,
dependency
Label:
original
incorrect
verb
in
Lang-‐8
Method
Corpus
Model
RAD
κ
r
Confu9on
Mat.
ESL
Genera9ve
94.5
0.55
0.71
Disc.
ESL
ESL
Discrimina9ve
95.0
0.73
0.48
Disc.
Sim-‐ESL
Pseudo-‐ESL
Discrimina9ve
98.3
0.69
0.76
Thesaurus
Na9ve
Genera9ve
89.3
0.57
0.68
Roundtrip
Na9ve
Genera9ve
93.6
0.53
0.67
Experiment and Result!