1. Reassessing the Goals of
Grammatical Error Correction:
Fluency instead of Grammaticality
Keisuke Sakaguchi1, Courtney Napoles1,
Matt Post1, & Joel Tetreault2
1Johns Hopkins University
2Yahoo! (now at Grammarly)
2. Grammaticality and Fluency
From this scope, social media has shorten
our distance.
2
From this scope, social media has
shortened our distance.
From this perspective, social media has
shortened the distance between us.
3. Overview (High-level)
1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
3
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
4. History of GEC:
Grammaticality or Fluency?
4
trying to help learners
correct small mistakes?
also trying to help them
sound more fluent?
or
Shared
Task
Target Errors Metric
HOO 11 All error types:
e.g. Prep. Punctuations, word choice …
F-score
HOO 12 Limited error types:
prepositions, determiners
F-score
CoNLL 13 Limited error types:
HOO12 + noun number, verb form, SVA
M2 (≈ F0.5)
CoNLL 14 All error types:
e.g. CoNLL13 + redundancy, word choice …
M2 (≈ F0.5)
5. 1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
Overview (High-level)
5
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
6. Existing Annotation Scheme
Fine-grained error type coding (CLC: 80 types, NUCLE: 27 types)
6
<p>However, to have
<NS type ="RD"> <i>his</i>
<c>your</c></NS>
<NS type="FN"><i>photos</i>
<c>photo</c></NS>
taken and
<NS type="TV"><i>showed</i>
<c>shown</c></NS>
on television and
<NS type="MT"><c>in</c></NS>
<NS type ="MD"><c>the</c>
</NS>
newspapers increases your
popularity.</p>
<MISTAKE start_par="1" start
_off="387" end_par="1" end_
off="389"><TYPE>Prep</TYPE><
CORRECTION>of</CORRECTION></
MISTAKE>
<MISTAKE start_par= "1"
start_off="396" end_par= "1”
end_off="413"><TYPE>V0
</TYPE><CORRECTION>that are
inhospitable</CORRECTION></M
ISTAKE>
<MISTAKE start_par= "1"
start_off="422" end_par= "1”
end_off="430"><TYPE>Mec
</TYPE><CORRECTION>deserts</
CORRECTION></MISTAKE>
7. <MISTAKE start_par="1" start
_off="387" end_par="1" end_
off="389"><TYPE>Prep</TYPE><
CORRECTION>of</CORRECTION></
MISTAKE><MISTAKE start_par=
"1" start_off="396" end_par=
"1” end_off="413"><TYPE>V0
</TYPE><CORRECTION>that are
inhospitable</CORRECTION></M
ISTAKE><MISTAKE start_par=
"1" start_off="422" end_par=
"1” end_off="430"><TYPE>Mec
</TYPE><CORRECTION>deserts</
CORRECTION></MISTAKE>
<p>However, to have <NS type
="RD"><i>his</i><c>your</c>
</NS> <NS type="FN"><i>
photos</i><c>photo</c></NS>
taken and <NS type="TV"><i>
showed</i><c>shown</c></NS>
on television and <NS type=
"MT"><c>in</c></NS> <NS type
="MD"><c>the</c></NS>
newspapers increases your
popularity.</p>
Existing Annotation Scheme
Fine-grained error type coding (CLC: 80 types, NUCLE: 27 types)
7
1. It costs a lot to train annotators for error coding.
2. Inter-annotator agreement (IAA) is very low.
3. Downward pressure on the annotators to make
sentences just Grammatical and not Fluent.
8. Grammaticality and Fluency
From this scope, social media has shorten
our distance.
8
From this scope, social media has
shortened our distance.
From this perspective, social media has
shortened the distance between us.
9. New annotation: Fluency edits
Simply ask native speakers to rewrite the sentence to
sound natural to them.
9
1. Low cost: no error tags, no training is required
2. Scalability: All native speakers can annotate.
3. Fluency is taken into account.
10. New annotation: Fluency edits
- NUCLE 3.2 dataset (test: 1,312 sentences)
- Two annotators per sentence.
- Fluency edits vs. Grammatically minimal edits.
10
From this scope, social media has
shortened our distance.
From this perspective, social media has
shortened the distance between us.
11. Data examples
Some family may feel hurt , with regards to their family
pride or reputation , on having the knowledge of such
genetic disorder running in their family .
11
Some families may feel hurt [] with regards to their family
pride or reputation , on having [] knowledge of such a
genetic disorder running in their family .
On [] learning of such a genetic disorder running in their
family , some family members may feel hurt [] regarding
their family pride or reputation .
Minimal
Fluency
Some family members may feel hurt [] with regards to
their family pride or reputation [] on having knowledge of
a genetic disorder running in their family .
NUCLE
12. Preference by Native Speakers
12
Scored and Ranked by TrueSkill (as used in WMT)
(Rank groups have statistically significant difference.)
Rank Score Annotation scheme
1 1.16 Fluency edits
2 0.54 NUCLE annotation
3 0.26 minimal edits
4 -2.9 Original sentence
13. 1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
Overview (High-level)
13
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
14. Metrics
MaxMatch (M2) (Dahlmeier and Ng, 2012)
Phrase level F-measure
Designed for error-coded GEC corpora.
I-measure (Felice and Briscoe, 2015)
Token level accuracy
Designed for error-coded GEC corpora.
GLEU (Napoles et al., 2015)
Similar to BLEU but considering source information.
N-gram precision with penalty term
Suitable for non-error-coded corpora.
14
Grammaticality
Fluency
Grammaticality
15. Annotation Scheme & Metrics
15
M2
Fluency edits
NUCLE (error coded)
(≈ Minimal edits)
Annotation Scheme
(= Reference)
Automated Metrics
I-measure
GLEU
X
Q: Which is the best combination to evaluate?
Oracle ranking (by human): Grundkiewicz et at (2015).
18. 1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
Overview (High-level)
18
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
19. New annotation: Fluency edits
- NUCLE 3.2 dataset (1,312 sentences)
- Two crowdsourced annotators per sentence.
- Qualified by expert editors
- Reward: $0.07 ~ $0.1 per sentence
- Total: Approx. $240 in 24 hours.
Q: How is the quality of their edits?
19
20. Data examples
Some family may feel hurt , with regards to their family
pride or reputation , on having the knowledge of such
genetic disorder running in their family .
20
On [] learning of such a genetic disorder running in their
family , some family members may feel hurt [] regarding
their family pride or reputation .
Some relatives may [] be concerned about the family’s []
reputation – not to mention their own pride – in relation to
this news of [] familial genetic defectiveness [] .
Fluency
Some family members may feel hurt [] with regards to
their family pride or reputation [] on having knowledge of
a genetic disorder running in their family .
NUCLE
Fluency
21. Preference by Native Speakers
21
Scored and Ranked by TrueSkill (as used in WMT)
(Rank groups have statistically significant difference.)
Rank Score Annotation scheme
1 1.16 Fluency edits
0.97 Fluency edits
2 0.26 NUCLE reference
3 -2.9 Original sentence
23. Summary
23
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
24. All the Fluency reference are available at
https://github.com/keisks/reassess-gec.git
For exhaustive experiments and analysis:
http://aclweb.org/anthology/Q/Q16/Q16-1013.pdf
Thank you!
24