SlideShare a Scribd company logo
Grammatical Error Correction
for ESL-Learners by SMT
Getting it right
Marcin Junczys-Dowmunt Roman Grundkiewicz
Information System Laboratory
Faculty for Math and Computer Science
Adam Mickiewicz University in Pozna´n, Poland
January 5th, 2015
Part I
What’s been going on in with SMT in
Grammatical Error Correction (GEC)?
The CoNLL-2014 Shared Task
Focus: English as a second language (ESL);
Preceeded by the HOO 2011 and 2012 (Helping Our Own)
shared tasks: detection and correction of determiner and
preposition errors in scientific papers;
Preceeded by the CoNLL-2013 shared task on Grammatical
Error Correction: correct five error types, determiners,
prepositions, noun number, subject-verb-agreement, verb
forms. 16 teams.
This year, 28 error categories.
13 teams participated.
NUCLE: NUS Corpus of Learner English
Dahlmeier et. al 2013
National University of Singapore (NUS) Natural Language
Processing Group led by Prof. Hwee Tou Ng and the NUS
Centre for English Language Communication led by Prof.
Siew Mei Wu;
1,400 essays, written by NUS students, ca. 1 million words,
ca. 51,000 sentences;
Topics: environmental pollution, healthcare, etc.;
Annotated with 28 error categories by NUS English
instructors.
NUCLE: NUS Corpus of Learner English
Example
<DOC nid="840">
<TEXT>
<P>Engineering design process can be defined ... </P>
<P>Firstly, engineering design ... </P>
...
</TEXT>
<ANNOTATION teacher_id="173">
<MISTAKE start_par="0" start_off="0" end_par="0" end_off="26">
<TYPE>ArtOrDet</TYPE>
<CORRECTION>The engineering design process</CORRECTION>
</MISTAKE>
...
</ANNOTATION>
</DOC>
NUCLE: NUS Corpus of Learner English
NUCLE Error Types I
* 6647 ArtOrDet Article or Determiner
5300 Wci Wrong collocation/idiom
4668 Rloc- Local redundancy
* 3770 Nn Noun number
3200 Vt Verb tense
3054 Mec Punctuation, spelling
* 2412 Prep Preposition
2160 Wform Word form
* 1527 SVA Subject-verb-agreement
1452 Others Other errors
* 1444 Vform Verb form
1349 Trans Link word/phrases
1073 Um Unclear meaning
925 Pref Pronoun reference
NUCLE: NUS Corpus of Learner English
NUCLE Error Types II
861 Srun Runons, comma splice
674 WOinc Incorrect sentence form
571 Wtone Tone
544 Cit Citation
515 Spar Parallelism
431 Vm Verb modal
411 V0 Missing verb
353 Ssub Subordinate clause
345 WOadv Adverb/adjective position
242 Npos Noun possesive
186 Pform Pronoun form
174 Sfrag Fragment
50 Wa Acronyms
47 Smod Dangling modifier
Evaluation Metric: MaxMatch (M2
)
Dahlmeier and Ng 2012
An algorithm for efficiently computing the sequence of
phrase-level edits between a source sentence and a hypothesis.
Finds the maximum overlap with the gold-standard.
Based on Levenshtein distance matrix for candidate and
reference sentence.
The optimal edit sequence is scored using F1.0 (CoNLL 2013)
or F0.5 measure (CoNLL 2014).
Evaluation Metric: MaxMatch (M2
)
Example calculation
S The cat sat at mat .
A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0
A 4 4|||ArtOrDet|||the||a|||REQUIRED|||-NONE-|||0
S The dog .
A 1 2|||NN|||dogs|||REQUIRED|||-NONE-|||0
S Giant otters is an apex predator .
A 2 3|||SVA|||are|||REQUIRED|||-NONE-|||0
A 3 4|||ArtOrDet|||-NONE-|||REQUIRED|||-NONE-|||0
A 5 6|||NN|||predators|||REQUIRED|||-NONE-|||0
A 1 2|||NN|||otter|||REQUIRED|||-NONE-|||1
A cat sat on the mat .
The dog .
Giant otters are apex predator .
The CoNLL-2014 Shared Task
Final Results and Ranking
Team P R M2
0.5
1 CAMB 39.71 30.10 37.33
2 CUUI 41.78 24.88 36.79
3 AMU 41.62 21.40 35.01
4 POST 34.51 21.73 30.88
5 NTHU 35.08 18.85 29.92
6 RAC 33.14 14.99 26.68
7 UMC 31.27 14.46 25.37
8 PKU 32.21 13.65 25.32
9 NARA 21.57 29.38 22.78
10 SJTU 30.11 5.10 15.19
11 UFC 70.00 1.72 7.84
12 IPN 11.28 2.85 7.09
13 IITB 30.77 1.39 5.90
Grammatical Error Correction by Data-Intensive and
Feature-Rich Statistical Machine Translation
Marcin Junczys-Dowmunt, Roman Grundkiewicz. CoNLL-2014 Shared Task
Grammatical error correction seen as translation from
erroneous into corrected sentences.
Baseline is a standard phrase-based Moses set-up.
Explore the interaction of:
Parameter optimization
Web-scale language models
Larger error-annoted resources as parallel data
Task-specific dense and sparse features
Data used:
TM: NUCLE, Lang-8 (scraped)
LM: TM data, Wikipedia, CommonCrawl (Buck et. al 2014)
System Combination for Grammatical Error Correction
Raymond Hendy Susanto, Peter Phandi, Hwee Tou Ng. EMNLP 2014
Uses MEMT to combine several GEC systems:
Two classifier-based systems and two SMT-based systems.
Various n-top combinations from the CoNLL-2014 shared task.
Data used:
Classifiers: POS-Taggers, Chunkers, etc.
SMT-TM: NUCLE, Lang-8 (Mizumoto et al. 2012)
SMT-LM: TM data, Wikipedia
System Combination: all CoNLL-2014 submissions
Lang-8.com
A social network for language learners
Lang-8.com
Examples
I thought/think that they are/there is a good combination between
winter and reading .
Today I have/had a bad began/beginning .
I wanna improve my skill/skills !
Is there somebody/anybody who wants to be friend/friends with me ?
If you need more information , please don ’t hesitate to tell/contact me
please .
Language model data
Corpus Sentences Tokens
NUCLE 57.15 K 1.15 M
Lang-8 2.23 M 30.03 M
Wikipedia 213.08 M 3.37 G
CommonCrawl 59.13 G 975.63 G
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
?
M2
0.5
Part II
10 Lessons from the
CoNLL-2014 Shared Task
Lesson 1: Implement task-specific features
Add features that seem to be relevant for GEC: Levenshtein distance
source phrase (s) target phrase (t) LD
a short time . short term only . 3
a situation into a situation 1
a supermarket . a supermarket . 0
a supermarket . at a supermarket 1
able unable 1
LD(s, t) ≡ Levenshtein Distance between source phrase (s)
and target phrase (t) in words;
Feature computes eLD(s,t), sums to total number of edits
applied to sentence in log-linear model.
Lesson 1: Implement task-specific features
Add features that seem to be relevant for GEC: Levenshtein distance
source phrase (s) target phrase (t) LD D I S
a short time . short term only . 3 1 1 1
a situation into a situation 1 0 1 0
a supermarket . a supermarket . 0 0 0 0
a supermarket . at a supermarket 1 1 0 0
able unable 1 0 0 1
LD(s, t) ≡ Levenshtein Distance between source phrase (s)
and target phrase (t) in words;
Feature computes eLD(s,t), sums to total number of edits
applied to sentence in log-linear model.
From the Levenshtein distance matrix, compute counts for
deletions (D), inserts (I), and substitutions (S). eD, eI , and
eS are additive in log-linear models.
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL ,
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes INS out
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes INS out TRANS . TO .
Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes INS out TRANS . TO .
Create sequences for all sentence pairs.
Compute n-gram language model over operation sequences.
OSM usually contains reordering operations, here degenerates
to edit sequences.
Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
• (E)dits: counts inserts,
deletions, replacements.
Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
• (E)dits: counts inserts,
deletions, replacements.
• (O)peration Sequence
Model: without reordering
degenerates to stateful
sequence model of edits.
Lesson 2: Optimize the right metric
Implement an optimizer for the metric that you have to use in the end
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
• (E)dits: counts inserts,
deletions, replacements.
• (O)peration Sequence
Model: without reordering
degenerates to stateful
sequence model of edits.
Lesson 2: Optimize the right metric
Implement an optimizer for the metric that you have to use in the end
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with M2
Lesson 3: Account for optimizer instability
Or don’t count on good luck (Clark et al. 2011)
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with M2
Lesson 3: Account for optimizer instability
Averaged weights beat mean scores (Cettolo et al. 2011)
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with M2
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Lesson 4: Bigger development sets are better
Maybe a little too big
Until now, CoNLL-2013 test set used as dev set:
Consists of only 1381 sentences with 3461 error annotations.
Rate of erroneous tokens: 14.97%
Lesson 4: Bigger development sets are better
Maybe a little too big
Until now, CoNLL-2013 test set used as dev set:
Consists of only 1381 sentences with 3461 error annotations.
Rate of erroneous tokens: 14.97%
But there is also NUCLE, currently used as training data:
Consists of only 57151 sentences with 44385 error annotations.
Rate of erroneous tokens: 6.23%
Lesson 4: Bigger development sets are better
Maybe a little too big
Until now, CoNLL-2013 test set used as dev set:
Consists of only 1381 sentences with 3461 error annotations.
Rate of erroneous tokens: 14.97%
But there is also NUCLE, currently used as training data:
Consists of only 57151 sentences with 44385 error annotations.
Rate of erroneous tokens: 6.23%
We are not supposed to look at the error rate of the
CoNLL-2014 test set (ca. 10% for Annotator 1, ca. 12% for
Annotator 2)
Lesson 4: Bigger development sets are better
Turning NUCLE into a development set via cross validation
1. Greedily remove sentences until error rate 0.15 is reached.
Leaves: 23381 sentences, 39707 annotations.
2. Divide into 4 equal-sized subsets.
3. Tune on 1 subset, add remaining 3 subsets to training data,
repeat for each subset.
4. Average weights for all subsets into single weight vector.
5. Repeat steps 3-4 a couple of times (here: 5) to average
weights accross iterations.
Lesson 4: Bigger development sets are better
Turning NUCLE into a development set via cross validation
We are not using the previous dev set, though it could just be
added to all subsets.
It is nice to have a second test set (although slighlty cheating
due to matching error rate).
Effect of error rate on tuning:
Low error rate: high precision.
High error rate: high recall.
Future work: Can the error rate be used to adjust training
data, too?
Lesson 4: Bigger development sets are better
Performance jump from switching dev sets
B L E O
38.0
40.0
42.0
44.0
M2
0.5
Lesson 4: Bigger development sets are better
Performance jump from switching dev sets
B L E O
38.0
40.0
42.0
44.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Lesson 5: Expanding the search space barely helps
103
104
42.86
42.88
42.90
Cube-pruning stack size & pop-limit
M2
0.5
Lesson 6: Self-composition helps (once)
0 1 2 3
42.9
43.0
43.1
43.2
Number of iterations
M2
0.5
Lesson 7: There’s something about sparse-features
But tuning them correctly is hard
In our CoNLL-2014 shared task submission, we reported that
sparse features help.
Correlation without causation: what actually helped was the
tuning scheme.
Still, we strongly believe that for GEC sparse features are
something worth exploring.
Lesson 7: There’s something about sparse-features
But tuning them correctly is hard
In our CoNLL-2014 shared task submission, we reported that
sparse features help.
Correlation without causation: what actually helped was the
tuning scheme.
Still, we strongly believe that for GEC sparse features are
something worth exploring.
Problems:
kbMIRA does not seem to work well with M2
.
Both PRO and kbMIRA give worse results than MERT for M2
even without sparse features.
PRO seems to even out with sparse features.
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
subst(comes, surfaces)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
subst(comes, surfaces)=1
del(out)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
subst(comes, surfaces)=1
del(out)=1
Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1 <s> subst(Then,Hence) a=1
insert(,)=1 Hence insert(,) a=1
subst(comes, surfaces)=1 problem subst(comes, surfaces) out=1
del(out)=1 comes del(out) .=1
Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
Instead: sampled a
subset from the the raw
text data with
cross-entropy difference
filtering.
Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
Instead: sampled a
subset from the the raw
text data with
cross-entropy difference
filtering.
Wikipedia CommonCrawl
42.0
44.0
46.0
48.0
50.0
M2
0.5
Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
Instead: sampled a
subset from the the raw
text data with
cross-entropy difference
filtering.
Wikipedia CommonCrawl
42.0
44.0
46.0
48.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
Legally and morally
dubious.
Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
Legally and morally
dubious.
Lang-8 free
Common-
Crawl
Lang-8 scraped
Common-
Crawl
48.0
50.0
52.0
M2
0.5
Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
Legally and morally
dubious.
Lang-8 free
Common-
Crawl
Lang-8 scraped
Common-
Crawl
48.0
50.0
52.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
Other wikis: over 400,000 Wikia wikis (Working on it)
The same format as Wikipedia-Dumps.
Our scripts can already work with that.
Wikia has a local branch in Pozna´n!
And one of my students is an intern
Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
Other wikis: over 400,000 Wikia wikis (Working on it)
The same format as Wikipedia-Dumps.
Our scripts can already work with that.
Wikia has a local branch in Pozna´n!
And one of my students is an intern
Diff between two Commoncrawl revisions (Dreading it)
Take revisions maybe one year apart.
Group documents by URL.
Sentence align and keep pairs with small differences.
Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
Other wikis: over 400,000 Wikia wikis (Working on it)
The same format as Wikipedia-Dumps.
Our scripts can already work with that.
Wikia has a local branch in Pozna´n!
And one of my students is an intern
Diff between two Commoncrawl revisions (Dreading it)
Take revisions maybe one year apart.
Group documents by URL.
Sentence align and keep pairs with small differences.
. . .
Lesson 10: GEC needs a better metric
Maybe tuning with BLEU is actually smarter
Team P R M2
0.5
1 CAMB 39.71 30.10 37.33
2 CUUI 41.78 24.88 36.79
3 AMU 41.62 21.40 35.01
4 POST 34.51 21.73 30.88
5 NTHU 35.08 18.85 29.92
6 RAC 33.14 14.99 26.68
7 UMC 31.27 14.46 25.37
8 PKU 32.21 13.65 25.32
9 NARA 21.57 29.38 22.78
10 SJTU 30.11 5.10 15.19
11 UFC 70.00 1.72 7.84
12 IPN 11.28 2.85 7.09
13 IITB 30.77 1.39 5.90
Lesson 10: GEC needs a better metric
Maybe tuning with BLEU is actually smarter
Team P R M2
0.5
1 CAMB 39.71 30.10 37.33
2 CUUI 41.78 24.88 36.79
3 AMU 41.62 21.40 35.01
4 POST 34.51 21.73 30.88
5 NTHU 35.08 18.85 29.92
6 RAC 33.14 14.99 26.68
7 UMC 31.27 14.46 25.37
8 PKU 32.21 13.65 25.32
9 NARA 21.57 29.38 22.78
10 SJTU 30.11 5.10 15.19
11 UFC 70.00 1.72 7.84
12 IPN 11.28 2.85 7.09
13 IITB 30.77 1.39 5.90
14 Input 0.00 0.00 0.00
Lesson 10: GEC needs a better metric
Maybe tuning with BLEU is actually smarter
Team P R M2
0.5 BLEU
1 CAMB 39.71 30.10 37.33 81.77
2 CUUI 41.78 24.88 36.79 83.46
3 AMU 41.62 21.40 35.01 83.42
4 POST 34.51 21.73 30.88 81.61
5 NTHU 35.08 18.85 29.92 82.42
6 RAC 33.14 14.99 26.68 81.91
7 UMC 31.27 14.46 25.37 83.66
8 PKU 32.21 13.65 25.32 83.71
9 NARA 21.57 29.38 22.78 –
10 SJTU 30.11 5.10 15.19 85.96
11 UFC 70.00 1.72 7.84 86.82
12 IPN 11.28 2.85 7.09 83.39
13 IITB 30.77 1.39 5.90 86.50
14 Input 0.00 0.00 0.00 86.79
Lesson 10: GEC needs a better metric
Maybe tuning with BLEU is actually smarter
Team P R M2
0.5 BLEU
11 UFC 70.00 1.72 7.84 86.82
14 Input 0.00 0.00 0.00 86.79
13 IITB 30.77 1.39 5.90 86.50
10 SJTU 30.11 5.10 15.19 85.96
8 PKU 32.21 13.65 25.32 83.71
7 UMC 31.27 14.46 25.37 83.66
2 CUUI 41.78 24.88 36.79 83.46
3 AMU 41.62 21.40 35.01 83.42
12 IPN 11.28 2.85 7.09 83.39
5 NTHU 35.08 18.85 29.92 82.42
6 RAC 33.14 14.99 26.68 81.91
1 CAMB 39.71 30.10 37.33 81.77
4 POST 34.51 21.73 30.88 81.61
9 NARA 21.57 29.38 22.78 –
Summing up
Proper parameter tuning is crucial.
Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
With additonal LM and TM data and task-specific features,
Moses beats all unrestricted systems and previously published
system combinations (partially tuned on test data).
Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
With additonal LM and TM data and task-specific features,
Moses beats all unrestricted systems and previously published
system combinations (partially tuned on test data).
New top-result with unrestricted data: 50.80% (51.00%
1-composition)
Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
With additonal LM and TM data and task-specific features,
Moses beats all unrestricted systems and previously published
system combinations (partially tuned on test data).
New top-result with unrestricted data: 50.80% (51.00%
1-composition)
But: the M2 metric is highly dubious.
Future Work
There is still a lot to learn:
How do I properly tune sparse features with M2 and PRO or
kbMIRA?
Is M2 a sensible metric at all?
Where and how can I collect more data?
What’s the effect of other LMs like neural language models
Can I fully integrate ML methods (Vowpal Wabbit)?
. . .
References
Christian Buck, Kenneth Heafield, and Bas van Ooyen. 2014.
N-gram Counts and Language Models from the Common
Crawl. LREC 2014.
Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. 2013.
Building a Large Annotated Corpus of Learner English: The
NUS Corpus of Learner English. BEA 2013.
Daniel Dahlmeier and Hwee Tou Ng. 2012. Better Evaluation
for Grammatical Error Correction. NAACL 2012.
Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian
Hadiwinoto, Raymond Hendy Susanto, and Christopher
Bryant. 2014. The CoNLL-2014 shared task on grammatical
error correction. CoNLL 2014.
References
Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2014.
The AMU system in the CoNLL-2014 shared task:
Grammatical error correction by data-intensive and
feature-rich statistical machine translation. CoNLL 2014.
Tomoya Mizumoto, Yuta Hayashibe, Mamoru Komachi,
Masaaki Nagata, and Yuji Matsumoto. 2012. The effect of
learner corpus size in grammatical error correction of ESL
writings. COLING 2012.
Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A.
Smith. 2011. Better hypothesis testing for statistical machine
translation: Controlling for optimizer instability. HLT 2011.
Mauro Cettolo, Nicola Bertoldi, and Marcello Federico. 2011.
Methods for smoothing the optimizer instability in SMT. MT
Summit 2011.

More Related Content

What's hot

MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Massimo Schenone
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Lviv Data Science Summer School
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Association for Computational Linguistics
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
RIILP
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
Kypros ma lotta eeva 31 8 iltalotta2
Kypros ma  lotta eeva 31 8 iltalotta2Kypros ma  lotta eeva 31 8 iltalotta2
Kypros ma lotta eeva 31 8 iltalotta2
Girl Saarilo
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
KozoChikai
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
Hrishikesh Nair
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
Francisco Zamora-Martinez
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
Katerina Vylomova
 
Word2Vec on Italian language
Word2Vec on Italian languageWord2Vec on Italian language
Word2Vec on Italian language
Francesco Cucari
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
Vincenzo Lomonaco
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
Ali Kabbadj
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
 
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
Jinho Choi
 

What's hot (19)

MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
Kypros ma lotta eeva 31 8 iltalotta2
Kypros ma  lotta eeva 31 8 iltalotta2Kypros ma  lotta eeva 31 8 iltalotta2
Kypros ma lotta eeva 31 8 iltalotta2
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Word2Vec on Italian language
Word2Vec on Italian languageWord2Vec on Italian language
Word2Vec on Italian language
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
 

Viewers also liked

SURVEY OF ERROR ANALYSIS
SURVEY OF ERROR ANALYSISSURVEY OF ERROR ANALYSIS
SURVEY OF ERROR ANALYSIS
Ranjanvelari
 
SAT Preparation & Tutoring Services in MA
SAT Preparation & Tutoring Services in MASAT Preparation & Tutoring Services in MA
SAT Preparation & Tutoring Services in MA
PreppedPolished
 
Paragraph Basics
Paragraph BasicsParagraph Basics
Paragraph Basics
Prof_K
 
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Tomoya Mizumoto
 
The fourth step in writing 1
The fourth step in writing 1The fourth step in writing 1
The fourth step in writing 1
cadavis78
 
Types of sentences group project
Types of sentences group projectTypes of sentences group project
Types of sentences group project
shannonwheeler11
 
The Third and Fourth Steps in Writing
The Third and Fourth Steps in WritingThe Third and Fourth Steps in Writing
The Third and Fourth Steps in Writing
Prof_K
 
19.4 - Error Analysis for Sentences
19.4 - Error Analysis for Sentences19.4 - Error Analysis for Sentences
19.4 - Error Analysis for Sentences
Designlab Innovation
 
SAT Prep - Improving Sentences
SAT  Prep - Improving SentencesSAT  Prep - Improving Sentences
SAT Prep - Improving Sentences
jpalmertree
 
Apostrophes
ApostrophesApostrophes
Apostrophes
Prof_K
 
The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...
The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...
The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...
Tomoya Mizumoto
 
Detection of semantic errors from simple bangla sentences
Detection of semantic errors from simple bangla sentencesDetection of semantic errors from simple bangla sentences
Detection of semantic errors from simple bangla sentences
Hozaifa Moaj
 
Corpus based digital dictionary construction for ESP
Corpus based digital dictionary construction for ESPCorpus based digital dictionary construction for ESP
Corpus based digital dictionary construction for ESP
Julio Palma
 
Commas
CommasCommas
Commas
Prof_K
 
Error correction using redundant residue number system
Error correction using redundant residue number systemError correction using redundant residue number system
Error correction using redundant residue number system
Nimi T
 
Reshaping the Value of Grammatical Feedback on Writing Using Colors
Reshaping the Value of Grammatical Feedback on Writing Using ColorsReshaping the Value of Grammatical Feedback on Writing Using Colors
Reshaping the Value of Grammatical Feedback on Writing Using Colors
Toyo University
 
Sentences Types: Simple, Compound, Complex, Compound-Complex
Sentences Types: Simple, Compound, Complex, Compound-ComplexSentences Types: Simple, Compound, Complex, Compound-Complex
Sentences Types: Simple, Compound, Complex, Compound-Complex
Belachew Weldegebriel
 
Resolving Grammer Error
Resolving Grammer ErrorResolving Grammer Error
Resolving Grammer Error
CC Undertree
 
First structures practice 1
First structures   practice 1First structures   practice 1
First structures practice 1
Melvin Duke
 

Viewers also liked (20)

SURVEY OF ERROR ANALYSIS
SURVEY OF ERROR ANALYSISSURVEY OF ERROR ANALYSIS
SURVEY OF ERROR ANALYSIS
 
SAT Preparation & Tutoring Services in MA
SAT Preparation & Tutoring Services in MASAT Preparation & Tutoring Services in MA
SAT Preparation & Tutoring Services in MA
 
Paragraph Basics
Paragraph BasicsParagraph Basics
Paragraph Basics
 
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
 
The fourth step in writing 1
The fourth step in writing 1The fourth step in writing 1
The fourth step in writing 1
 
Focus on writing ch. 20
Focus on writing ch. 20Focus on writing ch. 20
Focus on writing ch. 20
 
Types of sentences group project
Types of sentences group projectTypes of sentences group project
Types of sentences group project
 
The Third and Fourth Steps in Writing
The Third and Fourth Steps in WritingThe Third and Fourth Steps in Writing
The Third and Fourth Steps in Writing
 
19.4 - Error Analysis for Sentences
19.4 - Error Analysis for Sentences19.4 - Error Analysis for Sentences
19.4 - Error Analysis for Sentences
 
SAT Prep - Improving Sentences
SAT  Prep - Improving SentencesSAT  Prep - Improving Sentences
SAT Prep - Improving Sentences
 
Apostrophes
ApostrophesApostrophes
Apostrophes
 
The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...
The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...
The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writ...
 
Detection of semantic errors from simple bangla sentences
Detection of semantic errors from simple bangla sentencesDetection of semantic errors from simple bangla sentences
Detection of semantic errors from simple bangla sentences
 
Corpus based digital dictionary construction for ESP
Corpus based digital dictionary construction for ESPCorpus based digital dictionary construction for ESP
Corpus based digital dictionary construction for ESP
 
Commas
CommasCommas
Commas
 
Error correction using redundant residue number system
Error correction using redundant residue number systemError correction using redundant residue number system
Error correction using redundant residue number system
 
Reshaping the Value of Grammatical Feedback on Writing Using Colors
Reshaping the Value of Grammatical Feedback on Writing Using ColorsReshaping the Value of Grammatical Feedback on Writing Using Colors
Reshaping the Value of Grammatical Feedback on Writing Using Colors
 
Sentences Types: Simple, Compound, Complex, Compound-Complex
Sentences Types: Simple, Compound, Complex, Compound-ComplexSentences Types: Simple, Compound, Complex, Compound-Complex
Sentences Types: Simple, Compound, Complex, Compound-Complex
 
Resolving Grammer Error
Resolving Grammer ErrorResolving Grammer Error
Resolving Grammer Error
 
First structures practice 1
First structures   practice 1First structures   practice 1
First structures practice 1
 

Similar to Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it right

A single stage single constraints linear fractional programming problem an ap...
A single stage single constraints linear fractional programming problem an ap...A single stage single constraints linear fractional programming problem an ap...
A single stage single constraints linear fractional programming problem an ap...
orajjournal
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
Fwdays
 
1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf
ssusere320ca
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
Hendrik D'Oosterlinck
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
Fabio Petroni, PhD
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lifeng (Aaron) Han
 
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
Hiroki Shimanaka
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
Schwannden Kuo
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)
evabl444
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
JaeHo Jang
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
 
Cd project
Cd projectCd project
Cd project
divikmittal
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Investigating teachers' understanding of IMS Learning Design: Yes they can!
Investigating teachers' understanding of IMS Learning Design: Yes they can!Investigating teachers' understanding of IMS Learning Design: Yes they can!
Investigating teachers' understanding of IMS Learning Design: Yes they can!
Michael Derntl
 

Similar to Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it right (20)

A single stage single constraints linear fractional programming problem an ap...
A single stage single constraints linear fractional programming problem an ap...A single stage single constraints linear fractional programming problem an ap...
A single stage single constraints linear fractional programming problem an ap...
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
 
Cd project
Cd projectCd project
Cd project
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Investigating teachers' understanding of IMS Learning Design: Yes they can!
Investigating teachers' understanding of IMS Learning Design: Yes they can!Investigating teachers' understanding of IMS Learning Design: Yes they can!
Investigating teachers' understanding of IMS Learning Design: Yes they can!
 

Recently uploaded

Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 

Recently uploaded (20)

Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 

Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it right

  • 1. Grammatical Error Correction for ESL-Learners by SMT Getting it right Marcin Junczys-Dowmunt Roman Grundkiewicz Information System Laboratory Faculty for Math and Computer Science Adam Mickiewicz University in Pozna´n, Poland January 5th, 2015
  • 2. Part I What’s been going on in with SMT in Grammatical Error Correction (GEC)?
  • 3. The CoNLL-2014 Shared Task Focus: English as a second language (ESL); Preceeded by the HOO 2011 and 2012 (Helping Our Own) shared tasks: detection and correction of determiner and preposition errors in scientific papers; Preceeded by the CoNLL-2013 shared task on Grammatical Error Correction: correct five error types, determiners, prepositions, noun number, subject-verb-agreement, verb forms. 16 teams. This year, 28 error categories. 13 teams participated.
  • 4. NUCLE: NUS Corpus of Learner English Dahlmeier et. al 2013 National University of Singapore (NUS) Natural Language Processing Group led by Prof. Hwee Tou Ng and the NUS Centre for English Language Communication led by Prof. Siew Mei Wu; 1,400 essays, written by NUS students, ca. 1 million words, ca. 51,000 sentences; Topics: environmental pollution, healthcare, etc.; Annotated with 28 error categories by NUS English instructors.
  • 5. NUCLE: NUS Corpus of Learner English Example <DOC nid="840"> <TEXT> <P>Engineering design process can be defined ... </P> <P>Firstly, engineering design ... </P> ... </TEXT> <ANNOTATION teacher_id="173"> <MISTAKE start_par="0" start_off="0" end_par="0" end_off="26"> <TYPE>ArtOrDet</TYPE> <CORRECTION>The engineering design process</CORRECTION> </MISTAKE> ... </ANNOTATION> </DOC>
  • 6. NUCLE: NUS Corpus of Learner English NUCLE Error Types I * 6647 ArtOrDet Article or Determiner 5300 Wci Wrong collocation/idiom 4668 Rloc- Local redundancy * 3770 Nn Noun number 3200 Vt Verb tense 3054 Mec Punctuation, spelling * 2412 Prep Preposition 2160 Wform Word form * 1527 SVA Subject-verb-agreement 1452 Others Other errors * 1444 Vform Verb form 1349 Trans Link word/phrases 1073 Um Unclear meaning 925 Pref Pronoun reference
  • 7. NUCLE: NUS Corpus of Learner English NUCLE Error Types II 861 Srun Runons, comma splice 674 WOinc Incorrect sentence form 571 Wtone Tone 544 Cit Citation 515 Spar Parallelism 431 Vm Verb modal 411 V0 Missing verb 353 Ssub Subordinate clause 345 WOadv Adverb/adjective position 242 Npos Noun possesive 186 Pform Pronoun form 174 Sfrag Fragment 50 Wa Acronyms 47 Smod Dangling modifier
  • 8. Evaluation Metric: MaxMatch (M2 ) Dahlmeier and Ng 2012 An algorithm for efficiently computing the sequence of phrase-level edits between a source sentence and a hypothesis. Finds the maximum overlap with the gold-standard. Based on Levenshtein distance matrix for candidate and reference sentence. The optimal edit sequence is scored using F1.0 (CoNLL 2013) or F0.5 measure (CoNLL 2014).
  • 9. Evaluation Metric: MaxMatch (M2 ) Example calculation S The cat sat at mat . A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0 A 4 4|||ArtOrDet|||the||a|||REQUIRED|||-NONE-|||0 S The dog . A 1 2|||NN|||dogs|||REQUIRED|||-NONE-|||0 S Giant otters is an apex predator . A 2 3|||SVA|||are|||REQUIRED|||-NONE-|||0 A 3 4|||ArtOrDet|||-NONE-|||REQUIRED|||-NONE-|||0 A 5 6|||NN|||predators|||REQUIRED|||-NONE-|||0 A 1 2|||NN|||otter|||REQUIRED|||-NONE-|||1 A cat sat on the mat . The dog . Giant otters are apex predator .
  • 10. The CoNLL-2014 Shared Task Final Results and Ranking Team P R M2 0.5 1 CAMB 39.71 30.10 37.33 2 CUUI 41.78 24.88 36.79 3 AMU 41.62 21.40 35.01 4 POST 34.51 21.73 30.88 5 NTHU 35.08 18.85 29.92 6 RAC 33.14 14.99 26.68 7 UMC 31.27 14.46 25.37 8 PKU 32.21 13.65 25.32 9 NARA 21.57 29.38 22.78 10 SJTU 30.11 5.10 15.19 11 UFC 70.00 1.72 7.84 12 IPN 11.28 2.85 7.09 13 IITB 30.77 1.39 5.90
  • 11. Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation Marcin Junczys-Dowmunt, Roman Grundkiewicz. CoNLL-2014 Shared Task Grammatical error correction seen as translation from erroneous into corrected sentences. Baseline is a standard phrase-based Moses set-up. Explore the interaction of: Parameter optimization Web-scale language models Larger error-annoted resources as parallel data Task-specific dense and sparse features Data used: TM: NUCLE, Lang-8 (scraped) LM: TM data, Wikipedia, CommonCrawl (Buck et. al 2014)
  • 12. System Combination for Grammatical Error Correction Raymond Hendy Susanto, Peter Phandi, Hwee Tou Ng. EMNLP 2014 Uses MEMT to combine several GEC systems: Two classifier-based systems and two SMT-based systems. Various n-top combinations from the CoNLL-2014 shared task. Data used: Classifiers: POS-Taggers, Chunkers, etc. SMT-TM: NUCLE, Lang-8 (Mizumoto et al. 2012) SMT-LM: TM data, Wikipedia System Combination: all CoNLL-2014 submissions
  • 13. Lang-8.com A social network for language learners
  • 14. Lang-8.com Examples I thought/think that they are/there is a good combination between winter and reading . Today I have/had a bad began/beginning . I wanna improve my skill/skills ! Is there somebody/anybody who wants to be friend/friends with me ? If you need more information , please don ’t hesitate to tell/contact me please .
  • 15. Language model data Corpus Sentences Tokens NUCLE 57.15 K 1.15 M Lang-8 2.23 M 30.03 M Wikipedia 213.08 M 3.37 G CommonCrawl 59.13 G 975.63 G
  • 16. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 17. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 18. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 19. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 ? M2 0.5
  • 20. Part II 10 Lessons from the CoNLL-2014 Shared Task
  • 21. Lesson 1: Implement task-specific features Add features that seem to be relevant for GEC: Levenshtein distance source phrase (s) target phrase (t) LD a short time . short term only . 3 a situation into a situation 1 a supermarket . a supermarket . 0 a supermarket . at a supermarket 1 able unable 1 LD(s, t) ≡ Levenshtein Distance between source phrase (s) and target phrase (t) in words; Feature computes eLD(s,t), sums to total number of edits applied to sentence in log-linear model.
  • 22. Lesson 1: Implement task-specific features Add features that seem to be relevant for GEC: Levenshtein distance source phrase (s) target phrase (t) LD D I S a short time . short term only . 3 1 1 1 a situation into a situation 1 0 1 0 a supermarket . a supermarket . 0 0 0 0 a supermarket . at a supermarket 1 1 0 0 able unable 1 0 0 1 LD(s, t) ≡ Levenshtein Distance between source phrase (s) and target phrase (t) in words; Feature computes eLD(s,t), sums to total number of edits applied to sentence in log-linear model. From the Levenshtein distance matrix, compute counts for deletions (D), inserts (I), and substitutions (S). eD, eI , and eS are additive in log-linear models.
  • 23. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Source: Then a new problem comes out . Target: Hence , a new problem surfaces .
  • 24. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out .
  • 25. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then
  • 26. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL ,
  • 27. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL , TRANS a TO a
  • 28. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL , TRANS a TO a TRANS new TO new
  • 29. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL , TRANS a TO a TRANS new TO new TRANS problem TO problem
  • 30. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL , TRANS a TO a TRANS new TO new TRANS problem TO problem TRANS surfaces TO comes
  • 31. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL , TRANS a TO a TRANS new TO new TRANS problem TO problem TRANS surfaces TO comes INS out
  • 32. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL , TRANS a TO a TRANS new TO new TRANS problem TO problem TRANS surfaces TO comes INS out TRANS . TO .
  • 33. Lesson 1: Implement task-specific features A stateful feature: Operation Sequence Model Target: Hence , a new problem surfaces . Source: Then a new problem comes out . TRANS Hence TO Then DEL , TRANS a TO a TRANS new TO new TRANS problem TO problem TRANS surfaces TO comes INS out TRANS . TO . Create sequences for all sentence pairs. Compute n-gram language model over operation sequences. OSM usually contains reordering operations, here degenerates to edit sequences.
  • 34. Lesson 1: Implement task-specific features Results optmized with BLEU B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU • (B)aseline: vanilla Moses, disabled reordering.
  • 35. Lesson 1: Implement task-specific features Results optmized with BLEU B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU • (B)aseline: vanilla Moses, disabled reordering. • (L)evenshtein Distance: word-based as TM score. Sums to number of edits.
  • 36. Lesson 1: Implement task-specific features Results optmized with BLEU B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU • (B)aseline: vanilla Moses, disabled reordering. • (L)evenshtein Distance: word-based as TM score. Sums to number of edits. • (E)dits: counts inserts, deletions, replacements.
  • 37. Lesson 1: Implement task-specific features Results optmized with BLEU B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU • (B)aseline: vanilla Moses, disabled reordering. • (L)evenshtein Distance: word-based as TM score. Sums to number of edits. • (E)dits: counts inserts, deletions, replacements. • (O)peration Sequence Model: without reordering degenerates to stateful sequence model of edits.
  • 38. Lesson 2: Optimize the right metric Implement an optimizer for the metric that you have to use in the end B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU • (B)aseline: vanilla Moses, disabled reordering. • (L)evenshtein Distance: word-based as TM score. Sums to number of edits. • (E)dits: counts inserts, deletions, replacements. • (O)peration Sequence Model: without reordering degenerates to stateful sequence model of edits.
  • 39. Lesson 2: Optimize the right metric Implement an optimizer for the metric that you have to use in the end B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with M2
  • 40. Lesson 3: Account for optimizer instability Or don’t count on good luck (Clark et al. 2011) B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with M2
  • 41. Lesson 3: Account for optimizer instability Averaged weights beat mean scores (Cettolo et al. 2011) B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with BLEU B L E O 36.0 38.0 40.0 42.0 M2 0.5 Optimized with M2
  • 42. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 43. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 44. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 45. Lesson 4: Bigger development sets are better Maybe a little too big Until now, CoNLL-2013 test set used as dev set: Consists of only 1381 sentences with 3461 error annotations. Rate of erroneous tokens: 14.97%
  • 46. Lesson 4: Bigger development sets are better Maybe a little too big Until now, CoNLL-2013 test set used as dev set: Consists of only 1381 sentences with 3461 error annotations. Rate of erroneous tokens: 14.97% But there is also NUCLE, currently used as training data: Consists of only 57151 sentences with 44385 error annotations. Rate of erroneous tokens: 6.23%
  • 47. Lesson 4: Bigger development sets are better Maybe a little too big Until now, CoNLL-2013 test set used as dev set: Consists of only 1381 sentences with 3461 error annotations. Rate of erroneous tokens: 14.97% But there is also NUCLE, currently used as training data: Consists of only 57151 sentences with 44385 error annotations. Rate of erroneous tokens: 6.23% We are not supposed to look at the error rate of the CoNLL-2014 test set (ca. 10% for Annotator 1, ca. 12% for Annotator 2)
  • 48. Lesson 4: Bigger development sets are better Turning NUCLE into a development set via cross validation 1. Greedily remove sentences until error rate 0.15 is reached. Leaves: 23381 sentences, 39707 annotations. 2. Divide into 4 equal-sized subsets. 3. Tune on 1 subset, add remaining 3 subsets to training data, repeat for each subset. 4. Average weights for all subsets into single weight vector. 5. Repeat steps 3-4 a couple of times (here: 5) to average weights accross iterations.
  • 49. Lesson 4: Bigger development sets are better Turning NUCLE into a development set via cross validation We are not using the previous dev set, though it could just be added to all subsets. It is nice to have a second test set (although slighlty cheating due to matching error rate). Effect of error rate on tuning: Low error rate: high precision. High error rate: high recall. Future work: Can the error rate be used to adjust training data, too?
  • 50. Lesson 4: Bigger development sets are better Performance jump from switching dev sets B L E O 38.0 40.0 42.0 44.0 M2 0.5
  • 51. Lesson 4: Bigger development sets are better Performance jump from switching dev sets B L E O 38.0 40.0 42.0 44.0 M2 0.5
  • 52. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 53. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 54. Lesson 5: Expanding the search space barely helps 103 104 42.86 42.88 42.90 Cube-pruning stack size & pop-limit M2 0.5
  • 55. Lesson 6: Self-composition helps (once) 0 1 2 3 42.9 43.0 43.1 43.2 Number of iterations M2 0.5
  • 56. Lesson 7: There’s something about sparse-features But tuning them correctly is hard In our CoNLL-2014 shared task submission, we reported that sparse features help. Correlation without causation: what actually helped was the tuning scheme. Still, we strongly believe that for GEC sparse features are something worth exploring.
  • 57. Lesson 7: There’s something about sparse-features But tuning them correctly is hard In our CoNLL-2014 shared task submission, we reported that sparse features help. Correlation without causation: what actually helped was the tuning scheme. Still, we strongly believe that for GEC sparse features are something worth exploring. Problems: kbMIRA does not seem to work well with M2 . Both PRO and kbMIRA give worse results than MERT for M2 even without sparse features. PRO seems to even out with sparse features.
  • 58. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces .
  • 59. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1
  • 60. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 insert(,)=1
  • 61. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 insert(,)=1
  • 62. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 insert(,)=1
  • 63. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 insert(,)=1
  • 64. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 insert(,)=1 subst(comes, surfaces)=1
  • 65. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 insert(,)=1 subst(comes, surfaces)=1 del(out)=1
  • 66. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 insert(,)=1 subst(comes, surfaces)=1 del(out)=1
  • 67. Lesson 7: There’s something about sparse-features Sparse feature examples Source: Then a new problem comes out . Target: Hence , a new problem surfaces . subst(Then,Hence)=1 <s> subst(Then,Hence) a=1 insert(,)=1 Hence insert(,) a=1 subst(comes, surfaces)=1 problem subst(comes, surfaces) out=1 del(out)=1 comes del(out) .=1
  • 68. Lesson 8: Nothing beats the language model Which is actually quite frustrating Using Kenneth Heafield’s CommonCrawl LM is near impossible due to hardware bottleneck.
  • 69. Lesson 8: Nothing beats the language model Which is actually quite frustrating Using Kenneth Heafield’s CommonCrawl LM is near impossible due to hardware bottleneck. Still working on way to find out the upper bound.
  • 70. Lesson 8: Nothing beats the language model Which is actually quite frustrating Using Kenneth Heafield’s CommonCrawl LM is near impossible due to hardware bottleneck. Still working on way to find out the upper bound. Instead: sampled a subset from the the raw text data with cross-entropy difference filtering.
  • 71. Lesson 8: Nothing beats the language model Which is actually quite frustrating Using Kenneth Heafield’s CommonCrawl LM is near impossible due to hardware bottleneck. Still working on way to find out the upper bound. Instead: sampled a subset from the the raw text data with cross-entropy difference filtering. Wikipedia CommonCrawl 42.0 44.0 46.0 48.0 50.0 M2 0.5
  • 72. Lesson 8: Nothing beats the language model Which is actually quite frustrating Using Kenneth Heafield’s CommonCrawl LM is near impossible due to hardware bottleneck. Still working on way to find out the upper bound. Instead: sampled a subset from the the raw text data with cross-entropy difference filtering. Wikipedia CommonCrawl 42.0 44.0 46.0 48.0 50.0 M2 0.5
  • 73. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 74. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 75. Lesson 9: Collecting parallel monolingual data is hard For the Shared Task 2014 we scraped more data from Lang-8.
  • 76. Lesson 9: Collecting parallel monolingual data is hard For the Shared Task 2014 we scraped more data from Lang-8. Collected twice as many sentences (ca. 4M) compared to the free release with 2M .
  • 77. Lesson 9: Collecting parallel monolingual data is hard For the Shared Task 2014 we scraped more data from Lang-8. Collected twice as many sentences (ca. 4M) compared to the free release with 2M . Legally and morally dubious.
  • 78. Lesson 9: Collecting parallel monolingual data is hard For the Shared Task 2014 we scraped more data from Lang-8. Collected twice as many sentences (ca. 4M) compared to the free release with 2M . Legally and morally dubious. Lang-8 free Common- Crawl Lang-8 scraped Common- Crawl 48.0 50.0 52.0 M2 0.5
  • 79. Lesson 9: Collecting parallel monolingual data is hard For the Shared Task 2014 we scraped more data from Lang-8. Collected twice as many sentences (ca. 4M) compared to the free release with 2M . Legally and morally dubious. Lang-8 free Common- Crawl Lang-8 scraped Common- Crawl 48.0 50.0 52.0 M2 0.5
  • 80. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 81. Performance on the CoNLL-2014 Shared Task test set CoNLL-2014 Top 5 Unrestricted Susanto et. al 2014 Restricted Unrestricted This talk Restricted Unrestricted 30.0 40.0 50.0 M2 0.5
  • 82. Lesson 9: Collecting parallel monolingual data is hard Other less obvious but legally safe sources Wikipedia edit histories (Done that) Over 14 million sentences with small editions. Publically available as the WikEd corpus. Currently thinking what to do with that.
  • 83. Lesson 9: Collecting parallel monolingual data is hard Other less obvious but legally safe sources Wikipedia edit histories (Done that) Over 14 million sentences with small editions. Publically available as the WikEd corpus. Currently thinking what to do with that. Other wikis: over 400,000 Wikia wikis (Working on it) The same format as Wikipedia-Dumps. Our scripts can already work with that. Wikia has a local branch in Pozna´n! And one of my students is an intern
  • 84. Lesson 9: Collecting parallel monolingual data is hard Other less obvious but legally safe sources Wikipedia edit histories (Done that) Over 14 million sentences with small editions. Publically available as the WikEd corpus. Currently thinking what to do with that. Other wikis: over 400,000 Wikia wikis (Working on it) The same format as Wikipedia-Dumps. Our scripts can already work with that. Wikia has a local branch in Pozna´n! And one of my students is an intern Diff between two Commoncrawl revisions (Dreading it) Take revisions maybe one year apart. Group documents by URL. Sentence align and keep pairs with small differences.
  • 85. Lesson 9: Collecting parallel monolingual data is hard Other less obvious but legally safe sources Wikipedia edit histories (Done that) Over 14 million sentences with small editions. Publically available as the WikEd corpus. Currently thinking what to do with that. Other wikis: over 400,000 Wikia wikis (Working on it) The same format as Wikipedia-Dumps. Our scripts can already work with that. Wikia has a local branch in Pozna´n! And one of my students is an intern Diff between two Commoncrawl revisions (Dreading it) Take revisions maybe one year apart. Group documents by URL. Sentence align and keep pairs with small differences. . . .
  • 86. Lesson 10: GEC needs a better metric Maybe tuning with BLEU is actually smarter Team P R M2 0.5 1 CAMB 39.71 30.10 37.33 2 CUUI 41.78 24.88 36.79 3 AMU 41.62 21.40 35.01 4 POST 34.51 21.73 30.88 5 NTHU 35.08 18.85 29.92 6 RAC 33.14 14.99 26.68 7 UMC 31.27 14.46 25.37 8 PKU 32.21 13.65 25.32 9 NARA 21.57 29.38 22.78 10 SJTU 30.11 5.10 15.19 11 UFC 70.00 1.72 7.84 12 IPN 11.28 2.85 7.09 13 IITB 30.77 1.39 5.90
  • 87. Lesson 10: GEC needs a better metric Maybe tuning with BLEU is actually smarter Team P R M2 0.5 1 CAMB 39.71 30.10 37.33 2 CUUI 41.78 24.88 36.79 3 AMU 41.62 21.40 35.01 4 POST 34.51 21.73 30.88 5 NTHU 35.08 18.85 29.92 6 RAC 33.14 14.99 26.68 7 UMC 31.27 14.46 25.37 8 PKU 32.21 13.65 25.32 9 NARA 21.57 29.38 22.78 10 SJTU 30.11 5.10 15.19 11 UFC 70.00 1.72 7.84 12 IPN 11.28 2.85 7.09 13 IITB 30.77 1.39 5.90 14 Input 0.00 0.00 0.00
  • 88. Lesson 10: GEC needs a better metric Maybe tuning with BLEU is actually smarter Team P R M2 0.5 BLEU 1 CAMB 39.71 30.10 37.33 81.77 2 CUUI 41.78 24.88 36.79 83.46 3 AMU 41.62 21.40 35.01 83.42 4 POST 34.51 21.73 30.88 81.61 5 NTHU 35.08 18.85 29.92 82.42 6 RAC 33.14 14.99 26.68 81.91 7 UMC 31.27 14.46 25.37 83.66 8 PKU 32.21 13.65 25.32 83.71 9 NARA 21.57 29.38 22.78 – 10 SJTU 30.11 5.10 15.19 85.96 11 UFC 70.00 1.72 7.84 86.82 12 IPN 11.28 2.85 7.09 83.39 13 IITB 30.77 1.39 5.90 86.50 14 Input 0.00 0.00 0.00 86.79
  • 89. Lesson 10: GEC needs a better metric Maybe tuning with BLEU is actually smarter Team P R M2 0.5 BLEU 11 UFC 70.00 1.72 7.84 86.82 14 Input 0.00 0.00 0.00 86.79 13 IITB 30.77 1.39 5.90 86.50 10 SJTU 30.11 5.10 15.19 85.96 8 PKU 32.21 13.65 25.32 83.71 7 UMC 31.27 14.46 25.37 83.66 2 CUUI 41.78 24.88 36.79 83.46 3 AMU 41.62 21.40 35.01 83.42 12 IPN 11.28 2.85 7.09 83.39 5 NTHU 35.08 18.85 29.92 82.42 6 RAC 33.14 14.99 26.68 81.91 1 CAMB 39.71 30.10 37.33 81.77 4 POST 34.51 21.73 30.88 81.61 9 NARA 21.57 29.38 22.78 –
  • 90. Summing up Proper parameter tuning is crucial.
  • 91. Summing up Proper parameter tuning is crucial. The language model is the strongest feature.
  • 92. Summing up Proper parameter tuning is crucial. The language model is the strongest feature. With proper tuning, vanilla Moses beats published top-results on the CoNLL-2014 shared task (39.39% vs. 41.63%) when using restricted training data.
  • 93. Summing up Proper parameter tuning is crucial. The language model is the strongest feature. With proper tuning, vanilla Moses beats published top-results on the CoNLL-2014 shared task (39.39% vs. 41.63%) when using restricted training data. New top-result with restricted data (1-composition): 43.18%
  • 94. Summing up Proper parameter tuning is crucial. The language model is the strongest feature. With proper tuning, vanilla Moses beats published top-results on the CoNLL-2014 shared task (39.39% vs. 41.63%) when using restricted training data. New top-result with restricted data (1-composition): 43.18% With additonal LM and TM data and task-specific features, Moses beats all unrestricted systems and previously published system combinations (partially tuned on test data).
  • 95. Summing up Proper parameter tuning is crucial. The language model is the strongest feature. With proper tuning, vanilla Moses beats published top-results on the CoNLL-2014 shared task (39.39% vs. 41.63%) when using restricted training data. New top-result with restricted data (1-composition): 43.18% With additonal LM and TM data and task-specific features, Moses beats all unrestricted systems and previously published system combinations (partially tuned on test data). New top-result with unrestricted data: 50.80% (51.00% 1-composition)
  • 96. Summing up Proper parameter tuning is crucial. The language model is the strongest feature. With proper tuning, vanilla Moses beats published top-results on the CoNLL-2014 shared task (39.39% vs. 41.63%) when using restricted training data. New top-result with restricted data (1-composition): 43.18% With additonal LM and TM data and task-specific features, Moses beats all unrestricted systems and previously published system combinations (partially tuned on test data). New top-result with unrestricted data: 50.80% (51.00% 1-composition) But: the M2 metric is highly dubious.
  • 97. Future Work There is still a lot to learn: How do I properly tune sparse features with M2 and PRO or kbMIRA? Is M2 a sensible metric at all? Where and how can I collect more data? What’s the effect of other LMs like neural language models Can I fully integrate ML methods (Vowpal Wabbit)? . . .
  • 98. References Christian Buck, Kenneth Heafield, and Bas van Ooyen. 2014. N-gram Counts and Language Models from the Common Crawl. LREC 2014. Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. 2013. Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English. BEA 2013. Daniel Dahlmeier and Hwee Tou Ng. 2012. Better Evaluation for Grammatical Error Correction. NAACL 2012. Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 shared task on grammatical error correction. CoNLL 2014.
  • 99. References Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2014. The AMU system in the CoNLL-2014 shared task: Grammatical error correction by data-intensive and feature-rich statistical machine translation. CoNLL 2014. Tomoya Mizumoto, Yuta Hayashibe, Mamoru Komachi, Masaaki Nagata, and Yuji Matsumoto. 2012. The effect of learner corpus size in grammatical error correction of ESL writings. COLING 2012. Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A. Smith. 2011. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. HLT 2011. Mauro Cettolo, Nicola Bertoldi, and Marcello Federico. 2011. Methods for smoothing the optimizer instability in SMT. MT Summit 2011.