Correct Me If I’m Wrong: Fixing Grammatical 
Errors by Preposition Ranking 
Roman Prokofyev, Ruslan Mavlyutov, Gianluca Demartini and 
Philippe Cudre-Mauroux 
eXascale Infolab 
University of Fribourg, Switzerland 
November 4th, CIKM’14 
Shanghai, China 
1
Motivations and Task Overview 
2 
• Grammatical correction is important by itself 
• Also as a part of Machine Translation or Speech Recognition 
Correction of textual content written by English Learners. 
I am new in android programming. 
[to, at, for, …] 
⇒ Rank candidate prepositions by their likelihood of being 
correct in order to potentially replace the original.
3 
State-of-Art in Grammar Correction 
Approaches: 
• Multi-class classification using pre-defined candidate set 
• Statistical Machine Translation 
• Language modeling 
Features: 
• Lexical features, POS tags 
• Head verbs/nouns, word dependencies 
• N-gram counts from large corpora (Google Web 1-T) 
• Confusion matrix
Key Ideas 
• Usage of a particular preposition is governed by a 
particular word/n-gram; 
⇒ Task: select/aggregate n-grams that influence 
preposition usage; 
⇒ Use n-gram association measures to score each 
preposition. 
4
Processing Pipeline 
5 
1 ... m 
n-gram 
construction 
Sentence 
Tokenization 
Supervised 
Classifier 
Feature 
extrFaecatitounre 
extFraecattiuornes N-gram 
Statistics 
Document 
foreach 
foreach 
Sentence 
Extraction 
Corrected 
Sentence 
Corrected 
Document 
List of 
n-grams 
Ranked list 
of 
prepositions 
for n-gram
6 
Tokenization and n-gram distance 
N-gram Type Distance 
the force PREP 3gram -2 
force be PREP 3gram -1 
be PREP you 3gram 0 
PREP you . 3gram 1 
N-gram Type Distance 
be PREP 2gram -1 
PREP you 2gram 1 
PREP . 2gram 2
N-gram association measures 
7 
Motivation: 
use association measures to compute a score that will be 
proportional to the likelihood of an n-gram appearing 
together with a preposition. 
N-gram PMI scores by preposition 
force be PREP (with: -4.9), (under: -7.86), (at: -9.26), (in: -9.93), … 
be PREP you (with: -1.86), (amongst: -1.99), (beside: -2.26), … 
PREP you . (behind: -0.71), (beside: -0.82), (around: -0.84), … 
Background N-gram collection: Google Books N-grams.
N-grams with determiner skips 
8 
Around 30% prepositions are used in proximity to 
determiners (“a”, “the”, etc.) 
Determiners Nouns 
Correct preposition ranks 
“one of the most”  “one of most”
PMI-based Features 
9 
• Average rank of a preposition among the ranks of the 
considered n-grams; 
• Average PMI score of a preposition among the PMI 
scores of the considered n-grams; 
• Total number of occurrences of a certain preposition on 
the first position in the ranking among the ranks of the 
considered n-grams. 
Calculated across 2 logical groups (considered n-grams): 
• N-gram size; 
• N-gram distances.
Central N-grams 
10 
Distribution of correct preposition counts on top of PMI 
rankings with respect to n-gram distance.
Other features 
• Confusion matrix values 
• POS tags: 5 most frequent tags + “OTHER” catch-all tag; 
• Preposition itself: sparse vector of the size of the 
candidate preposition set. 
11 
to in of for on at with from 
to 0.958 0.007 0.002 0.011 0.004 0.003 0.005 0.002 
in 0.037 0.79 0.01 0.009 0.066 0.036 0.015 0.008
Preposition selection 
Supervised Learning algorithm. 
• Two-class classification with a confidence score for every 
preposition from the candidate set; 
• Every candidate preposition will receive its own set of 
feature values; 
Classifier: random forest. 
12
Training/Test Collections 
Training collection: 
• First Certificate of English (Cambridge exams) 
Test collections: 
• CoNLL-2013 (50 essays written by NUS students) 
• StackExchange (historical edits) 
Cambridge FCE CoNLL-2013 StackExhange 
N# sentences 27k 1.4k 6k 
13
Experiments: overview 
1. Feature importance scores 
2. N-gram sizes 
3. N-gram distances 
4. CoNLL and StackExchange evaluation 
14
15 
Experiments: Feature Importances 
Feature name Importance score 
Confusion matrix probability 0.28 
Top preposition counts (3grams) 0.13 
Average rank (distance=0) 0.06 
Central n-gram rank 0.06 
Average rank (distance=1) 0.05 
All top features except “confusion matrix” are based on the 
PMI scores.
N-gram Sizes and Distances 
16 
Distance restriction Precision Recall F1 score 
(0) 0.3077* 0.3908* 0.3442* 
(-1,1) 0.3231 0.4166 0.3637 
(-2,2) 0.3214 0.4222 0.3648 
(-5,5) 0.3223 0.4028 0.3577 
None 0.3214 0.3924* 0.3532 
N-gram sizes Precision Recall F1 score 
{2,3}-grams 0.3005* 0.3879* 0.3385* 
{3}-grams + skip n-grams (4) 0.2931* 0.4187 0.3447 
{2,3}-grams + skip n-grams 0.3231 0.4166 0.3637 
* indicates a statistically significant difference to the best performing 
approach (in bold)
Test Collection Evaluation 
17 
Collection Approach Precision Recall F1 
score 
CoNLL-2013 
NARA Team @CoNLL2013 0.2910 0.1254 0.1753 
N-gram-based classification 0.2592 0.3611 0.3017 
StackExchange 
N-gram-based classification 0.1585 0.2185 0.1837 
N-gram-based classification 
(cross-validation) 
0.2704 0.2961 0.2824
Takeaways 
18 
• PMI association measures 
• + Two-class preposition ranking 
⇒ allow to significantly outperform the state of the art. 
• Skip n-grams contribute significantly to the final result. 
Roman Prokofyev (@rprokofyev) 
eXascale Infolab (exascale.info), University of Fribourg, Switzerland 
http://www.slideshare.net/eXascaleInfolab/
Two-class vs. Multi-class classification 
Multi-class: 
• input: one feature vector 
• output: one of N classes (1-N) 
Two-class (still with N possible outcomes) 
• input: N feature vectors 
• output: 1/0 for each of N feature vectors 
19
Training/Test Collections 
20 
Training collection: 
• Cambridge Learner Corpus: First Certificate of English 
(exams of 2000-2001), CLC FCE 
Test collections: 
• CoNLL-2013 (50 essays written by NUS students) 
• StackExchange (StackOverflow + Superuser) 
CLC FCE CoNLL-2013 SE 
N# sentences 27k 1.4k 6k 
N# prepositions 60k 3.2k 15.8k 
N# prepositional errors 2.9k 152 6k 
% errors 4.8 4.7 38.2
Evaluation Metrics 
Precision = 
Recall = 
F1 = 
21 
valid _ suggested _ corrections 
total _ suggested _ corrections 
valid _ suggested _corrections 
total _ valid _ corrections 
2·Precision·Recall 
Precision+ Recall

CIKM14: Fixing grammatical errors by preposition ranking

  • 1.
    Correct Me IfI’m Wrong: Fixing Grammatical Errors by Preposition Ranking Roman Prokofyev, Ruslan Mavlyutov, Gianluca Demartini and Philippe Cudre-Mauroux eXascale Infolab University of Fribourg, Switzerland November 4th, CIKM’14 Shanghai, China 1
  • 2.
    Motivations and TaskOverview 2 • Grammatical correction is important by itself • Also as a part of Machine Translation or Speech Recognition Correction of textual content written by English Learners. I am new in android programming. [to, at, for, …] ⇒ Rank candidate prepositions by their likelihood of being correct in order to potentially replace the original.
  • 3.
    3 State-of-Art inGrammar Correction Approaches: • Multi-class classification using pre-defined candidate set • Statistical Machine Translation • Language modeling Features: • Lexical features, POS tags • Head verbs/nouns, word dependencies • N-gram counts from large corpora (Google Web 1-T) • Confusion matrix
  • 4.
    Key Ideas •Usage of a particular preposition is governed by a particular word/n-gram; ⇒ Task: select/aggregate n-grams that influence preposition usage; ⇒ Use n-gram association measures to score each preposition. 4
  • 5.
    Processing Pipeline 5 1 ... m n-gram construction Sentence Tokenization Supervised Classifier Feature extrFaecatitounre extFraecattiuornes N-gram Statistics Document foreach foreach Sentence Extraction Corrected Sentence Corrected Document List of n-grams Ranked list of prepositions for n-gram
  • 6.
    6 Tokenization andn-gram distance N-gram Type Distance the force PREP 3gram -2 force be PREP 3gram -1 be PREP you 3gram 0 PREP you . 3gram 1 N-gram Type Distance be PREP 2gram -1 PREP you 2gram 1 PREP . 2gram 2
  • 7.
    N-gram association measures 7 Motivation: use association measures to compute a score that will be proportional to the likelihood of an n-gram appearing together with a preposition. N-gram PMI scores by preposition force be PREP (with: -4.9), (under: -7.86), (at: -9.26), (in: -9.93), … be PREP you (with: -1.86), (amongst: -1.99), (beside: -2.26), … PREP you . (behind: -0.71), (beside: -0.82), (around: -0.84), … Background N-gram collection: Google Books N-grams.
  • 8.
    N-grams with determinerskips 8 Around 30% prepositions are used in proximity to determiners (“a”, “the”, etc.) Determiners Nouns Correct preposition ranks “one of the most”  “one of most”
  • 9.
    PMI-based Features 9 • Average rank of a preposition among the ranks of the considered n-grams; • Average PMI score of a preposition among the PMI scores of the considered n-grams; • Total number of occurrences of a certain preposition on the first position in the ranking among the ranks of the considered n-grams. Calculated across 2 logical groups (considered n-grams): • N-gram size; • N-gram distances.
  • 10.
    Central N-grams 10 Distribution of correct preposition counts on top of PMI rankings with respect to n-gram distance.
  • 11.
    Other features •Confusion matrix values • POS tags: 5 most frequent tags + “OTHER” catch-all tag; • Preposition itself: sparse vector of the size of the candidate preposition set. 11 to in of for on at with from to 0.958 0.007 0.002 0.011 0.004 0.003 0.005 0.002 in 0.037 0.79 0.01 0.009 0.066 0.036 0.015 0.008
  • 12.
    Preposition selection SupervisedLearning algorithm. • Two-class classification with a confidence score for every preposition from the candidate set; • Every candidate preposition will receive its own set of feature values; Classifier: random forest. 12
  • 13.
    Training/Test Collections Trainingcollection: • First Certificate of English (Cambridge exams) Test collections: • CoNLL-2013 (50 essays written by NUS students) • StackExchange (historical edits) Cambridge FCE CoNLL-2013 StackExhange N# sentences 27k 1.4k 6k 13
  • 14.
    Experiments: overview 1.Feature importance scores 2. N-gram sizes 3. N-gram distances 4. CoNLL and StackExchange evaluation 14
  • 15.
    15 Experiments: FeatureImportances Feature name Importance score Confusion matrix probability 0.28 Top preposition counts (3grams) 0.13 Average rank (distance=0) 0.06 Central n-gram rank 0.06 Average rank (distance=1) 0.05 All top features except “confusion matrix” are based on the PMI scores.
  • 16.
    N-gram Sizes andDistances 16 Distance restriction Precision Recall F1 score (0) 0.3077* 0.3908* 0.3442* (-1,1) 0.3231 0.4166 0.3637 (-2,2) 0.3214 0.4222 0.3648 (-5,5) 0.3223 0.4028 0.3577 None 0.3214 0.3924* 0.3532 N-gram sizes Precision Recall F1 score {2,3}-grams 0.3005* 0.3879* 0.3385* {3}-grams + skip n-grams (4) 0.2931* 0.4187 0.3447 {2,3}-grams + skip n-grams 0.3231 0.4166 0.3637 * indicates a statistically significant difference to the best performing approach (in bold)
  • 17.
    Test Collection Evaluation 17 Collection Approach Precision Recall F1 score CoNLL-2013 NARA Team @CoNLL2013 0.2910 0.1254 0.1753 N-gram-based classification 0.2592 0.3611 0.3017 StackExchange N-gram-based classification 0.1585 0.2185 0.1837 N-gram-based classification (cross-validation) 0.2704 0.2961 0.2824
  • 18.
    Takeaways 18 •PMI association measures • + Two-class preposition ranking ⇒ allow to significantly outperform the state of the art. • Skip n-grams contribute significantly to the final result. Roman Prokofyev (@rprokofyev) eXascale Infolab (exascale.info), University of Fribourg, Switzerland http://www.slideshare.net/eXascaleInfolab/
  • 19.
    Two-class vs. Multi-classclassification Multi-class: • input: one feature vector • output: one of N classes (1-N) Two-class (still with N possible outcomes) • input: N feature vectors • output: 1/0 for each of N feature vectors 19
  • 20.
    Training/Test Collections 20 Training collection: • Cambridge Learner Corpus: First Certificate of English (exams of 2000-2001), CLC FCE Test collections: • CoNLL-2013 (50 essays written by NUS students) • StackExchange (StackOverflow + Superuser) CLC FCE CoNLL-2013 SE N# sentences 27k 1.4k 6k N# prepositions 60k 3.2k 15.8k N# prepositional errors 2.9k 152 6k % errors 4.8 4.7 38.2
  • 21.
    Evaluation Metrics Precision= Recall = F1 = 21 valid _ suggested _ corrections total _ suggested _ corrections valid _ suggested _corrections total _ valid _ corrections 2·Precision·Recall Precision+ Recall

Editor's Notes

  • #2 Welcome everyone, my name is Roman Prokofyev, and I’m a PhD student at the eXascale Infolab, here at the University of Fribourg. And today I will talk Fixing Grammatical Errors by Preposition Ranking.
  • #3 I’m going to start directly with a problem and task overview, using an example sentence…
  • #4 LM – assigning different probability
  • #5 particulate word/phrase - May be it’s not the case, specially when it’s reference to as a pronoun (determined by another sentence)
  • #7 Distance between a preposition and a closest word to a preposition in the n-gram we are considering.
  • #8 Large corpus helps to overcome data sparsity of PMI
  • #9 Google n-gram corpus does not contain skip n-gram, so we made them ourselves.
  • #11 Let’s continue to feature types..
  • #12 Next feature family
  • #14 We used 2 datasets for our evaluation.
  • #21 We used 2 datasets for our evaluation.