SlideShare a Scribd company logo
Correct Me If I’m Wrong: Fixing Grammatical 
Errors by Preposition Ranking 
Roman Prokofyev, Ruslan Mavlyutov, Gianluca Demartini and 
Philippe Cudre-Mauroux 
eXascale Infolab 
University of Fribourg, Switzerland 
November 4th, CIKM’14 
Shanghai, China 
1
Motivations and Task Overview 
2 
• Grammatical correction is important by itself 
• Also as a part of Machine Translation or Speech Recognition 
Correction of textual content written by English Learners. 
I am new in android programming. 
[to, at, for, …] 
⇒ Rank candidate prepositions by their likelihood of being 
correct in order to potentially replace the original.
3 
State-of-Art in Grammar Correction 
Approaches: 
• Multi-class classification using pre-defined candidate set 
• Statistical Machine Translation 
• Language modeling 
Features: 
• Lexical features, POS tags 
• Head verbs/nouns, word dependencies 
• N-gram counts from large corpora (Google Web 1-T) 
• Confusion matrix
Key Ideas 
• Usage of a particular preposition is governed by a 
particular word/n-gram; 
⇒ Task: select/aggregate n-grams that influence 
preposition usage; 
⇒ Use n-gram association measures to score each 
preposition. 
4
Processing Pipeline 
5 
1 ... m 
n-gram 
construction 
Sentence 
Tokenization 
Supervised 
Classifier 
Feature 
extrFaecatitounre 
extFraecattiuornes N-gram 
Statistics 
Document 
foreach 
foreach 
Sentence 
Extraction 
Corrected 
Sentence 
Corrected 
Document 
List of 
n-grams 
Ranked list 
of 
prepositions 
for n-gram
6 
Tokenization and n-gram distance 
N-gram Type Distance 
the force PREP 3gram -2 
force be PREP 3gram -1 
be PREP you 3gram 0 
PREP you . 3gram 1 
N-gram Type Distance 
be PREP 2gram -1 
PREP you 2gram 1 
PREP . 2gram 2
N-gram association measures 
7 
Motivation: 
use association measures to compute a score that will be 
proportional to the likelihood of an n-gram appearing 
together with a preposition. 
N-gram PMI scores by preposition 
force be PREP (with: -4.9), (under: -7.86), (at: -9.26), (in: -9.93), … 
be PREP you (with: -1.86), (amongst: -1.99), (beside: -2.26), … 
PREP you . (behind: -0.71), (beside: -0.82), (around: -0.84), … 
Background N-gram collection: Google Books N-grams.
N-grams with determiner skips 
8 
Around 30% prepositions are used in proximity to 
determiners (“a”, “the”, etc.) 
Determiners Nouns 
Correct preposition ranks 
“one of the most”  “one of most”
PMI-based Features 
9 
• Average rank of a preposition among the ranks of the 
considered n-grams; 
• Average PMI score of a preposition among the PMI 
scores of the considered n-grams; 
• Total number of occurrences of a certain preposition on 
the first position in the ranking among the ranks of the 
considered n-grams. 
Calculated across 2 logical groups (considered n-grams): 
• N-gram size; 
• N-gram distances.
Central N-grams 
10 
Distribution of correct preposition counts on top of PMI 
rankings with respect to n-gram distance.
Other features 
• Confusion matrix values 
• POS tags: 5 most frequent tags + “OTHER” catch-all tag; 
• Preposition itself: sparse vector of the size of the 
candidate preposition set. 
11 
to in of for on at with from 
to 0.958 0.007 0.002 0.011 0.004 0.003 0.005 0.002 
in 0.037 0.79 0.01 0.009 0.066 0.036 0.015 0.008
Preposition selection 
Supervised Learning algorithm. 
• Two-class classification with a confidence score for every 
preposition from the candidate set; 
• Every candidate preposition will receive its own set of 
feature values; 
Classifier: random forest. 
12
Training/Test Collections 
Training collection: 
• First Certificate of English (Cambridge exams) 
Test collections: 
• CoNLL-2013 (50 essays written by NUS students) 
• StackExchange (historical edits) 
Cambridge FCE CoNLL-2013 StackExhange 
N# sentences 27k 1.4k 6k 
13
Experiments: overview 
1. Feature importance scores 
2. N-gram sizes 
3. N-gram distances 
4. CoNLL and StackExchange evaluation 
14
15 
Experiments: Feature Importances 
Feature name Importance score 
Confusion matrix probability 0.28 
Top preposition counts (3grams) 0.13 
Average rank (distance=0) 0.06 
Central n-gram rank 0.06 
Average rank (distance=1) 0.05 
All top features except “confusion matrix” are based on the 
PMI scores.
N-gram Sizes and Distances 
16 
Distance restriction Precision Recall F1 score 
(0) 0.3077* 0.3908* 0.3442* 
(-1,1) 0.3231 0.4166 0.3637 
(-2,2) 0.3214 0.4222 0.3648 
(-5,5) 0.3223 0.4028 0.3577 
None 0.3214 0.3924* 0.3532 
N-gram sizes Precision Recall F1 score 
{2,3}-grams 0.3005* 0.3879* 0.3385* 
{3}-grams + skip n-grams (4) 0.2931* 0.4187 0.3447 
{2,3}-grams + skip n-grams 0.3231 0.4166 0.3637 
* indicates a statistically significant difference to the best performing 
approach (in bold)
Test Collection Evaluation 
17 
Collection Approach Precision Recall F1 
score 
CoNLL-2013 
NARA Team @CoNLL2013 0.2910 0.1254 0.1753 
N-gram-based classification 0.2592 0.3611 0.3017 
StackExchange 
N-gram-based classification 0.1585 0.2185 0.1837 
N-gram-based classification 
(cross-validation) 
0.2704 0.2961 0.2824
Takeaways 
18 
• PMI association measures 
• + Two-class preposition ranking 
⇒ allow to significantly outperform the state of the art. 
• Skip n-grams contribute significantly to the final result. 
Roman Prokofyev (@rprokofyev) 
eXascale Infolab (exascale.info), University of Fribourg, Switzerland 
http://www.slideshare.net/eXascaleInfolab/
Two-class vs. Multi-class classification 
Multi-class: 
• input: one feature vector 
• output: one of N classes (1-N) 
Two-class (still with N possible outcomes) 
• input: N feature vectors 
• output: 1/0 for each of N feature vectors 
19
Training/Test Collections 
20 
Training collection: 
• Cambridge Learner Corpus: First Certificate of English 
(exams of 2000-2001), CLC FCE 
Test collections: 
• CoNLL-2013 (50 essays written by NUS students) 
• StackExchange (StackOverflow + Superuser) 
CLC FCE CoNLL-2013 SE 
N# sentences 27k 1.4k 6k 
N# prepositions 60k 3.2k 15.8k 
N# prepositional errors 2.9k 152 6k 
% errors 4.8 4.7 38.2
Evaluation Metrics 
Precision = 
Recall = 
F1 = 
21 
valid _ suggested _ corrections 
total _ suggested _ corrections 
valid _ suggested _corrections 
total _ valid _ corrections 
2·Precision·Recall 
Precision+ Recall

More Related Content

Viewers also liked

Preposition
PrepositionPreposition
Preposition
johnnica gaupo
 
Adverb or preposition
Adverb or prepositionAdverb or preposition
Adverb or preposition
LHunter202
 
Preposition
PrepositionPreposition
Preposition
Karan Rupani
 
Power Point Presentation About Preposition
Power Point Presentation About PrepositionPower Point Presentation About Preposition
Power Point Presentation About Preposition
Aji Subekti
 
Preposition
PrepositionPreposition
Preposition
younes Anas
 
Belajar Preposition of Time Bahasa Inggris
Belajar Preposition of Time Bahasa InggrisBelajar Preposition of Time Bahasa Inggris
07 Prepositions
07 Prepositions07 Prepositions
07 Prepositions
guest421523a3
 
preposition 1
preposition 1preposition 1
preposition 1
marisolconner
 
Prepositions
PrepositionsPrepositions
Prepositions
EDLYN JOVEN
 
Prepositions
PrepositionsPrepositions
Preposition(1)
Preposition(1)Preposition(1)
Preposition(1)
Hatice Kübra Aldatmaz
 
Preposition, adjective, adverb
Preposition, adjective, adverbPreposition, adjective, adverb
Preposition, adjective, adverb
Breeze Brinks
 
Lesson Plan for Preposition – Std. II by Marina Corda
Lesson Plan for Preposition – Std. II by Marina CordaLesson Plan for Preposition – Std. II by Marina Corda
Lesson Plan for Preposition – Std. II by Marina Corda
Marina Corda
 
Preposition of place
Preposition of placePreposition of place
Preposition of place
Cate Atehortua
 
Year 5 English Unit 9 Preposition
Year 5 English Unit 9 PrepositionYear 5 English Unit 9 Preposition
Year 5 English Unit 9 Preposition
Grace Ng
 
Preposition
PrepositionPreposition
Preposition
Shakira Fonseca
 
Preposition of place
Preposition of placePreposition of place
Preposition of place
Bunda Azzam Elkarim
 
Presentación1 preposition of place by teacher iexi arauz
Presentación1 preposition of place by teacher iexi arauzPresentación1 preposition of place by teacher iexi arauz
Presentación1 preposition of place by teacher iexi arauz
iexi
 
Presentation of english (parts of speech)
Presentation of english (parts of speech)Presentation of english (parts of speech)
Presentation of english (parts of speech)
Kunnu Aggarwal
 
Learning about prepositions
Learning about prepositions Learning about prepositions
Learning about prepositions
BMaram
 

Viewers also liked (20)

Preposition
PrepositionPreposition
Preposition
 
Adverb or preposition
Adverb or prepositionAdverb or preposition
Adverb or preposition
 
Preposition
PrepositionPreposition
Preposition
 
Power Point Presentation About Preposition
Power Point Presentation About PrepositionPower Point Presentation About Preposition
Power Point Presentation About Preposition
 
Preposition
PrepositionPreposition
Preposition
 
Belajar Preposition of Time Bahasa Inggris
Belajar Preposition of Time Bahasa InggrisBelajar Preposition of Time Bahasa Inggris
Belajar Preposition of Time Bahasa Inggris
 
07 Prepositions
07 Prepositions07 Prepositions
07 Prepositions
 
preposition 1
preposition 1preposition 1
preposition 1
 
Prepositions
PrepositionsPrepositions
Prepositions
 
Prepositions
PrepositionsPrepositions
Prepositions
 
Preposition(1)
Preposition(1)Preposition(1)
Preposition(1)
 
Preposition, adjective, adverb
Preposition, adjective, adverbPreposition, adjective, adverb
Preposition, adjective, adverb
 
Lesson Plan for Preposition – Std. II by Marina Corda
Lesson Plan for Preposition – Std. II by Marina CordaLesson Plan for Preposition – Std. II by Marina Corda
Lesson Plan for Preposition – Std. II by Marina Corda
 
Preposition of place
Preposition of placePreposition of place
Preposition of place
 
Year 5 English Unit 9 Preposition
Year 5 English Unit 9 PrepositionYear 5 English Unit 9 Preposition
Year 5 English Unit 9 Preposition
 
Preposition
PrepositionPreposition
Preposition
 
Preposition of place
Preposition of placePreposition of place
Preposition of place
 
Presentación1 preposition of place by teacher iexi arauz
Presentación1 preposition of place by teacher iexi arauzPresentación1 preposition of place by teacher iexi arauz
Presentación1 preposition of place by teacher iexi arauz
 
Presentation of english (parts of speech)
Presentation of english (parts of speech)Presentation of english (parts of speech)
Presentation of english (parts of speech)
 
Learning about prepositions
Learning about prepositions Learning about prepositions
Learning about prepositions
 

Similar to CIKM14: Fixing grammatical errors by preposition ranking

[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
Seoul National University
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
Alon Bochman, CFA
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
広樹 本間
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
pathsproject
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
Luis Borbon
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Vikash Kumar
 
Sentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural NetworksSentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural Networks
Adrián Palacios Corella
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
yang947066
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
johanericka2
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
Sharvil Katariya
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
Poo Kuan Hoong
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Sebastian Ruder
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
ananth
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
butest
 
Numerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminarsNumerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminars
PhD ISG, Sapienza University of Rome
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain
 
Pegasus
PegasusPegasus
Pegasus
Hangil Kim
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
Jihun Park
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
ESCOM
 

Similar to CIKM14: Fixing grammatical errors by preposition ranking (20)

[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Sentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural NetworksSentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural Networks
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Numerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminarsNumerical and experimental investigation on existing structures: two seminars
Numerical and experimental investigation on existing structures: two seminars
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Pegasus
PegasusPegasus
Pegasus
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
eXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
eXascale Infolab
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
eXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
eXascale Infolab
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
eXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
eXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
eXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
eXascale Infolab
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
eXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
eXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
eXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
eXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
eXascale Infolab
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
eXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
eXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 

CIKM14: Fixing grammatical errors by preposition ranking

  • 1. Correct Me If I’m Wrong: Fixing Grammatical Errors by Preposition Ranking Roman Prokofyev, Ruslan Mavlyutov, Gianluca Demartini and Philippe Cudre-Mauroux eXascale Infolab University of Fribourg, Switzerland November 4th, CIKM’14 Shanghai, China 1
  • 2. Motivations and Task Overview 2 • Grammatical correction is important by itself • Also as a part of Machine Translation or Speech Recognition Correction of textual content written by English Learners. I am new in android programming. [to, at, for, …] ⇒ Rank candidate prepositions by their likelihood of being correct in order to potentially replace the original.
  • 3. 3 State-of-Art in Grammar Correction Approaches: • Multi-class classification using pre-defined candidate set • Statistical Machine Translation • Language modeling Features: • Lexical features, POS tags • Head verbs/nouns, word dependencies • N-gram counts from large corpora (Google Web 1-T) • Confusion matrix
  • 4. Key Ideas • Usage of a particular preposition is governed by a particular word/n-gram; ⇒ Task: select/aggregate n-grams that influence preposition usage; ⇒ Use n-gram association measures to score each preposition. 4
  • 5. Processing Pipeline 5 1 ... m n-gram construction Sentence Tokenization Supervised Classifier Feature extrFaecatitounre extFraecattiuornes N-gram Statistics Document foreach foreach Sentence Extraction Corrected Sentence Corrected Document List of n-grams Ranked list of prepositions for n-gram
  • 6. 6 Tokenization and n-gram distance N-gram Type Distance the force PREP 3gram -2 force be PREP 3gram -1 be PREP you 3gram 0 PREP you . 3gram 1 N-gram Type Distance be PREP 2gram -1 PREP you 2gram 1 PREP . 2gram 2
  • 7. N-gram association measures 7 Motivation: use association measures to compute a score that will be proportional to the likelihood of an n-gram appearing together with a preposition. N-gram PMI scores by preposition force be PREP (with: -4.9), (under: -7.86), (at: -9.26), (in: -9.93), … be PREP you (with: -1.86), (amongst: -1.99), (beside: -2.26), … PREP you . (behind: -0.71), (beside: -0.82), (around: -0.84), … Background N-gram collection: Google Books N-grams.
  • 8. N-grams with determiner skips 8 Around 30% prepositions are used in proximity to determiners (“a”, “the”, etc.) Determiners Nouns Correct preposition ranks “one of the most”  “one of most”
  • 9. PMI-based Features 9 • Average rank of a preposition among the ranks of the considered n-grams; • Average PMI score of a preposition among the PMI scores of the considered n-grams; • Total number of occurrences of a certain preposition on the first position in the ranking among the ranks of the considered n-grams. Calculated across 2 logical groups (considered n-grams): • N-gram size; • N-gram distances.
  • 10. Central N-grams 10 Distribution of correct preposition counts on top of PMI rankings with respect to n-gram distance.
  • 11. Other features • Confusion matrix values • POS tags: 5 most frequent tags + “OTHER” catch-all tag; • Preposition itself: sparse vector of the size of the candidate preposition set. 11 to in of for on at with from to 0.958 0.007 0.002 0.011 0.004 0.003 0.005 0.002 in 0.037 0.79 0.01 0.009 0.066 0.036 0.015 0.008
  • 12. Preposition selection Supervised Learning algorithm. • Two-class classification with a confidence score for every preposition from the candidate set; • Every candidate preposition will receive its own set of feature values; Classifier: random forest. 12
  • 13. Training/Test Collections Training collection: • First Certificate of English (Cambridge exams) Test collections: • CoNLL-2013 (50 essays written by NUS students) • StackExchange (historical edits) Cambridge FCE CoNLL-2013 StackExhange N# sentences 27k 1.4k 6k 13
  • 14. Experiments: overview 1. Feature importance scores 2. N-gram sizes 3. N-gram distances 4. CoNLL and StackExchange evaluation 14
  • 15. 15 Experiments: Feature Importances Feature name Importance score Confusion matrix probability 0.28 Top preposition counts (3grams) 0.13 Average rank (distance=0) 0.06 Central n-gram rank 0.06 Average rank (distance=1) 0.05 All top features except “confusion matrix” are based on the PMI scores.
  • 16. N-gram Sizes and Distances 16 Distance restriction Precision Recall F1 score (0) 0.3077* 0.3908* 0.3442* (-1,1) 0.3231 0.4166 0.3637 (-2,2) 0.3214 0.4222 0.3648 (-5,5) 0.3223 0.4028 0.3577 None 0.3214 0.3924* 0.3532 N-gram sizes Precision Recall F1 score {2,3}-grams 0.3005* 0.3879* 0.3385* {3}-grams + skip n-grams (4) 0.2931* 0.4187 0.3447 {2,3}-grams + skip n-grams 0.3231 0.4166 0.3637 * indicates a statistically significant difference to the best performing approach (in bold)
  • 17. Test Collection Evaluation 17 Collection Approach Precision Recall F1 score CoNLL-2013 NARA Team @CoNLL2013 0.2910 0.1254 0.1753 N-gram-based classification 0.2592 0.3611 0.3017 StackExchange N-gram-based classification 0.1585 0.2185 0.1837 N-gram-based classification (cross-validation) 0.2704 0.2961 0.2824
  • 18. Takeaways 18 • PMI association measures • + Two-class preposition ranking ⇒ allow to significantly outperform the state of the art. • Skip n-grams contribute significantly to the final result. Roman Prokofyev (@rprokofyev) eXascale Infolab (exascale.info), University of Fribourg, Switzerland http://www.slideshare.net/eXascaleInfolab/
  • 19. Two-class vs. Multi-class classification Multi-class: • input: one feature vector • output: one of N classes (1-N) Two-class (still with N possible outcomes) • input: N feature vectors • output: 1/0 for each of N feature vectors 19
  • 20. Training/Test Collections 20 Training collection: • Cambridge Learner Corpus: First Certificate of English (exams of 2000-2001), CLC FCE Test collections: • CoNLL-2013 (50 essays written by NUS students) • StackExchange (StackOverflow + Superuser) CLC FCE CoNLL-2013 SE N# sentences 27k 1.4k 6k N# prepositions 60k 3.2k 15.8k N# prepositional errors 2.9k 152 6k % errors 4.8 4.7 38.2
  • 21. Evaluation Metrics Precision = Recall = F1 = 21 valid _ suggested _ corrections total _ suggested _ corrections valid _ suggested _corrections total _ valid _ corrections 2·Precision·Recall Precision+ Recall

Editor's Notes

  1. Welcome everyone, my name is Roman Prokofyev, and I’m a PhD student at the eXascale Infolab, here at the University of Fribourg. And today I will talk Fixing Grammatical Errors by Preposition Ranking.
  2. I’m going to start directly with a problem and task overview, using an example sentence…
  3. LM – assigning different probability
  4. particulate word/phrase - May be it’s not the case, specially when it’s reference to as a pronoun (determined by another sentence)
  5. Distance between a preposition and a closest word to a preposition in the n-gram we are considering.
  6. Large corpus helps to overcome data sparsity of PMI
  7. Google n-gram corpus does not contain skip n-gram, so we made them ourselves.
  8. Let’s continue to feature types..
  9. Next feature family
  10. We used 2 datasets for our evaluation.
  11. We used 2 datasets for our evaluation.