SlideShare a Scribd company logo
1 of 24
Download to read offline
Reassessing the Goals of
Grammatical Error Correction:
Fluency instead of Grammaticality
Keisuke Sakaguchi1, Courtney Napoles1,
Matt Post1, & Joel Tetreault2
1Johns Hopkins University
2Yahoo! (now at Grammarly)
Grammaticality and Fluency
From this scope, social media has shorten
our distance.
2
From this scope, social media has
shortened our distance.
From this perspective, social media has
shortened the distance between us.
Overview (High-level)
1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
3
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
History of GEC:
Grammaticality or Fluency?
4
trying to help learners
correct small mistakes?
also trying to help them
sound more fluent?
or
Shared
Task
Target Errors Metric
HOO 11 All error types:
e.g. Prep. Punctuations, word choice …
F-score
HOO 12 Limited error types:
prepositions, determiners
F-score
CoNLL 13 Limited error types:
HOO12 + noun number, verb form, SVA
M2 (≈ F0.5)
CoNLL 14 All error types:
e.g. CoNLL13 + redundancy, word choice …
M2 (≈ F0.5)
1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
Overview (High-level)
5
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
Existing Annotation Scheme
Fine-grained error type coding (CLC: 80 types, NUCLE: 27 types)
6
<p>However, to have
<NS type ="RD"> <i>his</i>
<c>your</c></NS>
<NS type="FN"><i>photos</i>
<c>photo</c></NS>
taken and
<NS type="TV"><i>showed</i>
<c>shown</c></NS>
on television and
<NS type="MT"><c>in</c></NS>
<NS type ="MD"><c>the</c>
</NS>
newspapers increases your
popularity.</p>
<MISTAKE start_par="1" start
_off="387" end_par="1" end_
off="389"><TYPE>Prep</TYPE><
CORRECTION>of</CORRECTION></
MISTAKE>
<MISTAKE start_par= "1"
start_off="396" end_par= "1”
end_off="413"><TYPE>V0
</TYPE><CORRECTION>that are
inhospitable</CORRECTION></M
ISTAKE>
<MISTAKE start_par= "1"
start_off="422" end_par= "1”
end_off="430"><TYPE>Mec
</TYPE><CORRECTION>deserts</
CORRECTION></MISTAKE>
<MISTAKE start_par="1" start
_off="387" end_par="1" end_
off="389"><TYPE>Prep</TYPE><
CORRECTION>of</CORRECTION></
MISTAKE><MISTAKE start_par=
"1" start_off="396" end_par=
"1” end_off="413"><TYPE>V0
</TYPE><CORRECTION>that are
inhospitable</CORRECTION></M
ISTAKE><MISTAKE start_par=
"1" start_off="422" end_par=
"1” end_off="430"><TYPE>Mec
</TYPE><CORRECTION>deserts</
CORRECTION></MISTAKE>
<p>However, to have <NS type
="RD"><i>his</i><c>your</c>
</NS> <NS type="FN"><i>
photos</i><c>photo</c></NS>
taken and <NS type="TV"><i>
showed</i><c>shown</c></NS>
on television and <NS type=
"MT"><c>in</c></NS> <NS type
="MD"><c>the</c></NS>
newspapers increases your
popularity.</p>
Existing Annotation Scheme
Fine-grained error type coding (CLC: 80 types, NUCLE: 27 types)
7
1. It costs a lot to train annotators for error coding.
2. Inter-annotator agreement (IAA) is very low.
3. Downward pressure on the annotators to make
sentences just Grammatical and not Fluent.
Grammaticality and Fluency
From this scope, social media has shorten
our distance.
8
From this scope, social media has
shortened our distance.
From this perspective, social media has
shortened the distance between us.
New annotation: Fluency edits
Simply ask native speakers to rewrite the sentence to
sound natural to them.
9
1. Low cost: no error tags, no training is required
2. Scalability: All native speakers can annotate.
3. Fluency is taken into account.
New annotation: Fluency edits
- NUCLE 3.2 dataset (test: 1,312 sentences)
- Two annotators per sentence.
- Fluency edits vs. Grammatically minimal edits.
10
From this scope, social media has
shortened our distance.
From this perspective, social media has
shortened the distance between us.
Data examples
Some family may feel hurt , with regards to their family
pride or reputation , on having the knowledge of such
genetic disorder running in their family .
11
Some families may feel hurt [] with regards to their family
pride or reputation , on having [] knowledge of such a
genetic disorder running in their family .
On [] learning of such a genetic disorder running in their
family , some family members may feel hurt [] regarding
their family pride or reputation .
Minimal
Fluency
Some family members may feel hurt [] with regards to
their family pride or reputation [] on having knowledge of
a genetic disorder running in their family .
NUCLE
Preference by Native Speakers
12
Scored and Ranked by TrueSkill (as used in WMT)
(Rank groups have statistically significant difference.)
Rank Score Annotation scheme
1 1.16 Fluency edits
2 0.54 NUCLE annotation
3 0.26 minimal edits
4 -2.9 Original sentence
1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
Overview (High-level)
13
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
Metrics
MaxMatch (M2) (Dahlmeier and Ng, 2012)
Phrase level F-measure
Designed for error-coded GEC corpora.
I-measure (Felice and Briscoe, 2015)
Token level accuracy
Designed for error-coded GEC corpora.
GLEU (Napoles et al., 2015)
Similar to BLEU but considering source information.
N-gram precision with penalty term
Suitable for non-error-coded corpora.
14
Grammaticality
Fluency
Grammaticality
Annotation Scheme & Metrics
15
M2
Fluency edits
NUCLE (error coded)
(≈ Minimal edits)
Annotation Scheme
(= Reference)
Automated Metrics
I-measure
GLEU
X
Q: Which is the best combination to evaluate?
Oracle ranking (by human): Grundkiewicz et at (2015).
Correlation (Spearmsn’s r)
16
0.819
0.758
0.626
0.725
0.6
0.65
0.7
0.75
0.8
0.85
GLEU M2
Fluency NUCLE
Fluency Grammaticality
N.B. I-measure showed weakly
negative correlations (omitted).
Correlation (Pearson’s r)
17
0.731
0.665
0.646
0.677
0.6
0.65
0.7
0.75
GLEU M2
Fluency NUCLE
GrammaticalityFluency
N.B. I-measure showed weakly
negative correlations (omitted).
1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
Overview (High-level)
18
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
New annotation: Fluency edits
- NUCLE 3.2 dataset (1,312 sentences)
- Two crowdsourced annotators per sentence.
- Qualified by expert editors
- Reward: $0.07 ~ $0.1 per sentence
- Total: Approx. $240 in 24 hours.
Q: How is the quality of their edits?
19
Data examples
Some family may feel hurt , with regards to their family
pride or reputation , on having the knowledge of such
genetic disorder running in their family .
20
On [] learning of such a genetic disorder running in their
family , some family members may feel hurt [] regarding
their family pride or reputation .
Some relatives may [] be concerned about the family’s []
reputation – not to mention their own pride – in relation to
this news of [] familial genetic defectiveness [] .
Fluency
Some family members may feel hurt [] with regards to
their family pride or reputation [] on having knowledge of
a genetic disorder running in their family .
NUCLE
Fluency
Preference by Native Speakers
21
Scored and Ranked by TrueSkill (as used in WMT)
(Rank groups have statistically significant difference.)
Rank Score Annotation scheme
1 1.16 Fluency edits
0.97 Fluency edits
2 0.26 NUCLE reference
3 -2.9 Original sentence
Correlation (GLEU metric)
22
0.819
0.676
0.626
0.6
0.65
0.7
0.75
0.8
0.85
Spearman
Fluency Fluency crowd NUCLE
0.731
0.668
0.646
0.6
0.65
0.7
0.75
Pearson
Fluency Fluency crowd NUCLE
Summary
23
New corpora should be produced regularly (e.g. SMT),
and avoid over-reliance on a single annotated corpus
1. GEC community has not clearly distinguished
Grammaticality and Fluency.
2. Fluency-oriented annotations and metric
- Native speakers preference
- Higher correlation to human ranking
- Easier and cheaper to collect new datasets
All the Fluency reference are available at
https://github.com/keisks/reassess-gec.git
For exhaustive experiments and analysis:
http://aclweb.org/anthology/Q/Q16/Q16-1013.pdf
Thank you!
24

More Related Content

Similar to TACL16_Sakaguchi

A Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy QueriesA Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy Queriesijdms
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxfredharris32
 
On the optimal classifier for affective vocal bursts and stuttering predictions
On the optimal classifier for affective vocal bursts and stuttering predictionsOn the optimal classifier for affective vocal bursts and stuttering predictions
On the optimal classifier for affective vocal bursts and stuttering predictionsBagusTrisAtmaja
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyPerla Pelicano Corpez
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
 
COMM 102 Inspiring Innovation/tutorialrank.com
 COMM 102 Inspiring Innovation/tutorialrank.com COMM 102 Inspiring Innovation/tutorialrank.com
COMM 102 Inspiring Innovation/tutorialrank.comjonhson119
 
Social Science Statistics STA2122.501 ● ONLINE Project 3.docx
Social Science Statistics STA2122.501 ● ONLINE Project 3.docxSocial Science Statistics STA2122.501 ● ONLINE Project 3.docx
Social Science Statistics STA2122.501 ● ONLINE Project 3.docxrosemariebrayshaw
 
ACTFL Conference Presentation
ACTFL Conference PresentationACTFL Conference Presentation
ACTFL Conference Presentationgrcranwell92
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET Journal
 
Essential biology 01 statistical analysis
Essential biology 01 statistical analysisEssential biology 01 statistical analysis
Essential biology 01 statistical analysismcnewbold
 
Social Science Statistics STA2122.501 ● ONLINE Project 3
Social Science Statistics STA2122.501 ● ONLINE Project 3Social Science Statistics STA2122.501 ● ONLINE Project 3
Social Science Statistics STA2122.501 ● ONLINE Project 3ChereCheek752
 
8323 Stats - Lesson 1 - 02 Introduction General 2008
8323 Stats - Lesson 1 - 02 Introduction General 20088323 Stats - Lesson 1 - 02 Introduction General 2008
8323 Stats - Lesson 1 - 02 Introduction General 2008untellectualism
 
Cwpa 2016 comparative revision writing
Cwpa 2016 comparative revision writingCwpa 2016 comparative revision writing
Cwpa 2016 comparative revision writingmacktial
 
IRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET Journal
 
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Cemal Ardil
 
Thamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slidesThamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slidesThamme Gowda
 

Similar to TACL16_Sakaguchi (20)

A Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy QueriesA Framework for Computing Linguistic Hedges in Fuzzy Queries
A Framework for Computing Linguistic Hedges in Fuzzy Queries
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
 
On the optimal classifier for affective vocal bursts and stuttering predictions
On the optimal classifier for affective vocal bursts and stuttering predictionsOn the optimal classifier for affective vocal bursts and stuttering predictions
On the optimal classifier for affective vocal bursts and stuttering predictions
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep Learning
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendency
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
 
COMM 102 Inspiring Innovation/tutorialrank.com
 COMM 102 Inspiring Innovation/tutorialrank.com COMM 102 Inspiring Innovation/tutorialrank.com
COMM 102 Inspiring Innovation/tutorialrank.com
 
Expert System Seminar
Expert System SeminarExpert System Seminar
Expert System Seminar
 
Social Science Statistics STA2122.501 ● ONLINE Project 3.docx
Social Science Statistics STA2122.501 ● ONLINE Project 3.docxSocial Science Statistics STA2122.501 ● ONLINE Project 3.docx
Social Science Statistics STA2122.501 ● ONLINE Project 3.docx
 
ACTFL Conference Presentation
ACTFL Conference PresentationACTFL Conference Presentation
ACTFL Conference Presentation
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational Videos
 
Essential biology 01 statistical analysis
Essential biology 01 statistical analysisEssential biology 01 statistical analysis
Essential biology 01 statistical analysis
 
Social Science Statistics STA2122.501 ● ONLINE Project 3
Social Science Statistics STA2122.501 ● ONLINE Project 3Social Science Statistics STA2122.501 ● ONLINE Project 3
Social Science Statistics STA2122.501 ● ONLINE Project 3
 
Definition Of Courage Essay.pdf
Definition Of Courage Essay.pdfDefinition Of Courage Essay.pdf
Definition Of Courage Essay.pdf
 
8323 Stats - Lesson 1 - 02 Introduction General 2008
8323 Stats - Lesson 1 - 02 Introduction General 20088323 Stats - Lesson 1 - 02 Introduction General 2008
8323 Stats - Lesson 1 - 02 Introduction General 2008
 
Cwpa 2016 comparative revision writing
Cwpa 2016 comparative revision writingCwpa 2016 comparative revision writing
Cwpa 2016 comparative revision writing
 
IRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET- Natural Language Query Processing
IRJET- Natural Language Query Processing
 
MTP and SI.pptx
MTP and SI.pptxMTP and SI.pptx
MTP and SI.pptx
 
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
 
Thamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slidesThamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slides
 

More from Keisuke Sakaguchi (10)

EMNLP 2021 proScript (summary slides)
EMNLP 2021 proScript (summary slides)EMNLP 2021 proScript (summary slides)
EMNLP 2021 proScript (summary slides)
 
EMNLP 2021 proScript
EMNLP 2021 proScriptEMNLP 2021 proScript
EMNLP 2021 proScript
 
Acl18 sakaguchi
Acl18 sakaguchiAcl18 sakaguchi
Acl18 sakaguchi
 
Ijcnlp17 sakaguchi
Ijcnlp17 sakaguchiIjcnlp17 sakaguchi
Ijcnlp17 sakaguchi
 
ACL17_Sakaguchi
ACL17_SakaguchiACL17_Sakaguchi
ACL17_Sakaguchi
 
NAACL15_sakaguchi
NAACL15_sakaguchiNAACL15_sakaguchi
NAACL15_sakaguchi
 
BEA12_sakaguchi
BEA12_sakaguchiBEA12_sakaguchi
BEA12_sakaguchi
 
ACL13_sakaguchi
ACL13_sakaguchiACL13_sakaguchi
ACL13_sakaguchi
 
WMT14_sakaguchi
WMT14_sakaguchiWMT14_sakaguchi
WMT14_sakaguchi
 
COLING12_sakaguchi
COLING12_sakaguchiCOLING12_sakaguchi
COLING12_sakaguchi
 

Recently uploaded

(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

TACL16_Sakaguchi

  • 1. Reassessing the Goals of Grammatical Error Correction: Fluency instead of Grammaticality Keisuke Sakaguchi1, Courtney Napoles1, Matt Post1, & Joel Tetreault2 1Johns Hopkins University 2Yahoo! (now at Grammarly)
  • 2. Grammaticality and Fluency From this scope, social media has shorten our distance. 2 From this scope, social media has shortened our distance. From this perspective, social media has shortened the distance between us.
  • 3. Overview (High-level) 1. GEC community has not clearly distinguished Grammaticality and Fluency. 2. Fluency-oriented annotations and metric - Native speakers preference - Higher correlation to human ranking - Easier and cheaper to collect new datasets 3 New corpora should be produced regularly (e.g. SMT), and avoid over-reliance on a single annotated corpus
  • 4. History of GEC: Grammaticality or Fluency? 4 trying to help learners correct small mistakes? also trying to help them sound more fluent? or Shared Task Target Errors Metric HOO 11 All error types: e.g. Prep. Punctuations, word choice … F-score HOO 12 Limited error types: prepositions, determiners F-score CoNLL 13 Limited error types: HOO12 + noun number, verb form, SVA M2 (≈ F0.5) CoNLL 14 All error types: e.g. CoNLL13 + redundancy, word choice … M2 (≈ F0.5)
  • 5. 1. GEC community has not clearly distinguished Grammaticality and Fluency. 2. Fluency-oriented annotations and metric - Native speakers preference - Higher correlation to human ranking - Easier and cheaper to collect new datasets Overview (High-level) 5 New corpora should be produced regularly (e.g. SMT), and avoid over-reliance on a single annotated corpus
  • 6. Existing Annotation Scheme Fine-grained error type coding (CLC: 80 types, NUCLE: 27 types) 6 <p>However, to have <NS type ="RD"> <i>his</i> <c>your</c></NS> <NS type="FN"><i>photos</i> <c>photo</c></NS> taken and <NS type="TV"><i>showed</i> <c>shown</c></NS> on television and <NS type="MT"><c>in</c></NS> <NS type ="MD"><c>the</c> </NS> newspapers increases your popularity.</p> <MISTAKE start_par="1" start _off="387" end_par="1" end_ off="389"><TYPE>Prep</TYPE>< CORRECTION>of</CORRECTION></ MISTAKE> <MISTAKE start_par= "1" start_off="396" end_par= "1” end_off="413"><TYPE>V0 </TYPE><CORRECTION>that are inhospitable</CORRECTION></M ISTAKE> <MISTAKE start_par= "1" start_off="422" end_par= "1” end_off="430"><TYPE>Mec </TYPE><CORRECTION>deserts</ CORRECTION></MISTAKE>
  • 7. <MISTAKE start_par="1" start _off="387" end_par="1" end_ off="389"><TYPE>Prep</TYPE>< CORRECTION>of</CORRECTION></ MISTAKE><MISTAKE start_par= "1" start_off="396" end_par= "1” end_off="413"><TYPE>V0 </TYPE><CORRECTION>that are inhospitable</CORRECTION></M ISTAKE><MISTAKE start_par= "1" start_off="422" end_par= "1” end_off="430"><TYPE>Mec </TYPE><CORRECTION>deserts</ CORRECTION></MISTAKE> <p>However, to have <NS type ="RD"><i>his</i><c>your</c> </NS> <NS type="FN"><i> photos</i><c>photo</c></NS> taken and <NS type="TV"><i> showed</i><c>shown</c></NS> on television and <NS type= "MT"><c>in</c></NS> <NS type ="MD"><c>the</c></NS> newspapers increases your popularity.</p> Existing Annotation Scheme Fine-grained error type coding (CLC: 80 types, NUCLE: 27 types) 7 1. It costs a lot to train annotators for error coding. 2. Inter-annotator agreement (IAA) is very low. 3. Downward pressure on the annotators to make sentences just Grammatical and not Fluent.
  • 8. Grammaticality and Fluency From this scope, social media has shorten our distance. 8 From this scope, social media has shortened our distance. From this perspective, social media has shortened the distance between us.
  • 9. New annotation: Fluency edits Simply ask native speakers to rewrite the sentence to sound natural to them. 9 1. Low cost: no error tags, no training is required 2. Scalability: All native speakers can annotate. 3. Fluency is taken into account.
  • 10. New annotation: Fluency edits - NUCLE 3.2 dataset (test: 1,312 sentences) - Two annotators per sentence. - Fluency edits vs. Grammatically minimal edits. 10 From this scope, social media has shortened our distance. From this perspective, social media has shortened the distance between us.
  • 11. Data examples Some family may feel hurt , with regards to their family pride or reputation , on having the knowledge of such genetic disorder running in their family . 11 Some families may feel hurt [] with regards to their family pride or reputation , on having [] knowledge of such a genetic disorder running in their family . On [] learning of such a genetic disorder running in their family , some family members may feel hurt [] regarding their family pride or reputation . Minimal Fluency Some family members may feel hurt [] with regards to their family pride or reputation [] on having knowledge of a genetic disorder running in their family . NUCLE
  • 12. Preference by Native Speakers 12 Scored and Ranked by TrueSkill (as used in WMT) (Rank groups have statistically significant difference.) Rank Score Annotation scheme 1 1.16 Fluency edits 2 0.54 NUCLE annotation 3 0.26 minimal edits 4 -2.9 Original sentence
  • 13. 1. GEC community has not clearly distinguished Grammaticality and Fluency. 2. Fluency-oriented annotations and metric - Native speakers preference - Higher correlation to human ranking - Easier and cheaper to collect new datasets Overview (High-level) 13 New corpora should be produced regularly (e.g. SMT), and avoid over-reliance on a single annotated corpus
  • 14. Metrics MaxMatch (M2) (Dahlmeier and Ng, 2012) Phrase level F-measure Designed for error-coded GEC corpora. I-measure (Felice and Briscoe, 2015) Token level accuracy Designed for error-coded GEC corpora. GLEU (Napoles et al., 2015) Similar to BLEU but considering source information. N-gram precision with penalty term Suitable for non-error-coded corpora. 14 Grammaticality Fluency Grammaticality
  • 15. Annotation Scheme & Metrics 15 M2 Fluency edits NUCLE (error coded) (≈ Minimal edits) Annotation Scheme (= Reference) Automated Metrics I-measure GLEU X Q: Which is the best combination to evaluate? Oracle ranking (by human): Grundkiewicz et at (2015).
  • 16. Correlation (Spearmsn’s r) 16 0.819 0.758 0.626 0.725 0.6 0.65 0.7 0.75 0.8 0.85 GLEU M2 Fluency NUCLE Fluency Grammaticality N.B. I-measure showed weakly negative correlations (omitted).
  • 17. Correlation (Pearson’s r) 17 0.731 0.665 0.646 0.677 0.6 0.65 0.7 0.75 GLEU M2 Fluency NUCLE GrammaticalityFluency N.B. I-measure showed weakly negative correlations (omitted).
  • 18. 1. GEC community has not clearly distinguished Grammaticality and Fluency. 2. Fluency-oriented annotations and metric - Native speakers preference - Higher correlation to human ranking - Easier and cheaper to collect new datasets Overview (High-level) 18 New corpora should be produced regularly (e.g. SMT), and avoid over-reliance on a single annotated corpus
  • 19. New annotation: Fluency edits - NUCLE 3.2 dataset (1,312 sentences) - Two crowdsourced annotators per sentence. - Qualified by expert editors - Reward: $0.07 ~ $0.1 per sentence - Total: Approx. $240 in 24 hours. Q: How is the quality of their edits? 19
  • 20. Data examples Some family may feel hurt , with regards to their family pride or reputation , on having the knowledge of such genetic disorder running in their family . 20 On [] learning of such a genetic disorder running in their family , some family members may feel hurt [] regarding their family pride or reputation . Some relatives may [] be concerned about the family’s [] reputation – not to mention their own pride – in relation to this news of [] familial genetic defectiveness [] . Fluency Some family members may feel hurt [] with regards to their family pride or reputation [] on having knowledge of a genetic disorder running in their family . NUCLE Fluency
  • 21. Preference by Native Speakers 21 Scored and Ranked by TrueSkill (as used in WMT) (Rank groups have statistically significant difference.) Rank Score Annotation scheme 1 1.16 Fluency edits 0.97 Fluency edits 2 0.26 NUCLE reference 3 -2.9 Original sentence
  • 22. Correlation (GLEU metric) 22 0.819 0.676 0.626 0.6 0.65 0.7 0.75 0.8 0.85 Spearman Fluency Fluency crowd NUCLE 0.731 0.668 0.646 0.6 0.65 0.7 0.75 Pearson Fluency Fluency crowd NUCLE
  • 23. Summary 23 New corpora should be produced regularly (e.g. SMT), and avoid over-reliance on a single annotated corpus 1. GEC community has not clearly distinguished Grammaticality and Fluency. 2. Fluency-oriented annotations and metric - Native speakers preference - Higher correlation to human ranking - Easier and cheaper to collect new datasets
  • 24. All the Fluency reference are available at https://github.com/keisks/reassess-gec.git For exhaustive experiments and analysis: http://aclweb.org/anthology/Q/Q16/Q16-1013.pdf Thank you! 24