SlideShare a Scribd company logo
0
Grammatical Error Correction
with Improved Real-world Applicability
実世界への適⽤性を指向した⽂法誤り訂正
三⽥ 雅⼈
情報科学研究科 システム情報科学専攻 乾研究室
博⼠論⽂本審査
2021年7⽉20⽇ @オンライン
Background
2
• Millions of people are learning English as a Second Language (ESL)
→ According to a report published by the British Council in 2013, English is spoken
at a useful level by 1.75 billion people worldwide
• Due to the difficulty of learning a new language, their written texts
may contain grammatical errors [Nagata et al.,2011; Dahlmeier et al.,2013]
e.g.) KJ corpus
Interests in automatic error correction
3
Commercial perspective:
• A great potential for many real-world application as
• Writing support tools to assist writers with their writing without
human intervention
• Education tools since it can provide real-time feedback
Research perspective:
• Interesting and challenging language generation task
→ language modeling, syntax and semantics in noisy text
• Actively studied as Grammatical Error Correction (GEC) task
Grammatical Error Correction (GEC)
4
• A task of correcting different kinds of errors in text such as spelling,
punctuation, grammatical, and word choice errors
Machine is design to help people.
Machines are design to help people.
Mainstream approaches:
• Encoder-Decoder model based on Deep Neural Networks (DNN):
Ø Machine translation (MT) task: an ungrammatical text → a grammatical text
! It can theoretically correct all error types without expert knowledge
! It allows cutting-edge neural MT models to be adopted
Systems achieved human-level performance…
5
From [Ge et al.,2018]
→ From a commercial perspective, three major issues in the current GEC:
1. Evaluation
2. Data Noise
3. Low Resource
Issue1: Evaluation
6
GEC community tends to evaluate systems on a particular corpus written
by relatively proficient learners (e.g., CoNLL-2014)
Research (GEC community):
CoNLL-2014 [Ng et al., 2014]
X Y Z
GEC systems are expected to be able to
robustly correct errors in any written text
Real-world scenarios:
Proficient
Independent
Basic
GEC system
performance
Question:
Can we realize a reliable enough evaluation to be applied in real-world scenarios?
Issue2: Data Noise
7
We will discuss about discuss this with you.
I want to discuss about discuss of the education.
We discuss about discuss about our sales target.
Inconsistent annotations in GEC corpus: [Lo et al., 2018]
Research (GEC community):
Little focus on verifying and ensuring:
ü the quality of the datasets
ü how lower-quality data might affect GEC
performance
Real-world scenarios:
ü Limited available data
ü Not always to be possible to use
high-quality data
Question:
A better GEC model can be built by reducing noise in GEC corpora ?
Issue3: Low Resource
8
Question:
How to build a lightweight models requiring less resources?
Figure from [Kiyono+2019]
Current de fact : Incorporating GEC system with pseudo-data
→ Tendency to require more resources to develop GEC systems (e.g.,
GPUs and training time)
Real-world perspective (checklist):
ü Performance
ü Low resources
ü Inference speed
:
etc.
Three issues and goal
9
1. Evaluation
uNo reliable and robust evaluation methodologies
2. Data noise
uNo data denoising methodologies
3. Low resource
uIncreased resources required for model development
Underlying Motivation & Goal
• Provide the foundation and research direction for GEC with Improved
Real-world Applicability
• Contribute to make the GEC study more meaningful in real-world
scenarios
Grammatical Error Correction
with Improved Real-world Applicability
Overview
10
Evaluation Data Noise Low Resource
§1,§2
§3 §4 §5
u How to realize a reliable evaluation?
→ Cross-sectional evaluation
(NAACL 2019, Journal of NLP 2021)
u How to design denoising method?
→ A self-refinement strategy
(EMNLP 2020)
u How to build a lightweight models
requiring less resources?
→ Grammatical generalization ability
(ACL 2021)
Background: Evaluation
11
• Most of the previous works conduct evaluation using CoNLL-2014
• Recently, more and more works have used the JFLEG in combination,
but (customarily) independently evaluate using different metrics
Essays written by students at the
National University of Singapore
In real-world scenarios
12
• Real-world applications assume a wide variety of writing as input
• The difficulty varies under different conditions
Proficient
Independent
Basic
GEC system
GEC systems are expected to be able to
robustly correct errors in any written text
Error tendencies vary depending on
the learner's proficiency level
e.g.) proficiency
Chapter3: Cross-sectional Evaluation of GEC Models
(NAACL 2019, Journal of NLP 2021)
13
What we did in this chapter:
1. Check if the current evaluation is reliable (NAACL 2019)
2. Explore a evaluation methodology with improved real-world applicability
(Journal of NLP 2021)
Chapter3: Cross-sectional Evaluation of GEC Models
(NAACL 2019, Journal of NLP 2021)
14
What we did in this chapter:
1. Check if the current evaluation is reliable (NAACL 2019)
2. Explore a evaluation methodology with improved real-world applicability
(Journal of NLP 2021)
Current benchmark
Are there variations in
the evaluation results?
CoNLL-2014 [Ng et al., 2014]
X Y Z
X Y Z
Corpus A
X Y Z
Corpus B
X Y
Z
Corpus C
performance
GEC systems
15
• The systems must be based on machine translation
• Each systems must be implemented to have a competitive performance
on CoNLL02014
Requirements:
• LSTM: LSTM based system [Luong et al., 2015]
• CNN: CNN based system [Chollampatt et al., 2017]
• Transformer: Transformer based system [Vaswani et al., 2017]
• SMT: Statistical Machine Translation based system [Junczys-Dowmunt et al., 2017]
Cross-corpora Evaluation (NAACL 2019)
16
• Systemʼs rankings considerably vary depending on the corpus
→ Single-corpus evaluation is not reliable for GEC
Analysis
17
• Performance evaluation by error type (CoNLL-2014)
Determiner
e.g. [this → these]
Preposition
e.g. [for → with]
Punctuation
e.g. [. Because → , because]
Verb
e.g. [grow → bring]
Noun Number
e.g. [cat → cats]
Verb Tense
e.g. [eat → has eaten]
→ Each system has different strengths and weaknesses
Analysis
18
• Performance evaluation by error type(Cross-corpora)
• The best-performing models for each error type in each corpus
CoNLL-
2014
CoNLL-
2013
FCE JFLEG KJ BEA-2019
Det. LSTM LSTM LSTM SMT CNN LSTM
Prep. SMT Transformer SMT Transformer LSTM Transformer
Punct. Transformer Transformer Transformer SMT LSTM SMT
Verb LSTM CNN SMT LSTM LSTM Transformer
Noun
Num.
LSTM Transformer CNN LSTM CNN LSTM
Verb Form Transformer Transformer Transformer LSTM CNN Transformer
→ Each corpus has different tendency errors
Cross-sectional Evaluation (Journal of NLP 2021)
19
Ideas:
• No necessity for the evaluation segment to be a corpus
→ Cross-sectional evaluation
! Possible to investigate the behavior of the model more precisely
with evaluation segments (perspective) we want to focus on
Proficiency-wise dataset: BEA-2019
20
BEA-2019 contains CEFR-compliant proficiency information for writers
Basic Independent Proficient
A1
A2
B1
B2
C1
C2
CEFR: Common European Framework of
Reference for Languages
※ N: Native
※ WER (Word Edit Rate)
average sentence length: ⬆
word edit rate: ⬇
vocabulary size: ⬆
Result: Cross-proficiency evaluation
21
• In the basic-intermediate level (A,B), the performance of Transformer
is higher than others
basic-intermediate level
Result: Cross-proficiency evaluation
22
• In the advanced level (C,N), SMT achieved the highest performance
advanced level
Summary of Chapter 3
23
Observations
• The system rankings considerably vary depending on the corpus
→ Current single corpus evaluation is not reliable
• A large divergence in the evaluation between the basic-intermediate and advanced
levels of writer's proficiency
Research Question and Contribution:
Q: How to realize a reliable evaluation?
A: Evaluation from multiple perspectives by appropriately separating the data according
to the purpose (e.g., Cross-proficiency evaluation)
→ Provide the more reliable evaluation foundation for GEC
Limitations (Future work):
• Detailed factor analysis of ranking changes
• New metrics appropriate for cross-sectional evaluation
Grammatical Error Correction
with Improved Real-world Applicability
Overview
24
Evaluation Data Noise Low Resource
§1,§2
§3 §4 §5
u How to realize a reliable evaluation?
→ Cross-sectional evaluation
(NAACL 2019, Journal of NLP 2021)
u How to design denoising method?
→ A self-refinement strategy
(EMNLP 2020)
u How to build a lightweight models
requiring less resources?
→ Grammatical generalization ability
(ACL 2021)
Background
25
• Manually created GEC data has been used implicitly as a cleanest
→ the data are usually manually built by experts
e.g.) KJ Corpus [Nagata et al.,2011]
Now, I live <prp crr=“in”></prp> my home alone.
Original: Now, I live my home alone .
Corrected: Now, I live in my home alone .
Issues and motivation
26
Lo et al. (2018)ʼs report:
• A GEC model trained on EFCamDat [Geertzen et al.,2013], the largest publicly available learner
corpus as of today (2M sent pairs), was outperformed by a model trained on a smaller
dataset (720K sent pairs)
• This may be due to the “inconsistent annotations”
We will discuss about discuss this with you.
I want to discuss about discuss of the education.
We discuss about discuss about our sales target.
Motivation:
In real-world scenarios, it may not always be possible to use high-quality data
→ Need to develop training strategy on low-quality data without sacrificing performance
Chapter 4: A Self-refinement Strategy for Noise Reduction
(EMNLP 2020)
27
What we did in this chapter:
1. Reveal the amount of noise in existing GEC data
2. Propose a data denoising method which improves GEC performance
3. Analyze how the method affects both performance and the data itself
Presence of noise in GEC data
28
1. For 300 target sentences (Y) from each dataset, one expert reviewed
them and we obtained denoised ones (Yʼ)
2. Calculated the averaged Levenshtein distance between the original target
sentences (Y) and the denoised target sentences (Yʼ)
37.1
42.1
34.6
0
10
20
30
40
50
60
70
80
90
100
% noise
BEA-train EF Lang-8
Filtering ?
29
• A straightforward solution is to apply a filtering approach
→ Noisy data are filtered out and a smaller subset of high-quality sentence pairs is
retained (cf. MT)
Filtering
Intuition: Filtering approaches may not be the best choice in GEC :
1. GEC is a low-resource task compared to MT, thus further reducing data size by
filtering may be critically ineffective;
2. Even noisy instances may still be useful for training since they might contain
some correct edits as well
We will discuss about
this with you
We will discuss
this with you
We discuss about our
sales target
We discuss about our
sales target
I need to discuss about
the education
I need to discuss of
the education
Proposed method: Self-refinement
30
Key Ideas:
Denoising datasets by leveraging the prediction consistency of existing models
Correction
Human
We will discuss about
this with you
We will discuss
this with you
We discuss about our
sales target
We discuss about our
sales target
I need to discuss about
the education
I need to discuss of
the education
Re-correction
Model
We will discuss about
this with you
We will discuss
this with you
We discuss about our
sales target
We discuss our
sales target
I need to discuss about
the education
I need to discuss
the education
Self-refinement: Algorithm
31
! "
#
$
Base model
① Train a base model %
& ② Apply a base model to ' and
obtain system outputs '’
"’ ! )
"
)
$
④ Add (*, +
') to )
-
)
" = "’ ; 001 " – 001 "’ ≥ 4
)
" = " ; (001(") – 001 "’ < 4)
③ Selection
Fail-safe mechanism using language model
Noisy Parallel Data: #
$ = (!, ")
Denoised Parallel Data: )
$ = {}
All trainable parameters: 8
Denoised model
⑤ Train a denoised new model +
&
Result
32
• Significantly improved performance
across all training/test sets
Result
33
• Filtering approaches can be useful for
corpora with large data size
Result
34
• Not useful with small data
→ Suggests the possibility of excluding
even instances that were partially useful
for training the model
Precision vs. Recall
35
• Recall significantly increased, while precision was mostly maintained
→ Due to the correction of "inconsistent annotations”
Analysis: Noise reduction
36
• Manually evaluated 500 triples of source
sentences (X), original target sentences (Y), and
generated target sentences (Yʼ)
→ 73.6% of the replaced samples were determined to
be appropriate corrections, including cases where
both were correct
Summary of Chapter 4
37
Observations
• A non-negligible amount of noise in the most commonly used training data for GEC
• Significantly improved performance by removing noise
Research Question and Contribution:
Q: How to design denoising method?
A: Develop a simple but effective denoising method based on self-refinement strategy
→ Enable to develop an accurate GEC systems with low-quality data
Limitations:
• Boundary conditions under which noise reduction works effectively are unclear
Grammatical Error Correction
with Improved Real-world Applicability
Overview
38
Evaluation Data Noise Low Resource
§1,§2
§3 §4 §5
u How to realize a reliable evaluation?
→ Cross-sectional evaluation
(NAACL 2019, Journal of NLP 2021)
u How to design denoising method?
→ A self-refinement strategy
(EMNLP 2020)
u How to build a lightweight models
requiring less resources?
→ Grammatical generalization ability
(ACL 2021)
Issues: Larger data, bigger model
39
• Pseudo-data generation is popular
− Generate pseudo-errors from grammatical sentence sets (e.g., Wikipedia)
• Increased training data
− Increased resources required for model development(GPUs, training time…etc.)
− Need to add about 60 million samples of pseudo-data to improve a standard
measure of GEC, F0.5 score, by only two points [Kiyono et al.,2019]
Figure from [Kiyono et al.,2019]
Research Question
40
Type2: Errors based on grammatical rule(e.g., subject-verb agreement)
Every dog [run → runs] quickly
Type1: Errors are not based on grammatical rule(e.g., collocation)
I listen [in → to] his speech carefully
Two types of errors covered by GEC…
Intuition:
Do not need to memorize individual
patterns if we have learned rules
A. Yes → No need large amounts of data (at least for Type2)
A. No → Need to incorporate grammatical knowledge as rules into the models
Q. Do GEC models realize grammatical generalization?
Chapter5: Do GEC Models Realize Grammatical Realization ?
(ACL 2021)
41
What we did in this chapter:
1. Introduce an analysis method to evaluate whether models can generalize
to unseen errors
2. Add new depth to the study of GEC beyond just improving the scores
Proposed method
42
• Automatically build datasets with controlled vocabularies to appear in error
locations in training and test sets
• Compare the performance when correcting previously seen error correction
patterns (Known setting) to correcting unseen patterns of the same error type
(Unknown setting)
Train:
Test 1:
(Known Setting)
Every polite cow *smile / smiles awkwardly
Test 2:
(Unknown Setting)
Every white fox *run / runs quickly
Every dog *run / runs quickly
That slimy duck smiles / smiles awkwardly
Some slimy cows smile / smile dramatically
Two types of data: synthetic and real data
43
Synthetic data Real data
Methods
Synthesizing using context
free grammar (CFG)
Sampling from existing
GEC datasets
① control of patterns ✔ ✔
② control of vocabulary ✔
• Investigate standard five error types defined by Bryant et al.
(2017), which are errors based on grammatical rules:
• Subject-verb agreement errors(VERB:SVA)
• Verb forms errors(VERB:FORM)
• Word order errors(WO)
• Morphological errors(MORPH)
• Noun number errors(NOUN:NUM)
Examples of automatically constructed data
44
Synthetic data:Sentences with limited vocabulary and syntax
Real data: Sentences with a diversity of vocabulary and syntax
Result: Synthetic data
45
• The model's performance drops significantly in the unknown setting
compared to the known setting, except for WO
→ It lacks the generalization ability required to correct errors from
provided training examples
Dataset VERB:SVA VERB:FORM WO MORPH NOUN:NUM
Synthetic data
Known 99.61 99.17 99.09 98.44 97.47
Unknown 46.05 56.93 84.00 29.35 65.55
-53.56 -42.24 -15.09 -69.09 -31.92
Real data
Known 87.84 86.36 74.89 87.77 83.75
Unknown 6.28 6.28 9.25 3.83 12.49
-81.56 -80.08 -65.64 -83.94 -71.26
Table 2: Generalization performance for unseen errors. Each number represents an F0.5 score.
specifically, we design two kinds of generation
rules for each of the five error types to be ana-
lyzed, one generating grammatical sentences and
the other ungrammatical ones1. For example, for
and 68,002 sentence pairs for NOUN:NUM. Com-
pared to the synthetic data, real data has a wide
variety of vocabulary and syntax ranging from sim-
ple to complex.
Result: Real data
46
The model's performance drops significantly on all errors
→ Generalization is more difficult in more practical settings where the
vocabulary and syntax are diverse
Dataset VERB:SVA VERB:FORM WO MORPH NOUN:NUM
Synthetic data
Known 99.61 99.17 99.09 98.44 97.47
Unknown 46.05 56.93 84.00 29.35 65.55
-53.56 -42.24 -15.09 -69.09 -31.92
Real data
Known 87.84 86.36 74.89 87.77 83.75
Unknown 6.28 6.28 9.25 3.83 12.49
-81.56 -80.08 -65.64 -83.94 -71.26
Table 2: Generalization performance for unseen errors. Each number represents an F0.5 score.
specifically, we design two kinds of generation
rules for each of the five error types to be ana-
lyzed, one generating grammatical sentences and
the other ungrammatical ones1. For example, for
and 68,002 sentence pairs for NOUN:NUM. Com-
pared to the synthetic data, real data has a wide
variety of vocabulary and syntax ranging from sim-
ple to complex.
Detection vs. Correction
47
Q. Which factor is responsible for the failure to generalize grammatical knowledge?
1. An inability to detect errors
2. An inability to predict the correct words
0
20
40
60
80
100
VERB:SVA VERB:FORM WO MORPH NOUN:NUM
F
0.5
Correction
(known)
Detection
(unknown)
Correction
(unknown)
VE
VE
W
M
NO
Complexity in real data
48
noiseless noisy
VERB:SVA 9.95 5.78
VERB:FORM 12.33 5.47
WO 7.89 9.35
MORPH 6.32 3.90
NOUN:NUM 24.16 12.49
We observed the effect of two contributing factors of complexity in real data:
1. Error complexity
2. Sentence length
• WO is robust against complexity of input sentences
→ The reason why WO was relatively low compared to the others, even with real data
WO does not depend on
the sentence length
WO does not depend on
the error complexity
the target error is
the only error
contains other errors
besides the target error
Can a few correction patterns improve model performance ?
49
• Performance change when we expose the model to a few error
correction patterns
• Adding even just one or two samples to the training data can
significantly improve the modelʼs performance
→ Important to sample few seen patterns for each word when building training data
Summary of Chapter 5
50
Observations:
• A current standard Transformer-based GEC model fails to realize grammatical
generalization even in simple settings with limited vocabulary and syntax
Research Question and Contribution:
Q: How to build a lightweight models requiring less resources?
A: A combination of rule-based and DNN-based methods is necessary
→ Provide a research direction to implement lightweight GEC models
Limitations:
• No real solutions based on our findings
Grammatical Error Correction
with Improved Real-world Applicability
Overview
51
Evaluation Data Noise Low Resource
§1,§2
§3 §4 §5
u How to realize a reliable evaluation?
→ Cross-sectional evaluation
(NAACL 2019, Journal of NLP 2021)
u How to design denoising method?
→ A self-refinement strategy
(EMNLP 2020)
u How to build a lightweight models
requiring less resources?
→ Grammatical generalization ability
(ACL 2021)
Contributions
52
§3. How to realize a reliable evaluation?
❖ Demonstrated current single-corpus evaluation is not reliable and
proposed the cross-sectional evaluation as alternative
→ Provide the more reliable evaluation foundation for GEC
§4. How to design denoising method?
❖ Developed a simple but effective denoising method
→ Enable to develop an accurate GEC systems with low-quality data
§5. How to build a lightweight models requiring less resources?
❖ Showed that a combination of rule-based and DNN-based methods is
necessary
→ Provide a research direction to implement lightweight GEC models
Summary of the thesis
53
• This thesis focus on the three major issues that arise when trying to
apply GEC systems to the real-world: Evaluation, Data Noise, and Low
Resource
→ It will facilitate discussions on systems oriented to real-world
applicability for bridging gaps between GEC study and real-world
settings
Research
Accuracy first
Narrow domain
Clean data
Real-world
Noisy data
Wide range of domains
Low resource
Gaps
Appendix:
List of Publications/Presentations
54
Journal Papers
55
1. Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui.
Phenomenon-wise Evaluation Dataset Towards Analyzing Robustness of Machine
Translation Models. (in Japanese). In Journal of Natural Language Processing, Volume 28,
Number 2, pp450-478.
2. Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross-
Sectional Evaluation of Grammatical Error Correction Models. (in Japanese). In Journal of
Natural Language Processing, Volume 28, Number 1, pp.160-182, March 2021.
International Conferences (Refereed) 1/3
56
1. Masato Mita, Hitomi Yanaka. Do Grammatical Error Correction Models Realize Grammatical
Generalization?. In Findings of the Joint Conference of the 59th Annual Meeting of the
Association for Computational Linguistics and the 11th International Joint Conference on
Natural Language Processing (ACL-IJCNLP 2021) (To appear).
2. Takumi Gotou, Ryo Nagata, Masato Mita, Kazuaki Hanawa. Taking the Correction Difficulty
into Account in Grammatical Error Correction Evaluation. In Proceedings of the 28th
International Conference on Computational Linguistics (COLING 2020), pages 2085-2095,
December 2020.
3. Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui.
PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-
Generated Contents. In Proceedings of the 28th International Conference on Computational
Linguistics (COLING 2020), pages 2085-2095, December 2020.
4. Masato Mita, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, Kentaro Inui. A Self-Refinement
Strategy for Noise Reduction in Grammatical Error Correction. In Findings of the 2020
Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp.267‒280,
November 2020.
International Conferences (Refereed) 2/3
57
1. Hiroaki Funayama, Shota Sasaki, Yuichiro Matsubayashi, Tomoya Mizumoto, Jun Suzuki,
Masato Mita, Kentaro Inui. Preventing Critical Scoring Errors in Short Answer Scoring with
Confidence Estimation. In Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics: Student Research Workshop, pages 237-243, July 2020.
2. Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, Kentaro Inui. Can Encoder- decoder
Models Benefit from Pre-trained Language Representation in Grammatical Error
Correction? In Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics (ACL 2020), pages 4248-4254, July 2020.
3. Masato Hagiwara and Masato Mita. GitHub Typo Corpus: A Large-Scale Multilingual
Dataset of Misspellings and Grammatical Errors. In Proceedings of the 12th Conference on
Language Resources and Evaluation (LREC 2020), pages 6761‒6768, May 2020.
4. Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, Kentaro Inui. An Empirical Study of
Incorporating Pseudo Data to Grammatical Error Correction. In Proceedings of the 2019
Conference on Empirical Methods in Natural Language Processing and the 9th International
Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), pages 1236-1242,
November 2019.
International Conferences (Refereed) 3/3
58
1. Hiroki Asano, Masato Mita, Tomoya Mizumoto, Jun Suzuki. The AIP-Tohoku System at the
BEA-2019 Shared Task. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP
for Building Educational Applications, pages 176-182, August 2019.
2. Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross-Corpora
Evaluation and Analysis of Grammatical Error Correction Models ̶ Is Single-Corpus
Evaluation Enough?. In Proceedings of the 17th Annual Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies
(NAACL-HLT), pages 1309-1314, May 2019.
Domestic conference (Not refereed) 1/3
59
1. 三⽥雅⼈, 萩原正⼈, 坂⼝慶祐, ⽔本智也, 鈴⽊潤, 乾健太郎. 論述リライトタスクの提案と⾃動評価
の実現に向けて. ⾔語処理学会第27回年次⼤会ワークショップ,2021年3⽉.
2. 三⽥雅⼈, ⾕中瞳. ⽂法誤り訂正モデルは訂正に必要な⽂法を学習しているか. ⾔語処理学会第27回
年次⼤会,2021年3⽉.
3. 三⽥雅⼈, ⾕中瞳. ⽂法誤り訂正モデルは⽂法知識を汎化しているか. NLP若⼿の会第15回シンポジ
ウム ,2020年9⽉.
4. 松本悠太, 藤井諒, 阿部⾹央莉, ⾈⼭弘晃, 三⽥雅⼈. 漢字の意味構造を考慮した複数のニューラル漢
字創作システムの⽐較検討. NLP若⼿の会第15回シンポジウム ,2020年9⽉.
5. 藤井諒, 三⽥雅⼈, 阿部⾹央莉, 塙⼀晃, 森下睦, 鈴⽊潤, 乾健太郎. ユーザ⽣成 コンテンツの⾼品質
な⾃動翻訳に向けた⾔語現象の体系的分析. 第34回⼈⼯知能学会全国⼤会,2020年6⽉.
6. ⾈⼭弘晃, 佐々⽊翔太, ⽔本智也, 三⽥雅⼈, 鈴⽊潤, 松林優⼀郎, 乾健太郎. 記述式答案⾃動採点のた
めの確信度推定⼿法の検討. ⾔語処理学会第26回年次⼤会,2020年3⽉.
Domestic conference (Not refereed) 2/3
60
1. 五藤巧, 永⽥亮, 三⽥雅⼈, 塙⼀晃. 訂正難易度を考慮した⽂法誤り訂正のための性能評価尺度. ⾔語
処理学会第26回年次⼤会,2020年3⽉.
2. 清野舜, 鈴⽊潤, 三⽥雅⼈, ⽔本智也, 乾健太郎. ⼤規模疑似データを⽤いた⾼ 性能⽂法誤り訂正モデ
ルの構築. ⾔語処理学会第26回年次⼤会,2020年3⽉. 優秀賞受賞
3. 三⽥雅⼈, 清野舜, ⾦⼦正弘, 鈴⽊潤, 乾健太郎. ⽂法誤り訂正のための⾃⼰改 良戦略に基づくノイズ
除去. ⾔語処理学会第26回年次⼤会,2020年3⽉.若⼿奨励賞受賞
4. Masato Mita, Masato Hagiwara, Keisuke Sakaguchi, Tomoya Mizumoto, Jun Suzuki, Kentaro
Inui. Automated Essay Rewriting (AER): Grammatical Error Correction, Fluency Edits, and
Beyond. 第241回⾃然⾔語処理研究会,2019年8⽉.
5. ⾈⼭弘晃, 佐々⽊翔太, ⽔本智也, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. ⾃動採点における確信度推定⼿法.
NLP若⼿の会第14回シンポジウム ,2019年8⽉.
6. 五藤巧, 永⽥亮, 三⽥雅⼈, 塙⼀晃, ⽔本智也. ⽂法誤り訂正における問題の難しさを考慮した性能評
価尺度の提案. NLP若⼿の会第14回シンポジウム, 2019年8⽉.萌芽研究賞受賞
Domestic conference (Not refereed) 3/3
61
1. 藤井諒, ⾈⼭弘晃, 北⼭晃太郎, 阿部⾹央莉, Ana brassard, 三⽥雅⼈, ⼤内啓樹. seq2seqによる部⾸
を考慮したニューラル漢字⽣成システム. NLP若⼿の会 第14回シンポジウム,2019年8⽉.
2. ⾦⼦正弘, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. コロケーション・イディオム誤りを考慮した⽂法誤り訂正
のための擬似データ⽣成. NLP若⼿の会第14回シンポジウム,2019年8⽉.
3. 藤井諒, 阿部⾹央莉, 塙⼀晃, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. ⽂法誤りに頑健な機械翻訳システムの実
現に向けた敵対性ノイズの検討. NLP若⼿の会第14回シンポジウム ,2019年8⽉.
4. 三⽥雅⼈, 萩原正⼈, 坂⼝慶祐, ⽔本智也, 鈴⽊潤, 乾健太郎. ⽂法誤り訂正を拡張した新タスクの提
案. NLP若⼿の会第14回シンポジウム,2019年8⽉. 奨励賞受賞
5. 三⽥雅⼈, ⽔本智也, ⾦⼦正弘, 永⽥亮, 乾健太郎. ⽂法誤り訂正のコーパス横断評価: 単⼀コーパス
で⼗分か? ⾔語処理学会第25回年次⼤会, 2019年3⽉.
6. 三⽥雅⼈, ⽔本智也, ⼤内啓樹, 永⽥亮, 乾健太郎. ⽂法誤り訂正のための教師なし解釈性機構. NLP
若⼿の会第13回シンポジウム,2018年8⽉.
Awards
62
1. ⾔語処理学会第26回年次⼤会 若⼿奨励賞. 2020年3⽉.
2. ⾔語処理学会第26回年次⼤会 優秀賞. 2020年3⽉.
3. NLP若⼿の会第14回シンポジウム 奨励賞. 2019年8⽉.
4. NLP若⼿の会第14回シンポジウム萌芽研究賞. 2019年8⽉.

More Related Content

What's hot

BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
Seoung-Ho Choi
 
Pegasus
PegasusPegasus
Pegasus
Hangil Kim
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
Rrubaa Panchendrarajan
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
Yuta Niki
 
Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018
佑 甲野
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
rohitnayak
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
Alia Hamwi
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
taeseon ryu
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
Sai Mohith
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
Prabhakar Bikkaneti
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
Jose Zagal
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobile
Anirudh Koul
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Ila Group
 

What's hot (20)

BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Pegasus
PegasusPegasus
Pegasus
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobile
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Similar to Grammatical Error Correction with Improved Real-world Applicability

NLG, Training, Inference & Evaluation
NLG, Training, Inference & Evaluation NLG, Training, Inference & Evaluation
NLG, Training, Inference & Evaluation
Deep Learning Italia
 
Review On In-Context Leaning.pptx
Review On In-Context Leaning.pptxReview On In-Context Leaning.pptx
Review On In-Context Leaning.pptx
wesleyshih4
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet Development
Jácome Cunha
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
HyunJoon Jung
 
Critiquing CS Assessment from a CS for All lens: Dagstuhl Seminar Poster
Critiquing CS Assessment from a CS for All lens: Dagstuhl Seminar PosterCritiquing CS Assessment from a CS for All lens: Dagstuhl Seminar Poster
Critiquing CS Assessment from a CS for All lens: Dagstuhl Seminar Poster
Mark Guzdial
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET Journal
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for Requirements
Clément Portet
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
ICML UDL Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdf
ICML UDL  Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdfICML UDL  Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdf
ICML UDL Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdf
ManojAcharya52
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
IRJET Journal
 
An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...
IJECEIAES
 
Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_final
Jeffrey Shomaker
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
jcscholtes
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
Isabelle Augenstein
 
Learning context is all you need for task general artificial intelligence
Learning context is all you need for task general artificial intelligenceLearning context is all you need for task general artificial intelligence
Learning context is all you need for task general artificial intelligence
LibgirlTeam
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
International Journal of Modern Research in Engineering and Technology
 

Similar to Grammatical Error Correction with Improved Real-world Applicability (20)

NLG, Training, Inference & Evaluation
NLG, Training, Inference & Evaluation NLG, Training, Inference & Evaluation
NLG, Training, Inference & Evaluation
 
Review On In-Context Leaning.pptx
Review On In-Context Leaning.pptxReview On In-Context Leaning.pptx
Review On In-Context Leaning.pptx
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet Development
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Critiquing CS Assessment from a CS for All lens: Dagstuhl Seminar Poster
Critiquing CS Assessment from a CS for All lens: Dagstuhl Seminar PosterCritiquing CS Assessment from a CS for All lens: Dagstuhl Seminar Poster
Critiquing CS Assessment from a CS for All lens: Dagstuhl Seminar Poster
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for Requirements
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
ICML UDL Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdf
ICML UDL  Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdfICML UDL  Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdf
ICML UDL Evaluating Deep Learning Models Applications to NLP Nazneen Rajani.pdf
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...
 
Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_final
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Learning context is all you need for task general artificial intelligence
Learning context is all you need for task general artificial intelligenceLearning context is all you need for task general artificial intelligence
Learning context is all you need for task general artificial intelligence
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 

Grammatical Error Correction with Improved Real-world Applicability

  • 1. 0 Grammatical Error Correction with Improved Real-world Applicability 実世界への適⽤性を指向した⽂法誤り訂正 三⽥ 雅⼈ 情報科学研究科 システム情報科学専攻 乾研究室 博⼠論⽂本審査 2021年7⽉20⽇ @オンライン
  • 2. Background 2 • Millions of people are learning English as a Second Language (ESL) → According to a report published by the British Council in 2013, English is spoken at a useful level by 1.75 billion people worldwide • Due to the difficulty of learning a new language, their written texts may contain grammatical errors [Nagata et al.,2011; Dahlmeier et al.,2013] e.g.) KJ corpus
  • 3. Interests in automatic error correction 3 Commercial perspective: • A great potential for many real-world application as • Writing support tools to assist writers with their writing without human intervention • Education tools since it can provide real-time feedback Research perspective: • Interesting and challenging language generation task → language modeling, syntax and semantics in noisy text • Actively studied as Grammatical Error Correction (GEC) task
  • 4. Grammatical Error Correction (GEC) 4 • A task of correcting different kinds of errors in text such as spelling, punctuation, grammatical, and word choice errors Machine is design to help people. Machines are design to help people. Mainstream approaches: • Encoder-Decoder model based on Deep Neural Networks (DNN): Ø Machine translation (MT) task: an ungrammatical text → a grammatical text ! It can theoretically correct all error types without expert knowledge ! It allows cutting-edge neural MT models to be adopted
  • 5. Systems achieved human-level performance… 5 From [Ge et al.,2018] → From a commercial perspective, three major issues in the current GEC: 1. Evaluation 2. Data Noise 3. Low Resource
  • 6. Issue1: Evaluation 6 GEC community tends to evaluate systems on a particular corpus written by relatively proficient learners (e.g., CoNLL-2014) Research (GEC community): CoNLL-2014 [Ng et al., 2014] X Y Z GEC systems are expected to be able to robustly correct errors in any written text Real-world scenarios: Proficient Independent Basic GEC system performance Question: Can we realize a reliable enough evaluation to be applied in real-world scenarios?
  • 7. Issue2: Data Noise 7 We will discuss about discuss this with you. I want to discuss about discuss of the education. We discuss about discuss about our sales target. Inconsistent annotations in GEC corpus: [Lo et al., 2018] Research (GEC community): Little focus on verifying and ensuring: ü the quality of the datasets ü how lower-quality data might affect GEC performance Real-world scenarios: ü Limited available data ü Not always to be possible to use high-quality data Question: A better GEC model can be built by reducing noise in GEC corpora ?
  • 8. Issue3: Low Resource 8 Question: How to build a lightweight models requiring less resources? Figure from [Kiyono+2019] Current de fact : Incorporating GEC system with pseudo-data → Tendency to require more resources to develop GEC systems (e.g., GPUs and training time) Real-world perspective (checklist): ü Performance ü Low resources ü Inference speed : etc.
  • 9. Three issues and goal 9 1. Evaluation uNo reliable and robust evaluation methodologies 2. Data noise uNo data denoising methodologies 3. Low resource uIncreased resources required for model development Underlying Motivation & Goal • Provide the foundation and research direction for GEC with Improved Real-world Applicability • Contribute to make the GEC study more meaningful in real-world scenarios
  • 10. Grammatical Error Correction with Improved Real-world Applicability Overview 10 Evaluation Data Noise Low Resource §1,§2 §3 §4 §5 u How to realize a reliable evaluation? → Cross-sectional evaluation (NAACL 2019, Journal of NLP 2021) u How to design denoising method? → A self-refinement strategy (EMNLP 2020) u How to build a lightweight models requiring less resources? → Grammatical generalization ability (ACL 2021)
  • 11. Background: Evaluation 11 • Most of the previous works conduct evaluation using CoNLL-2014 • Recently, more and more works have used the JFLEG in combination, but (customarily) independently evaluate using different metrics Essays written by students at the National University of Singapore
  • 12. In real-world scenarios 12 • Real-world applications assume a wide variety of writing as input • The difficulty varies under different conditions Proficient Independent Basic GEC system GEC systems are expected to be able to robustly correct errors in any written text Error tendencies vary depending on the learner's proficiency level e.g.) proficiency
  • 13. Chapter3: Cross-sectional Evaluation of GEC Models (NAACL 2019, Journal of NLP 2021) 13 What we did in this chapter: 1. Check if the current evaluation is reliable (NAACL 2019) 2. Explore a evaluation methodology with improved real-world applicability (Journal of NLP 2021)
  • 14. Chapter3: Cross-sectional Evaluation of GEC Models (NAACL 2019, Journal of NLP 2021) 14 What we did in this chapter: 1. Check if the current evaluation is reliable (NAACL 2019) 2. Explore a evaluation methodology with improved real-world applicability (Journal of NLP 2021) Current benchmark Are there variations in the evaluation results? CoNLL-2014 [Ng et al., 2014] X Y Z X Y Z Corpus A X Y Z Corpus B X Y Z Corpus C performance
  • 15. GEC systems 15 • The systems must be based on machine translation • Each systems must be implemented to have a competitive performance on CoNLL02014 Requirements: • LSTM: LSTM based system [Luong et al., 2015] • CNN: CNN based system [Chollampatt et al., 2017] • Transformer: Transformer based system [Vaswani et al., 2017] • SMT: Statistical Machine Translation based system [Junczys-Dowmunt et al., 2017]
  • 16. Cross-corpora Evaluation (NAACL 2019) 16 • Systemʼs rankings considerably vary depending on the corpus → Single-corpus evaluation is not reliable for GEC
  • 17. Analysis 17 • Performance evaluation by error type (CoNLL-2014) Determiner e.g. [this → these] Preposition e.g. [for → with] Punctuation e.g. [. Because → , because] Verb e.g. [grow → bring] Noun Number e.g. [cat → cats] Verb Tense e.g. [eat → has eaten] → Each system has different strengths and weaknesses
  • 18. Analysis 18 • Performance evaluation by error type(Cross-corpora) • The best-performing models for each error type in each corpus CoNLL- 2014 CoNLL- 2013 FCE JFLEG KJ BEA-2019 Det. LSTM LSTM LSTM SMT CNN LSTM Prep. SMT Transformer SMT Transformer LSTM Transformer Punct. Transformer Transformer Transformer SMT LSTM SMT Verb LSTM CNN SMT LSTM LSTM Transformer Noun Num. LSTM Transformer CNN LSTM CNN LSTM Verb Form Transformer Transformer Transformer LSTM CNN Transformer → Each corpus has different tendency errors
  • 19. Cross-sectional Evaluation (Journal of NLP 2021) 19 Ideas: • No necessity for the evaluation segment to be a corpus → Cross-sectional evaluation ! Possible to investigate the behavior of the model more precisely with evaluation segments (perspective) we want to focus on
  • 20. Proficiency-wise dataset: BEA-2019 20 BEA-2019 contains CEFR-compliant proficiency information for writers Basic Independent Proficient A1 A2 B1 B2 C1 C2 CEFR: Common European Framework of Reference for Languages ※ N: Native ※ WER (Word Edit Rate) average sentence length: ⬆ word edit rate: ⬇ vocabulary size: ⬆
  • 21. Result: Cross-proficiency evaluation 21 • In the basic-intermediate level (A,B), the performance of Transformer is higher than others basic-intermediate level
  • 22. Result: Cross-proficiency evaluation 22 • In the advanced level (C,N), SMT achieved the highest performance advanced level
  • 23. Summary of Chapter 3 23 Observations • The system rankings considerably vary depending on the corpus → Current single corpus evaluation is not reliable • A large divergence in the evaluation between the basic-intermediate and advanced levels of writer's proficiency Research Question and Contribution: Q: How to realize a reliable evaluation? A: Evaluation from multiple perspectives by appropriately separating the data according to the purpose (e.g., Cross-proficiency evaluation) → Provide the more reliable evaluation foundation for GEC Limitations (Future work): • Detailed factor analysis of ranking changes • New metrics appropriate for cross-sectional evaluation
  • 24. Grammatical Error Correction with Improved Real-world Applicability Overview 24 Evaluation Data Noise Low Resource §1,§2 §3 §4 §5 u How to realize a reliable evaluation? → Cross-sectional evaluation (NAACL 2019, Journal of NLP 2021) u How to design denoising method? → A self-refinement strategy (EMNLP 2020) u How to build a lightweight models requiring less resources? → Grammatical generalization ability (ACL 2021)
  • 25. Background 25 • Manually created GEC data has been used implicitly as a cleanest → the data are usually manually built by experts e.g.) KJ Corpus [Nagata et al.,2011] Now, I live <prp crr=“in”></prp> my home alone. Original: Now, I live my home alone . Corrected: Now, I live in my home alone .
  • 26. Issues and motivation 26 Lo et al. (2018)ʼs report: • A GEC model trained on EFCamDat [Geertzen et al.,2013], the largest publicly available learner corpus as of today (2M sent pairs), was outperformed by a model trained on a smaller dataset (720K sent pairs) • This may be due to the “inconsistent annotations” We will discuss about discuss this with you. I want to discuss about discuss of the education. We discuss about discuss about our sales target. Motivation: In real-world scenarios, it may not always be possible to use high-quality data → Need to develop training strategy on low-quality data without sacrificing performance
  • 27. Chapter 4: A Self-refinement Strategy for Noise Reduction (EMNLP 2020) 27 What we did in this chapter: 1. Reveal the amount of noise in existing GEC data 2. Propose a data denoising method which improves GEC performance 3. Analyze how the method affects both performance and the data itself
  • 28. Presence of noise in GEC data 28 1. For 300 target sentences (Y) from each dataset, one expert reviewed them and we obtained denoised ones (Yʼ) 2. Calculated the averaged Levenshtein distance between the original target sentences (Y) and the denoised target sentences (Yʼ) 37.1 42.1 34.6 0 10 20 30 40 50 60 70 80 90 100 % noise BEA-train EF Lang-8
  • 29. Filtering ? 29 • A straightforward solution is to apply a filtering approach → Noisy data are filtered out and a smaller subset of high-quality sentence pairs is retained (cf. MT) Filtering Intuition: Filtering approaches may not be the best choice in GEC : 1. GEC is a low-resource task compared to MT, thus further reducing data size by filtering may be critically ineffective; 2. Even noisy instances may still be useful for training since they might contain some correct edits as well We will discuss about this with you We will discuss this with you We discuss about our sales target We discuss about our sales target I need to discuss about the education I need to discuss of the education
  • 30. Proposed method: Self-refinement 30 Key Ideas: Denoising datasets by leveraging the prediction consistency of existing models Correction Human We will discuss about this with you We will discuss this with you We discuss about our sales target We discuss about our sales target I need to discuss about the education I need to discuss of the education Re-correction Model We will discuss about this with you We will discuss this with you We discuss about our sales target We discuss our sales target I need to discuss about the education I need to discuss the education
  • 31. Self-refinement: Algorithm 31 ! " # $ Base model ① Train a base model % & ② Apply a base model to ' and obtain system outputs '’ "’ ! ) " ) $ ④ Add (*, + ') to ) - ) " = "’ ; 001 " – 001 "’ ≥ 4 ) " = " ; (001(") – 001 "’ < 4) ③ Selection Fail-safe mechanism using language model Noisy Parallel Data: # $ = (!, ") Denoised Parallel Data: ) $ = {} All trainable parameters: 8 Denoised model ⑤ Train a denoised new model + &
  • 32. Result 32 • Significantly improved performance across all training/test sets
  • 33. Result 33 • Filtering approaches can be useful for corpora with large data size
  • 34. Result 34 • Not useful with small data → Suggests the possibility of excluding even instances that were partially useful for training the model
  • 35. Precision vs. Recall 35 • Recall significantly increased, while precision was mostly maintained → Due to the correction of "inconsistent annotations”
  • 36. Analysis: Noise reduction 36 • Manually evaluated 500 triples of source sentences (X), original target sentences (Y), and generated target sentences (Yʼ) → 73.6% of the replaced samples were determined to be appropriate corrections, including cases where both were correct
  • 37. Summary of Chapter 4 37 Observations • A non-negligible amount of noise in the most commonly used training data for GEC • Significantly improved performance by removing noise Research Question and Contribution: Q: How to design denoising method? A: Develop a simple but effective denoising method based on self-refinement strategy → Enable to develop an accurate GEC systems with low-quality data Limitations: • Boundary conditions under which noise reduction works effectively are unclear
  • 38. Grammatical Error Correction with Improved Real-world Applicability Overview 38 Evaluation Data Noise Low Resource §1,§2 §3 §4 §5 u How to realize a reliable evaluation? → Cross-sectional evaluation (NAACL 2019, Journal of NLP 2021) u How to design denoising method? → A self-refinement strategy (EMNLP 2020) u How to build a lightweight models requiring less resources? → Grammatical generalization ability (ACL 2021)
  • 39. Issues: Larger data, bigger model 39 • Pseudo-data generation is popular − Generate pseudo-errors from grammatical sentence sets (e.g., Wikipedia) • Increased training data − Increased resources required for model development(GPUs, training time…etc.) − Need to add about 60 million samples of pseudo-data to improve a standard measure of GEC, F0.5 score, by only two points [Kiyono et al.,2019] Figure from [Kiyono et al.,2019]
  • 40. Research Question 40 Type2: Errors based on grammatical rule(e.g., subject-verb agreement) Every dog [run → runs] quickly Type1: Errors are not based on grammatical rule(e.g., collocation) I listen [in → to] his speech carefully Two types of errors covered by GEC… Intuition: Do not need to memorize individual patterns if we have learned rules A. Yes → No need large amounts of data (at least for Type2) A. No → Need to incorporate grammatical knowledge as rules into the models Q. Do GEC models realize grammatical generalization?
  • 41. Chapter5: Do GEC Models Realize Grammatical Realization ? (ACL 2021) 41 What we did in this chapter: 1. Introduce an analysis method to evaluate whether models can generalize to unseen errors 2. Add new depth to the study of GEC beyond just improving the scores
  • 42. Proposed method 42 • Automatically build datasets with controlled vocabularies to appear in error locations in training and test sets • Compare the performance when correcting previously seen error correction patterns (Known setting) to correcting unseen patterns of the same error type (Unknown setting) Train: Test 1: (Known Setting) Every polite cow *smile / smiles awkwardly Test 2: (Unknown Setting) Every white fox *run / runs quickly Every dog *run / runs quickly That slimy duck smiles / smiles awkwardly Some slimy cows smile / smile dramatically
  • 43. Two types of data: synthetic and real data 43 Synthetic data Real data Methods Synthesizing using context free grammar (CFG) Sampling from existing GEC datasets ① control of patterns ✔ ✔ ② control of vocabulary ✔ • Investigate standard five error types defined by Bryant et al. (2017), which are errors based on grammatical rules: • Subject-verb agreement errors(VERB:SVA) • Verb forms errors(VERB:FORM) • Word order errors(WO) • Morphological errors(MORPH) • Noun number errors(NOUN:NUM)
  • 44. Examples of automatically constructed data 44 Synthetic data:Sentences with limited vocabulary and syntax Real data: Sentences with a diversity of vocabulary and syntax
  • 45. Result: Synthetic data 45 • The model's performance drops significantly in the unknown setting compared to the known setting, except for WO → It lacks the generalization ability required to correct errors from provided training examples Dataset VERB:SVA VERB:FORM WO MORPH NOUN:NUM Synthetic data Known 99.61 99.17 99.09 98.44 97.47 Unknown 46.05 56.93 84.00 29.35 65.55 -53.56 -42.24 -15.09 -69.09 -31.92 Real data Known 87.84 86.36 74.89 87.77 83.75 Unknown 6.28 6.28 9.25 3.83 12.49 -81.56 -80.08 -65.64 -83.94 -71.26 Table 2: Generalization performance for unseen errors. Each number represents an F0.5 score. specifically, we design two kinds of generation rules for each of the five error types to be ana- lyzed, one generating grammatical sentences and the other ungrammatical ones1. For example, for and 68,002 sentence pairs for NOUN:NUM. Com- pared to the synthetic data, real data has a wide variety of vocabulary and syntax ranging from sim- ple to complex.
  • 46. Result: Real data 46 The model's performance drops significantly on all errors → Generalization is more difficult in more practical settings where the vocabulary and syntax are diverse Dataset VERB:SVA VERB:FORM WO MORPH NOUN:NUM Synthetic data Known 99.61 99.17 99.09 98.44 97.47 Unknown 46.05 56.93 84.00 29.35 65.55 -53.56 -42.24 -15.09 -69.09 -31.92 Real data Known 87.84 86.36 74.89 87.77 83.75 Unknown 6.28 6.28 9.25 3.83 12.49 -81.56 -80.08 -65.64 -83.94 -71.26 Table 2: Generalization performance for unseen errors. Each number represents an F0.5 score. specifically, we design two kinds of generation rules for each of the five error types to be ana- lyzed, one generating grammatical sentences and the other ungrammatical ones1. For example, for and 68,002 sentence pairs for NOUN:NUM. Com- pared to the synthetic data, real data has a wide variety of vocabulary and syntax ranging from sim- ple to complex.
  • 47. Detection vs. Correction 47 Q. Which factor is responsible for the failure to generalize grammatical knowledge? 1. An inability to detect errors 2. An inability to predict the correct words 0 20 40 60 80 100 VERB:SVA VERB:FORM WO MORPH NOUN:NUM F 0.5 Correction (known) Detection (unknown) Correction (unknown) VE VE W M NO
  • 48. Complexity in real data 48 noiseless noisy VERB:SVA 9.95 5.78 VERB:FORM 12.33 5.47 WO 7.89 9.35 MORPH 6.32 3.90 NOUN:NUM 24.16 12.49 We observed the effect of two contributing factors of complexity in real data: 1. Error complexity 2. Sentence length • WO is robust against complexity of input sentences → The reason why WO was relatively low compared to the others, even with real data WO does not depend on the sentence length WO does not depend on the error complexity the target error is the only error contains other errors besides the target error
  • 49. Can a few correction patterns improve model performance ? 49 • Performance change when we expose the model to a few error correction patterns • Adding even just one or two samples to the training data can significantly improve the modelʼs performance → Important to sample few seen patterns for each word when building training data
  • 50. Summary of Chapter 5 50 Observations: • A current standard Transformer-based GEC model fails to realize grammatical generalization even in simple settings with limited vocabulary and syntax Research Question and Contribution: Q: How to build a lightweight models requiring less resources? A: A combination of rule-based and DNN-based methods is necessary → Provide a research direction to implement lightweight GEC models Limitations: • No real solutions based on our findings
  • 51. Grammatical Error Correction with Improved Real-world Applicability Overview 51 Evaluation Data Noise Low Resource §1,§2 §3 §4 §5 u How to realize a reliable evaluation? → Cross-sectional evaluation (NAACL 2019, Journal of NLP 2021) u How to design denoising method? → A self-refinement strategy (EMNLP 2020) u How to build a lightweight models requiring less resources? → Grammatical generalization ability (ACL 2021)
  • 52. Contributions 52 §3. How to realize a reliable evaluation? ❖ Demonstrated current single-corpus evaluation is not reliable and proposed the cross-sectional evaluation as alternative → Provide the more reliable evaluation foundation for GEC §4. How to design denoising method? ❖ Developed a simple but effective denoising method → Enable to develop an accurate GEC systems with low-quality data §5. How to build a lightweight models requiring less resources? ❖ Showed that a combination of rule-based and DNN-based methods is necessary → Provide a research direction to implement lightweight GEC models
  • 53. Summary of the thesis 53 • This thesis focus on the three major issues that arise when trying to apply GEC systems to the real-world: Evaluation, Data Noise, and Low Resource → It will facilitate discussions on systems oriented to real-world applicability for bridging gaps between GEC study and real-world settings Research Accuracy first Narrow domain Clean data Real-world Noisy data Wide range of domains Low resource Gaps
  • 55. Journal Papers 55 1. Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui. Phenomenon-wise Evaluation Dataset Towards Analyzing Robustness of Machine Translation Models. (in Japanese). In Journal of Natural Language Processing, Volume 28, Number 2, pp450-478. 2. Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross- Sectional Evaluation of Grammatical Error Correction Models. (in Japanese). In Journal of Natural Language Processing, Volume 28, Number 1, pp.160-182, March 2021.
  • 56. International Conferences (Refereed) 1/3 56 1. Masato Mita, Hitomi Yanaka. Do Grammatical Error Correction Models Realize Grammatical Generalization?. In Findings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) (To appear). 2. Takumi Gotou, Ryo Nagata, Masato Mita, Kazuaki Hanawa. Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), pages 2085-2095, December 2020. 3. Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui. PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User- Generated Contents. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), pages 2085-2095, December 2020. 4. Masato Mita, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, Kentaro Inui. A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction. In Findings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp.267‒280, November 2020.
  • 57. International Conferences (Refereed) 2/3 57 1. Hiroaki Funayama, Shota Sasaki, Yuichiro Matsubayashi, Tomoya Mizumoto, Jun Suzuki, Masato Mita, Kentaro Inui. Preventing Critical Scoring Errors in Short Answer Scoring with Confidence Estimation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 237-243, July 2020. 2. Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, Kentaro Inui. Can Encoder- decoder Models Benefit from Pre-trained Language Representation in Grammatical Error Correction? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), pages 4248-4254, July 2020. 3. Masato Hagiwara and Masato Mita. GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 6761‒6768, May 2020. 4. Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, Kentaro Inui. An Empirical Study of Incorporating Pseudo Data to Grammatical Error Correction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), pages 1236-1242, November 2019.
  • 58. International Conferences (Refereed) 3/3 58 1. Hiroki Asano, Masato Mita, Tomoya Mizumoto, Jun Suzuki. The AIP-Tohoku System at the BEA-2019 Shared Task. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 176-182, August 2019. 2. Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models ̶ Is Single-Corpus Evaluation Enough?. In Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 1309-1314, May 2019.
  • 59. Domestic conference (Not refereed) 1/3 59 1. 三⽥雅⼈, 萩原正⼈, 坂⼝慶祐, ⽔本智也, 鈴⽊潤, 乾健太郎. 論述リライトタスクの提案と⾃動評価 の実現に向けて. ⾔語処理学会第27回年次⼤会ワークショップ,2021年3⽉. 2. 三⽥雅⼈, ⾕中瞳. ⽂法誤り訂正モデルは訂正に必要な⽂法を学習しているか. ⾔語処理学会第27回 年次⼤会,2021年3⽉. 3. 三⽥雅⼈, ⾕中瞳. ⽂法誤り訂正モデルは⽂法知識を汎化しているか. NLP若⼿の会第15回シンポジ ウム ,2020年9⽉. 4. 松本悠太, 藤井諒, 阿部⾹央莉, ⾈⼭弘晃, 三⽥雅⼈. 漢字の意味構造を考慮した複数のニューラル漢 字創作システムの⽐較検討. NLP若⼿の会第15回シンポジウム ,2020年9⽉. 5. 藤井諒, 三⽥雅⼈, 阿部⾹央莉, 塙⼀晃, 森下睦, 鈴⽊潤, 乾健太郎. ユーザ⽣成 コンテンツの⾼品質 な⾃動翻訳に向けた⾔語現象の体系的分析. 第34回⼈⼯知能学会全国⼤会,2020年6⽉. 6. ⾈⼭弘晃, 佐々⽊翔太, ⽔本智也, 三⽥雅⼈, 鈴⽊潤, 松林優⼀郎, 乾健太郎. 記述式答案⾃動採点のた めの確信度推定⼿法の検討. ⾔語処理学会第26回年次⼤会,2020年3⽉.
  • 60. Domestic conference (Not refereed) 2/3 60 1. 五藤巧, 永⽥亮, 三⽥雅⼈, 塙⼀晃. 訂正難易度を考慮した⽂法誤り訂正のための性能評価尺度. ⾔語 処理学会第26回年次⼤会,2020年3⽉. 2. 清野舜, 鈴⽊潤, 三⽥雅⼈, ⽔本智也, 乾健太郎. ⼤規模疑似データを⽤いた⾼ 性能⽂法誤り訂正モデ ルの構築. ⾔語処理学会第26回年次⼤会,2020年3⽉. 優秀賞受賞 3. 三⽥雅⼈, 清野舜, ⾦⼦正弘, 鈴⽊潤, 乾健太郎. ⽂法誤り訂正のための⾃⼰改 良戦略に基づくノイズ 除去. ⾔語処理学会第26回年次⼤会,2020年3⽉.若⼿奨励賞受賞 4. Masato Mita, Masato Hagiwara, Keisuke Sakaguchi, Tomoya Mizumoto, Jun Suzuki, Kentaro Inui. Automated Essay Rewriting (AER): Grammatical Error Correction, Fluency Edits, and Beyond. 第241回⾃然⾔語処理研究会,2019年8⽉. 5. ⾈⼭弘晃, 佐々⽊翔太, ⽔本智也, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. ⾃動採点における確信度推定⼿法. NLP若⼿の会第14回シンポジウム ,2019年8⽉. 6. 五藤巧, 永⽥亮, 三⽥雅⼈, 塙⼀晃, ⽔本智也. ⽂法誤り訂正における問題の難しさを考慮した性能評 価尺度の提案. NLP若⼿の会第14回シンポジウム, 2019年8⽉.萌芽研究賞受賞
  • 61. Domestic conference (Not refereed) 3/3 61 1. 藤井諒, ⾈⼭弘晃, 北⼭晃太郎, 阿部⾹央莉, Ana brassard, 三⽥雅⼈, ⼤内啓樹. seq2seqによる部⾸ を考慮したニューラル漢字⽣成システム. NLP若⼿の会 第14回シンポジウム,2019年8⽉. 2. ⾦⼦正弘, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. コロケーション・イディオム誤りを考慮した⽂法誤り訂正 のための擬似データ⽣成. NLP若⼿の会第14回シンポジウム,2019年8⽉. 3. 藤井諒, 阿部⾹央莉, 塙⼀晃, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. ⽂法誤りに頑健な機械翻訳システムの実 現に向けた敵対性ノイズの検討. NLP若⼿の会第14回シンポジウム ,2019年8⽉. 4. 三⽥雅⼈, 萩原正⼈, 坂⼝慶祐, ⽔本智也, 鈴⽊潤, 乾健太郎. ⽂法誤り訂正を拡張した新タスクの提 案. NLP若⼿の会第14回シンポジウム,2019年8⽉. 奨励賞受賞 5. 三⽥雅⼈, ⽔本智也, ⾦⼦正弘, 永⽥亮, 乾健太郎. ⽂法誤り訂正のコーパス横断評価: 単⼀コーパス で⼗分か? ⾔語処理学会第25回年次⼤会, 2019年3⽉. 6. 三⽥雅⼈, ⽔本智也, ⼤内啓樹, 永⽥亮, 乾健太郎. ⽂法誤り訂正のための教師なし解釈性機構. NLP 若⼿の会第13回シンポジウム,2018年8⽉.
  • 62. Awards 62 1. ⾔語処理学会第26回年次⼤会 若⼿奨励賞. 2020年3⽉. 2. ⾔語処理学会第26回年次⼤会 優秀賞. 2020年3⽉. 3. NLP若⼿の会第14回シンポジウム 奨励賞. 2019年8⽉. 4. NLP若⼿の会第14回シンポジウム萌芽研究賞. 2019年8⽉.