Grammatical Error Correction with Improved Real-world Applicability

0
Grammatical Error Correction
with Improved Real-world Applicability
実世界への適⽤性を指向した⽂法誤り訂正
三⽥雅⼈
情報科学研究科システム情報科学専攻乾研究室
博⼠論⽂本審査
2021年7⽉20⽇ @オンライン

Background
2
• Millions of people are learning English as a Second Language (ESL)
→ According to a report published by the British Council in 2013, English is spoken
at a useful level by 1.75 billion people worldwide
• Due to the difficulty of learning a new language, their written texts
may contain grammatical errors [Nagata et al.,2011; Dahlmeier et al.,2013]
e.g.) KJ corpus

Interests in automatic error correction
3
Commercial perspective:
• A great potential for many real-world application as
• Writing support tools to assist writers with their writing without
human intervention
• Education tools since it can provide real-time feedback
Research perspective:
• Interesting and challenging language generation task
→ language modeling, syntax and semantics in noisy text
• Actively studied as Grammatical Error Correction (GEC) task

Grammatical Error Correction (GEC)
4
• A task of correcting different kinds of errors in text such as spelling,
punctuation, grammatical, and word choice errors
Machine is design to help people.
Machines are design to help people.
Mainstream approaches:
• Encoder-Decoder model based on Deep Neural Networks (DNN):
Ø Machine translation (MT) task: an ungrammatical text → a grammatical text
! It can theoretically correct all error types without expert knowledge
! It allows cutting-edge neural MT models to be adopted

Systems achieved human-level performance…
5
From [Ge et al.,2018]
→ From a commercial perspective, three major issues in the current GEC:
1. Evaluation
2. Data Noise
3. Low Resource

Issue1: Evaluation
6
GEC community tends to evaluate systems on a particular corpus written
by relatively proficient learners (e.g., CoNLL-2014)
Research (GEC community):
CoNLL-2014 [Ng et al., 2014]
X Y Z
GEC systems are expected to be able to
robustly correct errors in any written text
Real-world scenarios:
Proficient
Independent
Basic
GEC system
performance
Question:
Can we realize a reliable enough evaluation to be applied in real-world scenarios?

Issue2: Data Noise
7
We will discuss about discuss this with you.
I want to discuss about discuss of the education.
We discuss about discuss about our sales target.
Inconsistent annotations in GEC corpus: [Lo et al., 2018]
Research (GEC community):
Little focus on verifying and ensuring:
ü the quality of the datasets
ü how lower-quality data might affect GEC
performance
Real-world scenarios:
ü Limited available data
ü Not always to be possible to use
high-quality data
Question:
A better GEC model can be built by reducing noise in GEC corpora ?

Issue3: Low Resource
8
Question:
How to build a lightweight models requiring less resources?
Figure from [Kiyono+2019]
Current de fact : Incorporating GEC system with pseudo-data
→ Tendency to require more resources to develop GEC systems (e.g.,
GPUs and training time)
Real-world perspective (checklist):
ü Performance
ü Low resources
ü Inference speed
:
etc.

Three issues and goal
9
1. Evaluation
uNo reliable and robust evaluation methodologies
2. Data noise
uNo data denoising methodologies
3. Low resource
uIncreased resources required for model development
Underlying Motivation & Goal
• Provide the foundation and research direction for GEC with Improved
Real-world Applicability
• Contribute to make the GEC study more meaningful in real-world
scenarios

Overview
10
Evaluation Data Noise Low Resource
§1,§2
§3 §4 §5
u How to realize a reliable evaluation?
→ Cross-sectional evaluation
(NAACL 2019, Journal of NLP 2021)
u How to design denoising method?
→ A self-refinement strategy
(EMNLP 2020)
u How to build a lightweight models
requiring less resources?
→ Grammatical generalization ability
(ACL 2021)

Background: Evaluation
11
• Most of the previous works conduct evaluation using CoNLL-2014
• Recently, more and more works have used the JFLEG in combination,
but (customarily) independently evaluate using different metrics
Essays written by students at the
National University of Singapore

In real-world scenarios
12
• Real-world applications assume a wide variety of writing as input
• The difficulty varies under different conditions
Proficient
Independent
Basic
GEC system
GEC systems are expected to be able to
robustly correct errors in any written text
Error tendencies vary depending on
the learner's proficiency level
e.g.) proficiency

Chapter3: Cross-sectional Evaluation of GEC Models
13
What we did in this chapter:
1. Check if the current evaluation is reliable (NAACL 2019)
2. Explore a evaluation methodology with improved real-world applicability
(Journal of NLP 2021)

Chapter3: Cross-sectional Evaluation of GEC Models
14
1. Check if the current evaluation is reliable (NAACL 2019)
2. Explore a evaluation methodology with improved real-world applicability
(Journal of NLP 2021)
Current benchmark
Are there variations in
the evaluation results?
CoNLL-2014 [Ng et al., 2014]
X Y Z
X Y Z
Corpus A
X Y Z
Corpus B
X Y
Z
Corpus C
performance

GEC systems
15
• The systems must be based on machine translation
• Each systems must be implemented to have a competitive performance
on CoNLL02014
Requirements：
• LSTM: LSTM based system [Luong et al., 2015]
• CNN: CNN based system [Chollampatt et al., 2017]
• Transformer: Transformer based system [Vaswani et al., 2017]
• SMT: Statistical Machine Translation based system [Junczys-Dowmunt et al., 2017]

Cross-corpora Evaluation (NAACL 2019)
16
• Systemʼs rankings considerably vary depending on the corpus
→ Single-corpus evaluation is not reliable for GEC

Analysis
17
• Performance evaluation by error type （CoNLL-2014）
Determiner
e.g. [this → these]
Preposition
e.g. [for → with]
Punctuation
e.g. [. Because → , because]
Verb
e.g. [grow → bring]
Noun Number
e.g. [cat → cats]
Verb Tense
e.g. [eat → has eaten]
→ Each system has different strengths and weaknesses

Analysis
18
• Performance evaluation by error type（Cross-corpora）
• The best-performing models for each error type in each corpus
CoNLL-
2014
CoNLL-
2013
FCE JFLEG KJ BEA-2019
Det. LSTM LSTM LSTM SMT CNN LSTM
Prep. SMT Transformer SMT Transformer LSTM Transformer
Punct. Transformer Transformer Transformer SMT LSTM SMT
Verb LSTM CNN SMT LSTM LSTM Transformer
Noun
Num.
LSTM Transformer CNN LSTM CNN LSTM
Verb Form Transformer Transformer Transformer LSTM CNN Transformer
→ Each corpus has different tendency errors

Cross-sectional Evaluation (Journal of NLP 2021)
19
Ideas:
• No necessity for the evaluation segment to be a corpus
! Possible to investigate the behavior of the model more precisely
with evaluation segments (perspective) we want to focus on

Proficiency-wise dataset: BEA-2019
20
BEA-2019 contains CEFR-compliant proficiency information for writers
Basic Independent Proficient
A1
A2
B1
B2
C1
C2
CEFR: Common European Framework of
Reference for Languages
※ N: Native
※ WER (Word Edit Rate)
average sentence length: ⬆
word edit rate: ⬇
vocabulary size: ⬆

Result: Cross-proficiency evaluation
21
• In the basic-intermediate level (A,B), the performance of Transformer
is higher than others
basic-intermediate level

Result: Cross-proficiency evaluation
22
• In the advanced level (C,N), SMT achieved the highest performance
advanced level

Summary of Chapter 3
23
Observations
• The system rankings considerably vary depending on the corpus
→ Current single corpus evaluation is not reliable
• A large divergence in the evaluation between the basic-intermediate and advanced
levels of writer's proficiency
Research Question and Contribution:
Q: How to realize a reliable evaluation?
A: Evaluation from multiple perspectives by appropriately separating the data according
to the purpose (e.g., Cross-proficiency evaluation)
→ Provide the more reliable evaluation foundation for GEC
Limitations (Future work):
• Detailed factor analysis of ranking changes
• New metrics appropriate for cross-sectional evaluation

Overview
24
§1,§2
§3 §4 §5
→ A self-reﬁnement strategy
(EMNLP 2020)
(ACL 2021)

Background
25
• Manually created GEC data has been used implicitly as a cleanest
→ the data are usually manually built by experts
e.g.) KJ Corpus [Nagata et al.,2011]
Now, I live <prp crr=“in”></prp> my home alone.
Original: Now, I live my home alone .
Corrected: Now, I live in my home alone .

Issues and motivation
26
Lo et al. (2018)ʼs report:
• A GEC model trained on EFCamDat [Geertzen et al.,2013], the largest publicly available learner
corpus as of today (2M sent pairs), was outperformed by a model trained on a smaller
dataset (720K sent pairs)
• This may be due to the “inconsistent annotations”
We will discuss about discuss this with you.
I want to discuss about discuss of the education.
We discuss about discuss about our sales target.
Motivation:
In real-world scenarios, it may not always be possible to use high-quality data
→ Need to develop training strategy on low-quality data without sacrificing performance

Chapter 4: A Self-refinement Strategy for Noise Reduction
(EMNLP 2020)
27
1. Reveal the amount of noise in existing GEC data
2. Propose a data denoising method which improves GEC performance
3. Analyze how the method aﬀects both performance and the data itself

Presence of noise in GEC data
28
1. For 300 target sentences (Y) from each dataset, one expert reviewed
them and we obtained denoised ones (Yʼ)
2. Calculated the averaged Levenshtein distance between the original target
sentences (Y) and the denoised target sentences (Yʼ)
37.1
42.1
34.6
0
10
20
30
40
50
60
70
80
90
100
% noise
BEA-train EF Lang-8

Filtering ?
29
• A straightforward solution is to apply a filtering approach
→ Noisy data are filtered out and a smaller subset of high-quality sentence pairs is
retained (cf. MT)
Filtering
Intuition: Filtering approaches may not be the best choice in GEC :
1. GEC is a low-resource task compared to MT, thus further reducing data size by
ﬁltering may be critically ineﬀective;
2. Even noisy instances may still be useful for training since they might contain
some correct edits as well
We will discuss about
this with you
We will discuss
this with you
We discuss about our
sales target
sales target
I need to discuss about
the education
I need to discuss of
the education

Proposed method: Self-reﬁnement
30
Key Ideas:
Denoising datasets by leveraging the prediction consistency of existing models
Correction
Human
this with you
We will discuss
this with you
sales target
sales target
the education
I need to discuss of
the education
Re-correction
Model
this with you
We will discuss
this with you
sales target
We discuss our
sales target
the education
I need to discuss
the education

Self-refinement: Algorithm
31
! "
#
$
Base model
① Train a base model %
& ② Apply a base model to ' and
obtain system outputs '’
"’ ! )
"
)
$
④ Add (*, +
') to )
-
)
" = "’ ; 001 " – 001 "’ ≥ 4
)
" = " ; (001(") – 001 "’ < 4)
③ Selection
Fail-safe mechanism using language model
Noisy Parallel Data: #
$ = (!, ")
Denoised Parallel Data: )
$ = {}
All trainable parameters: 8
Denoised model
⑤ Train a denoised new model +
&

Result
32
• Signiﬁcantly improved performance
across all training/test sets

Result
33
• Filtering approaches can be useful for
corpora with large data size

Result
34
• Not useful with small data
→ Suggests the possibility of excluding
even instances that were partially useful
for training the model

Precision vs. Recall
35
• Recall signiﬁcantly increased, while precision was mostly maintained
→ Due to the correction of "inconsistent annotations”

Analysis: Noise reduction
36
• Manually evaluated 500 triples of source
sentences (X), original target sentences (Y), and
generated target sentences (Yʼ)
→ 73.6% of the replaced samples were determined to
be appropriate corrections, including cases where
both were correct

37
Observations
• A non-negligible amount of noise in the most commonly used training data for GEC
• Significantly improved performance by removing noise
Q: How to design denoising method?
A: Develop a simple but effective denoising method based on self-refinement strategy
→ Enable to develop an accurate GEC systems with low-quality data
Limitations:
• Boundary conditions under which noise reduction works effectively are unclear

Overview
38
§1,§2
§3 §4 §5
(EMNLP 2020)
(ACL 2021)

Issues： Larger data, bigger model
39
• Pseudo-data generation is popular
− Generate pseudo-errors from grammatical sentence sets (e.g., Wikipedia)
• Increased training data
− Increased resources required for model development（GPUs, training time…etc.）
− Need to add about 60 million samples of pseudo-data to improve a standard
measure of GEC, F0.5 score, by only two points [Kiyono et al.,2019]
Figure from [Kiyono et al.,2019]

Research Question
40
Type2: Errors based on grammatical rule（e.g., subject-verb agreement）
Every dog [run → runs] quickly
Type1: Errors are not based on grammatical rule（e.g., collocation）
I listen [in → to] his speech carefully
Two types of errors covered by GEC…
Intuition:
Do not need to memorize individual
patterns if we have learned rules
A. Yes → No need large amounts of data (at least for Type2)
A. No → Need to incorporate grammatical knowledge as rules into the models
Q. Do GEC models realize grammatical generalization?

Chapter5: Do GEC Models Realize Grammatical Realization ?
(ACL 2021)
41
1. Introduce an analysis method to evaluate whether models can generalize
to unseen errors
2. Add new depth to the study of GEC beyond just improving the scores

Proposed method
42
• Automatically build datasets with controlled vocabularies to appear in error
locations in training and test sets
• Compare the performance when correcting previously seen error correction
patterns (Known setting) to correcting unseen patterns of the same error type
(Unknown setting)
Train:
Test 1:
(Known Setting)
Every polite cow *smile / smiles awkwardly
Test 2:
(Unknown Setting)
Every white fox *run / runs quickly
Every dog *run / runs quickly
That slimy duck smiles / smiles awkwardly
Some slimy cows smile / smile dramatically

Two types of data: synthetic and real data
43
Synthetic data Real data
Methods
Synthesizing using context
free grammar (CFG)
Sampling from existing
GEC datasets
① control of patterns ✔ ✔
② control of vocabulary ✔
• Investigate standard five error types defined by Bryant et al.
(2017), which are errors based on grammatical rules:
• Subject-verb agreement errors（VERB:SVA）
• Verb forms errors（VERB:FORM）
• Word order errors（WO）
• Morphological errors（MORPH）
• Noun number errors（NOUN:NUM）

Examples of automatically constructed data
44
Synthetic data：Sentences with limited vocabulary and syntax
Real data： Sentences with a diversity of vocabulary and syntax

Result: Synthetic data
45
• The model's performance drops significantly in the unknown setting
compared to the known setting, except for WO
→ It lacks the generalization ability required to correct errors from
provided training examples
Dataset VERB:SVA VERB:FORM WO MORPH NOUN:NUM
Synthetic data
Known 99.61 99.17 99.09 98.44 97.47
Unknown 46.05 56.93 84.00 29.35 65.55
-53.56 -42.24 -15.09 -69.09 -31.92
Real data
Known 87.84 86.36 74.89 87.77 83.75
Unknown 6.28 6.28 9.25 3.83 12.49
-81.56 -80.08 -65.64 -83.94 -71.26
Table 2: Generalization performance for unseen errors. Each number represents an F0.5 score.
specifically, we design two kinds of generation
rules for each of the five error types to be ana-
lyzed, one generating grammatical sentences and
the other ungrammatical ones1. For example, for
and 68,002 sentence pairs for NOUN:NUM. Com-
pared to the synthetic data, real data has a wide
variety of vocabulary and syntax ranging from sim-
ple to complex.

Result: Real data
46
The model's performance drops significantly on all errors
→ Generalization is more difficult in more practical settings where the
vocabulary and syntax are diverse
Dataset VERB:SVA VERB:FORM WO MORPH NOUN:NUM
Synthetic data
Known 99.61 99.17 99.09 98.44 97.47
Unknown 46.05 56.93 84.00 29.35 65.55
-53.56 -42.24 -15.09 -69.09 -31.92
Real data
Known 87.84 86.36 74.89 87.77 83.75
Unknown 6.28 6.28 9.25 3.83 12.49
-81.56 -80.08 -65.64 -83.94 -71.26
Table 2: Generalization performance for unseen errors. Each number represents an F0.5 score.
specifically, we design two kinds of generation
rules for each of the five error types to be ana-
lyzed, one generating grammatical sentences and
the other ungrammatical ones1. For example, for
and 68,002 sentence pairs for NOUN:NUM. Com-
pared to the synthetic data, real data has a wide
variety of vocabulary and syntax ranging from sim-
ple to complex.

Detection vs. Correction
47
Q. Which factor is responsible for the failure to generalize grammatical knowledge?
1. An inability to detect errors
2. An inability to predict the correct words
0
20
40
60
80
100
VERB:SVA VERB:FORM WO MORPH NOUN:NUM
F
0.5
Correction
(known)
Detection
(unknown)
Correction
(unknown)
VE
VE
W
M
NO

Complexity in real data
48
noiseless noisy
VERB:SVA 9.95 5.78
VERB:FORM 12.33 5.47
WO 7.89 9.35
MORPH 6.32 3.90
NOUN:NUM 24.16 12.49
We observed the effect of two contributing factors of complexity in real data:
1. Error complexity
2. Sentence length
• WO is robust against complexity of input sentences
→ The reason why WO was relatively low compared to the others, even with real data
WO does not depend on
the sentence length
WO does not depend on
the error complexity
the target error is
the only error
contains other errors
besides the target error

Can a few correction patterns improve model performance ?
49
• Performance change when we expose the model to a few error
correction patterns
• Adding even just one or two samples to the training data can
significantly improve the modelʼs performance
→ Important to sample few seen patterns for each word when building training data

50
Observations:
• A current standard Transformer-based GEC model fails to realize grammatical
generalization even in simple settings with limited vocabulary and syntax
Q: How to build a lightweight models requiring less resources?
A: A combination of rule-based and DNN-based methods is necessary
→ Provide a research direction to implement lightweight GEC models
Limitations:
• No real solutions based on our findings

Overview
51
§1,§2
§3 §4 §5
(EMNLP 2020)
(ACL 2021)

Contributions
52
§3. How to realize a reliable evaluation?
❖ Demonstrated current single-corpus evaluation is not reliable and
proposed the cross-sectional evaluation as alternative
→ Provide the more reliable evaluation foundation for GEC
§4. How to design denoising method?
❖ Developed a simple but effective denoising method
→ Enable to develop an accurate GEC systems with low-quality data
§5. How to build a lightweight models requiring less resources?
❖ Showed that a combination of rule-based and DNN-based methods is
necessary
→ Provide a research direction to implement lightweight GEC models

Summary of the thesis
53
• This thesis focus on the three major issues that arise when trying to
apply GEC systems to the real-world: Evaluation, Data Noise, and Low
Resource
→ It will facilitate discussions on systems oriented to real-world
applicability for bridging gaps between GEC study and real-world
settings
Research
Accuracy first
Narrow domain
Clean data
Real-world
Noisy data
Wide range of domains
Low resource
Gaps

Appendix:
List of Publications/Presentations
54

Journal Papers
55
1. Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui.
Phenomenon-wise Evaluation Dataset Towards Analyzing Robustness of Machine
Translation Models. (in Japanese). In Journal of Natural Language Processing, Volume 28,
Number 2, pp450-478.
2. Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross-
Sectional Evaluation of Grammatical Error Correction Models. (in Japanese). In Journal of
Natural Language Processing, Volume 28, Number 1, pp.160-182, March 2021.

International Conferences (Refereed) 1/3
56
1. Masato Mita, Hitomi Yanaka. Do Grammatical Error Correction Models Realize Grammatical
Generalization?. In Findings of the Joint Conference of the 59th Annual Meeting of the
Association for Computational Linguistics and the 11th International Joint Conference on
Natural Language Processing (ACL-IJCNLP 2021) (To appear).
2. Takumi Gotou, Ryo Nagata, Masato Mita, Kazuaki Hanawa. Taking the Correction Difficulty
into Account in Grammatical Error Correction Evaluation. In Proceedings of the 28th
International Conference on Computational Linguistics (COLING 2020), pages 2085-2095,
December 2020.
3. Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui.
PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-
Generated Contents. In Proceedings of the 28th International Conference on Computational
Linguistics (COLING 2020), pages 2085-2095, December 2020.
4. Masato Mita, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, Kentaro Inui. A Self-Refinement
Strategy for Noise Reduction in Grammatical Error Correction. In Findings of the 2020
Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp.267‒280,
November 2020.

57
1. Hiroaki Funayama, Shota Sasaki, Yuichiro Matsubayashi, Tomoya Mizumoto, Jun Suzuki,
Masato Mita, Kentaro Inui. Preventing Critical Scoring Errors in Short Answer Scoring with
Confidence Estimation. In Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics: Student Research Workshop, pages 237-243, July 2020.
2. Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, Kentaro Inui. Can Encoder- decoder
Models Benefit from Pre-trained Language Representation in Grammatical Error
Correction? In Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics (ACL 2020), pages 4248-4254, July 2020.
3. Masato Hagiwara and Masato Mita. GitHub Typo Corpus: A Large-Scale Multilingual
Dataset of Misspellings and Grammatical Errors. In Proceedings of the 12th Conference on
Language Resources and Evaluation (LREC 2020), pages 6761‒6768, May 2020.
4. Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, Kentaro Inui. An Empirical Study of
Incorporating Pseudo Data to Grammatical Error Correction. In Proceedings of the 2019
Conference on Empirical Methods in Natural Language Processing and the 9th International
Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), pages 1236-1242,
November 2019.

58
1. Hiroki Asano, Masato Mita, Tomoya Mizumoto, Jun Suzuki. The AIP-Tohoku System at the
BEA-2019 Shared Task. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP
for Building Educational Applications, pages 176-182, August 2019.
2. Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross-Corpora
Evaluation and Analysis of Grammatical Error Correction Models ̶ Is Single-Corpus
Evaluation Enough?. In Proceedings of the 17th Annual Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies
(NAACL-HLT), pages 1309-1314, May 2019.

Domestic conference (Not refereed) 1/3
59
1. 三⽥雅⼈, 萩原正⼈, 坂⼝慶祐, ⽔本智也, 鈴⽊潤, 乾健太郎. 論述リライトタスクの提案と⾃動評価
の実現に向けて. ⾔語処理学会第27回年次⼤会ワークショップ，2021年3⽉.
2. 三⽥雅⼈, ⾕中瞳. ⽂法誤り訂正モデルは訂正に必要な⽂法を学習しているか. ⾔語処理学会第27回
年次⼤会，2021年3⽉.
3. 三⽥雅⼈, ⾕中瞳. ⽂法誤り訂正モデルは⽂法知識を汎化しているか. NLP若⼿の会第15回シンポジ
ウム，2020年9⽉.
4. 松本悠太, 藤井諒, 阿部⾹央莉, ⾈⼭弘晃, 三⽥雅⼈. 漢字の意味構造を考慮した複数のニューラル漢
字創作システムの⽐較検討. NLP若⼿の会第15回シンポジウム，2020年9⽉.
5. 藤井諒, 三⽥雅⼈, 阿部⾹央莉, 塙⼀晃, 森下睦, 鈴⽊潤, 乾健太郎. ユーザ⽣成コンテンツの⾼品質
な⾃動翻訳に向けた⾔語現象の体系的分析. 第34回⼈⼯知能学会全国⼤会，2020年6⽉.
6. ⾈⼭弘晃, 佐々⽊翔太, ⽔本智也, 三⽥雅⼈, 鈴⽊潤, 松林優⼀郎, 乾健太郎. 記述式答案⾃動採点のた
めの確信度推定⼿法の検討. ⾔語処理学会第26回年次⼤会，2020年3⽉.

60
1. 五藤巧, 永⽥亮, 三⽥雅⼈, 塙⼀晃. 訂正難易度を考慮した⽂法誤り訂正のための性能評価尺度. ⾔語
処理学会第26回年次⼤会，2020年3⽉.
2. 清野舜, 鈴⽊潤, 三⽥雅⼈, ⽔本智也, 乾健太郎. ⼤規模疑似データを⽤いた⾼性能⽂法誤り訂正モデ
ルの構築. ⾔語処理学会第26回年次⼤会，2020年3⽉. 優秀賞受賞
3. 三⽥雅⼈, 清野舜, ⾦⼦正弘, 鈴⽊潤, 乾健太郎. ⽂法誤り訂正のための⾃⼰改良戦略に基づくノイズ
除去. ⾔語処理学会第26回年次⼤会，2020年3⽉.若⼿奨励賞受賞
4. Masato Mita, Masato Hagiwara, Keisuke Sakaguchi, Tomoya Mizumoto, Jun Suzuki, Kentaro
Inui. Automated Essay Rewriting (AER): Grammatical Error Correction, Fluency Edits, and
Beyond. 第241回⾃然⾔語処理研究会，2019年8⽉.
5. ⾈⼭弘晃, 佐々⽊翔太, ⽔本智也, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. ⾃動採点における確信度推定⼿法.
NLP若⼿の会第14回シンポジウム，2019年8⽉.
6. 五藤巧, 永⽥亮, 三⽥雅⼈, 塙⼀晃, ⽔本智也. ⽂法誤り訂正における問題の難しさを考慮した性能評
価尺度の提案. NLP若⼿の会第14回シンポジウム， 2019年8⽉.萌芽研究賞受賞

61
1. 藤井諒, ⾈⼭弘晃, 北⼭晃太郎, 阿部⾹央莉, Ana brassard, 三⽥雅⼈, ⼤内啓樹. seq2seqによる部⾸
を考慮したニューラル漢字⽣成システム. NLP若⼿の会第14回シンポジウム，2019年8⽉.
2. ⾦⼦正弘, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. コロケーション・イディオム誤りを考慮した⽂法誤り訂正
のための擬似データ⽣成. NLP若⼿の会第14回シンポジウム，2019年8⽉.
3. 藤井諒, 阿部⾹央莉, 塙⼀晃, 三⽥雅⼈, 鈴⽊潤, 乾健太郎. ⽂法誤りに頑健な機械翻訳システムの実
現に向けた敵対性ノイズの検討. NLP若⼿の会第14回シンポジウム，2019年8⽉.
4. 三⽥雅⼈, 萩原正⼈, 坂⼝慶祐, ⽔本智也, 鈴⽊潤, 乾健太郎. ⽂法誤り訂正を拡張した新タスクの提
案. NLP若⼿の会第14回シンポジウム，2019年8⽉. 奨励賞受賞
5. 三⽥雅⼈, ⽔本智也, ⾦⼦正弘, 永⽥亮, 乾健太郎. ⽂法誤り訂正のコーパス横断評価: 単⼀コーパス
で⼗分か? ⾔語処理学会第25回年次⼤会， 2019年3⽉.
6. 三⽥雅⼈, ⽔本智也, ⼤内啓樹, 永⽥亮, 乾健太郎. ⽂法誤り訂正のための教師なし解釈性機構. NLP
若⼿の会第13回シンポジウム，2018年8⽉.

Awards
62
1. ⾔語処理学会第26回年次⼤会若⼿奨励賞. 2020年3⽉.
2. ⾔語処理学会第26回年次⼤会優秀賞. 2020年3⽉.
3. NLP若⼿の会第14回シンポジウム奨励賞. 2019年8⽉.
4. NLP若⼿の会第14回シンポジウム萌芽研究賞. 2019年8⽉.

Grammatical Error Correction with Improved Real-world Applicability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Grammatical Error Correction with Improved Real-world Applicability

Similar to Grammatical Error Correction with Improved Real-world Applicability (20)

Recently uploaded

Recently uploaded (20)

Grammatical Error Correction with Improved Real-world Applicability