Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Neural Grammatical Error Correction built on Better Pre-training and Sequential Transfer Learning


Published on

A Neural Grammatical Error Correction built on Better Pre-training and Sequential Transfer Learning

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Neural Grammatical Error Correction built on Better Pre-training and Sequential Transfer Learning

  1. 1. A Neural Grammatical Error Correction built on Better Pre-training and Sequential Transfer Learning 2019-07-24 Jiyeon Ham Kakao Brain 1
  2. 2. Contents 1. What is GEC? 2. Previous Work 3. Our Approach 4. Results 2
  3. 3. 1. What is GEC? 3
  4. 4. Grammatical Error Correction Input Travel by bus is exspensive, bored and annoying. Output Travelling by bus is expensive, boring and annoying. 4
  5. 5. ACL 2019 BEA Challenge • Building Educational Application 2019: Shared Task • Restricted Track • Public data only • Low Resource Track • WI+Locness dev (4K) only 5
  6. 6. Data 6 • Data sources for each track Lang8 NUCLE FCE WI+Locness Description Online English learning site College student essays ESL exam questions English essays (native & non-native) Data size (sentences) 570K 21K 33K 33K (train) 4K (dev) 4K (test) Quality Relatively poor Good Good Good Restricted Track Low Resource Track Train Lang8, NUCLE, FCE, WI-train WI-dev-3k Template WI-train WI-dev-3k Fine-tuning WI-train WI-dev-3k Validation WI-dev WI-dev-1k
  7. 7. ERRANT • ERRor Annotation Toolkit (Bryant et al., 2017)* • Automatically annotate parallel English sentences with error type information • Extract the edits, and then classify them according to a rule-based error type framework * Christopher Bryant, Mariano Felice, and Ted Briscoe. 2017. Automatic annotation and evaluation of Error Types for Grammatical Error Correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada. 7
  8. 8. ERRANT Input Travel by bus is exspensive, bored and annoying. Output [Travel→Travelling] by bus is [exspensive→expensive], [bored→boring] and annoying. 8 R:SPELL R:VERB:FORM R:VERB:FORM
  9. 9. 2. Previous work 9
  10. 10. GEC as Low-resource Machine Translation* • Translating from erroneous to correct text • Techniques proposed for low-resource MT are applicable to improving neural GEC * M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, K. Heafield: Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task, NAACL 2018. 10
  11. 11. Denoising Autoencoder • Learns to reconstruct the original input given its noisy version • Minimize the reconstruction loss 𝐿(𝑥, dec(enc )𝑥 ) given an input 𝑥 and a noising function 𝑓 𝑥 = )𝑥 11
  12. 12. Copy-augmented Transformer* • Combines Transformer with copy scores • Copy score: softmax outputs of the encoder-decoder attention • Pretrained on denoising autoencoding task • Auxiliary losses • Token-level labeling • Sentence-level copying * Zhao, Wei, et al. "Improving Grammatical Error Correction via Pre-Training a Copy- Augmented Architecture with Unlabeled Data.” NAACL (2019). 12
  13. 13. 3. Our approach 13
  14. 14. Pipeline Pre processing • Context-aware spell checker • BPE segmentation Pre- training • Error extraction • Perturbation Training Fine-tuning Post processing • <unk> edit removal • Re-rank • Error type control 14 Sequential transfer learning
  15. 15. Preprocessing • Context-aware spellchecker • Example: • This is an esay about my favorite sport. • This is an esay question. • Incorporates context using a pre-trained neural language model (LM) • Fix casing errors with list of proper nouns • Byte pair encoding (BPE) segmentation 15
  16. 16. Pre-training • Pre-training a seq2seq model on a denoising task • Realistic noising scenarios • Token-based approach • Extract human edits from annotated GEC corpora • Missing punctuations (adding a comma), preposition errors (of→at), verb tenses (has→have) • Type-based approach • Use a priori knowledge • Replace with other prepositions, nouns with their singular/plural versions, verbs with one of their inflected versions 16
  17. 17. Pre-training • Generating pre-training data • Generate erroneous sentences from high-quality English corpora • If a token is exists in the dictionary of token edits • A token-based error is generated with the probability 0.9 • If a token is not processed • Apply a type-based error 17 Source Gutenberg Tatoeba WikiText-103 Size (# sentences) 11.6M 1.17M 3.93M
  18. 18. Training and Fine-tuning • Model • Transformer* • Copy-augmented Transformer • Fine-tuning • Both the development & test sets come from the same source (WI+Locness) • Use smaller learning rates * Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017. 18
  19. 19. Postprocessing • <unk> recovery • Infrequent tokens will be changed to <unk> by BPE tokenization • LM re-ranking • Generate sentences which are corrected or not corrected for each changed place, and calculate their perplexity • Error type control • Randomly choose some categories to drop and calculate ERAANT F0.5 score in valid set 19
  20. 20. 4. Results 20
  21. 21. Results 21
  22. 22. Context-Aware Spellchecking • Our spellchecker incorporates context to hunspell using a pre-trained neural language model (LM) 22 Add LM-based approach Fixing casing issues
  23. 23. Comparison of error generation • Performance gap decreases on Restricted Track • Our pre-training functions as proxy for training 23
  24. 24. Result on error types • Token-based error generations • Type-based error generations • Context-aware spellchecker • Challenging to match human annotators’ “naturalness” edits 24
  25. 25. Questions 25