NAIST	
  at	
  the	
  HOO	
  2012	
  Shared	
  Task	
  
Keisuke	
  Sakaguchi,	
  Yuta	
  Hayashibe,	
  Shuhei	
  Kondo,	
  Lis	
  Kanashiro,	
  Tomoya	
  Mizumoto,	
  Mamoru	
  Komachi,	
  Yuji	
  Matsumoto	
  
ComputaFonal	
  LinguisFcs	
  Lab.	
  Graduate	
  School	
  of	
  InformaFon	
  Science,	
  Nara	
  InsFtute	
  of	
  Science	
  and	
  Technology	
  (NAIST),	
  Japan	
  	
  	
Configurations of the system!
Input →                                → Output !Spelling correction!
Determiner correction!
Preposition correction!
System Architecture for Spelling Error Correction!
System Architecture for Preposition Error Correction!
System Architecture for Determiner Error Correction!
Experiment and Result!
  Spelling correction for unknown words !
  Open source spelling checker: GNU Aspell!
  Ranked by Google Web 1T 5-gram language model!
  Preliminary experiment: 52.4, 72.2, and 60.7% (Precision, Recall, F-score) 	
  12 target prepositions (Chodorow et al., 2010): !
of, in, for, to, by, at, on, from, as, about (covering 91%) !
  Replacement and insertion errors: a single model for detection and correction!
  Deletion errors: focus whether direct objects of verbs need prepositions !
  Syntactic & semantic features described in (Tetreault et al., 2010)!
  Classifier: Maximum Entropy modeling!
  Trained on 2 types of corpus (FCE): !
  “Gold” (Corrected except for Prep errors)!
  “Original” (FCE plain texts)!
  3 target determiners: a, an, the (“an” was normalized to “a” in training & test)!
  check determiners of the left boundary of a noun phrase!
  2 parser models: “Normal” (trained on the normal treebank) vs. !
“mixed” (trained on the treebank & its modified version in which articles at the
left boundary of NPs were removed.)!
  Feature vector representation for each NP using syntax-based feature
templates inspired by (De Felice, 2008)!
  Classifier: Passive aggressive algorithm!
  Training corpus: the CLC FCE dataset and the BNC data with feature
augmentation approach of (Daumé III, 2007)!
References!
Summary!
Future Work!
  Correcting spelling errors of existing words (e.g. *the  then)!
  Getting rich knowledge about verbs from VerbNet and FrameNet!
  Adding target determiners (this, my, etc.)!
 Martin Chodorow, Michael Gamon, and Joel Tetreault. 2010. The Utility of Article and Preposition Error Correction Systems for English
Language Learners: Feedback and Assessment. Language Testing, 27(3):419– 436.!
 Joel Tetreault, Jennifer Foster, and Martin Chodorow. 2010. Using Parse Features for Preposition Selection and Error Detection. In Proceedings
of the 47th Annual Meeting of the Association for Computational Linguistics Short Papers, pages 353–358, Uppsala, Sweden.!
 Rachele De Felice. 2008. Automatic Error Detection in Non-native English. Ph.D. thesis University of Oxford. !
 Hal Daumé III. 2007. Frustratingly Easy Domain Adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational
Linguistics, pages 256–263, Prague, Czech Republic. !
http://cl.naist.jp/en/ {keisuke-sa, yuta-h, shuhei-k, lis-k, tomoya-m, komachi, matsu}@is.naist.jp 	
  Combined after revision!
 Spelling: Spelling correction improved preposition error correction!
 Prepositions: Performed better when trained with the “original” set!
 Determiners: The "mixed" model improved performance!
  8 different configurations (Runs) 
  Preposition!
  Determiner!
*We	
  re-­‐evaluated	
  the	
  Run2	
  because	
  we	
  submiTed	
  the	
  Run2	
  with	
  the	
  same	
  condiFon	
  as	
  Run0.

BEA12_sakaguchi

  • 1.
    NAIST  at  the  HOO  2012  Shared  Task   Keisuke  Sakaguchi,  Yuta  Hayashibe,  Shuhei  Kondo,  Lis  Kanashiro,  Tomoya  Mizumoto,  Mamoru  Komachi,  Yuji  Matsumoto   ComputaFonal  LinguisFcs  Lab.  Graduate  School  of  InformaFon  Science,  Nara  InsFtute  of  Science  and  Technology  (NAIST),  Japan     Configurations of the system! Input →                                → Output !Spelling correction! Determiner correction! Preposition correction! System Architecture for Spelling Error Correction! System Architecture for Preposition Error Correction! System Architecture for Determiner Error Correction! Experiment and Result!   Spelling correction for unknown words !   Open source spelling checker: GNU Aspell!   Ranked by Google Web 1T 5-gram language model!   Preliminary experiment: 52.4, 72.2, and 60.7% (Precision, Recall, F-score)   12 target prepositions (Chodorow et al., 2010): ! of, in, for, to, by, at, on, from, as, about (covering 91%) !   Replacement and insertion errors: a single model for detection and correction!   Deletion errors: focus whether direct objects of verbs need prepositions !   Syntactic & semantic features described in (Tetreault et al., 2010)!   Classifier: Maximum Entropy modeling!   Trained on 2 types of corpus (FCE): !   “Gold” (Corrected except for Prep errors)!   “Original” (FCE plain texts)!   3 target determiners: a, an, the (“an” was normalized to “a” in training & test)!   check determiners of the left boundary of a noun phrase!   2 parser models: “Normal” (trained on the normal treebank) vs. ! “mixed” (trained on the treebank & its modified version in which articles at the left boundary of NPs were removed.)!   Feature vector representation for each NP using syntax-based feature templates inspired by (De Felice, 2008)!   Classifier: Passive aggressive algorithm!   Training corpus: the CLC FCE dataset and the BNC data with feature augmentation approach of (Daumé III, 2007)! References! Summary! Future Work!   Correcting spelling errors of existing words (e.g. *the  then)!   Getting rich knowledge about verbs from VerbNet and FrameNet!   Adding target determiners (this, my, etc.)!  Martin Chodorow, Michael Gamon, and Joel Tetreault. 2010. The Utility of Article and Preposition Error Correction Systems for English Language Learners: Feedback and Assessment. Language Testing, 27(3):419– 436.!  Joel Tetreault, Jennifer Foster, and Martin Chodorow. 2010. Using Parse Features for Preposition Selection and Error Detection. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics Short Papers, pages 353–358, Uppsala, Sweden.!  Rachele De Felice. 2008. Automatic Error Detection in Non-native English. Ph.D. thesis University of Oxford. !  Hal Daumé III. 2007. Frustratingly Easy Domain Adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 256–263, Prague, Czech Republic. ! http://cl.naist.jp/en/ {keisuke-sa, yuta-h, shuhei-k, lis-k, tomoya-m, komachi, matsu}@is.naist.jp   Combined after revision!  Spelling: Spelling correction improved preposition error correction!  Prepositions: Performed better when trained with the “original” set!  Determiners: The "mixed" model improved performance!   8 different configurations (Runs)    Preposition!   Determiner! *We  re-­‐evaluated  the  Run2  because  we  submiTed  the  Run2  with  the  same  condiFon  as  Run0.