The document discusses problems with using maximum likelihood estimation (MLE) for training neural encoder-decoder models for grammatical error correction (GEC). Specifically, MLE focuses on word-level accuracy and the model cannot recover if it predicts an incorrect word. To address this, the author proposes using a neural reinforcement learning (NRL) approach where the model directly optimizes expected reward on the training data rather than word accuracy. The NRL model updates its parameters based on rewards from predicted outputs rather than references. Evaluation shows the NRL model outperforms MLE and other baselines on automated and human evaluations.