The document discusses using neural reinforcement learning for grammatical error correction. It presents an encoder-decoder model with attention for grammatical error correction. Typically these models are trained with maximum likelihood estimation, which has drawbacks of optimizing at the word level rather than sentence level and exposure bias between training and testing. The document proposes using reinforcement learning to directly optimize the expected reward of a metric at the sentence level. The experiment applies this to a grammatical error correction task, achieving better performance according to the GLEU metric than a model trained with maximum likelihood estimation.