The document discusses why neural machine translation (NMT) models are better than statistical machine translation (SMT) models at producing translations of the appropriate length. It shows that SMT models tended to generate shorter translations due to optimizing for BLEU score, while NMT models directly optimize for maximum likelihood and thus produce lengthier translations that match the source text. The document then demonstrates this concept using a toy copying task, where an NMT model is able to match the length of input strings during translation more accurately than SMT models.