The document summarizes the results of experiments comparing large pre-trained language models for machine translation. In a machine translation challenge, a smaller Marian model demonstrated better or similar results to much larger pretrained models, contradicting expectations. This suggests that very large models do not necessarily improve translation quality and that current automatic evaluation metrics are limited. Human evaluation remains important for fully assessing machine translation quality.