This document discusses various automatic evaluation metrics for machine translation: - BLEU evaluates matching n-grams between reference and translated texts but ignores position and favors shorter translations. - METEOR explicitly matches words accounting for stem, synonym, and paraphrase matches. It aims for high precision and recall. - RIBES uses rank correlation coefficients between reference and translation word order to evaluate language pairs where word-for-word matching is difficult. - Statistical testing like bootstrapping is used to determine if differences in evaluation scores between systems are statistically significant.