Bleu vs rouge

BLEU Evaluation vs
Rouge Evaluation

BLEU
the closer a machine translation is to a professional human
translation, the better it is
brevity penalty : defined to be e^(1-r/c)

ROUGE
 ROUGE-N: N-gram based co-occurrence statistics.
 ROUGE-L: Longest Common Subsequence (LCS) based statistics. Longest
common subsequence problem takes into account sentence level structure
similarity naturally and identifies longest co-occurring in sequence n-grams
automatically.
 ROUGE-W: Weighted LCS-based statistics that favors consecutive
 ROUGE-S: Skip-bigram based co-occurrence statistics. Skip-bigram is any pair
of words in their sentence order.
 ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.

 Bleu measures precision: how much the words (and/or n-grams) in the
machine generated summaries appeared in the human reference summaries.
 Rouge measures recall: how much the words (and/or n-grams) in the
human reference summaries appeared in the machine generated summaries.

 these results are complementing, as is often the case in precision vs recall.
 If you have many words from the system results appearing in the human
references you will have high Bleu
 if you have many words from the human references appearing in the system
results you will have high Rouge.

𝐹1 =
2 ∗ (𝐵𝐿𝐸𝑈 ∗ 𝑅𝑜𝑢𝑔𝑒)
(𝐵𝐿𝐸𝑈 + 𝑅𝑜𝑢𝑔𝑒)

Bleu vs rouge

More Related Content

What's hot

Recently uploaded

Bleu vs rouge