SlideShare a Scribd company logo
1 of 1
Download to read offline
LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors
Aaron L. F. HAN

Derek F. WONG

Lidia S. CHAO

Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory

2CT)
(NLP

Department of Computer and Information Science, University of Macau

Conclusion and further work

Summary

Methodology

The conventional MT evaluation metrics tend to consider few
information factors about translation content or employ many
external linguistic resources and tools. The former will result in
low correlation with human judgments while the later will
make the metrics complicated. This paper proposed a novel
evaluation metric employing rich factors without relying on any
additional resource or tool, experiments showed the state-ofthe-art level compared with BLEU, TER and Meteor by the
correlation scores with human judgments, proving it a robust
and concise metric.

LEPOR was designed to take thorough variables into account and does not need any extra dataset or tool, which were aimed at both improving the
practical performance of the automatic metric and the easily operating. LEPOR focuses on combining two modified factor (sentence length penalty, ngram position difference penalty) and two classic methodologies (precision and recall), containing the main formula as below:
𝐿𝐸𝑃𝑂𝑅 = 𝐿𝑃 × 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 × 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃),
𝑟

1− 𝑐

𝑒
where 𝐿𝑃 = 1
𝑒

𝑐

1− 𝑟

𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 = 𝑒

𝑖𝑓 𝑐 < 𝑟
𝑖𝑓 𝑐 = 𝑟 ,
𝑖𝑓 𝑐 > 𝑟

−𝑁𝑃𝐷

, 𝑁𝑃𝐷 =

1
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡
𝛼
𝑅

𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝛼𝑅, 𝛽𝑃 = (𝛼 + 𝛽)/( +

Introduction
Since IBM proposed the system of BLEU as the automatic
metric for Machine Translation (MT) evaluation, many other
methods have been proposed to improve it. NIST added the
information weight into evaluation factors. Meteor proposed an
alternative way of calculating matched chunks to describe the
n-gram matching degree. Wong and Kit introduced position
difference in the evaluation metric. Other evaluation metrics
include TER and AMBER a modified version of BLEU
attaching more kinds of penalty coefficients combining the ngram precision and recall with the arithmetic average of Fmeasure etc. In order to distinguish the reliability of different
MT evaluation metrics, people used to apply the Spearman
correlation coefficient for evaluation.
Some MT evaluation metrics were designed with the part-ofspeech (POS) consideration using the linguistic tools, parser or
POS tagger. E.g. Machacek and Bojar proposed SemPOS
metric focusing on the English and Czech words.
To reduce the human tasks during the evaluation, the
methodologies that do not need reference translations are
growing up. E.g. MP4IBM1 used IBM1 model to calculate
scores based on morphemes, POS (4-grams) and lexicon
probabilities. MP4IBM1 needed large parallel bilingual corpus,
POS taggers etc.
Different words or phrases can express the same meanings, so it
is considered commonly in the literature to refer auxiliary
synonyms libraries during the evaluation task. Meteor was
based on unigram match on the words and their stems also with
additional synonyms database.
In the previous MT evaluation metrics, there are mainly two
problems: either presenting incomprehensive factors (e.g.
BLEU focus on precision) or relying on many external tools
and databases. The first aspect makes the metrics result in
unreasonable judgments. The second weakness makes the MT
evaluation metric complicated, time-consuming and not
universal for different languages. To address these weaknesses,
a novel metric LEPOR was designed employing rich factors
without relying on any additional resource.

𝐿𝐸𝑃𝑂𝑅 𝐴 =

1
𝑆𝑒𝑛𝑡𝑁𝑢𝑚

𝑆𝑒𝑛𝑡𝑁𝑢𝑚
𝑖=1

𝛽
),
𝑃

𝑃=

𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡
|𝑃𝐷 𝑖 |
𝑖=1
𝑐𝑜𝑚𝑚𝑜𝑛_𝑛𝑢𝑚
,
𝑠𝑦𝑠𝑡𝑒𝑚_𝑙𝑒𝑛𝑔𝑡ℎ

𝑅=

𝑐𝑜𝑚𝑚𝑜𝑛_𝑛𝑢𝑚
,
𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒_𝑙𝑒𝑛𝑔𝑡ℎ

𝐿𝐸𝑃𝑂𝑅 𝑖 , 𝐿𝐸𝑃𝑂𝑅 𝐵 = 𝐿𝑃 × 𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙𝑡𝑦 × 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃)

𝐿𝑃 means Length penalty, which is defined to embrace the penalty for both longer and shorter system outputs (output length c ) compared with the
reference translations (reference one r).
𝑁𝑃𝐷 means n-gram position difference penalty. The 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 value is designed to compare the words order in the sentences between reference
translation and hypothesis translation. The 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 value is normalized.
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡 represents the length of system output sentence and 𝑃𝐷 𝑖 means the n-gram position 𝐷-value (difference value) of aligned words between
output and reference sentences.
For the 𝑁𝑃𝐷 value, there are two steps: aligning and calculating. To begin with, the Context-dependent n-gram Word Alignment task: we take the
context-dependent factor into consideration and assign higher priority on it, which means we take into account the surrounding context (neighbouring
words) of the potential word to select a better matching pairs between the output and the reference. If there are both nearby matching or there is no
matched context around the potential words pairs, then we consider the nearest matching to align as a backup choice. The alignment direction is from
output sentence to the reference translations. Detailed introduction of the algorithm can be seen in our paper.
𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃) means the Harmonic mean of 𝛼𝑅 and 𝛽𝑃, where 𝛼 and 𝛽 are two parameters we designed to adjust the weight of 𝑅 (recall) and 𝑃
(precision).

Results
We trained LEPOR (including parameters α and β) on the ACL WMT 2008 data (EN: English, ES: Spanish, DE: German, FR: French and CZ: Czech).
For the context-dependent n-gram word alignment, we adjusted n as 2 on all the corpora meaning that we considered both the proceeding and following
two words as the context information. We used the MT evaluation corpora of 2011 ACL WMT for testing.
We compared the scoring results by LEPOR against the three “gold standard” metrics BLEU, TER and Meteor (version 1.3). In addition, we selected the
latest AMBER (modified version of BLEU) and MP4IBM1 as representatives to examine the quality of LEPOR. The correlation results are shown in
table:
Correlation Score with Human Judgment
Evaluation system

other-to-English

English-to-other

Mean

CZ-EN

DE-EN

ES-EN

FR-EN

EN-CZ

EN-DE

EN-ES

EN-FR

score

LEPOR-B

0.93

0.62

0.96

0.89

0.71

0.36

0.88

0.84

0.77

LEPOR-A

0.95

0.61

0.96

0.88

0.68

0.35

0.89

0.83

0.77

AMBER

0.88

0.59

0.86

0.95

0.56

0.53

0.87

0.84

0.76

Meteor-1.3-RANK

0.91

0.71

0.88

0.93

0.65

0.30

0.74

0.85

0.75

BLEU

0.88

0.48

0.90

0.85

0.65

0.44

0.87

0.86

0.74

TER

0.83

0.33

0.89

0.77

0.50

0.12

0.81

0.84

0.64

MP4IBM1

0.91

0.56

0.12

0.08

0.76

0.91

0.71

0.61

Result showed that LEPOR obtained the highest scores among
the metrics. BLEU, AMBER and Meteor-1.3 performed
unsteady with better correlation on some translation languages
and worse on others, resulting in medium level generally. TER
and MP4IBM1 got the worst scores by the mean correlation.
The result proved that LEPOR is a robust metric by
constructing augmented features and also a concise model
without using any external tool. MP4IBM1 did not need the
reference translations but the correlation was low. Result also
released the information that although the test metrics yielded
high system-level correlations with human judgments on
certain language pairs, they were far from satisfactory by
synthetically mean scores on eight corpora (spanning from 0.58
to 0.77 only).
Some further works are worth doing: test on synonym thesaurus
(translation of word can be re-expressed in different ways, such
as multi-words or paraphrases), evaluate the effectiveness on
languages of different topologies, employ the Multi-references
(another way to replace the synonym is the use of multireferences for evaluation).

Selected references
Papineni, K., Roukos, S., Ward, T. and Zhu, W. J. (2002).
BLEU: a method for automatic evaluation of machine
translation. In Proceedings of the 40th Annual Meeting of the
Association for Computational Linguistics (ACL 2002),
Philadelphia, PA, USA.
Doddington, G. (2002). Automatic evaluation of machine
translation quality using n-gram co-occurrence statistics. In
Proceedings of the second international conference on
Human Language Technology Research(HLT 2002),
California, USA.
Denkowski, M. and Lavie, A. (2011). Meteor 1.3: Automatic
metric for reliable optimization and evaluation of machine
translation systems. In Proceedings of the Sixth Workshop on
Statistical Machine translation of the Association for
Computational Linguistics, Edinburgh, Scotland, UK.
Wong, B. T-M and Kit, C. (2008). Word choice and word
position for automatic MT evaluation. In Workshop:
MetricsMATR of the Association for Machine Translation in
the Americas (AMTA), Waikiki, USA.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul,
J. (2006). A study of translation edit rate with targeted
human annotation. In Proceedings of the Conference of the
Association for Machine Translation in the Americas
(AMTA), Boston, USA.

0.58

Acknowledgments: This work is partially supported by the Research Committee of University of Macau and Science and Technology Development Fund of Macau under the grants UL019B/09-Y3/EEE/LYP01/FST and 057/2009/A2.

24th International Conference on Computational Linguistics (COLING 2012). Open source https://github.com/aaronlifenghan/aaron-project-lepor

More Related Content

What's hot

IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
 
Doppl Development Introduction
Doppl Development IntroductionDoppl Development Introduction
Doppl Development IntroductionDiego Perini
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis kevig
 
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEMULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEcsandit
 
Comparative analysis of algorithms_MADI
Comparative analysis of algorithms_MADIComparative analysis of algorithms_MADI
Comparative analysis of algorithms_MADISayed Rahman
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESCLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESecij
 
Argument extraction from news, blog and social media.
Argument extraction  from news, blog and social media.Argument extraction  from news, blog and social media.
Argument extraction from news, blog and social media.Sharath TS
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysisSudhaa Ravi
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and LexemeA. S. M. Shafi
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical AnalysisNayemid4676
 
Benchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise applicationBenchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise applicationConference Papers
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationijnlc
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 
Modern Programming Languages classification Poster
Modern Programming Languages classification PosterModern Programming Languages classification Poster
Modern Programming Languages classification PosterSaulo Aguiar
 
Classification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersClassification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersIJCSIS Research Publications
 

What's hot (19)

IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
Doppl Development Introduction
Doppl Development IntroductionDoppl Development Introduction
Doppl Development Introduction
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis
 
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEMULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
 
Comparative analysis of algorithms_MADI
Comparative analysis of algorithms_MADIComparative analysis of algorithms_MADI
Comparative analysis of algorithms_MADI
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESCLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
 
Argument extraction from news, blog and social media.
Argument extraction  from news, blog and social media.Argument extraction  from news, blog and social media.
Argument extraction from news, blog and social media.
 
Compiler design Project
Compiler design ProjectCompiler design Project
Compiler design Project
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and Lexeme
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Benchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise applicationBenchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise application
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
Modern Programming Languages classification Poster
Modern Programming Languages classification PosterModern Programming Languages classification Poster
Modern Programming Languages classification Poster
 
Classification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersClassification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four Classifiers
 

Viewers also liked

ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...
ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...
ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...Lifeng (Aaron) Han
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...Lifeng (Aaron) Han
 
GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...
GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...
GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...Lifeng (Aaron) Han
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...Lifeng (Aaron) Han
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric Lifeng (Aaron) Han
 
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...Lifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 

Viewers also liked (9)

ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...
ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...
ACL-WMT Poster.A Description of Tunable Machine Translation Evaluation System...
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
 
GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...
GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...
GSCL2013 Poster.A Study of Chinese Word Segmentation Based on the Characteris...
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric
 
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 

Similar to LEPOR Metric Evaluates Machine Translation with Augmented Factors

MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Yafi Azhari
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveyLifeng (Aaron) Han
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
Work in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingWork in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingManuel Castro
 
Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfSarah Marie
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONijnlc
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONkevig
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
 
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...aciijournal
 
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...aciijournal
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSkevig
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSkevig
 

Similar to LEPOR Metric Evaluates Machine Translation with Augmented Factors (20)

MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a survey
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
Work in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingWork in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper Writing
 
Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdf
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
 
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
Classification of MT-Output Using Hybrid MT-Evaluation Metrics for Post-Editi...
 
semeval2016
semeval2016semeval2016
semeval2016
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
 

More from Lifeng (Aaron) Han

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterLifeng (Aaron) Han
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Lifeng (Aaron) Han
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Lifeng (Aaron) Han
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerLifeng (Aaron) Han
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...Lifeng (Aaron) Han
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaLifeng (Aaron) Han
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationLifeng (Aaron) Han
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelLifeng (Aaron) Han
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the societyLifeng (Aaron) Han
 
Machine translation evaluation: a survey
Machine translation evaluation: a surveyMachine translation evaluation: a survey
Machine translation evaluation: a surveyLifeng (Aaron) Han
 

More from Lifeng (Aaron) Han (20)

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the society
 
Thesis-Master-MTE-Aaron
Thesis-Master-MTE-AaronThesis-Master-MTE-Aaron
Thesis-Master-MTE-Aaron
 
Machine translation evaluation: a survey
Machine translation evaluation: a surveyMachine translation evaluation: a survey
Machine translation evaluation: a survey
 

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

LEPOR Metric Evaluates Machine Translation with Augmented Factors

  • 1. LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors Aaron L. F. HAN Derek F. WONG Lidia S. CHAO Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory 2CT) (NLP Department of Computer and Information Science, University of Macau Conclusion and further work Summary Methodology The conventional MT evaluation metrics tend to consider few information factors about translation content or employ many external linguistic resources and tools. The former will result in low correlation with human judgments while the later will make the metrics complicated. This paper proposed a novel evaluation metric employing rich factors without relying on any additional resource or tool, experiments showed the state-ofthe-art level compared with BLEU, TER and Meteor by the correlation scores with human judgments, proving it a robust and concise metric. LEPOR was designed to take thorough variables into account and does not need any extra dataset or tool, which were aimed at both improving the practical performance of the automatic metric and the easily operating. LEPOR focuses on combining two modified factor (sentence length penalty, ngram position difference penalty) and two classic methodologies (precision and recall), containing the main formula as below: 𝐿𝐸𝑃𝑂𝑅 = 𝐿𝑃 × 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 × 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃), 𝑟 1− 𝑐 𝑒 where 𝐿𝑃 = 1 𝑒 𝑐 1− 𝑟 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 = 𝑒 𝑖𝑓 𝑐 < 𝑟 𝑖𝑓 𝑐 = 𝑟 , 𝑖𝑓 𝑐 > 𝑟 −𝑁𝑃𝐷 , 𝑁𝑃𝐷 = 1 𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡 𝛼 𝑅 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝛼𝑅, 𝛽𝑃 = (𝛼 + 𝛽)/( + Introduction Since IBM proposed the system of BLEU as the automatic metric for Machine Translation (MT) evaluation, many other methods have been proposed to improve it. NIST added the information weight into evaluation factors. Meteor proposed an alternative way of calculating matched chunks to describe the n-gram matching degree. Wong and Kit introduced position difference in the evaluation metric. Other evaluation metrics include TER and AMBER a modified version of BLEU attaching more kinds of penalty coefficients combining the ngram precision and recall with the arithmetic average of Fmeasure etc. In order to distinguish the reliability of different MT evaluation metrics, people used to apply the Spearman correlation coefficient for evaluation. Some MT evaluation metrics were designed with the part-ofspeech (POS) consideration using the linguistic tools, parser or POS tagger. E.g. Machacek and Bojar proposed SemPOS metric focusing on the English and Czech words. To reduce the human tasks during the evaluation, the methodologies that do not need reference translations are growing up. E.g. MP4IBM1 used IBM1 model to calculate scores based on morphemes, POS (4-grams) and lexicon probabilities. MP4IBM1 needed large parallel bilingual corpus, POS taggers etc. Different words or phrases can express the same meanings, so it is considered commonly in the literature to refer auxiliary synonyms libraries during the evaluation task. Meteor was based on unigram match on the words and their stems also with additional synonyms database. In the previous MT evaluation metrics, there are mainly two problems: either presenting incomprehensive factors (e.g. BLEU focus on precision) or relying on many external tools and databases. The first aspect makes the metrics result in unreasonable judgments. The second weakness makes the MT evaluation metric complicated, time-consuming and not universal for different languages. To address these weaknesses, a novel metric LEPOR was designed employing rich factors without relying on any additional resource. 𝐿𝐸𝑃𝑂𝑅 𝐴 = 1 𝑆𝑒𝑛𝑡𝑁𝑢𝑚 𝑆𝑒𝑛𝑡𝑁𝑢𝑚 𝑖=1 𝛽 ), 𝑃 𝑃= 𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡 |𝑃𝐷 𝑖 | 𝑖=1 𝑐𝑜𝑚𝑚𝑜𝑛_𝑛𝑢𝑚 , 𝑠𝑦𝑠𝑡𝑒𝑚_𝑙𝑒𝑛𝑔𝑡ℎ 𝑅= 𝑐𝑜𝑚𝑚𝑜𝑛_𝑛𝑢𝑚 , 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒_𝑙𝑒𝑛𝑔𝑡ℎ 𝐿𝐸𝑃𝑂𝑅 𝑖 , 𝐿𝐸𝑃𝑂𝑅 𝐵 = 𝐿𝑃 × 𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙𝑡𝑦 × 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃) 𝐿𝑃 means Length penalty, which is defined to embrace the penalty for both longer and shorter system outputs (output length c ) compared with the reference translations (reference one r). 𝑁𝑃𝐷 means n-gram position difference penalty. The 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 value is designed to compare the words order in the sentences between reference translation and hypothesis translation. The 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 value is normalized. 𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡 represents the length of system output sentence and 𝑃𝐷 𝑖 means the n-gram position 𝐷-value (difference value) of aligned words between output and reference sentences. For the 𝑁𝑃𝐷 value, there are two steps: aligning and calculating. To begin with, the Context-dependent n-gram Word Alignment task: we take the context-dependent factor into consideration and assign higher priority on it, which means we take into account the surrounding context (neighbouring words) of the potential word to select a better matching pairs between the output and the reference. If there are both nearby matching or there is no matched context around the potential words pairs, then we consider the nearest matching to align as a backup choice. The alignment direction is from output sentence to the reference translations. Detailed introduction of the algorithm can be seen in our paper. 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃) means the Harmonic mean of 𝛼𝑅 and 𝛽𝑃, where 𝛼 and 𝛽 are two parameters we designed to adjust the weight of 𝑅 (recall) and 𝑃 (precision). Results We trained LEPOR (including parameters α and β) on the ACL WMT 2008 data (EN: English, ES: Spanish, DE: German, FR: French and CZ: Czech). For the context-dependent n-gram word alignment, we adjusted n as 2 on all the corpora meaning that we considered both the proceeding and following two words as the context information. We used the MT evaluation corpora of 2011 ACL WMT for testing. We compared the scoring results by LEPOR against the three “gold standard” metrics BLEU, TER and Meteor (version 1.3). In addition, we selected the latest AMBER (modified version of BLEU) and MP4IBM1 as representatives to examine the quality of LEPOR. The correlation results are shown in table: Correlation Score with Human Judgment Evaluation system other-to-English English-to-other Mean CZ-EN DE-EN ES-EN FR-EN EN-CZ EN-DE EN-ES EN-FR score LEPOR-B 0.93 0.62 0.96 0.89 0.71 0.36 0.88 0.84 0.77 LEPOR-A 0.95 0.61 0.96 0.88 0.68 0.35 0.89 0.83 0.77 AMBER 0.88 0.59 0.86 0.95 0.56 0.53 0.87 0.84 0.76 Meteor-1.3-RANK 0.91 0.71 0.88 0.93 0.65 0.30 0.74 0.85 0.75 BLEU 0.88 0.48 0.90 0.85 0.65 0.44 0.87 0.86 0.74 TER 0.83 0.33 0.89 0.77 0.50 0.12 0.81 0.84 0.64 MP4IBM1 0.91 0.56 0.12 0.08 0.76 0.91 0.71 0.61 Result showed that LEPOR obtained the highest scores among the metrics. BLEU, AMBER and Meteor-1.3 performed unsteady with better correlation on some translation languages and worse on others, resulting in medium level generally. TER and MP4IBM1 got the worst scores by the mean correlation. The result proved that LEPOR is a robust metric by constructing augmented features and also a concise model without using any external tool. MP4IBM1 did not need the reference translations but the correlation was low. Result also released the information that although the test metrics yielded high system-level correlations with human judgments on certain language pairs, they were far from satisfactory by synthetically mean scores on eight corpora (spanning from 0.58 to 0.77 only). Some further works are worth doing: test on synonym thesaurus (translation of word can be re-expressed in different ways, such as multi-words or paraphrases), evaluate the effectiveness on languages of different topologies, employ the Multi-references (another way to replace the synonym is the use of multireferences for evaluation). Selected references Papineni, K., Roukos, S., Ward, T. and Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, USA. Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research(HLT 2002), California, USA. Denkowski, M. and Lavie, A. (2011). Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine translation of the Association for Computational Linguistics, Edinburgh, Scotland, UK. Wong, B. T-M and Kit, C. (2008). Word choice and word position for automatic MT evaluation. In Workshop: MetricsMATR of the Association for Machine Translation in the Americas (AMTA), Waikiki, USA. Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), Boston, USA. 0.58 Acknowledgments: This work is partially supported by the Research Committee of University of Macau and Science and Technology Development Fund of Macau under the grants UL019B/09-Y3/EEE/LYP01/FST and 057/2009/A2. 24th International Conference on Computational Linguistics (COLING 2012). Open source https://github.com/aaronlifenghan/aaron-project-lepor