SlideShare a Scribd company logo
An Awkward Disparity between BLEU / RIBES Scores
and Human Judgements in Machine Translation
Liling Tan, Jon Dehdari1
and Josef van Genabith1
Universität des Saarlandes
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)1
liling.tan@uni-saarland.de, {first.last_name}@dfki.de
Introduction
•MT metrics criticized for various reasons
(Babych and Hartley, 2004; Smith et al. 2014; Graham et al. 2015)
•Low BLEU != Bad MT (Callison-Burch et al. 2006)
•Higher BLEU -> Better MT (c.f. WMT, WAT, IWSLT, OpenMT)
BLEU: (Papineni et al. 2002)
•Precision based
•Weak recall penalty
•Disregards order
•Crude ngram weights
•Over-sensitive to
minor difference
Acknowledgements: The research leading to these results has received funding from the People Pro-
gramme(MarieCurieActions)oftheEuropeanUnion'sSeventhFrameworkProgrammeFP7/2007-2013/un-
der REA grant agreement n ◦ 317471.
RIBES (Isozaki et al. 2014)
•Kendall Tau prior on unigram
•Overcomes reordering
•Adequacy not measured
•Correlates with BLEU (naturally)
System Setup + Results
•Organizers:
RIBES = 94.13 ; BLEU = 69.22 ; HUMAN = 0.0
•Ours :
RIBES = 95.15 ; BLEU = 85.23 ; HUMAN = -17.75 !!!
Note: This is our Unicode2String submission for KO->JA patent subtask in WAT 2015; the other
results of other subtasks are presented in Tan and Bond (2014) and Tan et al. (2015)
Meta-Evaluation
Bubble Graph of Diff. Hypothesis - Baseline BLEU against RIBES for Positive HUMAN Scores
Bubble Graph of Diff. Hypothesis - Baseline BLEU against RIBES for Negative HUMAN Scores
Conclusion
• HigherBLEU/RIBEScorrelateswith+ve
HUMAN, not -ve HUMAN
• Minor lexical diff. cause huge diff. in
BLEU ; RIBES mostly measures fluency
• Minor metric score diff. not reflecting
major translation inadequacy
• Higher BLEU != Better MT
References
• Bogdan Babych and Anthony Hartley. 2004. Ex- tending the BLEU MT evaluation method with fre-
quency weightings. In ACL.
• OndËĞrej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp,
PhilippKoehn,VarvaraLogacheva,ChristofMonz,MatteoNegri,MattPost,CarolinaScarton,LuciaSpe-
cia, and Marco Turchi. 2015. Findings of the 2015 workshop on statistical machine translation. In WMT.
• Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluation the role of Bleu in ma-
chine translation research. In EACL.
• Mauro Cettolo, Jan Niehues, Sebastian StÃijker, Luisa Bentivogli, and Marcello Federico. 2014. Report
on the 11th IWSLT evaluation campaign, IWSLT 2014. In IWSLT.
• Yvette Graham, Timothy Baldwin, and Nitika Mathur. 2015. Accurate evaluation of segment-level ma-
chine translation metrics. In ACL.
• Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic
evaluation of translation quality for distant language pairs. In EMNLP.
• Toshiaki Nakazawa, Hideya Mino, Isao Goto, Graham Neubig, Sadao Kurohashi, and Eiichiro Sumita.
2015. Overview of the 2nd workshop on Asian translation. In WAT.
• Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic
evaluation of machine translation. In ACL.
• Liling Tan and Francis Bond. 2014. Manipulating in- put data in machine translation. In WAT.
• LilingTan,JosefvanGenabith,andFrancisBond. 2015. Passiveandpervasiveuseofbilingualdictio-nary
in statistical machine translation. In HyTra.

More Related Content

Similar to Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation

Professor Peter Fonagy - CYP IAPT National Clinical Lead
Professor Peter Fonagy - CYP IAPT National Clinical LeadProfessor Peter Fonagy - CYP IAPT National Clinical Lead
Professor Peter Fonagy - CYP IAPT National Clinical Lead
CYP MH
 
Innovations conference 2014 dr tracy robinson using a multidisciplinary pro...
Innovations conference 2014   dr tracy robinson using a multidisciplinary pro...Innovations conference 2014   dr tracy robinson using a multidisciplinary pro...
Innovations conference 2014 dr tracy robinson using a multidisciplinary pro...
Cancer Institute NSW
 
Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...
Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...
Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...
Association for Computational Linguistics
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Philip Bourne
 
Oral defense b. henry
Oral defense   b. henryOral defense   b. henry
Oral defense b. henry
Dr. Bernard C. Henry
 
Running Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docx
Running Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docxRunning Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docx
Running Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docx
toltonkendal
 
Developing a Critical Interventions Framework
Developing a Critical Interventions FrameworkDeveloping a Critical Interventions Framework
Developing a Critical Interventions Framework
Australian Centre for Student Equity and Success
 
Evaluating the cost-effectiveness of development investments
Evaluating the cost-effectiveness of development investmentsEvaluating the cost-effectiveness of development investments
Evaluating the cost-effectiveness of development investments
CCAFS | CGIAR Research Program on Climate Change, Agriculture and Food Security
 
HIT Use in Primary Care Practices: Understanding and Eliminating Disparity
HIT Use in Primary Care Practices: Understanding and Eliminating Disparity HIT Use in Primary Care Practices: Understanding and Eliminating Disparity
HIT Use in Primary Care Practices: Understanding and Eliminating Disparity
UCLA CTSI
 
Publishing in Public Health
Publishing in Public HealthPublishing in Public Health
Publishing in Public Health
Houkje
 
Rapid Qaulitative Inquiry, extended presentation
Rapid Qaulitative Inquiry, extended presentationRapid Qaulitative Inquiry, extended presentation
Rapid Qaulitative Inquiry, extended presentation
James Beebe
 
Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...
Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...
Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...
MEASURE Evaluation
 
PhD Confirmation talk
PhD Confirmation talkPhD Confirmation talk
PhD Confirmation talk
University of Melbourne, Australia
 
Reducing sitting time at work: What's the evidence?
Reducing sitting time at work: What's the evidence?Reducing sitting time at work: What's the evidence?
Reducing sitting time at work: What's the evidence?
Health Evidence™
 
Fostering research for policy and practitioners lessons and opportunities
Fostering research for policy and practitioners lessons and opportunitiesFostering research for policy and practitioners lessons and opportunities
Fostering research for policy and practitioners lessons and opportunities
UNICEF Office of Research - Innocenti
 
Healthy Research Partnerships Promote Healthy Communities
Healthy Research Partnerships Promote Healthy CommunitiesHealthy Research Partnerships Promote Healthy Communities
Healthy Research Partnerships Promote Healthy Communities
Mary Nacionales
 
Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...
Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...
Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...
Health Evidence™
 
Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)
Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)
Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)
Keri Ramirez
 
Cadth 2015 d7 burgess cadth 2015 20150414
Cadth 2015 d7 burgess cadth 2015 20150414Cadth 2015 d7 burgess cadth 2015 20150414
Cadth 2015 d7 burgess cadth 2015 20150414
CADTH Symposium
 
Quality in hospital
Quality in hospitalQuality in hospital
Quality in hospital
Mmedsc Hahm
 

Similar to Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation (20)

Professor Peter Fonagy - CYP IAPT National Clinical Lead
Professor Peter Fonagy - CYP IAPT National Clinical LeadProfessor Peter Fonagy - CYP IAPT National Clinical Lead
Professor Peter Fonagy - CYP IAPT National Clinical Lead
 
Innovations conference 2014 dr tracy robinson using a multidisciplinary pro...
Innovations conference 2014   dr tracy robinson using a multidisciplinary pro...Innovations conference 2014   dr tracy robinson using a multidisciplinary pro...
Innovations conference 2014 dr tracy robinson using a multidisciplinary pro...
 
Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...
Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...
Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Huma...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Oral defense b. henry
Oral defense   b. henryOral defense   b. henry
Oral defense b. henry
 
Running Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docx
Running Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docxRunning Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docx
Running Head REPLY TO OPINION 5.1 FOR KIMBRILEE SCHMITZ 1REPLY.docx
 
Developing a Critical Interventions Framework
Developing a Critical Interventions FrameworkDeveloping a Critical Interventions Framework
Developing a Critical Interventions Framework
 
Evaluating the cost-effectiveness of development investments
Evaluating the cost-effectiveness of development investmentsEvaluating the cost-effectiveness of development investments
Evaluating the cost-effectiveness of development investments
 
HIT Use in Primary Care Practices: Understanding and Eliminating Disparity
HIT Use in Primary Care Practices: Understanding and Eliminating Disparity HIT Use in Primary Care Practices: Understanding and Eliminating Disparity
HIT Use in Primary Care Practices: Understanding and Eliminating Disparity
 
Publishing in Public Health
Publishing in Public HealthPublishing in Public Health
Publishing in Public Health
 
Rapid Qaulitative Inquiry, extended presentation
Rapid Qaulitative Inquiry, extended presentationRapid Qaulitative Inquiry, extended presentation
Rapid Qaulitative Inquiry, extended presentation
 
Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...
Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...
Evaluation of Gender Aware Health Interventions in South Asia: What do we kn...
 
PhD Confirmation talk
PhD Confirmation talkPhD Confirmation talk
PhD Confirmation talk
 
Reducing sitting time at work: What's the evidence?
Reducing sitting time at work: What's the evidence?Reducing sitting time at work: What's the evidence?
Reducing sitting time at work: What's the evidence?
 
Fostering research for policy and practitioners lessons and opportunities
Fostering research for policy and practitioners lessons and opportunitiesFostering research for policy and practitioners lessons and opportunities
Fostering research for policy and practitioners lessons and opportunities
 
Healthy Research Partnerships Promote Healthy Communities
Healthy Research Partnerships Promote Healthy CommunitiesHealthy Research Partnerships Promote Healthy Communities
Healthy Research Partnerships Promote Healthy Communities
 
Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...
Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...
Evidence-Informed Public Health Decisions Made Easier: Take it one Step at a ...
 
Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)
Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)
Using Comparative Data to Enhance Learning Abroad Strategies (NAFSA 2018)
 
Cadth 2015 d7 burgess cadth 2015 20150414
Cadth 2015 d7 burgess cadth 2015 20150414Cadth 2015 d7 burgess cadth 2015 20150414
Cadth 2015 d7 burgess cadth 2015 20150414
 
Quality in hospital
Quality in hospitalQuality in hospital
Quality in hospital
 

More from Association for Computational Linguistics

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Association for Computational Linguistics
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Association for Computational Linguistics
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Association for Computational Linguistics
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Association for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Association for Computational Linguistics
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Association for Computational Linguistics
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Association for Computational Linguistics
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Association for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
S. Raj Kumar
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 

Liling Tan - 2015 - An Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation

  • 1. An Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation Liling Tan, Jon Dehdari1 and Josef van Genabith1 Universität des Saarlandes Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)1 liling.tan@uni-saarland.de, {first.last_name}@dfki.de Introduction •MT metrics criticized for various reasons (Babych and Hartley, 2004; Smith et al. 2014; Graham et al. 2015) •Low BLEU != Bad MT (Callison-Burch et al. 2006) •Higher BLEU -> Better MT (c.f. WMT, WAT, IWSLT, OpenMT) BLEU: (Papineni et al. 2002) •Precision based •Weak recall penalty •Disregards order •Crude ngram weights •Over-sensitive to minor difference Acknowledgements: The research leading to these results has received funding from the People Pro- gramme(MarieCurieActions)oftheEuropeanUnion'sSeventhFrameworkProgrammeFP7/2007-2013/un- der REA grant agreement n ◦ 317471. RIBES (Isozaki et al. 2014) •Kendall Tau prior on unigram •Overcomes reordering •Adequacy not measured •Correlates with BLEU (naturally) System Setup + Results •Organizers: RIBES = 94.13 ; BLEU = 69.22 ; HUMAN = 0.0 •Ours : RIBES = 95.15 ; BLEU = 85.23 ; HUMAN = -17.75 !!! Note: This is our Unicode2String submission for KO->JA patent subtask in WAT 2015; the other results of other subtasks are presented in Tan and Bond (2014) and Tan et al. (2015) Meta-Evaluation Bubble Graph of Diff. Hypothesis - Baseline BLEU against RIBES for Positive HUMAN Scores Bubble Graph of Diff. Hypothesis - Baseline BLEU against RIBES for Negative HUMAN Scores Conclusion • HigherBLEU/RIBEScorrelateswith+ve HUMAN, not -ve HUMAN • Minor lexical diff. cause huge diff. in BLEU ; RIBES mostly measures fluency • Minor metric score diff. not reflecting major translation inadequacy • Higher BLEU != Better MT References • Bogdan Babych and Anthony Hartley. 2004. Ex- tending the BLEU MT evaluation method with fre- quency weightings. In ACL. • OndËĞrej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, PhilippKoehn,VarvaraLogacheva,ChristofMonz,MatteoNegri,MattPost,CarolinaScarton,LuciaSpe- cia, and Marco Turchi. 2015. Findings of the 2015 workshop on statistical machine translation. In WMT. • Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluation the role of Bleu in ma- chine translation research. In EACL. • Mauro Cettolo, Jan Niehues, Sebastian StÃijker, Luisa Bentivogli, and Marcello Federico. 2014. Report on the 11th IWSLT evaluation campaign, IWSLT 2014. In IWSLT. • Yvette Graham, Timothy Baldwin, and Nitika Mathur. 2015. Accurate evaluation of segment-level ma- chine translation metrics. In ACL. • Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In EMNLP. • Toshiaki Nakazawa, Hideya Mino, Isao Goto, Graham Neubig, Sadao Kurohashi, and Eiichiro Sumita. 2015. Overview of the 2nd workshop on Asian translation. In WAT. • Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL. • Liling Tan and Francis Bond. 2014. Manipulating in- put data in machine translation. In WAT. • LilingTan,JosefvanGenabith,andFrancisBond. 2015. Passiveandpervasiveuseofbilingualdictio-nary in statistical machine translation. In HyTra.