SlideShare a Scribd company logo
1 of 10
© 2015
TAUS QE Summit San Jose 2015
12:00 / Topic 4:
Benchmarking Machine Translation engines
© 2015
Benchmarking Machine Translation engines
Session leader: JP Barraza (SYSTRAN)
Discussants: Julie Chang (Intel), Karin Berghoefer (Appen),
Tony O’Dowd (KantanMT)
• Towards industry benchmarking of MT engines and a database of MT use
cases
• One of the main problems in the translation industry today is the lack of
benchmarking. The output of MT engines cannot be compared to industry
averages or standards because these are not yet available. Automated scores
are meaningless outside the “laboratory”. At the same time, buyers of translation
services are increasingly interested in translated content of different quality
levels. They want to save on some content and invest more in other. They also
want to know how the different engines are performing on different content in
different language pairs. How can we be sure MT providers deliver what they are
paid for? Benchmarking MT engines and creating a library of MT use cases are
one way to move forward. Using industry benchmarking based on evaluation and
productivity data is another option. One way or another, buyers need to be able
to compare and benchmark MT solutions to make informed decisions.
© 2015
Benchmarking Machine Translation engines
Session Goals:
• Present Panelists’ Proposed Draft Solution
• Solicit feedback, requirements and desired outcomes from TAUS
members (survey)
• Establish a Working Group to move the Project forward
© 2015
Panelists’ Proposed Draft Solution
TAUS MT Benchmarking Site
Use Cases
Benchmarks
Use Cases
Success
Stories
Best
Practices
Learned
Benchmarks
Domain/Industry
Expectations
MT Vendor
Benchmarks
© 2015
MT Evaluation/Benchmarking:
Learning from the Academics…
Academic MT Competition Framework:
• Training dataset for Training/Tuning is selected by the MT Evaluator
• Training dataset typically includes the following 3 sets.
• Training set
• Tuning set
• Dev-testing set
• Participants train models with training set, tuning models with tuning set, and test models using dev-test
set. Depending on the competition, they receive multiple references for tuning and dev-testing sets.
• Gold Test Sets (multiple), are set aside by MT Evaluator and never seen by and competitors
• This is the “testing set”, a Gold test set never seen by the competitors.
• So in sum, we have 3 sets disclosed to participants, and 1 hidden for official scoring and human evaluation.
• The organizer sends out the Gold test set and each participant translates it and submits their best
translations.
• Typically two translations are allowed to be submitted (primary and secondary)
• The scores of each submitted translation are published (primary and secondary)
• Most often used scores (sorted by use frequency):
• BLEU
• NIST
• TER
• METEOR
• WER
• RIBES*
• Human Evaluation has been utilized in WMT, WAT, and a few others.
• In GALE competition, HTER was the official score to measure the minimum edits against MT-PE’d reference**
• Typical HE criteria: Fluency and Adequacy
© 2015
Proposed TAUS
MT Benchmarking WorkFlow…
TAUS
Benchmarking
Data
MT Engines
Automatic
Scores
Human
Evaluation
Published
Results
MT dashboard
© 2015
Proposed TAUS
MT Benchmarking Process…
STEP 1: Test Data
• TAUS selects industry/domain specific corpora as GOLD Test Sets to be used in evaluating MT engines across vendors.
• Possible Domains:
• Colloquial & Dialog
• IT / Technical Support
• Finance/Economics
• Pharmaceuticals & Life Sciences
• eCommerce
STEP 2: MT Vendors
• Vendors submit their best, commercially available MT engines per domain per LP:
• Vendors make MT engines available via API, so the TAUS MT Benchmarking system connects via API
• The Automatic scores (BLEU, TER,…) are stored
STEP 3: TAUS HE
• Human Evaluation of each tested system:
• TAUS will provide access to a subset of the translated Gold Test Sets to a 3 Human Evaluators:
• Preferably, Professional Human Translators and/or Post-Editors.
• TAUS DQF system will be used for this Human Evaluation portion:
• Quality Evaluation using Adequacy and/or Fluency Approaches
• Quality Evaluation using an Error Typology Approach
STEP 4: Publishing Results
• The results of both Automatic Scoring and Human Evaluation would then be published, per domain and per language pair tested, on a TAUS MT
Benchmarking site.
• There should be new GOLD Test Sets for each Benchmarking cycle.
• TAUS MT Benchmarking Frequency:
• Twice a Year (?)
MT dashboard
© 2015
Automatic
Scores
Human
Evaluation
Results
Translation
MT Engine
Translation
Output
Automatic
Scores
MT Vendor
MT Benchmarking
DQF Project
MT
Evaluators
Training
Tuning
Testing
Prepare
Web UI
Dashboard
TAUS
Benchmarking
System
GOLD Test
Set
Translation
MT dashboard
Visual Workflow
© 2015
MT Dashboard MockupMT dashboard
© 2015
TAUS Member SurveyMT benchmark
Please help us establish requirements and desired
outcomes for a TAUS MT dashboard!
Take our Online Survey and Join our Discussion:
https://goo.gl/sIggG5

More Related Content

What's hot

Rahul_cv_6 yrs exp
Rahul_cv_6 yrs expRahul_cv_6 yrs exp
Rahul_cv_6 yrs expRahul Kumar
 
Delivering solutions - focusing on TMS adoption rate and change management, F...
Delivering solutions - focusing on TMS adoption rate and change management, F...Delivering solutions - focusing on TMS adoption rate and change management, F...
Delivering solutions - focusing on TMS adoption rate and change management, F...TAUS - The Language Data Network
 
Case study - Test Automation of a Mobile Application
Case study - Test Automation of a Mobile ApplicationCase study - Test Automation of a Mobile Application
Case study - Test Automation of a Mobile ApplicationOak Systems
 
Integrated Test Management
Integrated Test ManagementIntegrated Test Management
Integrated Test ManagementKovair
 
Istqb advanced level test automation engineering q&a syllabus-training
Istqb advanced level test automation engineering q&a syllabus-trainingIstqb advanced level test automation engineering q&a syllabus-training
Istqb advanced level test automation engineering q&a syllabus-trainingNarayanan Palani
 
ISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&A
ISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&AISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&A
ISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&AHiraQureshi22
 
Sanket Kumar Lade Resume
Sanket Kumar Lade ResumeSanket Kumar Lade Resume
Sanket Kumar Lade ResumeSanket Lade
 
How to Fit Performance Testing into a DevOps Environment
How to Fit Performance Testing into a DevOps EnvironmentHow to Fit Performance Testing into a DevOps Environment
How to Fit Performance Testing into a DevOps EnvironmentNeotys
 
Achieve Performance Testing Excellence for Your SAP Apps
Achieve Performance Testing Excellence for Your SAP AppsAchieve Performance Testing Excellence for Your SAP Apps
Achieve Performance Testing Excellence for Your SAP AppsNeotys
 
Rakuten presentation qa_night
Rakuten presentation qa_nightRakuten presentation qa_night
Rakuten presentation qa_nightYusuke Nakamura
 
Advanced engineering practices to achieve higher agility quotient v1.0
Advanced engineering practices to achieve higher agility quotient v1.0Advanced engineering practices to achieve higher agility quotient v1.0
Advanced engineering practices to achieve higher agility quotient v1.0Musarrath Jabeen
 
Electronic medical records solution
Electronic medical records solutionElectronic medical records solution
Electronic medical records solutionRelevantz
 
Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...
Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...
Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...TEST Huddle
 

What's hot (20)

Rahul_cv_6 yrs exp
Rahul_cv_6 yrs expRahul_cv_6 yrs exp
Rahul_cv_6 yrs exp
 
Delivering solutions - focusing on TMS adoption rate and change management, F...
Delivering solutions - focusing on TMS adoption rate and change management, F...Delivering solutions - focusing on TMS adoption rate and change management, F...
Delivering solutions - focusing on TMS adoption rate and change management, F...
 
Case study - Test Automation of a Mobile Application
Case study - Test Automation of a Mobile ApplicationCase study - Test Automation of a Mobile Application
Case study - Test Automation of a Mobile Application
 
Anuroop_Resume
Anuroop_ResumeAnuroop_Resume
Anuroop_Resume
 
Integrated Test Management
Integrated Test ManagementIntegrated Test Management
Integrated Test Management
 
Updated
UpdatedUpdated
Updated
 
Case study on functional testing
Case study on functional testingCase study on functional testing
Case study on functional testing
 
Jhoanna_Resume_Updated
Jhoanna_Resume_UpdatedJhoanna_Resume_Updated
Jhoanna_Resume_Updated
 
Istqb advanced level test automation engineering q&a syllabus-training
Istqb advanced level test automation engineering q&a syllabus-trainingIstqb advanced level test automation engineering q&a syllabus-training
Istqb advanced level test automation engineering q&a syllabus-training
 
ISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&A
ISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&AISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&A
ISTQB Advanced Test Automation Engineering (CTAL-TAE) Q&A
 
Common Web UI Problems Transforming Manual to Automation
Common Web UI Problems Transforming Manual to Automation Common Web UI Problems Transforming Manual to Automation
Common Web UI Problems Transforming Manual to Automation
 
WHAT ABOUT QA
WHAT ABOUT QAWHAT ABOUT QA
WHAT ABOUT QA
 
Sanket Kumar Lade Resume
Sanket Kumar Lade ResumeSanket Kumar Lade Resume
Sanket Kumar Lade Resume
 
How to Fit Performance Testing into a DevOps Environment
How to Fit Performance Testing into a DevOps EnvironmentHow to Fit Performance Testing into a DevOps Environment
How to Fit Performance Testing into a DevOps Environment
 
Achieve Performance Testing Excellence for Your SAP Apps
Achieve Performance Testing Excellence for Your SAP AppsAchieve Performance Testing Excellence for Your SAP Apps
Achieve Performance Testing Excellence for Your SAP Apps
 
Rakuten presentation qa_night
Rakuten presentation qa_nightRakuten presentation qa_night
Rakuten presentation qa_night
 
Advanced engineering practices to achieve higher agility quotient v1.0
Advanced engineering practices to achieve higher agility quotient v1.0Advanced engineering practices to achieve higher agility quotient v1.0
Advanced engineering practices to achieve higher agility quotient v1.0
 
testvino
testvinotestvino
testvino
 
Electronic medical records solution
Electronic medical records solutionElectronic medical records solution
Electronic medical records solution
 
Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...
Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...
Slow Down to Speed Up - Leveraging Quality to Enable Productivity and Speed w...
 

Viewers also liked

An analysis of brand narrative and transmedia through Harry Potter
An analysis of brand narrative and transmedia through Harry PotterAn analysis of brand narrative and transmedia through Harry Potter
An analysis of brand narrative and transmedia through Harry PotterChristina Pellegrini
 
Característiques
CaracterístiquesCaracterístiques
CaracterístiquesEvaLlobell
 
KPI webinar slides
KPI webinar slidesKPI webinar slides
KPI webinar slidesDent Global
 
Urban Bricks Power Point
Urban Bricks Power PointUrban Bricks Power Point
Urban Bricks Power PointCary Ray
 
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)TAUS - The Language Data Network
 
đồ áN ngành may ngành may kiểm soát và cải tiến chất lượng cho mặt hàng quầ...
đồ áN ngành may ngành may   kiểm soát và cải tiến chất lượng cho mặt hàng quầ...đồ áN ngành may ngành may   kiểm soát và cải tiến chất lượng cho mặt hàng quầ...
đồ áN ngành may ngành may kiểm soát và cải tiến chất lượng cho mặt hàng quầ...TÀI LIỆU NGÀNH MAY
 
TPP: Opportunities and Challenges for Taiwan
TPP: Opportunities and Challenges for TaiwanTPP: Opportunities and Challenges for Taiwan
TPP: Opportunities and Challenges for TaiwanEiger
 
Journey to the centre of assessment
Journey to the centre of assessmentJourney to the centre of assessment
Journey to the centre of assessmentLearningandTeaching
 
Reading Skills in English
Reading Skills in EnglishReading Skills in English
Reading Skills in EnglishNithin Lalachan
 
Structured UX Thinking : Jon Fisher
Structured UX Thinking : Jon FisherStructured UX Thinking : Jon Fisher
Structured UX Thinking : Jon FisherNomensa
 
Technological advantages of tally.erp9
Technological advantages of tally.erp9Technological advantages of tally.erp9
Technological advantages of tally.erp9Accounts Arabia
 

Viewers also liked (17)

Lipsy SS16
Lipsy SS16Lipsy SS16
Lipsy SS16
 
An analysis of brand narrative and transmedia through Harry Potter
An analysis of brand narrative and transmedia through Harry PotterAn analysis of brand narrative and transmedia through Harry Potter
An analysis of brand narrative and transmedia through Harry Potter
 
Característiques
CaracterístiquesCaracterístiques
Característiques
 
norse legends translation
norse legends translationnorse legends translation
norse legends translation
 
KPI webinar slides
KPI webinar slidesKPI webinar slides
KPI webinar slides
 
Animation
Animation Animation
Animation
 
Urban Bricks Power Point
Urban Bricks Power PointUrban Bricks Power Point
Urban Bricks Power Point
 
RAGNAROK spanish
RAGNAROK spanishRAGNAROK spanish
RAGNAROK spanish
 
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
 
đồ áN ngành may ngành may kiểm soát và cải tiến chất lượng cho mặt hàng quầ...
đồ áN ngành may ngành may   kiểm soát và cải tiến chất lượng cho mặt hàng quầ...đồ áN ngành may ngành may   kiểm soát và cải tiến chất lượng cho mặt hàng quầ...
đồ áN ngành may ngành may kiểm soát và cải tiến chất lượng cho mặt hàng quầ...
 
TPP: Opportunities and Challenges for Taiwan
TPP: Opportunities and Challenges for TaiwanTPP: Opportunities and Challenges for Taiwan
TPP: Opportunities and Challenges for Taiwan
 
Journey to the centre of assessment
Journey to the centre of assessmentJourney to the centre of assessment
Journey to the centre of assessment
 
Reading Skills in English
Reading Skills in EnglishReading Skills in English
Reading Skills in English
 
Unexplained infertility
Unexplained infertility Unexplained infertility
Unexplained infertility
 
Structured UX Thinking : Jon Fisher
Structured UX Thinking : Jon FisherStructured UX Thinking : Jon Fisher
Structured UX Thinking : Jon Fisher
 
Pega ppt
Pega pptPega ppt
Pega ppt
 
Technological advantages of tally.erp9
Technological advantages of tally.erp9Technological advantages of tally.erp9
Technological advantages of tally.erp9
 

Similar to Towards industry benchmarking of MT engines and a database of MT use cases (JP Barraza, Systran)

TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS - The Language Data Network
 
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...ABBYY Language Serivces
 
Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Welocalize
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...Welocalize
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargWelocalize
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...SDL
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyIconic Translation Machines
 
How to select the right automated testing tool
How to select the right automated testing toolHow to select the right automated testing tool
How to select the right automated testing toolKatalon Studio
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyIconic Translation Machines
 
The lure of "the one metric that matters"
The lure of "the one metric that matters"The lure of "the one metric that matters"
The lure of "the one metric that matters"Split Software
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverQA or the Highway
 
Introduction to Total Data Driven Test Automation
Introduction to Total Data Driven Test AutomationIntroduction to Total Data Driven Test Automation
Introduction to Total Data Driven Test AutomationVNITO Alliance
 
Standards metadata management - version control and its governance
Standards metadata management - version control and its governanceStandards metadata management - version control and its governance
Standards metadata management - version control and its governanceKevin Lee
 

Similar to Towards industry benchmarking of MT engines and a database of MT use cases (JP Barraza, Systran) (20)

TAUS Evaluating Post-Editor Performance Guidelines
TAUS Evaluating Post-Editor Performance GuidelinesTAUS Evaluating Post-Editor Performance Guidelines
TAUS Evaluating Post-Editor Performance Guidelines
 
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
 
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
 
Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 
Machine Learning For Stock Broking
Machine Learning For Stock BrokingMachine Learning For Stock Broking
Machine Learning For Stock Broking
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
How to select the right automated testing tool
How to select the right automated testing toolHow to select the right automated testing tool
How to select the right automated testing tool
 
HP Use Case - MT Reversed Analysis
HP Use Case - MT Reversed AnalysisHP Use Case - MT Reversed Analysis
HP Use Case - MT Reversed Analysis
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
The lure of "the one metric that matters"
The lure of "the one metric that matters"The lure of "the one metric that matters"
The lure of "the one metric that matters"
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Qtp - Introduction values
Qtp - Introduction valuesQtp - Introduction values
Qtp - Introduction values
 
Telecom testing
Telecom testingTelecom testing
Telecom testing
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
 
Introduction to Total Data Driven Test Automation
Introduction to Total Data Driven Test AutomationIntroduction to Total Data Driven Test Automation
Introduction to Total Data Driven Test Automation
 
Standards metadata management - version control and its governance
Standards metadata management - version control and its governanceStandards metadata management - version control and its governance
Standards metadata management - version control and its governance
 

More from TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 

More from TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Recently uploaded

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 

Recently uploaded (20)

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 

Towards industry benchmarking of MT engines and a database of MT use cases (JP Barraza, Systran)

  • 1. © 2015 TAUS QE Summit San Jose 2015 12:00 / Topic 4: Benchmarking Machine Translation engines
  • 2. © 2015 Benchmarking Machine Translation engines Session leader: JP Barraza (SYSTRAN) Discussants: Julie Chang (Intel), Karin Berghoefer (Appen), Tony O’Dowd (KantanMT) • Towards industry benchmarking of MT engines and a database of MT use cases • One of the main problems in the translation industry today is the lack of benchmarking. The output of MT engines cannot be compared to industry averages or standards because these are not yet available. Automated scores are meaningless outside the “laboratory”. At the same time, buyers of translation services are increasingly interested in translated content of different quality levels. They want to save on some content and invest more in other. They also want to know how the different engines are performing on different content in different language pairs. How can we be sure MT providers deliver what they are paid for? Benchmarking MT engines and creating a library of MT use cases are one way to move forward. Using industry benchmarking based on evaluation and productivity data is another option. One way or another, buyers need to be able to compare and benchmark MT solutions to make informed decisions.
  • 3. © 2015 Benchmarking Machine Translation engines Session Goals: • Present Panelists’ Proposed Draft Solution • Solicit feedback, requirements and desired outcomes from TAUS members (survey) • Establish a Working Group to move the Project forward
  • 4. © 2015 Panelists’ Proposed Draft Solution TAUS MT Benchmarking Site Use Cases Benchmarks Use Cases Success Stories Best Practices Learned Benchmarks Domain/Industry Expectations MT Vendor Benchmarks
  • 5. © 2015 MT Evaluation/Benchmarking: Learning from the Academics… Academic MT Competition Framework: • Training dataset for Training/Tuning is selected by the MT Evaluator • Training dataset typically includes the following 3 sets. • Training set • Tuning set • Dev-testing set • Participants train models with training set, tuning models with tuning set, and test models using dev-test set. Depending on the competition, they receive multiple references for tuning and dev-testing sets. • Gold Test Sets (multiple), are set aside by MT Evaluator and never seen by and competitors • This is the “testing set”, a Gold test set never seen by the competitors. • So in sum, we have 3 sets disclosed to participants, and 1 hidden for official scoring and human evaluation. • The organizer sends out the Gold test set and each participant translates it and submits their best translations. • Typically two translations are allowed to be submitted (primary and secondary) • The scores of each submitted translation are published (primary and secondary) • Most often used scores (sorted by use frequency): • BLEU • NIST • TER • METEOR • WER • RIBES* • Human Evaluation has been utilized in WMT, WAT, and a few others. • In GALE competition, HTER was the official score to measure the minimum edits against MT-PE’d reference** • Typical HE criteria: Fluency and Adequacy
  • 6. © 2015 Proposed TAUS MT Benchmarking WorkFlow… TAUS Benchmarking Data MT Engines Automatic Scores Human Evaluation Published Results MT dashboard
  • 7. © 2015 Proposed TAUS MT Benchmarking Process… STEP 1: Test Data • TAUS selects industry/domain specific corpora as GOLD Test Sets to be used in evaluating MT engines across vendors. • Possible Domains: • Colloquial & Dialog • IT / Technical Support • Finance/Economics • Pharmaceuticals & Life Sciences • eCommerce STEP 2: MT Vendors • Vendors submit their best, commercially available MT engines per domain per LP: • Vendors make MT engines available via API, so the TAUS MT Benchmarking system connects via API • The Automatic scores (BLEU, TER,…) are stored STEP 3: TAUS HE • Human Evaluation of each tested system: • TAUS will provide access to a subset of the translated Gold Test Sets to a 3 Human Evaluators: • Preferably, Professional Human Translators and/or Post-Editors. • TAUS DQF system will be used for this Human Evaluation portion: • Quality Evaluation using Adequacy and/or Fluency Approaches • Quality Evaluation using an Error Typology Approach STEP 4: Publishing Results • The results of both Automatic Scoring and Human Evaluation would then be published, per domain and per language pair tested, on a TAUS MT Benchmarking site. • There should be new GOLD Test Sets for each Benchmarking cycle. • TAUS MT Benchmarking Frequency: • Twice a Year (?) MT dashboard
  • 8. © 2015 Automatic Scores Human Evaluation Results Translation MT Engine Translation Output Automatic Scores MT Vendor MT Benchmarking DQF Project MT Evaluators Training Tuning Testing Prepare Web UI Dashboard TAUS Benchmarking System GOLD Test Set Translation MT dashboard Visual Workflow
  • 9. © 2015 MT Dashboard MockupMT dashboard
  • 10. © 2015 TAUS Member SurveyMT benchmark Please help us establish requirements and desired outcomes for a TAUS MT dashboard! Take our Online Survey and Join our Discussion: https://goo.gl/sIggG5

Editor's Notes

  1. Major competitions like WMT (Europe), OpenMT (US), CWMT (China), and WAT (Japan, recently started since 2014) * (only used in WAT in Japan, but correlates significantly more than BLEU, especially when word order is significantly different) ** (actually measuring how much effort needed to correct the actual MT output)
  2. (commercially interesting domains for MT Buyers; TAUS can survey its users to come up with a short list) (https://evaluate.taus.net/academy/best-practices/evaluate-best-practices/adequacy-fluency-guidelines) (https://evaluate.taus.net/academy/best-practices/evaluate-best-practices/error-typology-guidelines) Step 3: This step requires resource time, and therefore there is a cost associated: Two potential solutions for ensuring quality Human Evaluations: Human Evaluation costs are paid by participating MT Vendors Human Evaluation costs are offset by Paid Access to the results of MT Benchmarking for TAUS Members/Non-Members Submission Period: Much like an Academic competition, I think we should suggest an Open period for submissions, and once that submission period has ended only the submitted engines will be benchmarked and published. Others will have to wait until the next Open submission period to submit engines for benchmarking.