SlideShare a Scribd company logo
1 of 18
Download to read offline
MT Evaluation
Seeing the Wood for the Trees
John Tinsley
CEO and Co-founder
TAUS QE Summit. Dublin. 28th May 2015
We need to marry data that we know from operations with data
we product during MT evaluations to create intelligence
Let’s look at how we can find that out and what it means…
Making the business case for MT
KNOWNS
•  Revenue from translation
•  Costs (internal, outsourced)
•  Variations of this information
across content and
languages
UNKNOWNS
•  MT performance
•  Cost of MT
•  Variations of this information
across content and
languages
Calculating potential ROI
Parameters	
  
Per	
  word	
  rate	
  (LSP)	
   Vendor	
  Rate	
   Produc3vity	
  Gain	
   Project	
  Word	
  Count	
   MT	
  Cost	
  
€0.10	
   €0.08	
   5,000,000	
  
MT	
  Weighted	
  Word	
  Count	
  
No	
  Machine	
  Transla3on	
   With	
  Machine	
  Transla3on	
  
LSP	
  Revenue	
   €500,000	
   LSP	
  Revenue	
   €500,000	
  
Vendor	
  Cost	
   €400,000	
   Vendor	
  Cost	
  
MT	
  Cost	
   0	
   MT	
  Cost	
  
Gross	
  Profit	
   €100,000	
   Gross	
  Profit	
  
Gross	
  Profit	
  Margin	
   20.0%	
   Gross	
  Profit	
  Margin	
  
Gross	
  Profit	
  
Increase	
  when	
  using	
  
MT	
   ???%	
  
**These numbers are for illustrative purposes only and not related to the case study
Problem
Large Chinese to English patent translation project. Challenging
content and language
Question
What if any efficiencies can machine translation add to the workflow of
RWS translators?
How we applied different types of MT evaluation and different stages
in the process, at various go/no stages, to help RWS to assess whether
MT is viable for this project
Client Case Study – RWS
- UK headquartered public company
- Founded 1958
- 9th largest LSP (CSA 2013 report)
- Leader in specialist IP translations
Lots of different ways to do evaluation**
–  automatic scores
•  BLEU, METEOR, GTM, TER
–  fluency, adequacy, comparative ranking
–  task-based evaluation
•  error analysis, post-edit productivity
Different metrics, different intelligence
–  what does each type of metric tell us?
–  which ones are usable at which stage of evaluation?
e.g. can we really use automatic scores to assess productivity?
e.g. does productivity delta really tell us how good the output is?
MT Evaluation – where do we start!?
Can we improve our baseline engines through customisation?
Step 1: Baseline and Customisation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
BLEU TER
Iconic Baseline
Iconic Customised
What next?
How good is the output relative to the task, i.e. post-editing?
- fluency/adequacy not going to tell us
- let’s start with segment level TER
-  Huge improvement
-  Intuitively, scores
reflect well but don’t
really say anything
-  Let’s dig deeper
Translation Edit Rate: correlates well with practical evaluations
If we look deeper, what can we learn?
INTELLIGENCE
• Proportion of full matches (i.e. big savings)
• Proportion of close matches (i.e. faster that fuzzy matches)
• Proportion of poor matches
ACTIONABLE INFORMATION
• Type of sentence with high/low matches
• Weaknesses and gaps
• Segments to compare and analyse in translation memory
TERscore
Step 2: Segment-level automatic analysis

Distribution of segment-level TER scores
This represents a 24% potential
productivity gain**
segment length
With MT experience and previous MT integration, productivity
testing can be run in the production environment. In this case we
used, the Dynamic Quality Framework
Beware the variables**!
•  Translators: different experience, speed, perceptions of MT
–  24 translators: senior, staff, and interns
•  Test sets: not representative; particularly difficult
–  2 tests sets, comprising 5 documents, and cross-fold validation
•  Environment and task: inexperience and unfamiliarity
–  Training materials, videos, and “dummy” segments
Step 3: Productivity testing
Overall average
Findings and Learnings 
25% productivity gain
Experienced: 22%
Staff: 23%
Interns: 30%
Test set 1.1: 25%
Test set 1.2: 35%
Test set 2.1: 06%
Test set 2.2: 35%
Correlates with TER
Rollout with junior staff
for more immediate
impact on bottom line?
Don’t be over concerned
by outliers.
Use data to facilitate
source content profiling?
What it tells us
By Translator Profile
By Test Set
Look our for anomalies**
–  segments with long timings (above average ratio words/minute)
–  sentences that don’t change much from MT to post-edit*
–  segments with unusually short timings
In this case, the next step is production roll-out to validate these
in the actual translator workflow over an extended period.
Warnings, Tips, and Next Steps

Now would be the right time to do fluency/adequacy if you need to
verify that post-editing is producing, at least, similar quality output
Calculating the ROI - revisited
Parameters	
  
Per	
  word	
  rate	
  (LSP)	
   Vendor	
  Rate	
   Produc3vity	
  Gain	
   Project	
  Word	
  Count	
   MT	
  Cost	
  
€0.10	
   €0.08	
   5,000,000	
  
MT	
  Weighted	
  Word	
  Count	
  
No	
  Machine	
  Transla3on	
   With	
  Machine	
  Transla3on	
  
LSP	
  Revenue	
   €500,000	
   LSP	
  Revenue	
   €500,000	
  
Vendor	
  Cost	
   €400,000	
   Vendor	
  Cost	
  
MT	
  Cost	
   0	
   MT	
  Cost	
  
Gross	
  Profit	
   €100,000	
   Gross	
  Profit	
  
Gross	
  Profit	
  Margin	
   20.0%	
   Gross	
  Profit	
  Margin	
  
Gross	
  Profit	
  
Increase	
  when	
  using	
  
MT	
   ???%	
  
**These numbers are for illustrative purposes only and not related to the case study
Calculating the ROI – plugging in the numbers
Parameters	
  
Per	
  word	
  rate	
  (LSP)	
   Vendor	
  Rate	
   Produc3vity	
  Gain	
   Project	
  Word	
  Count	
   MT	
  Cost	
  
€0.10	
   €0.08	
   25%	
   5,000,000	
   €0.008	
  
MT	
  Weighted	
  Word	
  Count	
  
3,750,000	
  
No	
  Machine	
  Transla3on	
   With	
  Machine	
  Transla3on	
  
LSP	
  Revenue	
   €500,000	
   LSP	
  Revenue	
   €500,000	
  
Vendor	
  Cost	
   €400,000	
   Vendor	
  Cost	
   €300,000	
  
MT	
  Cost	
   0	
   MT	
  Cost	
   €40,000	
  
Gross	
  Profit	
   €100,000	
   Gross	
  Profit	
   €160,000	
  
Gross	
  Profit	
  Margin	
   20.0%	
   Gross	
  Profit	
  Margin	
   32%	
  
Gross	
  Profit	
  
Increase	
  when	
  using	
  
MT	
   60%	
  
**These numbers are for illustrative purposes only and not related to the case study
Identify the gaps in your data
3 take home messages

Understand the process to collect
the right information
Continuous assessment
Thank You!
john@iptranslator.com
@IconicTrans
Iconic Translation Machines
•  Machine Translation with Subject Matter Expertise
•  Headquartered here in Dublin
•  Strong tradition of MT research and development
underpinning the company and its technologies
This presentation
•  MT evaluation: what, how, when, why?
–  What ways can we evaluate MT?
–  How do we carry out the evaluation?
–  When in the process do we carry out certain types of evaluation?
–  Why do we do certain evaluations and what do they tell us?
By way of introduction…
Step 2: Segment-level automatic analysis

Productivity
threshold
Plot of TER scores by length
Step 2: Segment-level automatic analysis

Distribution of segment-level TER scores

More Related Content

Viewers also liked

Sponsor Presentation: Selligent
Sponsor Presentation: Selligent Sponsor Presentation: Selligent
Sponsor Presentation: Selligent MediaPost
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Iconic Translation Machines
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of PatentsIconic Translation Machines
 
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Iconic Translation Machines
 
From the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchFrom the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchIconic Translation Machines
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationIconic Translation Machines
 

Viewers also liked (8)

Sponsor Presentation: Selligent
Sponsor Presentation: Selligent Sponsor Presentation: Selligent
Sponsor Presentation: Selligent
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents
 
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
 
From the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchFrom the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT Research
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine Translation
 
13. korelasi
13. korelasi13. korelasi
13. korelasi
 

Similar to MT Evaluation: Seeing the Wood for the Trees

Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyIconic Translation Machines
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyIconic Translation Machines
 
Digital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in DigitizationDigital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in DigitizationBizagi
 
NTEN Nonprofit Technology Leadership Series
NTEN Nonprofit Technology Leadership SeriesNTEN Nonprofit Technology Leadership Series
NTEN Nonprofit Technology Leadership SeriesBeth Kanter
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS - The Language Data Network
 
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013TAUS - The Language Data Network
 
Measuring Success in the Lean IT World
Measuring Success in the Lean IT WorldMeasuring Success in the Lean IT World
Measuring Success in the Lean IT WorldLean IT Association
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...TAUS - The Language Data Network
 
LookupPoint for Partners - Peter Reynolds
LookupPoint for Partners -  Peter ReynoldsLookupPoint for Partners -  Peter Reynolds
LookupPoint for Partners - Peter Reynoldspeterjreynolds
 
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)kantanmt
 
The lure of "the one metric that matters"
The lure of "the one metric that matters"The lure of "the one metric that matters"
The lure of "the one metric that matters"Split Software
 
Engineering mindset fort corporate management
Engineering mindset fort corporate managementEngineering mindset fort corporate management
Engineering mindset fort corporate managementXBOSoft
 
May 11th Slides: NTEN Leadership
May 11th Slides:  NTEN LeadershipMay 11th Slides:  NTEN Leadership
May 11th Slides: NTEN LeadershipBeth Kanter
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - MeasurefestGuido X Jansen
 
CompTIA P&L Management with Frank Coker
CompTIA P&L Management with Frank CokerCompTIA P&L Management with Frank Coker
CompTIA P&L Management with Frank CokerKris Fuehr
 
Unleashing the Enormous Power of Service Desk KPIs
Unleashing the Enormous Power of Service Desk KPIsUnleashing the Enormous Power of Service Desk KPIs
Unleashing the Enormous Power of Service Desk KPIsMetricNet
 
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...Perficient, Inc.
 
Step-Change Productivity - Analyst & Journalist Briefing 2014
Step-Change Productivity - Analyst & Journalist Briefing 2014Step-Change Productivity - Analyst & Journalist Briefing 2014
Step-Change Productivity - Analyst & Journalist Briefing 2014Tele2
 

Similar to MT Evaluation: Seeing the Wood for the Trees (20)

Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Digital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in DigitizationDigital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in Digitization
 
NTEN Nonprofit Technology Leadership Series
NTEN Nonprofit Technology Leadership SeriesNTEN Nonprofit Technology Leadership Series
NTEN Nonprofit Technology Leadership Series
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
ITIL Introduction
ITIL IntroductionITIL Introduction
ITIL Introduction
 
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
 
Measuring Success in the Lean IT World
Measuring Success in the Lean IT WorldMeasuring Success in the Lean IT World
Measuring Success in the Lean IT World
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
 
LookupPoint for Partners - Peter Reynolds
LookupPoint for Partners -  Peter ReynoldsLookupPoint for Partners -  Peter Reynolds
LookupPoint for Partners - Peter Reynolds
 
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
 
MT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, LingosailMT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, Lingosail
 
The lure of "the one metric that matters"
The lure of "the one metric that matters"The lure of "the one metric that matters"
The lure of "the one metric that matters"
 
Engineering mindset fort corporate management
Engineering mindset fort corporate managementEngineering mindset fort corporate management
Engineering mindset fort corporate management
 
May 11th Slides: NTEN Leadership
May 11th Slides:  NTEN LeadershipMay 11th Slides:  NTEN Leadership
May 11th Slides: NTEN Leadership
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
 
CompTIA P&L Management with Frank Coker
CompTIA P&L Management with Frank CokerCompTIA P&L Management with Frank Coker
CompTIA P&L Management with Frank Coker
 
Unleashing the Enormous Power of Service Desk KPIs
Unleashing the Enormous Power of Service Desk KPIsUnleashing the Enormous Power of Service Desk KPIs
Unleashing the Enormous Power of Service Desk KPIs
 
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
 
Step-Change Productivity - Analyst & Journalist Briefing 2014
Step-Change Productivity - Analyst & Journalist Briefing 2014Step-Change Productivity - Analyst & Journalist Briefing 2014
Step-Change Productivity - Analyst & Journalist Briefing 2014
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

MT Evaluation: Seeing the Wood for the Trees

  • 1. MT Evaluation Seeing the Wood for the Trees John Tinsley CEO and Co-founder TAUS QE Summit. Dublin. 28th May 2015
  • 2. We need to marry data that we know from operations with data we product during MT evaluations to create intelligence Let’s look at how we can find that out and what it means… Making the business case for MT KNOWNS •  Revenue from translation •  Costs (internal, outsourced) •  Variations of this information across content and languages UNKNOWNS •  MT performance •  Cost of MT •  Variations of this information across content and languages
  • 3. Calculating potential ROI Parameters   Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost   €0.10   €0.08   5,000,000   MT  Weighted  Word  Count   No  Machine  Transla3on   With  Machine  Transla3on   LSP  Revenue   €500,000   LSP  Revenue   €500,000   Vendor  Cost   €400,000   Vendor  Cost   MT  Cost   0   MT  Cost   Gross  Profit   €100,000   Gross  Profit   Gross  Profit  Margin   20.0%   Gross  Profit  Margin   Gross  Profit   Increase  when  using   MT   ???%   **These numbers are for illustrative purposes only and not related to the case study
  • 4. Problem Large Chinese to English patent translation project. Challenging content and language Question What if any efficiencies can machine translation add to the workflow of RWS translators? How we applied different types of MT evaluation and different stages in the process, at various go/no stages, to help RWS to assess whether MT is viable for this project Client Case Study – RWS - UK headquartered public company - Founded 1958 - 9th largest LSP (CSA 2013 report) - Leader in specialist IP translations
  • 5. Lots of different ways to do evaluation** –  automatic scores •  BLEU, METEOR, GTM, TER –  fluency, adequacy, comparative ranking –  task-based evaluation •  error analysis, post-edit productivity Different metrics, different intelligence –  what does each type of metric tell us? –  which ones are usable at which stage of evaluation? e.g. can we really use automatic scores to assess productivity? e.g. does productivity delta really tell us how good the output is? MT Evaluation – where do we start!?
  • 6. Can we improve our baseline engines through customisation? Step 1: Baseline and Customisation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 BLEU TER Iconic Baseline Iconic Customised What next? How good is the output relative to the task, i.e. post-editing? - fluency/adequacy not going to tell us - let’s start with segment level TER -  Huge improvement -  Intuitively, scores reflect well but don’t really say anything -  Let’s dig deeper
  • 7. Translation Edit Rate: correlates well with practical evaluations If we look deeper, what can we learn? INTELLIGENCE • Proportion of full matches (i.e. big savings) • Proportion of close matches (i.e. faster that fuzzy matches) • Proportion of poor matches ACTIONABLE INFORMATION • Type of sentence with high/low matches • Weaknesses and gaps • Segments to compare and analyse in translation memory
  • 8. TERscore Step 2: Segment-level automatic analysis Distribution of segment-level TER scores This represents a 24% potential productivity gain** segment length
  • 9. With MT experience and previous MT integration, productivity testing can be run in the production environment. In this case we used, the Dynamic Quality Framework Beware the variables**! •  Translators: different experience, speed, perceptions of MT –  24 translators: senior, staff, and interns •  Test sets: not representative; particularly difficult –  2 tests sets, comprising 5 documents, and cross-fold validation •  Environment and task: inexperience and unfamiliarity –  Training materials, videos, and “dummy” segments Step 3: Productivity testing
  • 10. Overall average Findings and Learnings 25% productivity gain Experienced: 22% Staff: 23% Interns: 30% Test set 1.1: 25% Test set 1.2: 35% Test set 2.1: 06% Test set 2.2: 35% Correlates with TER Rollout with junior staff for more immediate impact on bottom line? Don’t be over concerned by outliers. Use data to facilitate source content profiling? What it tells us By Translator Profile By Test Set
  • 11. Look our for anomalies** –  segments with long timings (above average ratio words/minute) –  sentences that don’t change much from MT to post-edit* –  segments with unusually short timings In this case, the next step is production roll-out to validate these in the actual translator workflow over an extended period. Warnings, Tips, and Next Steps Now would be the right time to do fluency/adequacy if you need to verify that post-editing is producing, at least, similar quality output
  • 12. Calculating the ROI - revisited Parameters   Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost   €0.10   €0.08   5,000,000   MT  Weighted  Word  Count   No  Machine  Transla3on   With  Machine  Transla3on   LSP  Revenue   €500,000   LSP  Revenue   €500,000   Vendor  Cost   €400,000   Vendor  Cost   MT  Cost   0   MT  Cost   Gross  Profit   €100,000   Gross  Profit   Gross  Profit  Margin   20.0%   Gross  Profit  Margin   Gross  Profit   Increase  when  using   MT   ???%   **These numbers are for illustrative purposes only and not related to the case study
  • 13. Calculating the ROI – plugging in the numbers Parameters   Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost   €0.10   €0.08   25%   5,000,000   €0.008   MT  Weighted  Word  Count   3,750,000   No  Machine  Transla3on   With  Machine  Transla3on   LSP  Revenue   €500,000   LSP  Revenue   €500,000   Vendor  Cost   €400,000   Vendor  Cost   €300,000   MT  Cost   0   MT  Cost   €40,000   Gross  Profit   €100,000   Gross  Profit   €160,000   Gross  Profit  Margin   20.0%   Gross  Profit  Margin   32%   Gross  Profit   Increase  when  using   MT   60%   **These numbers are for illustrative purposes only and not related to the case study
  • 14. Identify the gaps in your data 3 take home messages Understand the process to collect the right information Continuous assessment
  • 16. Iconic Translation Machines •  Machine Translation with Subject Matter Expertise •  Headquartered here in Dublin •  Strong tradition of MT research and development underpinning the company and its technologies This presentation •  MT evaluation: what, how, when, why? –  What ways can we evaluate MT? –  How do we carry out the evaluation? –  When in the process do we carry out certain types of evaluation? –  Why do we do certain evaluations and what do they tell us? By way of introduction…
  • 17. Step 2: Segment-level automatic analysis Productivity threshold Plot of TER scores by length
  • 18. Step 2: Segment-level automatic analysis Distribution of segment-level TER scores

Editor's Notes

  1. One of the biggest challenges as an MT provider is helping the LSP client make the business case for MT. In order to do this, we need to look as what data we HAVE and COMBINE that with data we collect through MT evaluations, to create the business intelligence around making the decision - TM leverage, translator speeds also possibly
  2. I won’t dwell on this but I’ll refer to it. It helps visualise what information we may HAVE and what information we NEED in order to complete the picture
  3. I’ll talk about how we collected this information through MT evaluations via a case study with RWS. What I’ll focus on his WHAT MT evaluation we carried out and what STAGES to give us the information we needed to know
  4. This is the part where we’re looking into the forest and trying to pick out the right approach. Different metrics tell us different things, but, perhaps more appropriately is what the metrics don’t tell us There are lots of them out there, you need to know which ones to use and when. We’ve obviously got a lot of experience in this area given our background, I won’t focus on this now but maybe we can leave a more detailed discussion for the breakout sessions
  5. First step is can we improve our engines through customisation. These automatic scores tell us CONCLUSIVELY. Yes. But the don’t really tell us anything about QUALITY, or SUITABILITY for the TASK We need to dig deeper on a segment level and for this, we use TER. WHY?
  6. TER has correlated well with practical evaluations for us. It gives us practical information which we can correlate with the bottom line It also gives us practicable (actionable) information which we can use to improve MT and do further analysis **If you do this over a variety of test documents like we did with RWS, where we used 10, you’ll get a sense of what the MT can bring**
  7. For example, here we see FOR EACH SEGMENT, the TER range and how long the segments are within those ranges. This allows us to do some calculations, which I won’t detail now, can discuss in the breakout session, but it resulted in a 24% gain
  8. Experience is crucial here. Lot’s of variables and things to look out for, like TRANSLATORS, TEST SETS, and the ENVIRONMENT as I’m sure people here can attest to. I won’t go into detail but here’s a high level look at what we did to try to find out different information. Again, I can go into details in the breakout session.
  9. In terms of analysing information, there are a number of things to look out for to make sure we’re getting more accurate results. Save to say now would be the right time to look at quality evaluation and make sure post-editing is not affecting things
  10. To revisit this (just pointing out these aren’t real numbers… ;-) If we plug it in…
  11. You need to know what information to collect before you can set up the evaluation You need to understand the different evaluation processes, or work with someone who does, to make sure you collect that information This is just the start. Engines will improve. Productivity assessment is ongoing too to ensure, at least, same gain and improvements