Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS

2,922 views

Published on

Delivered at the 29th LocWorld conference.
October 16th 2015
Santa Clara, CA, USA.

In this talk, we describe how we carried out a successful large scale evaluation and deployment of machine translation at RWS.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS

  1. 1. Seeing the Wood for the Trees in MT Evaluation an LSP success story from RWS John Tinsley CEO and Co-founder LocWorld. Silicon Valley. 16th October 2015
  2. 2. This is a piece of string
  3. 3. We provide Machine Translation solutions with Subject Matter Expertise
  4. 4. But how does that work? Pre-processing Post-processing Input Output Training Data Data Engineering
  5. 5. The Ensemble ArchitectureTM Chinese pre-ordering rules Statistical Post-editing Input Output Training Data Spanish med-device entity recognizer Multi-output Combination Korean pharma tokenizer Patent input classifier Client TM/terminology (optional) Japanese script normalisation German Compounding rules Moses RBMT Moses Moses Combining linguistics, statistics, and MT expertise
  6. 6. Where does evaluation fit it? MT Adoption Cycle
  7. 7. Wait! Let’s take a step back Why? When? How? What? Improve translator productivity How much faster does MT make them? Measure gains in speed Perpetually
  8. 8. Lots of different ways to do evaluation –  automatic scores •  BLEU, METEOR, GTM, TER –  fluency, adequacy, comparative ranking –  task-based evaluation •  error analysis, post-edit productivity Different metrics, different intelligence –  what does each type of metric tell us? –  which ones are usable at which stage of evaluation? e.g. can we really use automatic scores to assess productivity? e.g. does productivity delta really tell us how good the output is? MT Evaluation – where do we start!?
  9. 9. Problem Large Chinese to English patent translation project. Challenging content and language Question What if any efficiencies can machine translation add to the workflow of RWS translators? How we applied different types of MT evaluation and different stages in the process, at various go/no stages, to help RWS to assess whether MT is viable for this project Client Case Study – RWS - UK headquartered public company - Founded 1958 - 9th largest LSP (CSA 2013 report) - Leader in specialist IP translations
  10. 10. Can we improve our baseline engines through customisation? Step 1: Baseline and Customisation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 BLEU TER Iconic Baseline Iconic Customised What next? How good is the output relative to the task, i.e. post-editing? - fluency/adequacy not going to tell us - let’s start with segment level TER -  Huge improvement -  Intuitively, scores reflect well but don’t really say anything -  Let’s dig deeper
  11. 11. Translation Edit Rate: correlates well with practical evaluations If we look deeper, what can we learn? INTELLIGENCE • Proportion of full matches (i.e. big savings) • Proportion of close matches (i.e. faster that fuzzy matches) • Proportion of poor matches ACTIONABLE INFORMATION • Type of sentence with high/low matches • Weaknesses and gaps • Segments to compare and analyse in translation memory
  12. 12. TERscore Step 2: Segment-level automatic analysis Distribution of segment-level TER scores This represents a 24% potential productivity gain segment length
  13. 13. With MT experience and previous MT integration, productivity testing can be run in the production environment. In this case we used, the TAUS Dynamic Quality Framework Step 3: Productivity testing Productivity Test
  14. 14. Productivity Test
  15. 15. With MT experience and previous MT integration, productivity testing can be run in the production environment. In this case we used, the TAUS Dynamic Quality Framework Beware the variables! •  Translators: different experience, speed, perceptions of MT –  24 translators: senior, staff, and interns •  Test sets: not representative; particularly difficult –  2 tests sets, comprising 5 documents, and cross-fold validation •  Environment and task: inexperience and unfamiliarity –  Training materials, videos, and “dummy” segments Step 3: Productivity testing
  16. 16. Overall average Findings and Learnings 25% productivity gain Experienced: 22% Staff: 23% Interns: 30% Test set 1.1: 25% Test set 1.2: 35% Test set 2.1: 06% Test set 2.2: 35% Correlates with TER Rollout with junior staff for more immediate impact on bottom line? Don’t be over concerned by outliers. Use data to facilitate source content profiling? What it tells us By Translator Profile By Test Set
  17. 17. Look our for anomalies –  segments with long timings (above average ratio words/minute) –  sentences that don’t change much from MT to post-edit –  segments with unusually short timings In this case, the next step is production roll-out to validate these in the actual translator workflow over an extended period. Warnings, Tips, and Next Steps Now would be the right time to do fluency/adequacy if you need to verify that post-editing is producing, at least, similar quality output
  18. 18. We need to marry data that we know from operations with data we produce during MT evaluations to create business intelligence Let’s look at how we can find that out and what it means… Making the business case for MT KNOWNS •  Revenue from translation •  Costs (internal, outsourced) •  Variations of this information across content and languages UNKNOWNS •  MT performance •  Cost of MT •  Variations of this information across content and languages
  19. 19. Calculating the ROI on MT Parameters   Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost   €0.10   €0.08   5,000,000   MT  Weighted  Word  Count   No  Machine  Transla3on   With  Machine  Transla3on   LSP  Revenue   €500,000   LSP  Revenue   €500,000   Vendor  Cost   €400,000   Vendor  Cost   MT  Cost   0   MT  Cost   Gross  Profit   €100,000   Gross  Profit   Gross  Profit  Margin   20.0%   Gross  Profit  Margin   Gross  Profit   Increase  when  using   MT   ???%   **These numbers are for illustrative purposes only and not related to the case study
  20. 20. Calculating the ROI – plugging in the numbers Parameters   Per  word  rate  (LSP)   Vendor  Rate   Produc3vity  Gain   Project  Word  Count   MT  Cost   €0.10   €0.08   25%   5,000,000   €0.008   MT  Weighted  Word  Count   3,750,000   No  Machine  Transla3on   With  Machine  Transla3on   LSP  Revenue   €500,000   LSP  Revenue   €500,000   Vendor  Cost   €400,000   Vendor  Cost   €300,000   MT  Cost   0   MT  Cost   €40,000   Gross  Profit   €100,000   Gross  Profit   €160,000   Gross  Profit  Margin   20.0%   Gross  Profit  Margin   32%   Gross  Profit   Increase  when  using   MT   60%   **These numbers are for illustrative purposes only and not related to the case study
  21. 21. Fit for Purpose Evaluation
  22. 22. Thank You! john@iconictranslation.com @IconicTrans

×