Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SDL Proprietary and Confidential
How to Attain
Maximum Machine
Translation Quality
Rodrigo Fuentes Corradi, MT Consultant
...
2
Overview: The SDL MT Team
Who we are
First to commercialize Statistical
Machine Translation
o 50+ Professionals
o Over 1...
3
Post-Edit
SDL’s Intelligent Machine Translation (iMT):
Key steps in MT life cycle
Evaluate Train MT Test
SDL Approach
Re...
4
Teamwork for MT success
○ The MT market is undergoing
radical transformation
○ Scepticism remains in terms
of what benef...
Just Starting: Content,
Use Case & Solutions
6
○ Faster throughput without sacrificing quality
○ To meet aggressive turnarounds
○ Ability to handle increasing content ...
7
Right translation method, right price, right time
Quality
Volume
Human Translation Machine Translation
Blogs
User Forums...
8
SDL’s solutions for increasing MT quality
Customized
Engines
Domain
Verticals
Baselines
Language
Verticals
Engine Creation &
Data Best Practices
10
Good data for customized engines
How much?
What
content?
What style?
Engineers
Vertical engines or baselines may work b...
11
Types of training data
Bilingual
Parallel
Terminology
Source Only
Target Only
o Core training data: translated content,...
12
Goals:
o Enable volume translations
o Migrate content from HT to PE
o Provide accuracy and term
consistency
o Provide p...
How Good is Your
MT Engine?
14
MT testing approaches
Automated Measures
o Useful to compare competing engines and identify the best engine with a high...
15
SDL’s custom MT evaluation platform
○ Data is presented to evaluators in a blind test scenario
in order to safeguard va...
How to Deploy
MT Post-Edit
17
Achieving effective post-editing process
Raw output:
Building blocks
are in place
Linguists focus on
refining the outpu...
18
Post-editing quality guidelines
When post-editing to publishable quality, the following basic principles still apply:
o...
19
What is your quality requirement?
Error Category Specific Issue
Translation
($$$)
Publishable PE
($$)
Light PE
($)
Mist...
How to Maintain & Improve
Future Performance
21
Technical
support
Product
development
Product
development
iMT
consultants
Scientific
development
Hotfix
Terms & brands
...
22
Post-editors identify expected SMT misbehavior
Incorrect
formatting
Additional or
missing words
Words not
localised
Gen...
23
Punctuation not following
the specific language rules
Syntax and word order issues
very frequently observed
Inconsisten...
Expanding
the Roadmap
25
SDL iMT Group are constantly researching
ways to improve Vertical and Customized
MT Engines
SDL Research Scientists are...
26
Legacy MT
Legacy MT
(Monolithic
Phrase-based)
Foreign
Language
Your
Language
27
……
Neural
Networks
Compound
Splitting
Phrase-
Based
Finite
State
Automata
String
to Tree
Rule-
Based
Tree to
String
Pre...
28
Legacy MT systems are static
MT Provider Post-Editor
MT
Engine
xx x xxx
xx xxxxx
xxxx xxx
x x xx x
xxx x xx
PE Edited
x...
29
SDL MT innovation – Adaptive MT
○ New technology developed by SDL Research
○ An Adaptive MT engine that learns interact...
30
Adaptive MT key Features & Benefits
○ Creates a personal
adaptive MT engine
for the user
○ Interactive
o Improves
post-...
31
French
Le service était exceptionnel
Lits très à l'aise
La vue était breathtaking
French Translation
Le service clientè...
32
French
Le service était exceptionnel
Lits très à l'aise
La vue était breathtaking
French Translation
Le service clientè...
Focus on SDL Montreal
34
Focus on Canada’s market challenges
Flavor
requirements
Large retail
projects, no or
small starting
TMs
High
turnover
H...
35
Engine performance summary
Flavor
TerminologyFluency
Flavor
TerminologyFluency
Flavor
TerminologyFluency
Flavor
Termino...
36
SDL’s solution maturity roadmap
Generic
FR-CA
solutions
o Win clients
o Meet deadlines
o Collect project-specific data
...
37
SDL’s answer to Canada’s market challenges
Flavor
requirements
Large retail
projects, no or
small starting
TMs
High
tur...
Summary
39
How do I get started?
Let’s have a conversation:
What content do you
need translated?
What are your quality
requirement...
40
Takeaway
o Measure
& improve
1 2 3 4 5
o MT can be
complex, so
choose your
MT provider
wisely
o Document
your quality
r...
Copyright © 2008-2016 SDL plc. All rights reserved. All company names, brand names, trademarks,
service marks, images and ...
Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine ...
Upcoming SlideShare
Loading in …5
×

Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation

688 views

Published on

Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation. Delivered at the SDL Customer Success Summit Montreal 2016.

Published in: Technology
  • Be the first to comment

Learn the different approaches to machine translation and how to improve the quality of your global strategy with machine translation

  1. 1. SDL Proprietary and Confidential How to Attain Maximum Machine Translation Quality Rodrigo Fuentes Corradi, MT Consultant SDL Language Customer Success Summit | June 7, 2016
  2. 2. 2 Overview: The SDL MT Team Who we are First to commercialize Statistical Machine Translation o 50+ Professionals o Over 10 Nationalities o Across 5 Time Zones o 8 Locations o Computational Linguists o Project Managers Widespread team of language lovers: o Data Specialists o Post- Editors …all gathered from the four corners of SDL! What we do Drive MT Adoption: Educate, promote and support MT usage in existing SDL accounts & new opportunities o Design o Create o Test o Implement o Monitor Custom Engine Builds: …custom Statistical Machine Translation engines Linguistic Projects: Semantic annotation projects for US Government bodies & academic institutions How we do it o Los Angeles, CA o Cambridge, UK Two Research Labs: o 30+ Production offices resourcing MTPE o Custom Training for MTPE resources o Investment in Universities and future supply chain We’re Evangelists…about Machine Translation, using automation to accelerate productivity PE Production offices
  3. 3. 3 Post-Edit SDL’s Intelligent Machine Translation (iMT): Key steps in MT life cycle Evaluate Train MT Test SDL Approach Refine Engineers Developers ScientistsPost-Editor Process Workflow Resource Pool Computational Linguists
  4. 4. 4 Teamwork for MT success ○ The MT market is undergoing radical transformation ○ Scepticism remains in terms of what benefit MT can bring to business ○ Increasing numbers of mature MT players opt for a structured MT approach to match current communication demands ○ The secret of MTPE success lies in a step-by-step, resource-by-resource approach to Enterprise scale Post-Editing Account Managers & Consultants o Technical consulting o Research & implement specific solutions o Sales support PJMs o Communications o Project coordinate o Reporting o Support for consulting Linguists o Prepare customized material o Give trainings online or on-site Linguists o Data cleaning o Expert training o Engine testing o Maintenance Engineers o Data evaluation o Alignment o Conversion Translation Manager o Consolidate feedback on quality o Run PE Certification to improve quality SDL MT Team Roles Post-Edit Training Engine Building & Testing Data Analysis & Management Quality Management Project Management
  5. 5. Just Starting: Content, Use Case & Solutions
  6. 6. 6 ○ Faster throughput without sacrificing quality ○ To meet aggressive turnarounds ○ Ability to handle increasing content volume / volume fluctuation ○ Lower production costs ○ For high volume, MT can be more consistent The demand for MT solutions is growing quickly & post-editing is rapidly becoming a basic skill for translators Why companies use MT post-editing
  7. 7. 7 Right translation method, right price, right time Quality Volume Human Translation Machine Translation Blogs User Forums Reviews Chat Email Support FAQ Websites Wikis Knowledge Base Alerts/ Notifications Help User Guides Documentation Post-Edit Newsletters Advertising Marketing Legal Light Post-Edit
  8. 8. 8 SDL’s solutions for increasing MT quality Customized Engines Domain Verticals Baselines Language Verticals
  9. 9. Engine Creation & Data Best Practices
  10. 10. 10 Good data for customized engines How much? What content? What style? Engineers Vertical engines or baselines may work better if you don’t have enough or the right type of content Computational Linguists o More is better. The statistical algorithms work better with many words to analyse. Upwards of one million words for best success. For very consistent, clean data, half of that may work. o Content should all be from one content type, using similar terminology. A mix of content types (e.g., technical manuals, advertising, etc.) may produce poor results. o Style should be consistent. The algorithms learn patterns from similarities, and perform better if data is in similar form. Very long sentences, or creative and varied styles, can negatively affect trainings.
  11. 11. 11 Types of training data Bilingual Parallel Terminology Source Only Target Only o Core training data: translated content, usually in a translation memory. This is the content that works best and can be processed the fastest. o Translated content, but in separate files. This can be used if the content has been translated exactly, and the format is the same. If for example the document has extra tables in one language, or has been rewritten substantially to fit a different market, it is hard to find matching sentences. o Added to the training data to ensure corporate terms and brands are translated consistently. This can be a termbase or a simple bilingual word list. o Representative documents of the content that will be translated. They are used in initial evaluations of suitability for MT and to test the quality of the engine. Depending on their size, some 50-100 documents are ideal. o Representative documents in the translated language. They are used during the training and contribute to the fluency of the output. To have an effect, large numbers are needed, several million words are ideal.
  12. 12. 12 Goals: o Enable volume translations o Migrate content from HT to PE o Provide accuracy and term consistency o Provide productivity increases Feedback New MT customization workflow Utility and / or Productivity Testing SDL Assessment Client Request Engine Trainings Auto Eval Metrics Data Intake & Processing Blind Human Evaluation Deploy Engine Method o Iterative engine trainings, with several engines created with the best being deployed o Output matches your style and terminology o Engines “learn” from your Translation Memories and terminology o Work in combination with Baseline language engines Post-Editor Computational Linguists
  13. 13. How Good is Your MT Engine?
  14. 14. 14 MT testing approaches Automated Measures o Useful to compare competing engines and identify the best engine with a high reliability o No predictive value for Post-editing productivity but can validate post-editor’s feedback on MT output o All automated measures have their flaws, but SDL has found a weighted combination of measures that gives significant insights. Human - Quality Scoring o Resources are asked to score the MT output according to instructions, with a focus on understandability. o Advantage of method: Human evaluation is considered more robust to alternative, but also valid translations. Note: Human evaluations are prone to subjectivity so you need multiple test subjects. Performing this kind of test is more expensive and time consuming than an automated approach, but can give an absolute value for one engine, not just a comparison. Human – Productivity Testing o Productivity gain for MT is calculated by comparing post-editing speed with conventional translation speed so evaluators can assess how much value post-editing would add in a production environment. o Advantage of method: For Post-Editing, results are a good indicator of the suitability of the MT output. Note: Productivity increase is a difficult factor to predict for all cases and It’s also the most expensive and time consuming test of the three. Engineers Developers MT evaluations should be relevant to your content, from the method of testing (Automatic vs. Human Evaluation) to the testbed. It should represent true life scenarios, taking the available Science and applying it commercially. Computational Linguists
  15. 15. 15 SDL’s custom MT evaluation platform ○ Data is presented to evaluators in a blind test scenario in order to safeguard validity of results ○ Evaluation speed is recorded per segment ○ Multiple evaluators assess the same set of sentences ○ Each individual performance is compared to ensure consistency Additional measures for productivity tests: ○ Productivity increase from HT to PE ○ Translator’s editing actions (insert, copy-paste, pause) ○ Percentage of MT segments that do not require editing ○ Levenshtein edit distance from MT to final translation 1,127 1,510 1,026 1,188 1,123 1,816 1,470 1,414 Client resource SDL resource 1 SDL resource 2 Total Speed (WPH) Human Baseline Can evaluate both Sentence level quality & post-edit productivity gain via a custom testing platform and ensure the validity of results 3.15 3.04 3.09 3.01 2.92 2.97 0.13 0.12 0.13 evaluator1 evaluator2 Average total Customization-Baseline: Average scores Customization Baseline Delta
  16. 16. How to Deploy MT Post-Edit
  17. 17. 17 Achieving effective post-editing process Raw output: Building blocks are in place Linguists focus on refining the output Terminology & style are applied At high volume, MT can deliver greater consistency Trained linguists certified in MT post-editing Post-Editor
  18. 18. 18 Post-editing quality guidelines When post-editing to publishable quality, the following basic principles still apply: o The same references must be used as for conventional translation (project- specific guidelines, TMs, glossaries, termbases, etc.) o Grammar, spelling and punctuation must be correct o Appropriate style & correct terminology must be used consistently o The translation must read well and be suitable for its intended purpose Customer User Guide
  19. 19. 19 What is your quality requirement? Error Category Specific Issue Translation ($$$) Publishable PE ($$) Light PE ($) Mistranslation Error    Terminology Glossary adherence    Consistency   x Accuracy Omissions/Additions    Language Grammar   x Spelling   x Punctuation   x Style General Style   x Country Country Standards   x Register & Tone   x
  20. 20. How to Maintain & Improve Future Performance
  21. 21. 21 Technical support Product development Product development iMT consultants Scientific development Hotfix Terms & brands Python filters to protect and transform patterns Fundamental problem Influence long term scientific strategy iMT consultants Scheduled fix for future product release Analysis of setup, technical advice Major tool issue Minor tool issue Protected content translated, wrong terminology Translation errors following patterns, like dates Expected MT behaviour Linguistic Technical The effects of post-editor feedback
  22. 22. 22 Post-editors identify expected SMT misbehavior Incorrect formatting Additional or missing words Words not localised Gender, number, agreement or verb inflection issues Compound formation issues Syntax and word order issues Wrong punctuation Inconsistent or non-compliant terminology Mistranslations
  23. 23. 23 Punctuation not following the specific language rules Syntax and word order issues very frequently observed Inconsistent or wrong terminology very frequently observed Examples of unexpected misbehavior HTML entities instead of the correct character (i.e. & instead of &) Words in a language other than the target Engineers Scientists Post-Editor Computational Linguists
  24. 24. Expanding the Roadmap
  25. 25. 25 SDL iMT Group are constantly researching ways to improve Vertical and Customized MT Engines SDL Research Scientists are continuously improving the Statistical Machine Translation algorithms (e.g. Language Models, Translation Models, Reordering Models, Syntax, Transliteration, Rule-Based Components, etc…) SDL Data Engineers are continuously mining large amounts of good data used by the statistical algorithms Continuous improvement
  26. 26. 26 Legacy MT Legacy MT (Monolithic Phrase-based) Foreign Language Your Language
  27. 27. 27 …… Neural Networks Compound Splitting Phrase- Based Finite State Automata String to Tree Rule- Based Tree to String Pre- Ordering Trans- literation Hidden Markov Model Hyper Graphs Modular & Flexible “State-of-the-Art” Machine Learning Better Translation Quality Rapid Research Transition SDL XMT: Next generation technology, higher quality XMT Foreign Language Your Language M O D U L A R C O M P O N E N T S
  28. 28. 28 Legacy MT systems are static MT Provider Post-Editor MT Engine xx x xxx xx xxxxx xxxx xxx x x xx x xxx x xx PE Edited xx x xxx xx xxxxx xxxx xxx x x xx x xxx x xx MT Output
  29. 29. 29 SDL MT innovation – Adaptive MT ○ New technology developed by SDL Research ○ An Adaptive MT engine that learns interactively from the post-editor’s edits SDL Adaptive MT Post-Editor MT Engine Adaptive MT Processor xx x xxx xx xxxxx xxxx xxx x x xx x xxx x xx PE Edited xx x xxx xx xxxxx xxxx xxx x x xx x xxx x xx MT Output
  30. 30. 30 Adaptive MT key Features & Benefits ○ Creates a personal adaptive MT engine for the user ○ Interactive o Improves post-editor’s productivity ○ Reduces the frustration of editing the same incorrect MT ○ Cumulative learning over time – saved from job to job ○ No need to wait for a retrain
  31. 31. 31 French Le service était exceptionnel Lits très à l'aise La vue était breathtaking French Translation Le service clientèle était exceptionnel Lits très confortables à l'aise La vue était à couper le souffle breathtaking English Document The customer service was outstanding Very comfortable beds The view was breathtaking French Translation Le service ____ était excellent Les lits étaient très à l'aise Quelle breathtaking vue! User Feedback English Document The customer service was excellent The beds were very comfortable What a breathtaking view! Before Adaptive MT Machine Translation
  32. 32. 32 French Le service était exceptionnel Lits très à l'aise La vue était breathtaking French Translation Le service clientèle était exceptionnel Lits très confortables à l'aise La vue était à couper le souffle breathtaking English Document The customer service was outstanding Very comfortable beds The view was breathtaking French Translation Le service clientèle était excellent Les lits étaient très confortables Quelle vue à couper le souffle! User Feedback English Document The customer service was excellent The beds were very comfortable What a breathtaking view! Machine Translation Adaptive MT Engineers Post-Editor Developers Scientists Computational Linguists With Adaptive MT
  33. 33. Focus on SDL Montreal
  34. 34. 34 Focus on Canada’s market challenges Flavor requirements Large retail projects, no or small starting TMs High turnover High quality requirements Traditional offer (SDL prior to 2014, Google, Bing) Mixed French flavor Mixed domains, no retail vertical Lack of suitable generic solutions prevent MTPE from the start Lack of flavor & domain-specific terminology increase PE effort and review costs    
  35. 35. 35 Engine performance summary Flavor TerminologyFluency Flavor TerminologyFluency Flavor TerminologyFluency Flavor TerminologyFluency FR Baseline FR-CA Language Vertical FR Domain Verticals Customizations
  36. 36. 36 SDL’s solution maturity roadmap Generic FR-CA solutions o Win clients o Meet deadlines o Collect project-specific data Customizations o Improve productivity & quality o Collect more data and share feedback Retrainings o Further improvement to productivity and quality M A T U R I T Y
  37. 37. 37 SDL’s answer to Canada’s market challenges Flavor requirements Large retail projects, no or small starting TMs High turnover High quality requirements SDL’s offer after 2014 Training material is handpicked to ensure correct flavor We have grown retail solutions to fit current & new opportunites We have a portfolio of training material & success recipes for a quick start Combination of adapted MT solutions & shrewd testing and feedback processes    
  38. 38. Summary
  39. 39. 39 How do I get started? Let’s have a conversation: What content do you need translated? What are your quality requirements? What can you use for a training corpus?
  40. 40. 40 Takeaway o Measure & improve 1 2 3 4 5 o MT can be complex, so choose your MT provider wisely o Document your quality requirement o Integrate MT within your larger localization infrastructure o Use trained, certified post-editors
  41. 41. Copyright © 2008-2016 SDL plc. All rights reserved. All company names, brand names, trademarks, service marks, images and logos are the property of their respective owners. This presentation and its content are SDL confidential unless otherwise specified, and may not be copied, used or distributed except as authorised by SDL. Global Customer Experience Management

×