Welocalize Machine Translation Post Editing Basics Course I

2,383
-1

Published on

A quick overview of machine translation (MT) and post editing (PE) for localization service buyers and translators. Welocalize Language Tools team presents an overview of Concepts, Why MT?, Examples of Machine Translation and Post-Editing. Discussion of Post Editing and Light Post Editing. Additional topics include benefits of MT, MT patterns, output, raw MT. Automation, Language Services Provider. Contact Welocalize Language Tools for additional information. www.welocalize.com

Published in: Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,383
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
27
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • This one is ok to leave as is as it is a list of many IT companies’ names, doesn’t really point at a specific client.
  • Same as Dell
  • Welocalize Machine Translation Post Editing Basics Course I

    1. 1. Foundations Machine Translation Post-Editing Copyright: Welocalize, Inc. 2014. All Rights Reserved
    2. 2. machine.translation Copyright: Welocalize, Inc. 2014. All Rights Reserved
    3. 3. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. machine.translation • Contracts • Patents • Annual Reports • Light Marketing • Software Documentation • Software User Interface • SEO (Search Engine Optimization) • e-Learning Content • User Guides • Internal Corporate Communications • Wikis • Knowledge Bases • Proposals / Draft Applications • User Generated Content Different use cases for MT (audience? perishability? visibility?) Copyright: Welocalize, Inc. 2014. All Rights Reserved
    4. 4. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. why.mt For clients – Increase throughputs and consistency – Reduce cost of translation – Content explosion due to Internet – Most internet content is in English (user community is global) – Desire to translate also “lower quality” content, such as User Generated Content (UGC) at a profitable price – Quality of MT has improved (new technologies, lots of research) For the translator – Increase throughputs and consistency – MT is likely to become commonplace, like TMs before – More & more clients and LSPs use MT – Be an early-adopter – MT and new forms of post-editing requirements are fast evolving Copyright: Welocalize, Inc. 2014. All Rights Reserved
    5. 5. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. basic.concepts MT in a nutshell […] Machine Translation provides a set of tools by which digital text is automatically translated from one language (e.g. English) into another language (e.g. Spanish). Source: Systran user guide There are 3 main types of MT systems with different underlying logics:  Rules-based (RBMT)  Statistical (SMT)  Hybrid (SMT + RBMT) Most systems used today are either statistical or hybrid. All system types can be customized for specific clients, incorporating client Translation Memories, basic preferences and/or terminology lists. Copyright: Welocalize, Inc. 2014. All Rights Reserved
    6. 6. basic.concepts Client- specific data TMs, glossaries Domain-specific data chemistry or mechanical or IT or… General language data anything to“teach the system the basics on the language pair“, so all of: tourism, IT, automotive, literature,… e.g. Google Translate and Bing would be Baseline only Customizable MT systems (licensed or open source) Copyright: Welocalize, Inc. 2014. All Rights Reserved
    7. 7. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. basic.concepts Understanding statistical MT For the translator, it is important to understand that SMT systems are based on algorithms calculating probabilities within a given set of data (bilingual and monolingual). In other words, the system learns from legacy human translations (Translation Memories in our case) and calculates probabilities of most likely translations from these, without applying linguistic rules as such. Copyright: Welocalize, Inc. 2014. All Rights Reserved
    8. 8. basic.concepts The logic behind statistical machine translation (SMT) Imagine the TM(s) as aligned data corpus – example Example Terminology The term click appears > 16 000 times in TM A In 90% of cases it is translated with fare clic in 10% as: selezionare, scegliere, … The probability is high, that the machine translation will be fare clic …BUT, maybe… The string click OK appears 500 times in TM A In 50% of cases it is translated with fare clic su OK in 50% as: selezionare OK The probability is 50%, that the machine translation will be selezionare OK Copyright: Welocalize, Inc. 2014. All Rights Reserved
    9. 9. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples good > perfect to overall understandable and fairly fluent medium > contains useful chunks, terms and occasionally perfect output; more or less understandable, little fluency poor > poor with regard to understandability and fluency  We carry out content evaluations to prevent content with overall poor MT output from going into production  Medium is the broadest category and can still lead to productivity gains when used as a basis for post-editing The quality of raw MT output can vary. A distinction is typically made as follows: Copyright: Welocalize, Inc. 2014. All Rights Reserved
    10. 10. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples The quality of raw MT output can vary. Example: Copyright: Welocalize, Inc. 2014. All Rights Reserved
    11. 11. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples Know the patterns of MT output Even ”good” MT output is not expected to be perfect. Depending on the underlying MT logic and the language pair, there tend to be typical issues to fix, e.g.: – issues around capitalization – punctuation (source punctuation is copied) – spacing – omissions/additions of text (usually different in nature to those in fuzzy matches) – unknown/new words may be translated literally or be left in English – word order: can be mirroring the source – compound formation – word form agreement → being aware of typical issues helps good post-editing Copyright: Welocalize, Inc. 2014. All Rights Reserved
    12. 12. typical.examples Copyright: Welocalize, Inc. 2014. All Rights Reserved
    13. 13. typical.examples Copyright: Welocalize, Inc. 2014. All Rights Reserved
    14. 14. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing What is Post-Editing? Copyright: Welocalize, Inc. 2014. All Rights Reserved
    15. 15. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing In other words…  Make changes where necessary, using as much of the MT output as possible (based on language and client requirements)  Read the MT output & the source > decide quickly what can be used  Use as many “bits/sections“ of the MT output as possible: move them around, correct word forms, change the part of speech, use them as inspiration  Look up key terms in your reference material as usual, but also learn to trust the customized output  Automate with customized QA checks Adjust your expectations. Rethink your approach. Report recurring errors. Copyright: Welocalize, Inc. 2014. All Rights Reserved
    16. 16. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. full.post.editing full post-editing: “publishable quality” ► Client Glossary, TM, Style Guide and others apply Examples:  infinitive / imperative preferences?  passive / impassive preferences?  formal / informal preferences?  different styles for headers, lists, tables?  special formatting of UI options? (bilingual, English)  are measurements to be converted?  Terminology If the client requests “full post-editing”, this means publishable quality. The post-editor is responsible for ensuring the client requirements with regard to final quality expectations are met. Copyright: Welocalize, Inc. 2014. All Rights Reserved
    17. 17. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. light.post.editing light post-editing / “understandable quality” Full Post-Editing Light Post-Editing Grammar and spell-checking are correct Minor issues in grammar (and spelling) are acceptable Terminology is accurate & consistent Terminology is understandable and actionable Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable Style is consistent (headers, list items,…) Style variations are acceptable Punctuation is correct Variations/errors in punctuation are acceptable Style & tone are appropriate for content Style & tone are not offensive Specific requirements: 33 cm (13‘‘); change EN quotation marks to FR/DE/…. Follow MT output, e.g. keep proposed number format 13‘‘ (33cm), English quotation marks,... … … Copyright: Welocalize, Inc. 2014. All Rights Reserved
    18. 18. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing light post-editing versus full post-editing *Copyright CSA Copyright: Welocalize, Inc. 2014. All Rights Reserved Image © Common Sense Advisory, “Post-Edited machine translation defined”, April 30, 2013
    19. 19. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing Notes on productivity Just as with human translation, throughput can vary and depends on: – language pair – content type & complexity – experience – domain knowledge – quality requirements – use of automatic QA tools – quality of TM and reference material With MT, additional factors are: – quality of the MT – experience with post-editing Compared to average daily throughputs for human translation, average daily throughputs for full post-editing can be up to 3 x higher. Copyright: Welocalize, Inc. 2014. All Rights Reserved
    20. 20. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. take.aways  There are different use-cases of MT associated with different levels of final (post-edited) quality  When full PE is requested, this means publishable quality  There are different MT systems, Welocalize works with a range of them  MT output varies in quality, we evaluate it with our translation partners to ensure the necessary quality for post-editing is met  MT is not expected to be perfect, that„s why we need post-editors!  Post-editing replaces the translation stage in the workflow, but it is a different task, cognitively  MT systems can improve through adding more data & through constructive feedback from post-editors Copyright: Welocalize, Inc. 2014. All Rights Reserved
    21. 21. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. trademark.disclaimer: Product names, logos, brands and other trademarks referenced within this presentation are the property of their respective trademark holders. These trademark holders are not owned or affiliated to Welocalize, Inc., our products, or our website. They do not sponsor or endorse our materials. Reference is for education purposes only. Copyright: Welocalize, Inc. 2014. All Rights Reserved
    22. 22. Questions? Contact the Welocalize Language Tools Team lena.marg@welocalize.com, elaine.ocurran@welocalize.com Welocalize Frederick, Maryland - Headquarters 241 East 4th St. Suite 207 Frederick, Maryland 21701 USA [t] +1.301.668.0330 [t] +1.800.370.9515 Toll Free www.welocalize.com Copyright: Welocalize, Inc. 2014. All Rights Reserved
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×