Successfully reported this slideshow.

Integrating Machine Translation with Translation Memory: A Practical Approach

1,924 views

Published on

Published in: Technology, Business
  • Be the first to comment

Integrating Machine Translation with Translation Memory: A Practical Approach

  1. 1. Introduction Methodology DiscussionIntegrating Machine Translation with Translation Memory: A Practical Approach Panagiotis Kanavos and Dimitrios Kartsaklis November 4, 2010 Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 1/ 18
  2. 2. Introduction Methodology DiscussionIntroduction Despite the ongoing research and the progress on the field, Machine Translation has not been widely accepted by the professional translation industry Common criticisms: MT is only suitable for draft translations of e-mails and web pages MT is not efficient for morphologically rich languages MT is useful only to large companies owning a wealth of resources In a nutshell: MT is something for researchers to play around with Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 2/ 18
  3. 3. Introduction Methodology DiscussionA Case Study How MT can be incorporated into professional translation workflows, with limited resources, in ways that significantly increase productivity. We combine both statistical and rule-based MT systems with Translation Memory software using two approaches: The on demand, sentence-by-sentence application of MT The one-time application of MT into the whole translation project The case study is conducted in production conditions, with final deliverables that require the highest translation quality. Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 3/ 18
  4. 4. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowOur setting Language pair: English to Greek Text to be translated: Two Informatics books: one technical guide and one academic textbook. TM size: 140,000 TUs coming from in-domain texts Terminology DB size: 30,000 entries Fuzzy threshold: 70% Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 4/ 18
  5. 5. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowSoftware programs and combinations MT systems: Statistical: Moses Rule-based: Systran CAT programs: Swordfish II (Java application) over Linux D´j` Vu X over MS Windows ea Wordfast, an MS Word macro template Three combinations, based on practical factors: Sentence-by-sentence workflow with Swordfish/Moses Sentence-by-sentence workflow with Wordfast/Systran One-time MT application workflow with D´j` Vu X/Moses ea Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 5/ 18
  6. 6. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowSwordfish/Moses combination Swordfish: Allows connection to external programs or scripts Connection with Moses achieved with a custom Python script Basic workflow: if TM match > 80% then accept fuzzy match for post-edit else if 70% < TM match =< 80% then evaluate the fuzzy match if quality not acceptable then apply MT end if else apply MT if quality not acceptable then type the translation from scratch end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 6/ 18
  7. 7. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowSwordfish/Moses combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 7/ 18
  8. 8. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowWordfast/Systran combination Wordfast: A macro template working on top of MS Word Great deal of customization through MS Word macros Rule-based version of Systran, supporting user dictionaries Basic workflow: if TM match < 70% then apply pre-editing macros send segment to MT engine apply post-editing macros while MT result not good do amend Systran user dictionary and re-send segment to MT end while else accept the translation for post-edit end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 8/ 18
  9. 9. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowWordfast/Systran combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 9/ 18
  10. 10. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowD´j` Vu X/Moses combination ea D´j` Vu X: similar concept to Swordfish ea However: No way of integration with an MT system, so the only option is pre-translation of the whole project with Moses Send for MT only segments with no TM matches or TM matches below 80% Pre-translation stage: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 10/ 18
  11. 11. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowD´j` Vu X/Moses combination ea Basic workflow: if TM match > 80% then accept the translation for post-edit else evaluate MT translation if quality not acceptable then if any TM match exists (between 70-80%) then accept the translation for post-edit else apply “auto-assemble” feature if quality not acceptable then type the translation from scratch end if end if end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 11/ 18
  12. 12. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflowD´j` Vu X/Moses combination: Results ea Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 12/ 18
  13. 13. Introduction Methodology DiscussionProductivity increase MT & TM combination: Productivity increased to a level not possible by applying either technology in isolation: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 13/ 18
  14. 14. Introduction Methodology DiscussionImportant factors Quantity and quality of TM entries The domain of the translation material used to train the statistical MT system The above impose serious limitations for those who work with small texts in many different domains. Rule-based systems are more suitable in such cases Language pair: Coding efficient user dictionaries with morphologically rich languages is difficult and requires some trial and error. Phrase-based systems like Moses have better performance Style of text: Productivity is higher with repetitive text and step-by-step instructions User expertise with all technologies involved Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 14/ 18
  15. 15. Introduction Methodology DiscussionA proposal for a unified application For general acceptance by the professional translation community, MT should be integrated with TM into an intuitive unified system Basically a TM environment, with the MT engine as an extra component working on top of it MT suggestions should be presented in a controlled and selective way Basic components: A 2-column translation grid for source and target segments Terminology management MT engine Alignment tool Quality assurance control Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 15/ 18
  16. 16. Introduction Methodology DiscussionAdvanced issues Automation of the training process with TM databases Statistical systems require considerable computing resources. A solution: MT as Software As a Service (SaaS) Terminology databases can be used for more than reference purposes Additional entry fields for coding MT dictionary entries (Systran) Linguistic information can be used for creating factored models (Moses) Automatic suggestions-as-you-type (TransType, Caitra) Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 16/ 18
  17. 17. Introduction Methodology DiscussionSummary The combination of MT with TM results in significant productivity increase not feasible in a TM-only environment Currently there is not a straightforward way for doing that Work is in progress by the authors towards this purpose, in the form of a Software Specification document that will describe the design and the components of such a system in every detail Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 17/ 18
  18. 18. Introduction Methodology Discussion Thank you! Any questions?Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 18/ 18

×