SDL BeGlobal The SDL Platform for Automated Translation


Published on

Post edited machine translation as a skill and as an addition to the professional translators’ toolkit is now becoming widely accepted. Here you can see why...

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SDL BeGlobal The SDL Platform for Automated Translation

  1. 1. SDL BeGlobal The SDL Platform for Automated Translation SDL Proprietary and Confidential
  2. 2. Platform Characteristics
  3. 3. The SDL BeGlobal platform Multiple access points to MT engines WorldServer Trados Studio Passolo LivePerson Oracle RightNow Custom Integrations 3 Realtime API TouchPoints Translation Engines Role-based access web UI Customized Language Pairs Generic Language Pairs
  4. 4. Infrastructure Platform Scalable Serving thousands of users in real-time Private All requests remain private and confidential to users Secure Only accessible to those privileged
  5. 5. Infrastructure Platform Organized Only users with specific rights can do specific things Available & Deployable Can instantly be switched on (cloud offering) Made for integration Catering to IT professionals
  6. 6. Available Language Pairs Western European Eastern European Middle East & Asian African Danish <> English Albanian <> English Arabic <> English Bengali <> English Dutch <> English Bulgarian <> English Arabic <> Spanish Hindi <> English Finnish <> English Czech <> English Arabic <> French Indonesian <> English French <> English Estonian <> English Dari <> English Japanese <> English French <> Arabic Hungarian <> English Hausa <> English Korean <> English French <> Spanish, German Latvian <> English Hebrew <> English Simplified Chinese <> English German <> English Lithuanian <> English Pashto <> English Thai <> English German <> French Polish <> English Persian <> English Traditional Chinese <> English German <> Spanish Romanian <> English Somali <> English Urdu <> English Greek <> English Russian <> English Italian <> English, Spanish Serbian <> English Norwegian <> English Slovak <> English Portuguese <> English Slovenian <> English Spanish <> English, Arabic, Turkish <> English French, German, Italian Ukrainian <> English Swedish <> English 6 New language pairs are added on a continuing basis Custom developments welcome
  7. 7. Integrations Business User Focus
  8. 8. Integrations: Web Translator A simple web application for translating text and documents, customizable with customer’s visual identity 8
  9. 9. Integrations: BeGlobal Desktop A desktop tool that allows translating text and documents 9
  10. 10. Integrations: Microsoft Word The Word plugin allows the user to translate The document that they have on the screen
  11. 11. Integrations: Microsoft Word Source file is displayed next to the target translation once completed. This means the user can see the 2 documents
  12. 12. Integrations: Microsoft Outlook The Outlook plugin will translate an email, either appending the translation to the source, or overwriting it 12
  13. 13. Integrations: LivePerson chat The agent chats in English and all incoming Spanish messages are automatically translated 14 The user chats in Spanish and receives the agent’s messages also in Spanish
  14. 14. Integrations: SDL Trados Studio Define BeGlobal connection Define pre-translation settings 15
  15. 15. Integrations: SDL Trados Studio Translation results Trados Studio Editor 16
  16. 16. BeGlobal Trainer Continuous improvement
  17. 17. Language Pair Customization Process – Then and Now • Professional Services approach Parallel Data, TMs, Glossaries The Customer Translation Engines SDL Applied Science Engineers Custom Language Pair Training Data Deploy Customized Language Pairs Generic Language Pairs SDL SMT “Training” Environment • BeGlobal Trainer self-service approach Parallel Data, TMs, Glossaries 18 The Customer BeGlobal Trainer
  18. 18. What is BeGlobal Trainer? • A Software-as-a-Service component of SDL BeGlobal that allows the customization (“training”) of MT engines based on parallel corpus in TMX format. • Perform domain-adaptation for existing generic engines. • The packaged (commercial) approach of SDL Science Labs to MT Language Pair customization, summing up 15+ years of field experience. • A simple 4-step process: 19
  19. 19. The “Training” Workflow Collect Parallel Data Prepare Parallel Data WORK OFFLINE TMX files used as training data (single file or multiple) At least 500K/1M source words for domain improvement Create New Training Upload Data Train MT Engine Evaluate Engine BEGLOBAL TRAINER Translate Deploy Engine BeGlobal Online, API, SDL Trados Studio, … BEGLOBAL SaaS
  20. 20. Automatic cleaning is done on the training corpus. Trainer Concept Multiple internal trainings are performed to determine best/fastest engine. Training corpus (TMX) Clean Upload Complete training BiLingual Evaluation Understudy Train Evaluate Trained Language Pair Algorithm for evaluating the quality of text which has been machine-translated. Regression set (TXT) “The closer a machine translation is to the professional human translation in the TMX file, the better it is" Test set (TMX) Deployed Custom Language Pair Generic Deploy Human evaluation Optional Human evaluation on Regression set (trained vs generic) or custom evaluation “BLEU” score evaluation
  21. 21. Real-life applications Translation productivity Translate Post-edit WorldServer, Trados Studio Machine Translate Generate TM BG Integrations Deploy BeGlobal Platform 23 Train BeGlobal Trainer Integrations & Self-service translations
  22. 22. Fully documented, fully supported • Service provided with full documentation, user guide, best practices and dedicated consultancy 24
  23. 23. Demonstrations
  24. 24. Post-Editing Best Practice
  25. 25. In Order to Post-Edit Effectively... • Retain as much raw MT output as possible • Some parts of the MT output can almost always be used • Don‟t over edit or under edit • For each segment, re-read your translation – no details missing – no superfluous words left in Content and therefore MT output can vary. More or less editing may be required depending on the nature or difficulty of the material. 27
  26. 26. Typical Features to Watch out for in SMT Output • SMT output is generally fluent in style & consistent with existing data – (i.e. data used to build the engine) • BUT look out for the following issues (sample of just some of the issues): – Extra words in target, not present in the source – Words missing in the target (adjectives, nouns, verbs, prefixes) – Mistranslations (antonyms: remove/install, can/cannot, do/don„t) – Grammar: gender, agreement or verb inflection might be wrong 28
  27. 27. Extra Words in the Target LP Source MT Output PE Version Comments Les ailes droites et gauches sont les mêmes. Word added EN-FR Right and left fenders are the same. Les ailes droites et gauches sont bien les mêmes. EN-ES The email attachment is supported. El archivo adjunto al El como archivos correo electrónico es adjuntos al correo electrónico es compatible. compatible. Word added
  28. 28. Words Missing in the Target LP Source MT Output PE Version Comments IT-EN Come è possibile regolare la luminosità dello schermo del display? How can adjust the brightness the display screen? How can I adjust the brightness of the display screen? Subject missing in the target (not required in the source) EN-FR Using hardware removed earlier, attach the new center link. A l'aide du matériel de fixation déposé plus tôt, fixez la nouvelle articulation. A l'aide du matériel de fixation déposé plus tôt, fixez la nouvelle articulation centrale. Missing word in the target
  29. 29. Mistranslations LP Source MT Output PE Version Comments EN-FR You can alter, publicly display, publicly perform or broadcast the hardware. Vous ne pouvez en aucun cas modifier, afficher publiquement, exécuter publiquement ou diffuser les matériels. Vous pouvez modifier, afficher publiquement, exécuter publiquement ou diffuser les matériels. Negative instead of positive sentence ES-EN No conectar la batería Connect the battery Do not connect the battery Positive instead of negative sentence IT-EN Chiudere la vite di spurgo C e passare alla fase successiva del processo di spurgo. Open the bleeding screw C then continue next step of bleeding process. Close the bleeding screw C then continue next step of bleeding process. Antonym
  30. 30. Grammar LP Source MT Output PE Version Comments Wrong case EN-DE To remove the 3D diffuser: Zum Entfernen des 3D Refraktionstechnik: Zum Entfernen der 3D Refraktionstechnik: IT-EN Di seguito viene descritto soltanto ciò che è aggiunto. The following is described only what is added. The following describes Wrong verb tense only what is added. EN-FR The pressure is reduced to pilot pressure. La pression est réduit à la pression pilote. La pression est réduite à la pression pilote Gender agreement (masculine instead of feminine)
  31. 31. Summary
  32. 32. SDL BeGlobal … Is an Infrastructure Scalable, Secure, Easy to deploy and Integrate For translators With continued tightened integration with SDL Trados Studio And for nonlinguists Enable “non-professional translator”, “the business user” to self-service their multilingual communication needs: On demand & within their environment Continuously improves Allows non-technical, non-computational linguist user to train and evaluate custom engines
  33. 33. Thank You!