TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

1,161 views

Published on

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.

MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.

For the latest updates, follow us on Twitter - #MosesCore

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,161
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

  1. 1. TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEMoses: The Trusted TranslationsExperience15:30-15:45Monday 4 JuneGustavo LucardiTrusted Translations
  2. 2. Moses: The Trusted Translations ExperienceTAUS Open Source Machine TranslationPre Conference WorkshopsLocalization World ParisJune 4, 2012Gustavo LucardiCOO Trusted Translations, Inc.@glucardi
  3. 3. Moses: The Trusted Translations Experience Moses from the Point of View of an LSP • What Exactly are We Doing? • Unexpected Experiences and Issues • Evaluating Complementary Tools • Our Next Steps with Moses • Our Conclusions about Moses up to Now • Looking Back What Would We Do Differently • What other LSPs could learn from Our Experience TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  4. 4. Moses: The Trusted Translations Experience What exactly are we doing? • The real answer is LEARNING MT ASAP • If we are asked to choose, we prefer Open Source • We built two Moses engines: ▫ Legal English to Spanish (many clients) and Technical English to Spanish (1 client) ▫ We are re-training those engines with our post-editors feedback ▫ But, we are also re-building those engines with different Moses Customizations • Evaluating Complementary Tools ▫ Like SymEval among others • Testing other MT solutions ▫ Pangea MT (Pangeanic), DoMY (Translation Precision Tools), Smart MATE (Applied Languages), Language Studio (Asia Online), Tauyou, LetsMT, and others TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  5. 5. Moses: The Trusted Translations Experience Unexpected Experiences and Issues • The lack of information and documented procedures on how post- editing should be done (There is a very helpful TAUS Paper about that and we also went to the TAUS Pre Conference this morning) • The difficulty of measuring human post-editing quality and productivity (SymEval was helpful for that) • The difficulty to measure the quality of an Engine using BLEU (For example an Engine with a BLEU of 70 can have a worse output than Engines with a BLEU of 40) • The difficulty of aligning existing Parallel DATA without human intervention • The fear of translators to be displaced by MT • MT focus is improving Quality and Time, not Costs. • MT + PE is not enough = We are doing MT + PE + E and/or P TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  6. 6. Moses: The Trusted Translations Experience Evaluating Complementary Tools • We want to mention SymEval. It has 3 components: Eval, Extract and Diff. • Eval is used with TMX files and it produces an XML file highlighting the differences between the two versions. • Also the Project Score can be seen as the percentage of the machine translation that was used (Two identical files would have a score of 100) • We notice that it can help judging either the quality of the machine translation or the quality of the post-editing. In other words: ▫ If you have a good Engine you could use Eval to test your Post-editor ▫ If you have a good Post-Editor you could use Eval to test your Engines • SymEval is not very user-friendly, and there is not enough support documentation, For example, in the help page it says that the Eval tool only supports XLIFF files, but it actually does support also TMX • We find a more user-friendly tool in a Memory Server that we were testing, but we need to make the Memory Server speak with Moses yet TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  7. 7. Moses: The Trusted Translations Experience Our Next steps with Moses • Scripts to Train (Automate through scripts the pre-processing of DATA to train Moses engines: Take out Placeables like Numbers, Acronyms, XML Tags, E-mail addresses and URLs / Remove Short Segments and Long Segments / Spell check on Source and Target / Remove segments where target and source have different structures) • Scripts to Translate (Automate through scripts the pre and post-processing for translations with Moses engines: Go from TMX to TXT and from TXT to TMX) • Plug-in for Moses interaction with CAT TOOLS (Create or find PLUG-INS to use Moses directly through CAT TOOLS, like the ones for Okapi or Globalsight. This could allow us to avoid to go to and from TMX and TXT. Plug-ins for Moses like the plug-ins for Trados to connect with FreeTranslation, Systran and others) • Moses GUI (Create or find a user interface to allow the less tech savvy to interact directly with Moses) • Interaction of Moses with Cloud Memory Server and Workbench Online • Add the possibility that the Post Editor chose different MT Engines for each segment TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  8. 8. Moses: The Trusted Translations Experience Our Conclusions about Moses up to Now • Moses has great potential, but it is not yet optimized for LSPs • Its interface is not user-friendly (in fact there is no GUI yet) • There are no reliable sources of information for certain niches (a lot of work is still needed to obtain a reliable corpus in some niches with the tools that exist at the moment) • We still have to wait for the development of the new features that LSPs require (glossaries, blacklist words, labels) • It is necessary to optimize the internal processes of the LSPs to adapt to the needs of Moses (Management of Memories, product names and trademarks, languages) • Its better to work with an engine for a specific client, than with one for a specific industry or area of specialization TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  9. 9. Moses: The Trusted Translations Experience Looking Back What Would We Do Differently • Start using a Memory Server before • A Memory Server is the best solution to be sure that your company always properly manages translation memories • Memories are: ▫ The basic input for an LSP to customize an Engine ▫ The added value created by an LSP for a certain client or a domain • Start testing with Moses one year early (We started in 2010 when we should have started in 2009, but it’s never too late!) • Dedicate more resources to test all the other more mature implementations • Build Engines for Clients and not for Domains (???) TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  10. 10. Moses: The Trusted Translations Experience What other LSPs Could Learn from Our Experience • LSPs can only profit from running their own MT engines if they have a recurrent customer with enough data volume to build an engine and perfect engines on a specific domain over time • General (non-specific) MTs (i.e., Google Translate, Microsoft) will provide better results for one time jobs or for recurrent clients with low data volume • The effort put to overcome the problems found during the deployment stage on integrated MT customizations was pretty similar to the effort needed to set up Moses • The current proprietary MTs customizations that we tested have yet to travel some distance in order to offer a turnkey solution for an LSP (We prefer Linux to Windows, and for now we prefer Ubuntu to RedHat) TAUS Open Source Machine Translation Pre Conference Workshops Localization World Paris 2012
  11. 11. Thank you!Gustavo Lucardi@glucardiglucardi@trustedtranslations.comhttp://www.trustedtranslations.comhttp://translation-blog.trustedtranslations.com

×