TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

1,164 views
1,042 views

Published on

A Moses engine for legal translation
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme.
Latest news on Twitter - #MosesCore

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,164
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

  1. 1. TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEA Moses MT engine for legaltranslationBy Joël Sigling
  2. 2. Joël Sigling Directora Moses MT engine for legal translation Modern technology in a traditional sectorTAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEMonte Carlo, 25 March 2012
  3. 3. AVB Translations background• Amstelveens Vertaalburo: founded 1972 – traditional, high-quality agency• Translation World: founded 2002, tech-savvy all-round player• Merger in 2010 >> AVB Translations: premium brand with strong tech focus• Top 5 player in The Netherlands, 2011 turnover € 4.6 million• Core business: general translations – legal, financial, technical, … NO software localization (yet!)
  4. 4. History of MT interest• Member of TAUS since 2008, 1st round table Amsterdam• Visited TAUS User Conferences in US since 2009• Sense of urgency developed, merger distraction 2010• Action in 2011 after merger• 2011: choice for Dutch <> English legal (not IT-related!) domain engine• Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)
  5. 5. Why legal domain MT engine?• Legal translations about approx. 40% of AVB business, 80% Dutch <>English• Not the obvious choice: people said MT wouldn’t work for legal: sentences too long, material too intricate• Statistical MT suited to non-stylistic materials: eg legal• If this works, we can make MT happen for all other domains
  6. 6. MT engine objectives• Increased productivity, no BLEU % target, but tangible, practical results. How much extra can a translator do when compared to HT?• Tool to offer usable quality with very quick turnarounds for high volume (typical “Friday afternoon lawyer requests”)• Becoming an MT front runner in the non-localization sector for Dutch (5th language in Europe after FIGS)
  7. 7. Developing the Moses engine• Choice between in-house and external development • In-house: control, developing expertise, lower long-term cost • External: lower initial cost, much more expertise > best for now• Our pre-requisites for development option • ownership and free access to engine • assurance data will not be used or copied by builder • Acceptable costs for development & usage • skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT, SmartMate??• CrossLang > all of the above, closest to our office, independent
  8. 8. What we needed• Large quantities of high-quality translation data • Aligning existing high-quality legal translations (took longest to prepare) • Existing legal TMs • Going forward: company-/industry-specific terminology• Ways to measure gains • Not just automated evaluation % increase, but also tangible improvements > we are entrepreneurs, not scientists • CrossLang automated assessment tool (TER, BLEU, NIST, METEOR) • Manual assessment: eg. how many hours for post-editing 10,000 words?
  9. 9. Input data• Highest quality AVB Dutch <>English legal translations: approx. 700k words per language. Predominantly civil law.• Not fully reviewed AVB TM, still high-quality: approx. 10 mi. words per language. Predominantly civil law.• Legal translations harvested by CrossLang, more diverse legal material: 7 mi. words per language
  10. 10. CrossLang automated test results• Best results from AVB + harvested data, AVB data weighted extra• Results particularly good in civil law domain (bulk of AVB input data)• Results improved dramatically for other legal domains by adding harvested data
  11. 11. AVB results in practice• Test done in CrossLang production assessment tool: productivity 5% higher for post-editing than human output (human output in this case very high >1000 w p/h, PE even higer)
  12. 12. AVB results in practice• Live rush translations done in past two weeks: • 1,500 word trial done for law firm needing high volume in very short time. Post-edited in 75 minutes. Customer happy with quality/price ratio. • 25,000 words in two days with moderate PE effort by two post-editors. Quality estimate 80-90% of human translation. • 4,500 words in 3 hours with almost full PE effort by one post-editor. Quality estimate >90% of human translation • 15,000 words in one day, done by two post-editors. Quality estimate 80-90% of human translation
  13. 13. AVB results in practice• Test and live project show great potential in two areas: • Producing usable translations very quickly and at 50-60% of normal translation cost. Margins are similar to normal translation, but likely to improve! • Higher productivity, ie lower production cost and increased margins.
  14. 14. CrossLang Gateway benefits• Standard Moses engine offers no high-level functions • Only plain text files, always sentence by sentence, experimental recasing, experimental tag handling• CrossLang Gateway offers Java service layer (not wrapper scripts) • Most common file formats: Word, XML, XLIFF, • Adjustable text segmentation • Hardened, aligment-based tag handling • Advanced recasing tool based on alignment data • Named entity recognition & (re)tokenization • Terminology checking and replacementGateway features crucial to processing our material properly
  15. 15. Conclusions• Developing a good engine is not an “out of the box” task• Sufficient high-quality data is necessary for good results• Results are very promising, our objectives can be achieved• Working with a value added partner is recommended• Need to integrate MT solution in translation workflow apparent
  16. 16. Phone: +31 20 645.66.10Mobile: +31 625.025.475E-mail: joel.sigling@avb.nlTwitter: @JoelAVBAdres: Ouderkerkerlaan 50 1185 AD Amstelveen The NetherlandsWebsite: www.avb.nl

×