Successfully reported this slideshow.
Your SlideShare is downloading. ×

Fantastic MT Engines and Where to Find Them

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 13 Ad
Advertisement

More Related Content

More from Konstantin Savenkov (20)

Recently uploaded (20)

Advertisement

Fantastic MT Engines and Where to Find Them

  1. 1. Intento 1 BEST vs. FIT TO PURPOSE © Intento, Inc. / November 2019 linguistically best for my language pair MT with a proper data protection and retention policy, proper level of custom terminology support, which trains on my linguistic assets with good ROI per my business goals, and works well with my source data quality, format and content type according to my subject matter experts vs.
  2. 2. Intento 2 BEST vs. FIT TO PURPOSE © Intento, Inc. / November 2019 linguistically best for my language pair MT with a proper data protection and retention policy, proper level of custom terminology support, which trains on my linguistic assets with good ROI per my business goals, and works well with my source data quality, format and content type according to my subject matter experts vs. on-the-fly MT routing based on the historical data automated procurement and vendor management
  3. 3. Intento 3 PRACTICAL APPROACH © Intento, Inc. / November 2019 1 2 3 4 Select candidate MT systems Improve with your data assets Automated scoring Human-assisted scoring
  4. 4. Intento 1. SELECT CANDIDATE ENGINES 4© Intento, Inc. / November 2019 GENERIC STOCK MODELS Alibaba Amazon Baidu DeepL eBay Google GTCom IBM Kakao Microsoft Mirai ModernMT Niutrans Naver Omniscien PROMT Rozetta SAP SDL Sogou Systran Tencent Tilde Yandex VERTICAL STOCK MODELS CUSTOM TERMINOLOGY SUPPORT AUTO DOMAIN ADAPTATION MANUAL DOMAIN ADAPTATION Youdao Alibaba Baidu Cloud Translate Iconic Microsoft Omniscien PROMT SAP Systran Amazon Baidu Google IBM Iconic Microsoft Rozetta SDL Systran Globalese Google IBM Kantan Microsoft ModernMT Omniscien SDL Systran Alibaba Baidu Cloud Translate Iconic Omniscien PangeaMT Prompsit PROMT SDL Systran Tilde Yandex Yandex Standalone commercial MT products with an API. All product names, trademarks and registered trademarks are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.
  5. 5. Intento 5 business requirements — language pair — domain — available language assets — tag and format support — cost of ownership © Intento, Inc. / November 2019 July 2018 January 2019 WHAT MAKES THE DIFFERENCE?
  6. 6. Intento 6 2. IMPROVING ENGINES © Intento, Inc. / November 2019 data cleaning TM training glossaries sentence scores 40-60% of “live” TM is not suitable for MT — linguistic glossaries need to be “compiled” for MT
  7. 7. Intento 7 2. IMPROVING ENGINES © Intento, Inc. / November 2019 data cleaning TM training glossaries sentence scores Different data volume and data quality requirements Different performance of baseline models July 2018 January 2019
  8. 8. Intento 8 2. IMPROVING ENGINES © Intento, Inc. / November 2019 data cleaning TM training glossaries sentence scores corpus scores are not actionable — sentence scores help linguists to focus
  9. 9. Intento 9 PRACTICAL APPROACH © Intento, Inc. / November 2019 1 2 3 4 Select candidate MT systems Improve with your data assets Automated scoring Human-assisted scoring
  10. 10. Intento CORPUS SCORES TO FIND TOP-RUNNERS 10© Intento, Inc. / November 2019 lack of correlation indicates certain types of errors — statistically significant rapid drop-off identifies top-runners
  11. 11. Intento SENTENCE SCORES TO HELP REVIEWERS 11© Intento, Inc. / November 2019 hard show NMT training flaws — controversial expose NMT quirks — easy to check how high scores are correlated with quality — typical to measure PE effort typical
  12. 12. Intento 4. HUMAN-ASSISTED SCORING 12© Intento, Inc. / November 2019 Depends on the purpose! Linguistic Quality Assessment — Post-Editing Tracking — A/B testing — WTF-score
  13. 13. Intento DIFFERENT SCENARIOS - DIFFERENT CHOICES (even for the same language pair!) 13© Intento, Inc. / November 2019 PEMT / LSP — PEMT / Individual — Cross-Language Analysis and Retrieval (think eDiscovery) — Large-Scale Raw MT (think eCommerce) — Customer Support (think Global B2C) — Gisting and Inbound Content (think translation portals) — Large Enterprise — Government and Regulated Industries

×