Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pangeanic presentation at Elia Together Athens - Manuel Herranz


Published on

Our presentation at #Eliatogether in Athens was favored by many attendees. Will disintermediation be a force to reckon with in the translation industry as it has happened in the hotel and travel industries? What is the role of machine translation in all this? How does neural machine translation work?

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Pangeanic presentation at Elia Together Athens - Manuel Herranz

  2. 2. A few words about…Manuel Herranz Majored in mechanical engineering & languages (UK) Joined Giddings & Lewis - Ford Valencia / Chihuahua 1993 - 1996 Rolls Royce Marine & Industrial Spain / Argentina 1997-98 and 2000 Joined B.I Corporation Japan 1996-2004 Friendly buy-out 2005: Pangeanic
  3. 3. What we do as an industry Fight in a segmented market Enable international business Help people /organizations to communicate Innovate Differentiate? At what speed? Really?
  4. 4. How is your target market setup?
  5. 5. More importantly… Are you ahead of the game? 95% LSCs have no iSEO strategy (beyond having a bilingual website) becau translation is expensive. Do you invest in the product you sell? 80% have no national SEO strategy 50% apply /adopt MT (only 25% have MT embedded in their systems / custom engines) 10% have centralized TM system to leverage past content. Most operate with hundreds of TMs in a server.
  6. 6. Business model in 5 years? Disintermediation – these companies re-invented business models Or offer added value vs LSCs become language recruitment agencies/ HHRR specialists? Success: Basic business proposition + value
  7. 7. Industry revenues more than doubled from ~$19B in 2005 to ~$40B in 2016. [Common Sense Advisory data] Translation buyers worry about ever-growing content volumes and more language pairs – but with stable or shrinking budgets. Management expects Amazon, Microsoft or Google Translate will take care of “language problems” one day less complexity, lower cost. The US Census shows that translation workers doubled since 2008. [Slator, May 24, 2017] Automation: project management value replaced by bots? The number of working translators on LinkedIn has increased by 50+% since 2010 [LinkedIn data] Market forces squeeze mid-sized companies from both ends: large can offer economies of scale. Small are specialists, niche or local. Business model in 5, 4, 3 years…?
  8. 8. Disintermediation / Direct clients Satisfying the 5Bn searches/day and increasing demand for cheaper language services TM+MT leveraging, CAT agnostic, inexpensive tools When Google was founded in September 1998, it was serving ten thousand search queries per day (by the end of 2006 that same amount would be served in a single second) Affordable Efficiency areas and growth hacks
  9. 9. Centralised TMs for advanced leveraging (ActivaTM) The web as a sales tool (SEO, SEM) online ordering system (Cor). Machine Translation: SMT -> NMT … some astounding results. Winning iADDATPA: largest EC MT infrastructure project linking MT vendors to Public Administrations. Efficiency areas and growth hacks The Pangeanic Experience
  10. 10. The Database Elastic Search-based All language assets in one database, irrespective of tool that created them Deep learning for tag handling CAT-tool agnostic (solves interoperability issues) Automatic fuzzy match repair Matrix (triangulate to create new language pairs) Statistics on all segment units, words, domains Remote access, API Pre-filter prior to MT (TM+MT) More powerful (strict) fuzzy matchin g than traditional CAT-tools Saved +14% in fuzzy matches
  11. 11. The Database
  12. 12. National research project with EU funding Full platform Use by Pangeanic, LSPs, 3rd parties Eases estimation and automates workflows in any translation format (doc or web) CMS agnostic – extracts text and converts to xliff (doc or web) Translate sections of a web only (batches) Detect new content or content that has been eliminated to update language versions The Web
  13. 13. The Web
  14. 14. The Web
  15. 15. Neural Machine Translation - background Artificial Neural Networks for SMT History of ANN-based Machine Translation and Language Modelling for SMT: 1997 [Castano & Casacuberta 97] (JAUME I & U.Politécnica): Machine translation using neural networks and finite-state models (PangeaMT: areas/mt-showcase) 2007 [Schwenk & Costa-jussa 07]: Smooth bilingual n-gram translation. 2012 [Le & Allauzen 12, Schwenk 12]: Continuous space translation models with neural networks. 2014 [Devlin & Zbib 14]: Fast and robust neural networks for SMT Conventional SMT Use of statistics has been controversial in computational linguistics: Chomsky 1969: ... the notion ’probability of a sentence’ is an entirely useless one, under any known interpretation of this term. Considered to be true by most experts in (rule- based) natural language processing and artificial intelligence History of Statistical Approach to MT 1989-94: IBM’s pioneering work since 1996: only a few teams favored SMT: U.Politécnica Valencia, RWTH Aachen, HKUST, CMU 2006/2007 Google Translate 2006-2012 Euromatrix 2009: PangeaMT 2016: First trials in NMT 2017: European Commission: iADDATPA project
  16. 16. CMS 2 CMS 1Tilde MT Pangeanic KantanMT AT systems IADAAPTA Platform (cloud vs on-premise) CKAN Widget browser eTranslation - Requests: Supporting both synchronous and asynchronous requests - Many IADAAPTA deployments are possible. - A global instance register is kept by commercial partner. - send translation request - receive webhook - Ask for “request done” Priority Admin Lang / Q router User management/ Profile BACKOFFICE: - Global (instance management) - Individual (for each instance) - AT systems receive webhook - Ask for “content request” Documents (proprietary formats) Conversion e-Sens AS4 Profile Complia nt) e-Sens AS4 Profile Complia nt) Prompsit iADAATPA:
  17. 17. Neural Machine Translation – Will it work? Text out Text in
  18. 18. This is more realistic: MT in the wild, wild, wild world Quite an accurate workflow when integrating MT at a company MT engine
  19. 19. Neural Machine Translation – Is there a future for translation services? Machine translation will displace only those humans who translate like machines. (The remaining) translators will focus on tasks that require intelligence. - Arle Lommel, 2012
  20. 20. Neural Machine Translation Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT fluency On average: from a set of 250 sentences, around 85%-92% were good or very good (A or B). ES/PT/IT results similar to FR Evaluation: Translation companies and professional freelance translators EN-DE set of 250 sentences NMT SMT A 132 53% 34 14% B 98 39% 95 38% C 14 6% 97 39% D 6 2% 24 10% 250 250 EN-FR set of 250 sentences NMT SMT A 150 60% 39 16% B 76 30% 126 50% C 21 71 28% D 3 14 6% EN-RU set of 250 sentences NMT SMT A 128 51% 39 16% B 84 34% 43 17% C 22 9% 60 24% D 16 6% 108 43% 250 250
  21. 21. Neural Machine Translation Class “CAT” is selected as it got the highest value
  22. 22. Feed Forward Neural Machine Translation
  23. 23. Feed Forward Neural Machine Translation Training set Test Reference translation Out of which we take 2000 sentences to try the system with in- domain text (a typical sentence the system may encounter in the future) Remove any protocol configuration files that are not used for the specified protocol . These tables are sometimes referred to as " no sync " tables . This chapter will describe many of those pages and parameters .
  24. 24. Feed Forward Neural Machine Translation Error function (detects “wrong match”) Input Query Label (data we already know) Output Update function, ie (the “learning process”) New Weights (W) + New Bias (B) And after many, many training sessions detecting patterns, trial and errors and feedback loops…
  25. 25. Feed Forward Neural Machine Translation Label (data we already know) Output Error function (detects “wrong match”) Input Query No Update !!! (“learning process” completed) Now we have a system!!!! Input Queries Output Labels 80%-85% accuracy!!
  26. 26. Recurrent NMT + Attention Models Attention models tell the system which encoder states to look at a good and sound agreement un buen y sólido acuerdo un buen y sólido acuerdo <s> <s>
  27. 27. Neural Machine Translation – Rates 30% price drop 40%-75% salary increase for post-editors
  28. 28. Open Questions Are you working in the same way as 5 years ago? Do you think you will be working in the same way in 2023? Translation companies will remain providing translation services only? New business models: offer translation order automation (management systems), disintermediation, raw MT services? Will large translation companies consolidate and dominate globally? Can new players emerge with the right tools, selling globally? Is translation company-to-translation company selling a viable model?
  29. 29. Thank you! Manuel Herranz / Twitter: manuelhrrnz