Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AI for Translation Technologies and Multilingual Europe

231 views

Published on

Georg Rehm. AI for Translation Technologies and Multilingual Europe. DG TRAD Conference - Translation Services in the Digital World: A Sneak Peek into the (near) Future. Luxembourg. October 16/17, 2017.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

AI for Translation Technologies and Multilingual Europe

  1. 1. Georg Rehm georg.rehm@dfki.de DFKI GmbH, Language Technology Lab – Berlin, Germany META-NET, General Secretary AI for Translation Technologies and Multilingual Europe
  2. 2. Outline • Artificial Intelligence • Technology Support for Multilingual Europe • European MT Research – Results from QT21 • Connecting Europe Facility – Automated Translation • Towards the Human Language Project • Conclusions 2EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  3. 3. Artificial Intelligence 3EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  4. 4. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 4
  5. 5. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 5
  6. 6. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 6
  7. 7. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 7
  8. 8. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 8 Data Intelligence Current breakthroughs based on Machine Learning (Deep Learning) Also still in use: symbolic, rule-based methods and systems Artificial Intelligence • Huge data sets + powerful algorithms + extremely fast hardware • Self-driving cars, robots, image recognition, machine translation • Enormous potential for disruptions in all sectors and areas
  9. 9. Technology Support for Multilingual Europe 9EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  10. 10. • Multilingualism is at the very heart of the European idea • 24 EU languages – all languages have the same status • Dozens of regional and minority languages as well as languages of immigrants and trade partners • Economic challenges: – If the DSM is not multilingual, there will be 20+ isolated markets – Language barriers are market barriers • Social and public challenges: – Empower all citizens to use their mother tongues – Enable cross-border, cross-lingual, cross-cultural communication – Provide multilingual digital public services – Restore trust in media (fake news debate, filter bubble issue etc.)
  11. 11. q 60 research centres in 34 countries (founded in 2010) Chair of Executive Board: Jan Hajic (CUNI) Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI) q Multilingual Europe Technology Alliance. 826 members in 67 countries (published in 2013) (31 volumes; published in 2012) T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
  12. 12. q Basque q Bulgarian* q Catalan q Croatian* q Czech* q Danish* q Dutch* q English* q Estonian* q Finnish* q French* q Galician q German* q Greek* q Hungarian* q Icelandic q Irish* q Italian* q Latvian* q Lithuanian* q Maltese* q Norwegian q Polish* q Portuguese* q Romanian* q Serbian q Slovak* q Slovene* q Spanish* q Swedish* q Welsh * Official EU languagehttp://www.meta-net.eu/whitepapers
  13. 13. MT English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support through LT Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh excellent English good Speech English good Dutch, French, German, Italian, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh excellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh weak or no support through LTexcellent ResourcesTextAnalytics
  14. 14. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) Important: even current state of the art technologies are far from being perfect! Important: 20+ European languages are severely under-supported and face the danger of digital extinction.
  15. 15. Excellent Good Moderate Fragmentary Weak/no support LanguageTechnologySupport MillionsofNativeSpeakers(Worldwide) Yiddish Welsh VlaxRomani Turkish Scots Romany Occitan Maltese Macedonian Luxembourgish Lithuanian Limburgish Latvian Icelandic Friulian Frisian Breton Bosnian Asturian Albanian Irish Croatian Serbian Hebrew Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English 0 50 100 150 200 250 300 350 400 Source: Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Váradi, Sabine Kirchmeier-Andersen, Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, and Sigve Gramstad. An Update and Extension of the META-NET Study “Europe's Languages in the Digital Age”. In Proceedings of the Workshop on Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era (CCURL 2014), Reykjavik, Iceland, May 2014.
  16. 16. European Machine Translation Research – Results from QT21 16EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  17. 17. Research & Innovation Action 2015-18 Coordinator:   Josef  van  Genabith (DFKI) 17
  18. 18. • Morphologically rich languages (De, Cz, Lv) • Under-resourced languages (Lv, Ro) • Quality Assessment: MQM – DQF • Learning from human feedback (APE) • Evaluation framework: WMT – Event series to present and discuss results from MT evaluations – Procedures: Automatic scoring (BLEU etc.) and human judgements (large number of human annotators) • Shared tasks (newspaper translation, quality estimation, metrics and automatic post-editing) QT21 is Improving Automatic Translation 18
  19. 19. Human Judgement Rankings 64 53 1 3First 66 65 3 6 First  + Second 2015 2016 2017 QT21 Best Online 19 WMT Newspaper Translation Task • En ó Cz • En ó De • En ó Fr • En ó Cz • En ó De • En ó Ro • En ó Cz • En ó De • En ó Lv
  20. 20. 0 5 10 15 20 25 30 35 40 En  -­>  De De  -­>  En En  -­>  Cz Cz  -­>  En QT21  improvement  in  the  last  12  months  vs.  online  systems QT21-­WMT-­2016 Online  WMT-­2017 QT21-­WMT-­2017 WMT 2016 System on WMT 2017 Data 20
  21. 21. • Data sets are the fuel for neural networks • QT21’s neural technologies define the state of the art • Ranked #1 in more than 80% of all tasks at WMT 2017 • Also predominantly ranked #1 at WMT 2016 • QT21 keeps commercial systems at a distance • Huge improvements on morphologically rich languages • MQM as a standard for quality evaluation Selected Results 21
  22. 22. Connecting Europe Facility: Automated Translation (CEF AT) 22EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  23. 23. Connecting Europe 23 • EU flagship goal: Establishing the Digital Single Market • Overcoming existing barriers – by creating an environment for digital services to flourish – by providing cross-border infrastructures and services. • Sectorial CEF Digital Service Infrastructures (DSIs) This also includesODRBRIS eHealth EESSI Citizens need to solve disputes online across borders Citizens and business partners need legal certainty when doing business cross-border Citizens need to have online access to their patient summary when abroad Citizens need to get to enjoy their social security seamlessly and online when abroad eProcurement Open Data e-Justice Cyber Security Safer Internet …
  24. 24. 24 • Technological CEF building blocks can be used by the different DSIs (e.g., eInvoicing, eSignature etc.) • Most important in this context: CEF eTranslation – Why? To help European and national public administrations exchange information across language barriers – How? By providing MT capabilities that will enable digital services (in particular all DSIs) to be multilingual. • CEF eTranslation builds on MT@EC • Guarantees confidentiality and security of translated data • è ELRC contract Connecting Europe Coordinator: Josef  van  Genabith (DFKI)
  25. 25. European Language Resource Coordination 2525 • Language resourcesCollect • Needs of public servicesIdentify • With the public sector in the identification of Language ResourcesEngage • With any technical or legal issuesHelp • Observatory for language resources across EuropeAct
  26. 26. What has been achieved? 0 20 40 60 80 100 120 140 160 Bi-­/Multilingual  Corpora Terminologies Monolingual  Corpuora LR contributions by type Status:  April  2017 • 225 language resources collected • More than 2 billion words in all EU official languages, Norwegian and Icelandic • Over 450,000 terms • More than 2 million translation units • More than 91 resources to be used by you!
  27. 27. ELRC for you 27 • ELRC-SHARE Repository – Access to, sharing and contribution of LRs – Access to tools and services catalogue (forthcoming) – http://www.lr-coordination.eu/resources • ELRC Technical and Legal Helpdesk – Support for potential data donors (phone, email) – http://www.lr-coordination.eu/helpdesk • ELRC On-site assistance – http://www.lr-coordination.eu/services
  28. 28. Current Developments 28EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  29. 29. • Multilingual Europe: our languages enjoy equal status yet digital extinction of the majority of EU languages is a very severe danger. • Language Technology Research and Innovation in Europe: World class research results (e.g., in QT21), strong SME base, thousands of LSPs; fragmentation; need for coordination. • Big need for high-quality Language Technologies: translation, personal assistants, multilingual DSM etc. (example: CEF). • AI: Important breakthroughs and massive investments in R&D and applications (mostly in US, Asia) – huge opportunity for Europe! • The European Language Challenge cannot be abandoned or outsourced! Ø Need for Language Technology made in Europe for Europe! Current Developments 29EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  30. 30. Towards the Human Language Project 30EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  31. 31. • STOA Workshop in European Parliament (January 2017): “Language equality in the digital age – towards a Human Language Project” • Human Language Project vision suggested in several presentations • STOA Study, published in March 2017, does recommend setting up the HLP Ø http://www.stoa.europarl.europa.eu/stoa/cms/home/workshops/language STUDY EPRS | European Parliamentary Research Service Scientific Foresight Unit (STOA) PE 581.621 Science and Technology Options Assessment
  32. 32. 32 • Goal: Deep Natural Language Understanding by 2030 • AI for Next Generation Language Technology • Large-scale EU funding programme for basic and applied research as well as innovation (10-15 years) • New breakthroughs for research, industry and society to foster a multitude of innovations. Artificial Intelligence including cognition, perception, vision, cross-modal, cross-platform, cross-culture, IoT etc. Machine Learning Language TechnologyKnowledge Technology EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies Human Language Project
  33. 33. • All official European and many additional languages • Broad coverage, HQ, high precision – across modalities, across platforms, across cultures • Collaboration between EU, EC, EP, Member States, research, industry, other stakeholders. • Basic and applied research, innovation, commercialisation • Policy change towards “LT-enabled multilingualism” • HQMT – overcome quality (and language) barriers, written and spoken, collaborate with human translators • Resources and technologies for all European languages 33EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies Human Language Project
  34. 34. http://www.cracker-project.eu • http://www.meta-net.eu Version 1.0 of the SRIA • Strategic Agenda: “Language Technologiesfor Multilingual Europe – Towards a Human Language Project” • Key recommendation: set up Human Language Project • Also: establish Multilingual Digital Single Market • Informed by “LT for Multilingual Europe” survey • Takes into account: CEF AT, DSM, NGI • To be presented at META-FORUM 2017 (13/14 Nov. 2017) 34
  35. 35. Summary & Conclusions • AI is disrupting all industries – including translation. • But: perfect machine translation is still far away. • Not only are tools for gist translation getting better and better, so are tools for human translators! • Translators can expect to make use of a vastly improved (adaptive) tool landscape in the next couple of years. • We are collaborating with human translators better to understand how translation processes work. • The goal of the Human Language Project is to move Europe into the pole position in this field. 35EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  36. 36. 13/14 November 2017 Brussels, Belgium – http://www.meta-forum.eu Register now! Participation is free of charge.
  37. 37. Thank you! Many thanks to Josef van Genabith, Christian Dugast, Andrea Lösch (all DFKI) and to Maria Giagkou (ILSP). Dr. Georg Rehm DFKI Berlin ! georg.rehm@dfki.de ! http://de.linkedin.com/in/georgrehm ! https://www.slideshare.net/georgrehm Human Language Project Truly Multilingual Europe European Economy (MDSM) Attractive jobs for high potentials Education and young researchers Massive boost for research Foster innovation and new companies 13/14 November 2017 Brussels, Belgium – http://www.meta-forum.eu Register now! Participation is free of charge.

×