Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

European Language Technologies – Past, Present and Future

40 views

Published on

Georg Rehm. European Language Technologies – Past, Present and Future. Language Equality in the Digital Age. Conference on language technologies and digital equality in a multilingual Europe, European Parliament, Brussels, Belgium, September 2018. September 27, 2018. Invited talk

Published in: Science
  • Be the first to comment

  • Be the first to like this

European Language Technologies – Past, Present and Future

  1. 1. Georg Rehm German Research Center for Artificial Intelligence (DFKI) GmbH Language Technology Lab – Berlin, Germany georg.rehm@dfki.de European Language Technologies: Past, Present and Future
  2. 2. PAST Language Equality in the Digital Age (27 September 2018) 2
  3. 3. Language Technology • EU saw the relevance of LT as a driver for European unity already in the late 1970s (EUROTRA, 1978-1992) • Some funding for LT topics in FP6, FP7, Horizon 2020 • In H2020, LT was dropped for a while (now it’s back) • Many state-of-the-art results (e.g., Moses, NMT) • Europe has always been a leader in the area • A few large-scale national projects (Verbmobil, Quaero) • In recent years, all tech giants based in the US and Asia have been setting up their own AI/LT research groups • … that increasingly dominate the field, both scientifically and also as a magnet for young high potentials Language Equality in the Digital Age (27 September 2018) 3
  4. 4. PRESENT Language Equality in the Digital Age (27 September 2018) 4
  5. 5. • Multilingualism is at the heart of the European idea • 24 EU languages – all have the same status • Dozens of regional and minority languages as well as languages of immigrants and trade partners • Many economic, social and technical challenges: – The Digital Single Market needs to be multilingual – Cross-border, cross-lingual, cross-cultural communication – Additional challenges in the recent EP resolution “Language equality in the digital age” P8_TA-PROV(2018)0332
  6. 6. ! 60 research centres in 34 countries (founded in 2010) Chair of Executive Board: Jan Hajic (CUNI) Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI) ! Multilingual Europe Technology Alliance. 900+ members in 67 countries (published in 2013) (31 volumes; published in 2012) T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
  7. 7. ! Basque ! Bulgarian* ! Catalan ! Croatian* ! Czech* ! Danish* ! Dutch* ! English* ! Estonian* ! Finnish* ! French* ! Galician ! German* ! Greek* ! Hungarian* ! Icelandic ! Irish* ! Italian* ! Latvian* ! Lithuanian* ! Maltese* ! Norwegian ! Polish* ! Portuguese* ! Romanian* ! Serbian ! Slovak* ! Slovene* ! Spanish* ! Swedish* ! Welsh * Official EU languagehttp://www.meta-net.eu/whitepapers
  8. 8. MT English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support through LT Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh excellent English good Speech English good Dutch, French, German, Italian, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh excellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh weak or no support through LTexcellent ResourcesTextAnalytics
  9. 9. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) Important: even current state of the art technologies are far from being perfect!
  10. 10. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) Important: 20+ European languages are severely under-supported and face the danger of digital extinction.
  11. 11. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) We carried out the study in 2011/2012. While support for many languages has improved in the meantime, the overall picture remains mostly the same.
  12. 12. Excellent Good Moderate Fragmentary Weak/no support LanguageTechnologySupport MillionsofNativeSpeakers(Worldwide) Yiddish Welsh VlaxRomani Turkish Scots Romany Occitan Maltese Macedonian Luxembourgish Lithuanian Limburgish Latvian Icelandic Friulian Frisian Breton Bosnian Asturian Albanian Irish Croatian Serbian Hebrew Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English 0 50 100 150 200 250 300 350 400 Source: Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Váradi, Sabine Kirchmeier-Andersen, Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, and Sigve Gramstad. An Update and Extension of the META-NET Study “Europe's Languages in the Digital Age”. In Proceedings of the Workshop on Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era (CCURL 2014), Reykjavik, Iceland, May 2014.
  13. 13. LT – Current Developments • Multilingual Europe: our languages enjoy equal status yet digital extinction of the majority of EU languages is a very severe danger. • LT Research and Innovation in Europe: World class research results (e.g., QT21), strong SME base; fragmentation; need for coordination. • Digitisation of Europe: Big need for HQ Language Technologies. • AI: Important breakthroughs and massive investments in R&D and applications (mostly in the US and Asia) – huge opportunity for Europe! • The European language challenge cannot be abandoned or outsourced! ! STOA Workshop and Report: “Language equality in the digital age: towards a Human Language Project” ! EP Resolution “Language equality in the digital age” P8_TA-PROV(2018)0332 ! Huge need for Language Technology made in Europe for Europe! Language Equality in the Digital Age (27 September 2018) 13
  14. 14. FUTURE Language Equality in the Digital Age (27 September 2018) 14
  15. 15. ?
  16. 16. European Language Grid Language Equality in the Digital Age (27 September 2018) 16
  17. 17. Horizon 2020 Call ICT-29-2018 • “A multilingual Next Generation Internet” (budget: 25M€) • Technology-enabled multilingualism for an inclusive DSM • Expected impacts: – Provide better access to quality resources and tools – Increase quality and coverage of multilingual solutions used by industrial players – Increase the uptake of LT in Europe in various sectors – Cost savings for private and public sector users of LT • Topic a) – European Language Grid – One Innovation Action, 7M€ – Winning proposal: ELG – European Language Grid • Topic b) – Domain-specific/challenge-oriented HLT – Six Research and Innovation Actions, approx. 3M€ each Language Equality in the Digital Age (27 September 2018) 17
  18. 18. Language Equality in the Digital Age (27 September 2018) 18 ELGELG – The Primary Platform for Language Technology in Europe Web Interface APIs European Language Grid – Content Catalogue LT Services, Tools, Components, Technologies Language Resources and Data Sets Organisations, Languages, Service Types etc. Cloud Infrastructure • Development of a functional language technology cloud platform for Europe • Market place for European LT business space (directory of stakeholders) • Hundreds of LT services and resources – easy-to-use and easy-to-integrate • Many different technologies for all European languages • Evaluation through 15-20 pilot projects feeding back into the platform • 30+ national competence centres will be set up for a strong European network • Services and resources can be made available by the community • Boosting the emerging Multilingual Digital Single Market • Interoperability of services through containerisation • Towards a thriving and flourishing European LT community Consortium • DFKI GmbH (Coordinator) (DE) • ILSP, R.C. “Athena“ (GR) • University of Sheffield (UK) • Charles University (CZ) • ELDA (FR) • Tilde (LV) • SAIL LABS GmbH (AT) • Expert System Iberia (ES) • University of Edinburgh (UK) 2019–2021
  19. 19. The Grand Vision: Human Language Project Language Equality in the Digital Age (27 September 2018) 19
  20. 20. • Goal: Deep Natural Language Understanding by 2030 • All official European and many additional languages • Broad coverage, high quality, high precision • Create new approaches, algorithms, data sets • Across modalities: text, text types, speech, video etc. • Across platforms: messaging, telephony, social, mobile, IoT, robots, smart devices, conversational technologies etc. • Across cultures: knowledge, customs, formalities, humour, emotion, subjectivity, biases, opinions, filter bubble etc. • How? As the next EU FET Flagship Project! Language Equality in the Digital Age (27 September 2018) Human Language Project 20
  21. 21. HLP Prep Proposal • FET Flagship Projects: Up to 1B€ of funding for up to ten years • Flagships: Human Brain Project, Graphene, Quantum (2019) • H2020 FETFLAG-01-2018 called for preparatory actions, i.e., small (1M€) projects to prepare the full flagship proposal • Our proposal: “Human Language Project Preparation” • Consortium of 16 partners (coordinated by DFKI) • Second stage proposal submitted on 18 September 2018 • Results to be announced at ICT 2018 in Vienna (December) • More information at http://human-language-project.eu Language Equality in the Digital Age (27 September 2018) 21
  22. 22. Language Equality in the Digital Age (27 September 2018) 22 The following national and federal ministries support the HLP Prep proposal: • Flemish Department of Economy, Science and Innovation • Ministry of the Interior of the Republic of Cyprus • Czech Ministry of Education, Youth and Sports • Danish Ministry of Culture; French Ministry of Culture • German Federal Ministry for Economic Affairs and Energy • Ministry of Economic Affairs, Labour and Housing Baden-Württemberg • Ministry of Education, Research and Religious Affairs Centre for the Greek Language • Ministry of Tourism, Industry and Innovation (Iceland) • Department of Culture, Heritage and the Gaeltacht (Ireland) • Culture Information Systems Centre of the Republic of Latvia • Ministry of Culture (Slovenia) • Culture Information Systems Centre of the Republic of Latvia • Ministry of Public Administration (Slovenia) • Ministry of Education, Science and Sport (Slovenia) • Secretary of State for Information Society and Digital Agenda (Spain) Summary of the support for HLP Prep – We have received 375+ letters of support from the following stakeholders, covering all 28 EU Member States: • 16 national and federal ministries • 24 national language institutions and related organisations • 135 research organisations (including universities and research centres) • 136 companies (including 41 Language Service Providers)
  23. 23. 23 The HLP is a large-scale, long- term research and development and innovation programme, in which basic and applied research and development, as well as innovation and commercialisation work closely together to develop ground-breaking technologies for Deep Natural Language Under- standing by the year 2030. Among the fields whose collaboration we foresee in the HLP are the following: Primary fields: • Computational Linguistics & LT • Linguistics • Artificial Intelligence • Knowledge Technologies Secondary fields: • Social Sciences, DH • Computer Science • Cognitive Science Human Language Project Create and sustain a truly multilingual European society without language barriers Define the state of the art in Language Technologies and Language- centric Artificial Intelligence Boost growth of Europe’s economy via the Multilingual Digital Single Market Develop new talents and skills and attractive, sustainable jobs Make Europe the global leader for innovative Language Technologies Foster innovation, new ideas, new companies and business models
  24. 24. Language Equality in the Digital Age (27 September 2018) 24 Human Language Project The wider European research, development and innovation ecosystem around the Human Language Project
  25. 25. HLP Core Project • Coordination of the flagship (CP and PPs) • Continuous roadmap development • General technology and algorithm development • Digital data, resources, computing and collaboration infrastructure HLP Partnering Projects • Language-specific and/or regional consortia doing research on their own languages • Close collaboration with the Core Project • Overlap between CP and PP in terms of partners HLP Partnering Project: Spanish PP PP HLP Partnering Project: Italian HLP Partnering Project: Greek HLP Partnering Project: German HLP Partnering Project: Polish HLP Partnering Project: Baltic languages PP HLP Core Project HLP Partnering Project: Dutch PP
  26. 26. EP Resolution “Language equality in the digital age” P8_TA-PROV(2018)0332 Language Equality in the Digital Age (27 September 2018) 26
  27. 27. “Language equality” Resolution • Very important initiative and document for Europe! • A multitude of relevant suggestions & recommendations: – Current obstacles to achieving language equality in the digital age in Europe (12 items) – Improving the institutional framework for language technology policies at EU level (12 items) – Recommendations for EU research policies (9 items) – Education policies to improve the future of language technologies in Europe (6 items) – Language technologies: benefits for both private companies and public bodies (6 items) • ELG and HLP (Prep) address many of these 45 items! Language Equality in the Digital Age (27 September 2018) 27
  28. 28. Language Equality in the Digital Age (27 September 2018) 28 ELG HLP CEF Recommendations for EU research policies 25 Establish large-scale, long-term LT funding programme X 26 ICT integrators should be given economic incentives for LT X 27 Europe has to secure its leadership in language-centric AI X 28 EU funding programmes should boost LT basic research X 29 Create a European LT platform for sharing of services X 32 Set up LT financing platform; emphasise R&D in Deep NLU X X Education policies to improve the future of language technologies in Europe 34 Retain talent in Europe; joint action at European level X 39 Member States to provide support for educational institutions X Language technologies: benefits for both private companies and public bodies 40 Develop investment instruments and accelerator programs X 41 Enable and empower European SMEs to use LTs X 43 Develop multilingual public e-services X
  29. 29. Summary & Conclusions ! Europe is in dire need of sophisticated and robust Language Technologies for its specific demands ! Not only for the 24 official EU languages but dozens more! ! Europe is in a unique position to lead the quest for technologies for Deep Natural Language Understanding ! … and to benefit massively from them (society, economy) ! ELG: first step towards a functional European LT Platform ! Need for a coordinated & concerted push in basic research, applied R&D and innovation – danger to lose touch! ! Let’s set up the HLP as the next EU FET Flagship Project for game-changing new breakthroughs in LT! Language Equality in the Digital Age (27 September 2018) 29
  30. 30. Thank you! Dr. Georg Rehm DFKI Berlin ! georg.rehm@dfki.de ! @georgrehm ! http://georg-re.hm ! http://de.linkedin.com/in/georgrehm Language Equality in the Digital Age (27 September 2018) 30 Human Language Project Create and sustain a truly multilingual European society without language barriers Define the state of the art in Language Technologies and Language- centric Artificial Intelligence Boost growth of Europe’s economy via the Multilingual Digital Single Market Develop new talents and skills and attractive, sustainable jobs Make Europe the global leader for innovative Language Technologies Foster innovation, new ideas, new companies and business models

×