Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards a Human Language Project for Multilingual Europe: AI and Interpretation

101 views

Published on

Georg Rehm. Towards a Human Language Project for Multilingual Europe: AI and Interpretation. DG Interpretation Conference - Interpretation: Sharing Knowledge & Fostering Communities. European Commission, Brussels, April 2018. April 19/20, 2018. Invited talk.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Towards a Human Language Project for Multilingual Europe: AI and Interpretation

  1. 1. Georg Rehm German Research Center for Artificial Intelligence (DFKI) GmbH Language Technology Lab – Berlin, Germany META-NET, General Secretary georg.rehm@dfki.de Towards a Human Language Project for Multilingual Europe AI and Interpretation
  2. 2. Artificial Intelligence SCIC Universities Conference (19/20 April 2018) 2/12
  3. 3. SCIC Universities Conference (19/20 April 2018) 3
  4. 4. SCIC Universities Conference (19/20 April 2018) Data Intelligence Current breakthroughs based on Machine Learning (“Deep Learning”) Also still in use: symbolic, rule-based methods and systems Artificial Intelligence • Huge data sets + powerful algorithms + extremely fast hardware • Enormous potential for disruptions in all sectors and areas 4
  5. 5. META-NET and Multilingual Europe SCIC Universities Conference (19/20 April 2018) 5/12
  6. 6. • Multilingualism is at the heart of the European idea • 24 EU languages – all have the same status • Dozens of regional and minority languages as well as languages of immigrants and trade partners • Many economic and social challenges: – The Digital Single Market needs to be multilingual – Cross-border, cross-lingual, cross-cultural communication
  7. 7. ! 60 research centres in 34 countries (founded in 2010) Chair of Executive Board: Jan Hajic (CUNI) Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI) ! Multilingual Europe Technology Alliance. 826 members in 67 countries (published in 2013) (31 volumes; published in 2012) T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
  8. 8. ! Basque ! Bulgarian* ! Catalan ! Croatian* ! Czech* ! Danish* ! Dutch* ! English* ! Estonian* ! Finnish* ! French* ! Galician ! German* ! Greek* ! Hungarian* ! Icelandic ! Irish* ! Italian* ! Latvian* ! Lithuanian* ! Maltese* ! Norwegian ! Polish* ! Portuguese* ! Romanian* ! Serbian ! Slovak* ! Slovene* ! Spanish* ! Swedish* ! Welsh * Official EU languagehttp://www.meta-net.eu/whitepapers
  9. 9. MT English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support through LT Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh excellent English good Speech English good Dutch, French, German, Italian, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh excellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh weak or no support through LTexcellent ResourcesTextAnalytics
  10. 10. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
  11. 11. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) Important: even current state of the art technologies are far from being perfect!
  12. 12. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) Important: 20+ European languages are severely under-supported and face the danger of digital extinction.
  13. 13. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) We carried out the study in 2010/2012. While support for many languages has improved in the meantime, the overall picture remains mostly the same.
  14. 14. AI and Interpretation SCIC Universities Conference (19/20 April 2018) 14/12
  15. 15. • Since approx. 2015, with breakthroughs in neural technolo- gies, Machine Translation has been getting better and better. • All areas of AI look for “super-human performance” but language is fundamentally different and much more complex. • Neural AI approaches cannot understand language, they process it according to huge underlying data sets. • In many use cases, mistakes can be tolerated. • But: translation and interpretation are often mission-critical! • Mistakes can have serious consequences (politics, medicine). Translation and Interpretation SCIC Universities Conference (19/20 April 2018) 15
  16. 16. • Example: Lecture Translator – University lectures are automatically transcribed and translated, in near-real time, into several languages – Students can follow the translation through a web interface • Example: Presentation Translator – Presenter can have the speech automatically translated – Translations are displayed as subtitles • Example: Call Translator – Internet telephony provider offers automatic voice translation Speech Translation SCIC Universities Conference (19/20 April 2018) 16
  17. 17. • The three example applications work surprisingly well for general-domain language and input. But: – They are far from being perfect. – They aren’t robust. – They cannot cope with unforeseen situations. – They cannot understand language as humans do. – They are not (yet?) suited for conference interpretation. ! Limitations as regards their fields of application. • Interpretation is often mission-critical. ! Human interpreters won’t be replaced anytime soon. Issues and Limitations SCIC Universities Conference (19/20 April 2018) 17
  18. 18. SCIC Universities Conference (19/20 April 2018) 18 https://slator.com/features/ai-interpreter-fail-at-china-summit-sparks-debate-about-future-of-profession/
  19. 19. Human Language Project SCIC Universities Conference (19/20 April 2018) 19/12
  20. 20. • LT in Europe: World class research, strong SME base, thousands of LSPs; immense fragmentation; need for coordination. • Need for High-Quality LT: translation, interpretation, MDSM etc. • The European Language Challenge cannot be – it must not be – abandoned or outsourced! ! Need for Language Technology, made in Europe, for Europe! ! STOA Workshop in the EP (January 2017): “Language equality in the digital age – towards a Human Language Project” LT – Current Developments SCIC Universities Conference (19/20 April 2018) 20 STUDY EPRS | European Parliamentary Research Service Scientific Foresight Unit (STOA) PE 581.621 Science and Technology Options Assessment
  21. 21. • Goal: Deep Natural Language Understanding by 2030 • Vision: EU FET Flagship Project (10+ years) • Broad coverage, high quality, high precision • Create approaches, algorithms, data sets, resources • Across modalities: text, text types, speech, video etc. Artificial Intelligence including cognition, perception, vision, cross-modal, cross-platform, cross-culture etc. Machine Learning Language TechnologyLinguistics SCIC Universities Conference (19/20 April 2018) Human Language Project 21
  22. 22. Summary & Conclusions • AI is disrupting all industries – including translation and, increasingly, also interpretation. ! But: perfect, robust, precise language technologies (incl. written/spoken MT and interpretation) are still far away. • Linguists are increasingly needed – new profiles emerging ! The machine will support human experts and help them become more efficient – it will not replace them. • The Human Language Project is still a vision. Its goal: develop new breakthroughs in Language Technology. SCIC Universities Conference (19/20 April 2018) 22
  23. 23. Recommendation • SCIC Speech Repository • 4,000 speeches (3,000 public + 1,000 private) • Extremely interesting data set and language resource for Language Technology researchers! • Many R&D groups currently work on TED talk data sets • Recommendation: establish bridges between SCIC and research groups for spoken language translation • Help build the next generation of AI tools for interpreters • AI tools that are tailored to the needs and wishes, topics and domains of conference interpreters in the EC/EP SCIC Universities Conference (19/20 April 2018) 23
  24. 24. Thank you! Dr. Georg Rehm DFKI Berlin georg.rehm@dfki.de http://de.linkedin.com/in/georgrehm https://www.slideshare.net/georgrehm SCIC Universities Conference (19/20 April 2018) 24 Strategic Research and Innovation Agenda Language Technologies for Multilingual Europe Towards a Human Language Project SRIA Editorial Team Version 1.0 – December 2017

×