Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Celtic language technologies in the digital age


Published on

Cyflwyniad rhoddwyd gan John Judge ar dechnolegau iaith o safbwynt y Wyddeleg yn ystod y gynhadledd Trwy Ddulliau Technoleg 2015.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Celtic language technologies in the digital age

  1. 1. Celtic Language Technologies in the Digital Age John Judge, Adapt Centre, DCU
  2. 2. www.adaptcentre.ieBackground on Me • Background: Computational Linguist – research and real world • Interests in: Natural Language Processing, Text Analytics, Machine Translation, … • National Centre for Language Technology • Research Integration Coordinator for the ADAPT Centre of Excellence for Digital Content and Media Innovation • Focus on EU collaborations • META-NET • QT LaunchPad • LT Web • FALCON • Mli • QT21 • TraMOOC • EXPERT
  3. 3. www.adaptcentre.ieADAPT Centre • ADAPT Science Foundation Ireland Direct Funding over six years (until 2020) • Academic/Industry partnership built on top of CNGL • Five research themes • Six application areas • TCD and DCU co-leads; UCD and DIT partners • Open ended number of industry partners
  4. 4. www.adaptcentre.ieADAPT Centre E-Commerce Financial E-LearningLife SciencesICT Localisation Content & Media Entertainment
  5. 5. www.adaptcentre.ieGlobal Digital Content: Platform Research
  6. 6. www.adaptcentre.ieAmbitious Metrics for Success 13 Spin Out Companies €5m Commercialisation Awards 1,650 Top Quality Publications €110m Won in Total Competitive Research 500 Jobs €9m From Commercial Sources 60 Major EU Initiatives 200 Postgraduate Students 88 Licence Agreements
  7. 7. www.adaptcentre.ieAgus Gaeilge…? How much of all of this relates to Irish?
  8. 8. Language Technology
  9. 9. www.adaptcentre.ieLanguage Technology and Applications
  10. 10. www.adaptcentre.ieLT is not… • Localised Software • A website in your language • A static online dictionary But these are all VERY valuable resources for a language! …and can form part of a healthy LT ecosystem
  11. 11. www.adaptcentre.ieWhat is LT – Where I’m coming from • Technology for processing information (speech, text, gestures,…) in a given language • An enabling technology • Added intelligence to both content (creation, management/etc) and HCI • Set of tools and resources – part of a bigger picture and a larger ecosystem • Interactive • Not monolithic resources
  12. 12. www.adaptcentre.ieIt’s already right under your noses • These concepts (and some others) already being used for a wide range of applications • Marketing/Brand awareness • Customer Sentiment Analysis • Political barometers (Obama) • Information analysis and extraction (IBM Watson) • Offensive content filtering • Security applications
  13. 13. A look at the Irish LT perspective
  14. 14. www.adaptcentre.ieLT Landscape in Ireland • Historically strong in Translation and Localisation industry • Home to several internationally recognised research centres • NCLT • DERI • CNGL >>> ADAPT • INSIGHT • Government funding for research has been consistent despite worsening economic conditions
  15. 15. www.adaptcentre.ieLT for Irish • Many of the basics are covered • Spell checker • Grammar checking • T9 predictive text, smartphone predictive text (through additional software) • Localisation of open source software, and many major applications • Some of the more advanced stuff • Speech synthesiser • Part-of-Speech Tagger • (Dependency Parser)
  16. 16. www.adaptcentre.ieLT for Irish • But there’s not much else • Availability of text corpora, speech corpora, parallel texts, wordnets and other LT building blocks is limited or poor • Some resources exist – small, narrow coverage, restricted availability • Lack of basic linguistic resources is stifling development of modern language processing technologies for Irish • Yet our own research centres are producing world leading LT for other languages
  17. 17. www.adaptcentre.ieState of LT Support for Irish Source: META-NET Whitepaper Series The Irish Language in the Digital Age
  18. 18. MT 19 English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithu- anian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish weak or no support Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh excellent English good Speech English good Dutch, French, German, Italian, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish weak or no supportexcellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene weak/no supportexcellent Resources Text Analysis Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
  19. 19. www.adaptcentre.ieEurope’s Languages and LT Dutch French German Italian Spanish Catalan Czech Finnish Hungarian Polish Portuguese Swedish Basque Bulgarian Danish Galician Greek Norwegian Romanian Slovak Slovene Croatian Estonian Icelandic Irish Latvian Lithuanian Maltese Serbian Welsh English good support through Language Technology weak or no support
  20. 20. www.adaptcentre.ieSo What? • Take a closer look at the least equipped languages • Only 3 compete with English in their native countries • Maltese native fluency ~100% (Eurobarometer) • Irish and Welsh are at risk • So too are other RMLs which compete with any better resourced language on a day to day basis Croatian Estonian Icelandic Irish Latvian Lithuanian Maltese Serbian Welsh weak or no support Basque Bulgarian Danish Galician Greek Norwegian Romanian Slovak Slovene
  21. 21. www.adaptcentre.ieLanguages at risk in the pre-digital age
  22. 22. www.adaptcentre.ieLanguages at risk in the print age • Invention of the moveable type printing press • Improved literacy • Standardisation • The Reformation • The Renaissance • The Enlightenment • Death of hundreds of European RMLs that never made it into print
  23. 23. www.adaptcentre.ieLanguages in the Digital Age • The leap into the digital age has had profound effects • Need to equip all languages with digital resources to ensure survival • Otherwise they are doomed to history • The Celtic Languages need to address under-resourcing
  24. 24. A High Level Solution - Europe
  25. 25. www.adaptcentre.ieEuropean Level Action • Multilingual Europe Technology Alliance • Bring together Language Technology stakeholders • Concerted effort to influence EU research programmes for LT • Strategic Research Agenda for Multilingual Europe • Success in H2020 Funding calls – specifically in ICT 17 “Cracking the Language Barrier” • “.. to facilitate multilingual online communication for the benefit of the digital single market which is still fragmented by language barriers that hamper a wide penetration of cross-border commerce, social communication and exchange of cultural content.” • “Special focus is on the 21 EU languages (both as source and target languages) that have ‘fragmentary’ or ‘weak/no’ machine translation support according to the META-NET language white papers.”
  26. 26. www.adaptcentre.ieAddressing the Gap – CRACKER Project • CRACKER (Feb 2015) – follow up to META-NET. Stated goals: • Initiating a programme of ground-breaking actions that will deliver, by 2025, an online EU internal market free of language barriers, delivering automated translation quality, equal to currently best performing language pair/direction, in most relevant use situations and for at least 90% of the EU official languages. • Significantly improving the quality, coverage and technical maturity of automatic translation for at least half of the 21 EU languages that currently have "weak or no support" or "fragmentary support" of machine translation solutions, according to the META-NET Language White Papers referenced before. • Attracting a community of hundreds of contributors of language resources and language technology tools (from all EU Member States and Associated Countries) to adopt and support a single platform for sharing, maintaining and making use of language resources and tools; establishing widely agreed benchmarks for machine translation quality and stimulating competition between methods and systems.
  27. 27. www.adaptcentre.ieEU Actions Recap • The EU is calling for improved resources for our languages • The big players (industry and research) are organising to do something about it • Celtic languages can be part of this if we position ourselves to be there
  28. 28. www.adaptcentre.ieEU Actions – Getting on board • Riga Summit 2015, April 27-29 • • Venue for META-FORUM • Multilingual Technologies for the Digital Single Market • Language Technologies for the Big Data Challenge and Data Economy • High-Quality Machine Translation • Towards European Language Technology Platforms • Strategic Agenda for the Multilingual Digital Single Market
  29. 29. www.adaptcentre.ieSummit Agenda Opening addresses H.E. Andris Bērziņš, President of the Republic of Latvia First session Setting the Strategic Agenda for the Multilingual Digital Single Market Coffee break Second session Breaking the Language Barrier for Cross-Border Public Services Lunch Third session Language Technology: Enabling European Business Coffee break Fourth session Empowering the Multilingual Data Economy Closing session EU Innovation Excellence to Address Multilingual Challenges
  30. 30. www.adaptcentre.ieNational Policy/Funding Agency Round Table • Roundtable session to discuss where languages and language technologies currently stand in the different countries and regions and how to improve the situation • Goal: Shape a Strategic Research and Innovation Agenda with input (and buy in) directly from those responsible for our languages at a regional level
  31. 31. Towards a Celtic Language Technology Community
  32. 32. www.adaptcentre.ieLanguages in the Digital Age • Not all doom and gloom! • Significant opportunity: LT and language promotion/rejuvenation • Community effort can provide the basic building blocks • Techniques can do more with less • Policy makers can be hard to convince • We have to start somewhere – Celtic Language Technology Community Workshop
  33. 33. www.adaptcentre.ieCeltic Language Technology Workshop “The Celtic Language Technology Workshop (CLTW) series of workshops provides a forum for researchers interested in developing NLP (Natural Language Processing) resources and technologies for Celtic languages. As Celtic languages are under-resourced, our goal is to encourage collaboration and communication between researchers working on language technologies and resources for Celtic languages.”
  34. 34. www.adaptcentre.ieFirst CLTW at COLING 2014 • Held in association with COLING 2014 (top tier CL/LT conference) • Full day of research presentations (papers and posters) • Attended by about 30 people • Published 12 papers • Representing work on: Irish, Welsh, Scots Gaelic, Breton (and an invited talk that covered aspects of Manx) • Including an open forum session to discuss how to move the area forward • Endorsed by Irish Government, Ofis Publik ar Brezhoneg (among others)
  35. 35. www.adaptcentre.ieCLTW Topics of Interest • Language resources • Syntax, semantics, grammar, lexicons • Phonology / morphology, tagging • Morphological analysis • Part-of-speech taggers • Computer-Assisted Language Learning (CALL) • Translation memory • Machine translation • Parsing / chunking • Ontologies, terminology and knowledge representation • Speech processing / generation • Digital humanities • Corpus development / analysis • Treebanking • Evaluation methods • Ontology-lexica • Metadata • Linked data resources • Linguistic linked data resources • Semantic annotation • Information Extraction
  36. 36. www.adaptcentre.ieWorkshop Outcomes • A great time! • Community forum • Momentum • Ideas for further collaboration • Possible EU level action to address under-resourcing
  37. 37. Future Directions
  38. 38. www.adaptcentre.ieWithin the LT Community • Under resourced languages are a challenge for science • The best researchers LOVE a challenge • Celtic LT community position itself as a provider of interesting challenges • BUT: We still need wider language community help to ensure adequate data is available to the R&D community
  39. 39. www.adaptcentre.ieWhat Can/Should We Do? • Concerted Community Action • Data is key • Collections of digital data in a language • Appropriate format • Appropriate annotation • Appropriate licence • Appropriately available • The R&D community will combine to build more sophisticated tools and solve bigger problems… • This should not be done in isolation by each RML community • Band together and also look to EU initiatives
  40. 40. www.adaptcentre.ieCeltic LT Community Efforts • Next CLTW – Proposal for part of LREC 2016 • Semi formal meet ups (today) • Budding Irish LT lobby group CIGILT • COST (European COoperation in Science and Technology) Action • Reaching out further to the Humanities • Needs support from policy makers • Needs to produce results that generate buy in from language communities
  41. 41. www.adaptcentre.ieThe Grass Roots • Small numbers of speakers • Typically minority (or marginalised languages) • Everyone has a role to play • LT Community needs to speak out more • Show tangible benefits
  42. 42. www.adaptcentre.ieDiolch! – Thank You! Me CLTW!forum/celtic-language-technology META-NET LWPs EU initiatives