2. www.adaptcentre.ieBackground on Me
• Background: Computational Linguist – research and real world
• Interests in: Natural Language Processing, Text Analytics, Machine
Translation, …
• National Centre for Language Technology
• Research Integration Coordinator for the ADAPT Centre of Excellence
for Digital Content and Media Innovation
• Focus on EU collaborations
• META-NET
• QT LaunchPad
• LT Web
• FALCON
• Mli
• QT21
• TraMOOC
• EXPERT
3. www.adaptcentre.ieADAPT Centre
• ADAPT Science Foundation Ireland Direct Funding over six years
(until 2020)
• Academic/Industry partnership built on top of CNGL
• Five research themes
• Six application areas
• TCD and DCU co-leads; UCD and DIT partners
• Open ended number of industry partners
6. www.adaptcentre.ieAmbitious Metrics for Success
13
Spin Out
Companies
€5m
Commercialisation
Awards
1,650
Top Quality
Publications
€110m
Won in Total
Competitive
Research
500
Jobs
€9m
From
Commercial
Sources
60
Major EU
Initiatives
200
Postgraduate
Students
88
Licence
Agreements
10. www.adaptcentre.ieLT is not…
• Localised Software
• A website in your language
• A static online dictionary
But these are all VERY valuable resources for a language!
…and can form part of a healthy LT ecosystem
11. www.adaptcentre.ieWhat is LT – Where I’m coming from
• Technology for processing information (speech, text, gestures,…) in
a given language
• An enabling technology
• Added intelligence to both content (creation, management/etc) and
HCI
• Set of tools and resources – part of a bigger picture and a larger
ecosystem
• Interactive
• Not monolithic resources
12. www.adaptcentre.ieIt’s already right under your noses
• These concepts (and some others) already being used for a wide
range of applications
• Marketing/Brand awareness
• Customer Sentiment Analysis
• Political barometers (Obama)
• Information analysis and extraction (IBM Watson)
• Offensive content filtering
• Security applications
14. www.adaptcentre.ieLT Landscape in Ireland
• Historically strong in Translation and Localisation industry
• Home to several internationally recognised research centres
• NCLT
• DERI
• CNGL >>> ADAPT
• INSIGHT
• Government funding for research has been consistent despite
worsening economic conditions
15. www.adaptcentre.ieLT for Irish
• Many of the basics are covered
• Spell checker
• Grammar checking
• T9 predictive text, smartphone predictive text (through additional
software)
• Localisation of open source software, and many major applications
• Some of the more advanced stuff
• Speech synthesiser
• Part-of-Speech Tagger
• (Dependency Parser)
16. www.adaptcentre.ieLT for Irish
• But there’s not much else
• Availability of text corpora, speech corpora, parallel texts, wordnets
and other LT building blocks is limited or poor
• Some resources exist – small, narrow coverage, restricted
availability
• Lack of basic linguistic resources is stifling development of modern
language processing technologies for Irish
• Yet our own research centres are producing world leading LT for
other languages
17. www.adaptcentre.ieState of LT Support for Irish
Source: META-NET Whitepaper Series The Irish Language in the Digital Age
18. www.adaptcentre.ie
MT
19
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian,
Italian, Polish, Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech,
Danish, Estonian, Finnish, Galician,
Greek, Icelandic, Irish, Latvian, Lithu-
anian, Maltese, Norwegian, Portuguese,
Serbian, Slovak, Slovene, Swedish,
Welsh
excellent
Czech, Dutch, Finnish,
French, German,
Italian, Portuguese,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Danish, Estonian, Galician, Greek,
Hungarian, Irish, Norwegian,
Polish, Serbian, Slovak, Slovene,
Swedish
weak or no support
Croatian, Icelandic, Latvian,
Lithuanian, Maltese, Romanian,
Welsh
excellent
English
good
Speech
English
good
Dutch, French,
German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Czech, Danish, Finnish, Galician,
Greek, Hungarian, Norwegian,
Polish, Portuguese, Romanian,
Slovak, Slovene, Swedish
weak or no supportexcellent
English
good
Czech, Dutch, French,
German, Hungarian,
Italian, Polish,
Spanish, Swedish
moderate fragmentary
Basque, Bulgarian, Catalan,
Croatian, Danish, Estonian, Finnish,
Galician, Greek, Norwegian,
Portuguese, Romanian, Serbian,
Slovak, Slovene
weak/no supportexcellent
Resources
Text
Analysis
Croatian, Estonian, Icelandic, Irish,
Latvian, Lithuanian, Maltese, Serbian,
Welsh
Icelandic, Irish, Latvian, Lithuanian,
Maltese, Welsh
19. www.adaptcentre.ieEurope’s Languages and LT
Dutch
French
German
Italian
Spanish
Catalan
Czech
Finnish
Hungarian
Polish
Portuguese
Swedish
Basque
Bulgarian
Danish
Galician
Greek
Norwegian
Romanian
Slovak
Slovene
Croatian
Estonian
Icelandic
Irish
Latvian
Lithuanian
Maltese
Serbian
Welsh
English
good support through
Language Technology
weak or
no support
20. www.adaptcentre.ieSo What?
• Take a closer look at the least equipped languages
• Only 3 compete with English in their native countries
• Maltese native fluency ~100% (Eurobarometer)
• Irish and Welsh are at risk
• So too are other RMLs which compete with any better resourced
language on a day to day basis
Croatian
Estonian
Icelandic
Irish
Latvian
Lithuanian
Maltese
Serbian
Welsh
weak or
no support
Basque
Bulgarian
Danish
Galician
Greek
Norwegian
Romanian
Slovak
Slovene
22. www.adaptcentre.ieLanguages at risk in the print age
• Invention of the moveable type printing press
• Improved literacy
• Standardisation
• The Reformation
• The Renaissance
• The Enlightenment
• Death of hundreds of European RMLs that never made it into
print
23. www.adaptcentre.ieLanguages in the Digital Age
• The leap into the digital age has had profound effects
• Need to equip all languages with digital resources to ensure survival
• Otherwise they are doomed to history
• The Celtic Languages need to address under-resourcing
25. www.adaptcentre.ieEuropean Level Action
• Multilingual Europe Technology Alliance
• Bring together Language Technology stakeholders
• Concerted effort to influence EU research programmes for LT
• Strategic Research Agenda for Multilingual Europe
• Success in H2020 Funding calls – specifically in ICT 17 “Cracking
the Language Barrier”
• “.. to facilitate multilingual online communication for the benefit
of the digital single market which is still fragmented by language
barriers that hamper a wide penetration of cross-border
commerce, social communication and exchange of cultural
content.”
• “Special focus is on the 21 EU languages (both as source
and target languages) that have ‘fragmentary’ or ‘weak/no’
machine translation support according to the META-NET
language white papers.”
26. www.adaptcentre.ieAddressing the Gap – CRACKER Project
• CRACKER (Feb 2015) – follow up to META-NET. Stated goals:
• Initiating a programme of ground-breaking actions that will deliver, by
2025, an online EU internal market free of language barriers,
delivering automated translation quality, equal to currently best
performing language pair/direction, in most relevant use situations and
for at least 90% of the EU official languages.
• Significantly improving the quality, coverage and technical maturity of
automatic translation for at least half of the 21 EU languages that
currently have "weak or no support" or "fragmentary support" of
machine translation solutions, according to the META-NET
Language White Papers referenced before.
• Attracting a community of hundreds of contributors of language
resources and language technology tools (from all EU Member
States and Associated Countries) to adopt and support a single
platform for sharing, maintaining and making use of language
resources and tools; establishing widely agreed benchmarks for
machine translation quality and stimulating competition between
methods and systems.
27. www.adaptcentre.ieEU Actions Recap
• The EU is calling for improved resources for our languages
• The big players (industry and research) are organising to do
something about it
• Celtic languages can be part of this if we position ourselves to be
there
28. www.adaptcentre.ieEU Actions – Getting on board
• Riga Summit 2015, April 27-29
• http://www.rigasummit2015.eu
• Venue for META-FORUM
• Multilingual Technologies for the Digital Single Market
• Language Technologies for the Big Data Challenge and Data
Economy
• High-Quality Machine Translation
• Towards European Language Technology Platforms
• Strategic Agenda for the Multilingual Digital Single Market
29. www.adaptcentre.ieSummit Agenda
Opening addresses
H.E. Andris Bērziņš, President of the Republic of Latvia
First session
Setting the Strategic Agenda for the Multilingual Digital Single Market
Coffee break
Second session
Breaking the Language Barrier for Cross-Border Public Services
Lunch
Third session
Language Technology: Enabling European Business
Coffee break
Fourth session
Empowering the Multilingual Data Economy
Closing session
EU Innovation Excellence to Address Multilingual Challenges
30. www.adaptcentre.ieNational Policy/Funding Agency Round Table
• Roundtable session to discuss where languages and language
technologies currently stand in the different countries and regions
and how to improve the situation
• Goal: Shape a Strategic Research and Innovation Agenda with input
(and buy in) directly from those responsible for our languages at a
regional level
32. www.adaptcentre.ieLanguages in the Digital Age
• Not all doom and gloom!
• Significant opportunity: LT and language promotion/rejuvenation
• Community effort can provide the basic building blocks
• Techniques can do more with less
• Policy makers can be hard to convince
• We have to start somewhere – Celtic Language Technology
Community Workshop
33. www.adaptcentre.ieCeltic Language Technology Workshop
“The Celtic Language Technology Workshop (CLTW) series of
workshops provides a forum for researchers interested in developing
NLP (Natural Language Processing) resources and technologies for
Celtic languages.
As Celtic languages are under-resourced, our goal is to encourage
collaboration and communication between researchers working on
language technologies and resources for Celtic languages.”
34. www.adaptcentre.ieFirst CLTW at COLING 2014
• Held in association with COLING 2014 (top tier CL/LT conference)
• Full day of research presentations (papers and posters)
• Attended by about 30 people
• Published 12 papers
• Representing work on: Irish, Welsh, Scots Gaelic, Breton (and an
invited talk that covered aspects of Manx)
• Including an open forum session to discuss how to move the area
forward
• Endorsed by Irish Government, Ofis Publik ar Brezhoneg (among
others)
35. www.adaptcentre.ieCLTW Topics of Interest
• Language resources
• Syntax, semantics, grammar,
lexicons
• Phonology / morphology, tagging
• Morphological analysis
• Part-of-speech taggers
• Computer-Assisted Language
Learning (CALL)
• Translation memory
• Machine translation
• Parsing / chunking
• Ontologies, terminology and
knowledge representation
• Speech processing / generation
• Digital humanities
• Corpus development /
analysis
• Treebanking
• Evaluation methods
• Ontology-lexica
• Metadata
• Linked data resources
• Linguistic linked data
resources
• Semantic annotation
• Information Extraction
36. www.adaptcentre.ieWorkshop Outcomes
• A great time!
• Community forum
• Momentum
• Ideas for further collaboration
• Possible EU level action to address under-resourcing
38. www.adaptcentre.ieWithin the LT Community
• Under resourced languages are a challenge for science
• The best researchers LOVE a challenge
• Celtic LT community position itself as a provider of interesting
challenges
• BUT: We still need wider language community help to ensure
adequate data is available to the R&D community
39. www.adaptcentre.ieWhat Can/Should We Do?
• Concerted Community Action
• Data is key
• Collections of digital data in a language
• Appropriate format
• Appropriate annotation
• Appropriate licence
• Appropriately available
• The R&D community will combine to build more sophisticated tools
and solve bigger problems…
• This should not be done in isolation by each RML community
• Band together and also look to EU initiatives
40. www.adaptcentre.ieCeltic LT Community Efforts
• Next CLTW – Proposal for part of LREC 2016
• Semi formal meet ups (today)
• Budding Irish LT lobby group CIGILT
• COST (European COoperation in Science and Technology) Action
• Reaching out further to the Humanities
• Needs support from policy makers
• Needs to produce results that generate buy in from language
communities
41. www.adaptcentre.ieThe Grass Roots
• Small numbers of speakers
• Typically minority (or marginalised languages)
• Everyone has a role to play
• LT Community needs to speak out more
• Show tangible benefits
If the digital age is already heavily affecting english, the lingua franca of the WORLD (eg. selfie in the OED)
And we already have evidence that a similar previous information revolution killed off the lingua franca of Europe, of the Church AND 100’s of RMLs
What chance do languages that are under resourced digitally have?