Protecting Minority Languages from Digital Extinction
1. Protecting minority languages from digital
extinction: the case of Irish
Dr. Teresa Lynn
Research Fellow
ADAPT Centre, Dublin City University
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
4. www.adaptcentre.ie
Ireland and the Irish Language
• Once the most widely spoken language
• Historical factors led to decline in its
use & increase in use of English
• Still spoken natively
– Gaeltacht Regions
• “Urban revival” – Irish medium
schools increasingly popular
• Increased online usage
7. www.adaptcentre.ie
Language technology in our daily lives
o Grammar or Spellcheckers
o Predictive Text/ Autocorrect
o Search Engines
o Virtual Assistants
o Machine Translation systems (e.g. Google translate)
o …
9. www.adaptcentre.ie
Irish language technology survey
META-NET white paper series (Judge et al., 2012)
o EU-led study
o Survey of 31 EU languages
o Language resources and technologies
10. www.adaptcentre.ie
MT
10
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian,
Italian, Polish, Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech,
Danish, Estonian, Finnish, Galician,
Greek, Icelandic, Irish, Latvian,
Lithuanian, Maltese, Norwegian,
Portuguese, Serbian, Slovak, Slovene,
Swedish, Welsh
excellent
Czech, Dutch, Finnish,
French, German,
Italian, Portuguese,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish,
Estonian, Galician, Greek,
Hungarian, Irish, Norwegian,
Polish, Serbian, Slovak, Slovene,
Swedish
weak or no support
Croatian, Icelandic, Latvian,
Lithuanian, Maltese, Romanian, Welsh
excellent
English
good
Speech
English
good
Dutch, French,
German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Czech,
Danish, Finnish, Galician, Greek,
Hungarian, Norwegian, Polish,
Portuguese, Romanian, Slovak,
Slovene, Swedish
weak or no support
excellent
English
good
Czech, Dutch, French,
German, Hungarian,
Italian, Polish,
Spanish, Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian,
Danish, Estonian, Finnish, Galician,
Greek, Norwegian, Portuguese,
Romanian, Serbian, Slovak, Slovene
weak or no support
excellent
Resources
Text
Analysis
Croatian, Estonian, Icelandic, Irish,
Latvian, Lithuanian, Maltese, Serbian,
Welsh
Icelandic, Irish, Latvian, Lithuanian,
Maltese, Welsh
11. www.adaptcentre.ie
Language at Risk – in the Digital Age
“Printing Press resulted in the extinction of many minority
and regional languages”
Will technology have the same impact on Irish?
12. www.adaptcentre.ie
Irish Government support
Contributors:
Teresa Lynn Dublin City University
John Judge Dublin City University
Elaine Uí Dhonnchadha Trinity College Dublin
Neasa Ní Chiaráin Trinity College Dublin
Ailbhe Ní Chasaide Trinity College Dublin
Digital Strategy for the Irish Language 2019
13. www.adaptcentre.ie
Digital Strategy for the Irish Language 2019
Linguistic
Resources
Corpora
Knowledge
Bases
NLP Tools NLG Tools
Speech
Models
Speech
Synthesis
Speech
Recognition
Spoken
Dialogue
Systems
Machine
Translation
Information
Retrieval
State and
Public Use
CALL
Disability and
Access
Synergies
(Industry and
Public)
Topics Addressed:
14. www.adaptcentre.ie
Digital Strategy for the Irish Language 2019
Linguistic
Resources
Corpora
Knowledge
Bases
NLP Tools NLG Tools
Speech
Models
Speech
Synthesis
Speech
Recognition
Spoken
Dialogue
Systems
Machine
Translation
Information
Retrieval
State and
Public Use
CALL
Disability and
Access
Synergies
(Industry and
Public)
Topics Addressed:
17. www.adaptcentre.ie
Issues relating to lack of training data
Two issues here:
I. Abuse of free online systems
(should be post-edited by professional translator)
II. Over-estimation of open-domain system
18. www.adaptcentre.ie
Practical use of MT: Tapadóir System
• Initial meetings between ADAPT and senior staff at DCHG
(following META-NET White Paper Series 2012)
• DCHG recognised potential for investing in language technology,
specifically Machine Translation
• Potential cost-savings apparent
• Opportunity for ADAPT to carrying out research into Irish SMT
| 18
19. www.adaptcentre.ie
MT Data Sharing – Virtuous Circle
Develop/Improve
MT
Greater
Productivity
Reduced
Translation Costs
Increase Bilingual
Content
Share Data
22. www.adaptcentre.ie
Examples of sources of translation data
• Language Commissioner – Annual Reports, press releases, financial
reports, complaints received, guidebook for Language Act
• Department of Justice – legislative acts
• Dept of Foreign Affairs
• Dublin City Council - signage, applications, reports
• Universities – signage, course content, application forms, flyers, web
content
• Citizen’s Information – website content e.g. social welfare, consumer
affairs information, application forms
• Foras na Gaeilge (Irish language body) – dictionaries, monolingual
national corpus
• Kings Inns (Law Society) – monolingual training material
23. www.adaptcentre.ie
Shaping National Data Management Practices
“Where computer aided translation tooling is used to produce translations then,
in addition to the finished target language translation, any and all translation
memory data, or similar, derived from the translation process should be
returned to the department contracting for translation services as part of the
final end product delivery.”
24. www.adaptcentre.ie
ELRC: The Irish context
Advantages within the Irish context
• Small country – (relatively) less public bodies/ organisations to
reach
• Bilingual Society
• Dedicated Government department for Irish language affairs
• Dedicated Terminology Committee & Database
• Easier to make relevant connections
• Dept Gaeltacht support has strong impact
• Positive disposition towards language support
39. www.adaptcentre.ie
Language at Risk
Need to ensure continuing language usage
…….through technology
o Edutainment packages
o Word processing tools
o Webpage translation
o Search engines
o Games
o Social media/ online data mining
o Text generation (weather reports)
o Automatic subtitling
o Disability Access
o …
Source: http://www.leuphana.de/institute/ies/llt2015.html
40. www.adaptcentre.ie
Conclusion
When you have limited resources…
o Pilot projects can have impact: benefit/ need
o Share knowledge/collaborate/network with other minority groups
o Crowdsource (empower the language community)
o Seek National/ European support
o Collective efforts towards prevention of Digital Extinction
Start at the Share Data point on the circle.
The Dept shared data, which meant an engine could be developed….etc
Through Tapadóir project, we promoted the notion of data sharing within the detartment – and also with LSPs – and across other departments
How many knew about Irish before this? How many thought it was a dead language?