SlideShare a Scribd company logo
1 of 35
NLP for low-resourced languages
Teresa Lynn, PhD
Research Fellow
ADAPT Centre
Dublin City University
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
AI Challenges
for
Low-resourced
Languages
• Overview of The Irish Language
• NLP with few resources
• Addressing the Lack of Irish Data
• The Future?
Irish language - status
Ce ns us (2 0 1 6 ): Pop. 4 ,7 6 1 ,865
Ability to s pe a k : 1 ,7 6 1 ,4 20
Da ily us a ge : 7 3 ,8 0 3
First Officia l La ngua ge
N ational La ngua ge
EU Language status
Officia l E U La ngua ge
Minor ity La ngua ge (low -re s ource d)
De rogation on officia l tra ns lations (until 2 0 2 1 )
Morphology/ Inflection
LENITION
sa cheantar ‘in the area’
airgead a thuillfeadh sé ‘money he would earn’
a dheartháir ‘his brother’
ECLIPSIS
Tír na nÓg ‘Land of the Youth’
i mBéarla ‘in English’
ar an mbord ‘on the table’
VOWEL HARMONY
Caithim `I spend’
Casaim `I turn’
Rithfinn `I would run’
D’íosfainn `I would eat’
Inflected
Prepositions
7
le – with
liom `with me’
leat `with you’
ag – at
agam `at me’
agat `at you’
faoi – about/under
fúm ‘about/under me’
fút ‘about/under you’
ó – from
uaim `from me’
uait `from you’
do – to
dom to me’
duit `to you’
ar – on
orm ‘on me’
ort ‘on you’
Word Order
V OS
English: `I saw the boy’
Irish: Chonaic mé an buachaill
Gloss: Saw I the boy
Irish language technology
3 1 E U la ngua ges
La ngua ge re s ources a nd
te chnologies
ME TA -N ET white pa pe r s e r ie s (Judge et a l., 2 0 1 2 )
E U-le d s ur vey
www.adaptcentre.ie
MT
9
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian, Italian,
Polish, Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech, Danish, Estonian,
Finnish, Galician, Greek, Icelandic, Irish, Latvian,
Lithuanian, Maltese, Norwegian, Portuguese, Serbian,
Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch, Finnish, French,
German, Italian, Portuguese,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish, Estonian,
Galician, Greek, Hungarian, Irish, Norwegian,
Polish, Serbian, Slovak, Slovene, Swedish
weak or no support
Croatian, Icelandic, Latvian,
Lithuanian, Maltese, Romanian, Welsh
excellent
English
good
Speech
English
good
Dutch, French, German,
Italian, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Czech, Danish,
Finnish, Galician, Greek, Hungarian, Norwegian,
Polish, Portuguese, Romanian, Slovak, Slovene,
Swedish
weak or no supportexcellent
English
good
Czech, Dutch, French,
German, Hungarian, Italian,
Polish, Spanish, Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian, Danish,
Estonian, Finnish, Galician, Greek, Norwegian,
Portuguese, Romanian, Serbian, Slovak, Slovene
weak or no supportexcellent
Resources
Text
Analysis
Croatian, Estonian, Icelandic, Irish, Latvian,
Lithuanian, Maltese, Serbian, Welsh
Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
Risk of Digital Extinction
“Printing Press resulted in the extinction of
many minority and regional languages”
Will technology have the same impact on Irish?
Risk of Digital Extinction
Need to ensure continuing language
usage through technology
o Edutainment packages/ CALL
o Multi-platform Word processing tools
o Automated translation
o Search engines
o Games
o Social media/ Online data mining
o Text Generation (e.g. weather reports)
o Automatic subtitling
o …
Why
do we
need
NLP?
T E X T
S U M M A R I S AT I O N
7
S E N T I M E N T
A N A LY S I S
I N F O R M AT I O N
R E T R I E V A L
T E X T M I N I N G
M A C H I N E
T R A N S L AT I O N
Q U E S T I O N - A N S W E R I N G
S Y S T E M S
G R A M M A R C H E C K I N G
L A N G U A G E L E A R N I N G A P P S
R E C O M M E N D E R S Y S T E M S V I D E O S U M M A R I S AT I O N
• Overview of The Irish Language
• NLP with few resources
• Addressing the Lack of Irish Data
• The Future?
Why is NLP
a hard task?
One word/sentence may have many meanings
7
Many ways of saying the same thing
Meaning depends on context
Literal and figurative language (metaphor)
Language and culture
(different ways of conceptualising the same thing)
8
Ambiguous Headlines
Syntactic Ambiguity
EYE DROPS OFF SHELF
SQUAD HELPS DOG BITE VICTIM
ENRAGED COW INJURES FARMER WITH AXE
STOLEN PAINTING FOUND BY TREE
PANDA MATING FAILS; VETERINARIAN TAKES OVER
SAFETY EXPERTS SAY SCHOOL BUS PASSENGERS SHOULD BE
BELTED
POLICE BEGIN CAMPAIGN TO RUN DOWN JAYWALKERS
Semantic Ambiguity
Source: http://www.alta.asn.au/events/altss_w2003_proc/altss/courses/somers/headlines.htm
What does a machine know about language?
Sentence = a string/sequence of characters:
“The man saw the boy with the telescope”
What does a machine know about language?
SYNTACTIC PARSING 101
Who is doing what? Who has the telescope?
Part of Speech Tagging
“The man saw the boy with the telescope”
DET NOUN VERB DET NOUN PREP DET NOUN
“Traditional” Parsing
S  NP VP
S  NP VP PP
NP  Noun | Pronoun
VP  Verb NP | Verb PP
PP  Preposition Noun
Noun  ‘ice-cream’ | ‘summer’
Pronoun  `I’
Verb  `like’
Preposition  ‘in’
STATISTICAL PARSING
TEXT TEXT TEXT TEXT
Machine Learning in NLP
(data driven approaches)
STRUCTURED
DATA
LABELLED
DATA
RELIABLE
DATA
Machine Learning – data sparsity
Data Envy
Irish Data Sparsity
FUNDING
NUMBER OF
SPEAKERS
MORPHOLOGYSKILL
SHORTAGE
• Overview of The Irish Language
• NLP with few resources
• Addressing the Lack of Irish Data
• The Future?
Addressing the lack
of data
BOOT-
STRAPPING
TRAIN
MORE
EXPERTS
CROSS-
LINGUAL
TRANSFER
SYNTHETIC
DATA
CROSS-LINGUAL TRANSFER
UNIVERSAL
DEPENDENCIES
MULTI-WORD EXPRESSIONS
• Using data from one language to help build a system for another
BOOTSTRAPPING
PASSIVE LEARNING ACTIVE LEARNING
• Using limited data to train a sub-standard system to help further
annotations (human correction rather than annotate from scratch)
SYNTHETIC DATA
e.g. Back Translation for
Machine Translation
On that MT note…..
Tapadóir SMT system (BLEU 46)
SMT vs NMT (NMT BLEU 40)
Domain-tuning, linguistic features (hybrid)
Increasing data collection (European Language Resource Coordination)
• Overview of The Irish Language
• NLP with few resources
• Addressing the Lack of Irish Data
• The Future?
Linguistic
Resources
Corpora
Knowledge
Bases
NLP Tools NLG Tools
Speech
Models
Speech
Synthesis
Speech
Recognition
Spoken
Dialogue
Systems
Machine
Translation
Information
Retrieval
State and
Public Use
CALL
Disability and
Access
Synergies
(Industry and
Public)
Digital Strategy for the Irish Language 2019
TRAINING MORE EXPERTS
Machine Translation
Irish Twitter Analysis
Processing Irish Multiword Expressions
Irish Syntactic Parsing
Go Raibh Maith Agaibh
#GRMA
teresa.lynn@adaptcentre.ie
@cigilt

More Related Content

Similar to Dublin Machine Learning Meetup 2019

2013 Bilingual/ESL Conference Program
2013 Bilingual/ESL Conference Program2013 Bilingual/ESL Conference Program
2013 Bilingual/ESL Conference Program
Region4ESC
 
Pronunciation
PronunciationPronunciation
Pronunciation
hangha
 
Pronunciation
PronunciationPronunciation
Pronunciation
hangha
 
Pronunciation
PronunciationPronunciation
Pronunciation
hangha
 
PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...
PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...
PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...
Ewa Malczuk
 
Capstone - Phase 5 - PowerPoint
Capstone - Phase 5 - PowerPointCapstone - Phase 5 - PowerPoint
Capstone - Phase 5 - PowerPoint
Catherine Mullen
 
An exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishAn exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP Spanish
Steven Saffels
 
Horn, meagan project proposal
Horn, meagan project proposalHorn, meagan project proposal
Horn, meagan project proposal
Meag Horn
 

Similar to Dublin Machine Learning Meetup 2019 (20)

Role of infant-directed speech in language development of hearing impaired in...
Role of infant-directed speech in language development of hearing impaired in...Role of infant-directed speech in language development of hearing impaired in...
Role of infant-directed speech in language development of hearing impaired in...
 
2013 Bilingual/ESL Conference Program
2013 Bilingual/ESL Conference Program2013 Bilingual/ESL Conference Program
2013 Bilingual/ESL Conference Program
 
Natural Language Processing for Irish
Natural Language Processing for IrishNatural Language Processing for Irish
Natural Language Processing for Irish
 
Ethical Considerations for Culturally and Linguistically Diverse Populations ...
Ethical Considerations for Culturally and Linguistically Diverse Populations ...Ethical Considerations for Culturally and Linguistically Diverse Populations ...
Ethical Considerations for Culturally and Linguistically Diverse Populations ...
 
The Zobin Method
The Zobin MethodThe Zobin Method
The Zobin Method
 
The Zobin Method
The Zobin MethodThe Zobin Method
The Zobin Method
 
Pronunciation
PronunciationPronunciation
Pronunciation
 
Pronunciation
PronunciationPronunciation
Pronunciation
 
Pronunciation
PronunciationPronunciation
Pronunciation
 
Listening comprehension development in EFL Learners
Listening comprehension development in EFL LearnersListening comprehension development in EFL Learners
Listening comprehension development in EFL Learners
 
Reading difficulties
Reading difficultiesReading difficulties
Reading difficulties
 
Final project dyslexia -gomez 5-1-11
Final project dyslexia -gomez 5-1-11Final project dyslexia -gomez 5-1-11
Final project dyslexia -gomez 5-1-11
 
Age Awareness
Age AwarenessAge Awareness
Age Awareness
 
PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...
PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...
PowerPoint Presentation Polish Immigrants in Ireland Accent and the Dynamics ...
 
Capstone - Phase 5 - PowerPoint
Capstone - Phase 5 - PowerPointCapstone - Phase 5 - PowerPoint
Capstone - Phase 5 - PowerPoint
 
An exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishAn exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP Spanish
 
Horn, meagan project proposal
Horn, meagan project proposalHorn, meagan project proposal
Horn, meagan project proposal
 
Jep 4
Jep 4Jep 4
Jep 4
 
Unspoken Language
Unspoken LanguageUnspoken Language
Unspoken Language
 
English As A Second Language-ALGERIA
English As A Second Language-ALGERIAEnglish As A Second Language-ALGERIA
English As A Second Language-ALGERIA
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Dublin Machine Learning Meetup 2019

  • 1. NLP for low-resourced languages Teresa Lynn, PhD Research Fellow ADAPT Centre Dublin City University The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
  • 2. AI Challenges for Low-resourced Languages • Overview of The Irish Language • NLP with few resources • Addressing the Lack of Irish Data • The Future?
  • 3. Irish language - status Ce ns us (2 0 1 6 ): Pop. 4 ,7 6 1 ,865 Ability to s pe a k : 1 ,7 6 1 ,4 20 Da ily us a ge : 7 3 ,8 0 3 First Officia l La ngua ge N ational La ngua ge
  • 4. EU Language status Officia l E U La ngua ge Minor ity La ngua ge (low -re s ource d) De rogation on officia l tra ns lations (until 2 0 2 1 )
  • 5. Morphology/ Inflection LENITION sa cheantar ‘in the area’ airgead a thuillfeadh sé ‘money he would earn’ a dheartháir ‘his brother’ ECLIPSIS Tír na nÓg ‘Land of the Youth’ i mBéarla ‘in English’ ar an mbord ‘on the table’ VOWEL HARMONY Caithim `I spend’ Casaim `I turn’ Rithfinn `I would run’ D’íosfainn `I would eat’
  • 6. Inflected Prepositions 7 le – with liom `with me’ leat `with you’ ag – at agam `at me’ agat `at you’ faoi – about/under fúm ‘about/under me’ fút ‘about/under you’ ó – from uaim `from me’ uait `from you’ do – to dom to me’ duit `to you’ ar – on orm ‘on me’ ort ‘on you’
  • 7. Word Order V OS English: `I saw the boy’ Irish: Chonaic mé an buachaill Gloss: Saw I the boy
  • 8. Irish language technology 3 1 E U la ngua ges La ngua ge re s ources a nd te chnologies ME TA -N ET white pa pe r s e r ie s (Judge et a l., 2 0 1 2 ) E U-le d s ur vey
  • 9. www.adaptcentre.ie MT 9 English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish weak or no support Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh excellent English good Speech English good Dutch, French, German, Italian, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish weak or no supportexcellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene weak or no supportexcellent Resources Text Analysis Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
  • 10. Risk of Digital Extinction “Printing Press resulted in the extinction of many minority and regional languages” Will technology have the same impact on Irish?
  • 11. Risk of Digital Extinction Need to ensure continuing language usage through technology o Edutainment packages/ CALL o Multi-platform Word processing tools o Automated translation o Search engines o Games o Social media/ Online data mining o Text Generation (e.g. weather reports) o Automatic subtitling o …
  • 12. Why do we need NLP? T E X T S U M M A R I S AT I O N 7 S E N T I M E N T A N A LY S I S I N F O R M AT I O N R E T R I E V A L T E X T M I N I N G M A C H I N E T R A N S L AT I O N Q U E S T I O N - A N S W E R I N G S Y S T E M S G R A M M A R C H E C K I N G L A N G U A G E L E A R N I N G A P P S R E C O M M E N D E R S Y S T E M S V I D E O S U M M A R I S AT I O N
  • 13. • Overview of The Irish Language • NLP with few resources • Addressing the Lack of Irish Data • The Future?
  • 14. Why is NLP a hard task? One word/sentence may have many meanings 7 Many ways of saying the same thing Meaning depends on context Literal and figurative language (metaphor) Language and culture (different ways of conceptualising the same thing)
  • 15. 8 Ambiguous Headlines Syntactic Ambiguity EYE DROPS OFF SHELF SQUAD HELPS DOG BITE VICTIM ENRAGED COW INJURES FARMER WITH AXE STOLEN PAINTING FOUND BY TREE PANDA MATING FAILS; VETERINARIAN TAKES OVER SAFETY EXPERTS SAY SCHOOL BUS PASSENGERS SHOULD BE BELTED POLICE BEGIN CAMPAIGN TO RUN DOWN JAYWALKERS Semantic Ambiguity Source: http://www.alta.asn.au/events/altss_w2003_proc/altss/courses/somers/headlines.htm
  • 16. What does a machine know about language?
  • 17. Sentence = a string/sequence of characters: “The man saw the boy with the telescope” What does a machine know about language?
  • 18. SYNTACTIC PARSING 101 Who is doing what? Who has the telescope? Part of Speech Tagging “The man saw the boy with the telescope” DET NOUN VERB DET NOUN PREP DET NOUN
  • 19. “Traditional” Parsing S  NP VP S  NP VP PP NP  Noun | Pronoun VP  Verb NP | Verb PP PP  Preposition Noun Noun  ‘ice-cream’ | ‘summer’ Pronoun  `I’ Verb  `like’ Preposition  ‘in’
  • 21. Machine Learning in NLP (data driven approaches) STRUCTURED DATA LABELLED DATA RELIABLE DATA
  • 22. Machine Learning – data sparsity
  • 24.
  • 25. Irish Data Sparsity FUNDING NUMBER OF SPEAKERS MORPHOLOGYSKILL SHORTAGE
  • 26. • Overview of The Irish Language • NLP with few resources • Addressing the Lack of Irish Data • The Future?
  • 27. Addressing the lack of data BOOT- STRAPPING TRAIN MORE EXPERTS CROSS- LINGUAL TRANSFER SYNTHETIC DATA
  • 28. CROSS-LINGUAL TRANSFER UNIVERSAL DEPENDENCIES MULTI-WORD EXPRESSIONS • Using data from one language to help build a system for another
  • 29. BOOTSTRAPPING PASSIVE LEARNING ACTIVE LEARNING • Using limited data to train a sub-standard system to help further annotations (human correction rather than annotate from scratch)
  • 30. SYNTHETIC DATA e.g. Back Translation for Machine Translation
  • 31. On that MT note….. Tapadóir SMT system (BLEU 46) SMT vs NMT (NMT BLEU 40) Domain-tuning, linguistic features (hybrid) Increasing data collection (European Language Resource Coordination)
  • 32. • Overview of The Irish Language • NLP with few resources • Addressing the Lack of Irish Data • The Future?
  • 33. Linguistic Resources Corpora Knowledge Bases NLP Tools NLG Tools Speech Models Speech Synthesis Speech Recognition Spoken Dialogue Systems Machine Translation Information Retrieval State and Public Use CALL Disability and Access Synergies (Industry and Public) Digital Strategy for the Irish Language 2019
  • 34. TRAINING MORE EXPERTS Machine Translation Irish Twitter Analysis Processing Irish Multiword Expressions Irish Syntactic Parsing
  • 35. Go Raibh Maith Agaibh #GRMA teresa.lynn@adaptcentre.ie @cigilt