SlideShare a Scribd company logo
1 of 41
World languages / Ieithoedd y Byd
• > 7,000 language / iaith
• 90% <100,000 speakers / siaradwr
• 832 Papua New Guinea
• 260 Ewrop
• 46 with one speaker / 46 gydag un siaradwr
(UNESCO, Infolang, BBC Languages)
League position
Ble mae’r Gymraeg?
Where does Welsh come in the
league table of the 7,000 world
languages, listed by ‘number of
speakers’?
Wrth resti’r 7,000 o ieithoedd y byd yn nhrefn nifer siaradwyr, ble mae’r Gymraeg?
172
The challenge for
all Celtic and other languages
The technology can dictate which language
your family can speak at home.
So you’ve got to try to make the technology
understand and speak your language.
To make a Siri, Dot or Google Home, you
need:
1. Speech to text in your language (matched
audio/text, pronunciation dictionary, Kaldi or
similar)
2. Machine translation which is good in both
directions (neural networks helping)
3. Some kind of artificial intelligence to make
sense of the semantics (I’m simplifying)
4. Synthetic voice or text to speech
5. Buy-in from Google, Nuance, Apple,
Amazon..
So, how many
languages do the big
guns support?
Sawl iaith mae’r cwmnïau mawrion yn cefnogi go iawn?
iaith
rhyngwyneb
Twitter
interface
languages
iaith
rhyngwyneb
Twitter
48
iaith chwilio
Google
search
languages
iaith chwilio
Google
46
ieithoedd
Apple Siri
languages
ieithoedd
Apple Siri20
ieithoedd
Amazon Alexa
– Echo, Dot
languages
ieithoedd
Amazon Alexa
– Echo, Dot
languages
2
Another challenge for
your language
Plan to edge your language up into the
world’s Top 50, so the big companies
introduce support for it
But, hang on, Welsh is down at #172 in the
league table!
Key: yellow rows have support of 2+ companies
Sort by number of speakers first (Col B),
Then by number of Wikipedia articles in that language (Col D)
(http://wikistats.wmflabs.org/ 16
Chwefror 2017)
Ordered by number of speakers
• Catalan 91
• Galician 143
• Scots Gaelic 168 (!)
• Welsh 172
• Basque 174 (!)
• Irish 159
• Breton 206
(Wikimedia)
Number/(position in table)
of Wikipedia articles
• Catalan 541k (17)
• Basque 280k (31)
• Galician 139k (47)
• Welsh 91k (60)
• Breton 62k (74)
• Irish 40k (89)
• Scots Gaelic 14k (118)
• Manx 5k (165)
(http://wikistats.wmflabs.org/ 16 Chwefror 2017)
How we’ve started to help tackling
this for Welsh
• Open licencing of public sector data and content
• Robin Owain Wikimedia UK had already been
automating using AutoWikiBrowser
• Advice from Basque country: Gorka Julio, Josu
Waliño, Galder Gonzalez
• Galder Gonzalez: “The best way is determining what
you want to create, and having a bit with bot
permissions. Also pywikibot installed and running.”
• Grants for #wicipop #wicimon and #wikiiechyd
(pop, science and health, all with editathons as
well as automation)
Apart from appealing to
multinationals
Ar wahân i geisio swyno’r cwmnïau mawr...
• Wikipedia gets people speaking & writing in their
own language
• Creating a valuable and important resource for
everyone
• Schools – Digital Competence Framework,
literacy, photography, Welsh Baccalaureate..
• Golygathonau (editathons) are fun. (Like the
Papur Bro folding sessions)
But beware Gofal...
• Quality of content and production experience is
more important than quantity of articles
• Machine translation yields scale but needs to be
used with awareness of cultural sensitivities
• Risk: Celtic languages have low number of
‘views per hour’. How can we boost these?
Welsh 1,076 (67)
Breton 765 (75)
Irish 621 (80)
Scots Gaelic 352 (111)
Manx 231 (144)
Cornish 172 (172)
The challenge for
Celtic and other languages
Technology is dictating which languages
our families can speak at home.
So we’ve got to make the technology
understand and speak our languages
To do this, we need to raise our languages’
profiles in the eyes of the big companies.
Wikipedia in our own language is an
important part of this.
Corporate slide master
With guidelines for corporate presentationsWelsh PPT template
The title slide of your Welsh language PowerPoint presentation should contain the Welsh
Government logo and Welsh URL address as positioned here on the red template areas.
Do not alter the size or position of these areas.
You are NOT REQUIRED to put th branding on subsequent slides in your presentation
Welsh Government
Diolch Thanks
Gareth Morlais
@digitalst

More Related Content

Similar to How Wikipedia helps you get your language supported by major technology companies

Translate.org Presentation
Translate.org PresentationTranslate.org Presentation
Translate.org Presentation
SANGONeT
 
Bible translation in today's world part 2 2-4-15 p pt slides org
Bible translation in today's world   part 2  2-4-15 p pt slides orgBible translation in today's world   part 2  2-4-15 p pt slides org
Bible translation in today's world part 2 2-4-15 p pt slides org
Walt Hamilton
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Lucidworks
 
Simple english editors meeting
Simple english editors meetingSimple english editors meeting
Simple english editors meeting
Val Swisher
 

Similar to How Wikipedia helps you get your language supported by major technology companies (20)

Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
Translate.org Presentation
Translate.org PresentationTranslate.org Presentation
Translate.org Presentation
 
Language, Culture, and Software
Language, Culture, and SoftwareLanguage, Culture, and Software
Language, Culture, and Software
 
Bible translation in today's world part 2 2-4-15 p pt slides org
Bible translation in today's world   part 2  2-4-15 p pt slides orgBible translation in today's world   part 2  2-4-15 p pt slides org
Bible translation in today's world part 2 2-4-15 p pt slides org
 
Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...
 
European Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureEuropean Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and Future
 
New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...
 
New Life for Old Media (NEM presentation)
New Life for Old Media  (NEM presentation)New Life for Old Media  (NEM presentation)
New Life for Old Media (NEM presentation)
 
Accentuate Us!
Accentuate Us!Accentuate Us!
Accentuate Us!
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
Inclusive Libraries_Techshare India 2014
Inclusive Libraries_Techshare India 2014Inclusive Libraries_Techshare India 2014
Inclusive Libraries_Techshare India 2014
 
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
 
Best Practices When Localizing And Translating Marketing Materials
Best Practices When Localizing And Translating Marketing MaterialsBest Practices When Localizing And Translating Marketing Materials
Best Practices When Localizing And Translating Marketing Materials
 
Webinar Cultivating a Library Technoculture We are Tech Workers-2016-07-27
Webinar Cultivating a Library Technoculture We are Tech Workers-2016-07-27Webinar Cultivating a Library Technoculture We are Tech Workers-2016-07-27
Webinar Cultivating a Library Technoculture We are Tech Workers-2016-07-27
 
Protecting Minority Languages from Digital Extinction
Protecting Minority Languages from Digital ExtinctionProtecting Minority Languages from Digital Extinction
Protecting Minority Languages from Digital Extinction
 
Supporting languages, all of them
Supporting languages, all of themSupporting languages, all of them
Supporting languages, all of them
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 
Simple english editors meeting
Simple english editors meetingSimple english editors meeting
Simple english editors meeting
 
Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
 

More from Gareth Morlais

More from Gareth Morlais (7)

Transforming Together with the DVLA
Transforming Together with the DVLATransforming Together with the DVLA
Transforming Together with the DVLA
 
Cymraeg i Oedolion 2017 v01
Cymraeg i Oedolion 2017 v01Cymraeg i Oedolion 2017 v01
Cymraeg i Oedolion 2017 v01
 
Language Technology: Democracy Participation Collaboration. Talk by Gareth Mo...
Language Technology: Democracy Participation Collaboration. Talk by Gareth Mo...Language Technology: Democracy Participation Collaboration. Talk by Gareth Mo...
Language Technology: Democracy Participation Collaboration. Talk by Gareth Mo...
 
Digital Survival Kit for your language. Sut i oroesi yn yr oes ddigidol. Canl...
Digital Survival Kit for your language. Sut i oroesi yn yr oes ddigidol. Canl...Digital Survival Kit for your language. Sut i oroesi yn yr oes ddigidol. Canl...
Digital Survival Kit for your language. Sut i oroesi yn yr oes ddigidol. Canl...
 
G morlais llandrindodhyrwyddo15-2
G morlais llandrindodhyrwyddo15-2G morlais llandrindodhyrwyddo15-2
G morlais llandrindodhyrwyddo15-2
 
Technoleg Cymraeg
Technoleg CymraegTechnoleg Cymraeg
Technoleg Cymraeg
 
EduWiki 2013 Gareth Morlais.
EduWiki 2013 Gareth Morlais. EduWiki 2013 Gareth Morlais.
EduWiki 2013 Gareth Morlais.
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

How Wikipedia helps you get your language supported by major technology companies

  • 1.
  • 2.
  • 3.
  • 4. World languages / Ieithoedd y Byd • > 7,000 language / iaith • 90% <100,000 speakers / siaradwr • 832 Papua New Guinea • 260 Ewrop • 46 with one speaker / 46 gydag un siaradwr (UNESCO, Infolang, BBC Languages)
  • 5. League position Ble mae’r Gymraeg? Where does Welsh come in the league table of the 7,000 world languages, listed by ‘number of speakers’? Wrth resti’r 7,000 o ieithoedd y byd yn nhrefn nifer siaradwyr, ble mae’r Gymraeg?
  • 6. 172
  • 7.
  • 8. The challenge for all Celtic and other languages The technology can dictate which language your family can speak at home. So you’ve got to try to make the technology understand and speak your language.
  • 9.
  • 10. To make a Siri, Dot or Google Home, you need: 1. Speech to text in your language (matched audio/text, pronunciation dictionary, Kaldi or similar) 2. Machine translation which is good in both directions (neural networks helping) 3. Some kind of artificial intelligence to make sense of the semantics (I’m simplifying) 4. Synthetic voice or text to speech 5. Buy-in from Google, Nuance, Apple, Amazon..
  • 11.
  • 12. So, how many languages do the big guns support? Sawl iaith mae’r cwmnïau mawrion yn cefnogi go iawn?
  • 13.
  • 16.
  • 19.
  • 22.
  • 25. Another challenge for your language Plan to edge your language up into the world’s Top 50, so the big companies introduce support for it But, hang on, Welsh is down at #172 in the league table!
  • 26. Key: yellow rows have support of 2+ companies Sort by number of speakers first (Col B), Then by number of Wikipedia articles in that language (Col D)
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. Ordered by number of speakers • Catalan 91 • Galician 143 • Scots Gaelic 168 (!) • Welsh 172 • Basque 174 (!) • Irish 159 • Breton 206 (Wikimedia)
  • 35. Number/(position in table) of Wikipedia articles • Catalan 541k (17) • Basque 280k (31) • Galician 139k (47) • Welsh 91k (60) • Breton 62k (74) • Irish 40k (89) • Scots Gaelic 14k (118) • Manx 5k (165) (http://wikistats.wmflabs.org/ 16 Chwefror 2017)
  • 36. How we’ve started to help tackling this for Welsh • Open licencing of public sector data and content • Robin Owain Wikimedia UK had already been automating using AutoWikiBrowser • Advice from Basque country: Gorka Julio, Josu Waliño, Galder Gonzalez • Galder Gonzalez: “The best way is determining what you want to create, and having a bit with bot permissions. Also pywikibot installed and running.” • Grants for #wicipop #wicimon and #wikiiechyd (pop, science and health, all with editathons as well as automation)
  • 37. Apart from appealing to multinationals Ar wahân i geisio swyno’r cwmnïau mawr... • Wikipedia gets people speaking & writing in their own language • Creating a valuable and important resource for everyone • Schools – Digital Competence Framework, literacy, photography, Welsh Baccalaureate.. • Golygathonau (editathons) are fun. (Like the Papur Bro folding sessions)
  • 38.
  • 39. But beware Gofal... • Quality of content and production experience is more important than quantity of articles • Machine translation yields scale but needs to be used with awareness of cultural sensitivities • Risk: Celtic languages have low number of ‘views per hour’. How can we boost these? Welsh 1,076 (67) Breton 765 (75) Irish 621 (80) Scots Gaelic 352 (111) Manx 231 (144) Cornish 172 (172)
  • 40. The challenge for Celtic and other languages Technology is dictating which languages our families can speak at home. So we’ve got to make the technology understand and speak our languages To do this, we need to raise our languages’ profiles in the eyes of the big companies. Wikipedia in our own language is an important part of this.
  • 41. Corporate slide master With guidelines for corporate presentationsWelsh PPT template The title slide of your Welsh language PowerPoint presentation should contain the Welsh Government logo and Welsh URL address as positioned here on the red template areas. Do not alter the size or position of these areas. You are NOT REQUIRED to put th branding on subsequent slides in your presentation Welsh Government Diolch Thanks Gareth Morlais @digitalst

Editor's Notes

  1. In Welsh, we have elementary command recognition (which we can build up to text to speech) and fundamental machine translation. We need better speech to text and the natural language processing denoted by the ‘loading’ icon above needs lots of work, including: the selection and implementation of a Knowledge Graph with a suitable ‘learning’ Interest Graph, a dictionary of Welsh sentimental key terms and their emotional value, expanded Sensory Integration capacity, especially around location: placenames, names of buildings, organizations, businesses, tourist attractions, etc. We also need to map the expanded list of Named Entities being developed by HRU to ontological schema Image by Jade Thomas-Rowlands.
  2. Views per hr (position) Catalan 17,389 (30) Basque 3,222 (48) Galician 2,294 (54) Welsh 1,076 (67) Breton 765 (75) Irish 621 (80) * Scots Gaelic 352 (111) * Manx 231 (144) Cornish 172 (172) * higher than article order position
  3. The current #49 Greek has 127k articles today. This number will grow over time.
  4. 41