1. Presented by:
Martin Benjamin (martin@kamusi.org)
Director of ANLoc Locales and Terminology Subprojects
Funded by IDRC, Acacia, project number 104475
Managed in IDRC regional office for Middle East and North Africa, Cairo
2. Achievements and Lessons Learned by
the African Network for Localization
Languages and information technology in
Africa: the challenges for localization
Addressing the challenges: ANLoc and its
subprojects
Lessons learned: two case studies
The long view: the outlook for IT in African
languages and African societies going forward
4. National University of Rwanda
July 2008
Rwanda, July 2008
Dar es Salaam,
Tanzania, July 2008
6. African language facts:
• As many as 2000 languages spoken in Africa by
1,000,000,000 people
• Over 200 languages are spoken by more than 500,000
people each
• At least 15 languages are spoken by more than 10,000,000
people each:
Amharic, Arabic, Berber, Chewa, Fula, Hausa, Igbo,
Kinyarwanda, Malagasy, Oromo, Shona, Somali, Swahili, Yoruba,
Zulu
• Primary education in Africa is often in local, regional, or
national languages
• IT in Africa is mostly available in English, French, or
Portuguese
7. Enter ANLoc
Adapting ICT so that it can
be used by:
• Non-specialists IT+46 training workshop,
Kampala, November 2006
• Non-elites
• Non-speakers of global L10n =
languages
Localization
• Students
• Anyone
In other words, most of
Africa’s 1 billion people
8. Addressing the challenges: ANLoc and
its subprojects
Enabling L10n Software Freedom Day,
Accra, September 2008
• Locales
• Fonts
• Keyboards
Activating L10n
• Terminologies
• Translation tools
• Spellcheckers
• Localizing software
• Training localizers
Sustaining L10n
• Language and ICT policy
• Network development
9. Enabling L10n
Locales These are things that
Fonts must exist for a language
before any software
Keyboards localization can occur
Translate Firefox event,
Kampala, August 2008
10. Locales
The basic information sets needed to
configure computers for a language
•What character sets to use
•How dates and numbers appear
•Direction text is written
•Names of days, weeks, months
•Currency symbols, measurement systems
•Other background information that computers
need for a language
In French: “paramètres régionaux”
Makes it easy to write and share
documents in a language
Makes it possible to develop software,
websites, mobile phones, ATMs, etc, for a
language
11. Fonts
Many African languages have
letters that do not exist in the
standard European character set
ANLoc is creating Free and
Open Source fonts that contain all
characters for numerous African
languages that have been included
in the UNICODE standard
Availability of a font with all the
necessary characters is elemental
for using IT in a language
Fonts are integrated with ANLoc
Keyboards and Translation Tools
Documentation and
dissemination need more attention
12. Keyboards
Mapping the characters
of a language’s alphabet to
the keys on a qwerty or
azerty keyboard
Completely integrated
with the output of the Fonts
subproject for each specific
language
12 keyboards available
in most recent Windows and
Mac builds
30 keyboards available
for Linux: http://is.gd/CjGi
Documentation and
dissemination need more
attention
13. Activating L10n
Terminologies These are the building
Translation tools blocks to ensure the
viability of L10n for a
Speelcheekers language
Localizing Software
Training Localizers
tzLUG, Dar es Salaam,
December 2008
14. Activating L10n
Terminologies These are the building
Translation tools blocks to ensure the
viability of L10n for a
Spellcheckers language
Localizing Software
Training Localizers
tzLUG, Dar es Salaam,
December 2008
15. Terminologies
2500 IT terms selected from
more than 1100 translation files
for Free and Open Source
Software
Definitions for each term in
English
Glossmaster software for
rapid glossary development by
project partners
Producing terms + definitions
in 14 African languages
Working with Translation
Bureau (Public Works and
Governments Services Canada)
to add a French component
Direct export to Virtaal
translation tool of the Tools
subproject
Free online dissemination
through PALDO (kamusi.org)
16. Translation
Tools
Provide good tools to a wide range of
users, including:
• Less skilled people
• People who cannot translate from English
• People with less-frequently provided needs,
such as custom fonts, ISO 639-3 codes,
complex writing systems, right-to-left writing
Help beginners do the right thing and
work productively right away
Integrate with existing resources such
as glossaries and translation memories
Main tools being developed
• Pootle – translation management, online
translation
• Virtaal – powerful desktop (offline)
translation tool
• Translate Toolkit – underlying technology for
other tools, with numerous tools for L10n
engineering, planning, QA, etc.
Products already in use for OpenOffice,
Mozilla, Creative Commons, OLPC, Opera,
and many others
17. Translation
Tools
Provide good tools to a wide range of
users, including:
• Less skilled people
• People who cannot translate from English
• People with less-frequently provided needs,
such as custom fonts, ISO 639-3 codes,
complex writing systems, right-to-left writing
Help beginners do the right thing and
work productively right away
Integrate with existing resources such
as glossaries and translation memories
Main tools being developed
• Pootle – translation management, online
translation
• Virtaal – powerful desktop (offline)
translation tool
• Translate Toolkit – underlying technology for
other tools, with numerous tools for L10n
engineering, planning, QA, etc.
Products already in use for OpenOffice,
Mozilla, Creative Commons, OLPC, Opera,
and many others
18. Spellcheckers
Create tools to simplify technical
development
• CorpusCatcher – collects texts from the
web
• Spelt – word classification with a focus
on productivity
Create three spellcheckers for
languages of partners in the network:
• Gikuyu – Bantu, East (Kenya),
agglutinative morphology
• Zulu – Bantu, South (South Africa),
agglutinative morphology
• Yoruba – non-Bantu, West (Nigeria), rich
tonal system
Spellcheckers are created for
Hunspell for easy integration with office
and internet tools (OpenOffice, Firefox,
Thunderbird, others)
Build expertise for more work in this
area going forward
19. Localizing
Software
Starting with Firefox, a key
software application that is free,
open source, extremely useful,
and widely used
Focus on languages for
which glossaries are being
developed in the Terminology
subproject
Creating L10n communities
with pools of expertise that can
continue with more projects
For many languages, this
is a demonstration that will
prove that L10n is viable for the
first time
20. Training
Arabize training
Localizers workshop, Cairo, July
2008
Create training course
modules with the Institute for
Localisation Professionals
(TILP Ireland) to cover local
L10n needs
Establish local pools of
skilled L10n professionals
Open source sprint will
create material aimed
directly at volunteer
localizers
Work toward a
certification system for L10n
professionals
21. Sustaining L10n
Language and ICT
policy
(taken individually and together)
Network development
These are the foundations
to ensure ongoing pursuit
of L10n for speakers of
African languages
22. Language and
ICT Policy
Review current state of
language policy around Africa
Review where language
fits into ICT policy
Provide resources for
policy planners to understand
language and ICT issues
Engage policy planners
and decision makers in
support for expanded access
to ICT through L10n
International Mother Language Day
Paris, February 2009
23. Network
Development
Website with capacity
for contributions by all
network members:
http://africanlocalization.net
Active discussion list
for partner communication
Annual network
meetings for major partner
organizations
Recruitment of new
partners through website,
subprojects, and outreach
24. Lessons learned: two case studies
Locales Terminologies
Time and effort required to Time and effort required for
recruit participants through software development
networks
Payment model for
Volunteer model for data significant data contributions
contributions
Technology obstacles for
Upstreaming data: finding a African partners
thirst for project outputs
Managing the scope:
finding a hunger for joining in
25. Lessons learned: Locales
Time and effort required to
recruit participants
through networks
Ambitious goal of 100 languages
Need to find people with the
necessary combination of computer
skills, network access, and language
knowledge
For languages in the long tail,
that means we need to identify and
recruit from among about ½ million
total speakers
Even some languages with more
than 10,000,000 speakers have not
produced a single volunteer
26. Broadcasting through existing
Lessons learned: Locales networks (mailing lists, newsgroups)
Time and effort required to
recruit participants Exploring new social networking
through networks opportunities (Facebook, Twitter)
Using the personal networks of
ANLoc members
27. Lessons learned: Locales
Volunteer model for data
contributions
Amount of work is only 2 to 3 hours per language
Small payments to 100 people in 50 countries would be a logistical
nightmare (even if we had a budget to cover it)
New recruitment campaigns have addressed this question head on:
“And to answer the most common question in advance, yes, volunteer
means for free - for your language, for your country, but not for money.”
28. Lessons learned: Locales Google IBM Wikimedia Foundation
Upstreaming data: finding a
thirst for project outputs CLDR (Common Locale Data Repository)
29. Lessons learned: Terminologies
Time and effort required for
software development
Software must be:
Simple to use
Fast
Lightweight
Deal with numerous
linguistic complexities
Interlink numerous
languages
30. Lessons learned: Terminologies
Payment model for significant
data contributions
Project takes about 2 months of
professional labor per language
Payment for each language
occurs when all 2500 entries are
complete
Payment model insures that
work gets done and that quality
control can be implemented
31. Lessons learned: Terminologies
Technology obstacles for African
partners
Power outages
Connectivity problems
Adequate equipment has not
been a problem for our partners
32. Lessons learned: Terminologies
Managing the scope: finding a
hunger for joining in
Project provides a
consistent, carefully chosen set
of L10n terminology that can
be used for any language
English glossary with clear
definitions is a resource that is
not available to localizers
elsewhere on the web
Project cut from 24
languages to 12 to fit within
budget constraints
Additional language
groups are seeking to join on a
volunteer basis
33. The long view: the outlook for IT in
African languages and African societies
going forward
Use of ANLoc outputs
by consumers
Continued L10n
through people and tools
enabled by ANLoc
Strong and growing
network of African IT and
language professionals
Increased industry
L10n activity
Establishing the
expectation that IT will be
available in African
languages: making
localization the new normal
Isimikinyi, Tanzania Isimikinyi, Tanzania
June 2005 July 2008
34. Presented by:
Martin Benjamin (martin@kamusi.org)
Director of ANLoc Locales and Terminology Subprojects
Funded by IDRC, Acacia, project number 104475
Managed in IDRC regional office for Middle East and North Africa, Cairo
Editor's Notes
4 languages of Rwanda: English, French, Swahili, Kinyarwanda
Story about getting a business card made in Kigali. Software in English and French, conversation in Kinyarwanda and a bit of Swahili, cards took about an hour to design because the designer couldn’t read all the menus. National University of Rwanda has a haphazard collection of computers that use English or French, depending on who donated or purchased the equipment.