Apertium is a free/open-source platform for rule-based machine translation that supports over 40 language pairs. It has a modular pipeline architecture and is collaboratively developed by hundreds of researchers and developers. Apertium can be used for translation between related languages as well as in applications like gisting. It also produces monolingual language resources during rule development.
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
Apertium: Free Open-Source Rule-Based MT Platform for 40+ Languages
1. Apertium: Free/open-source rule-based machine
translation and language processors
Mikel L. Forcada
Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain
Riga TAUS Roundtable, June 1, 2016
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
2. What is Apertium?
What is Apertium?
Apertium (since 2005) is
a free/open-source platform for shallow-transfer rule-based machine
translation
which is collaboratively developed
and provides:
A congurable, language independent machine translation engine,
Data (dictionaries, rules) for more than 40 language pairs (in XML
and text-based formats), and
lots of tools for developers and users.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
3. What is Apertium?
Pipeline architecture
A pipelined architecture allows for easy customization and diagnostics.
lexical
transfer
morph.
analyser
morph.
disambig.
morph.
generator
post-
generator
SL
text
TL
text
deformatter
reformatter
structural
transfer
lexical
selection
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
4. What is Apertium?
Languages and language pairs
afr
nld
arg
cat
ita
bre
fra
spa
cym
eng
glg
dan
nno
nob
ast por ron
epo eus
hbs
mkd slv
bul
ind
zsmisl
swe
kaz
tat
mlt
ara
oci
sme
urd
hin
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
5. What is Apertium?
Apertium loves small languages
Some unique MT systems for small languages:
Breton→French Aragonese↔Spanish
Occitan↔Catalan Aragonese↔Catalan
Occitan↔Spanish North Sámi→Norwegian
To love is to give: e.g. provide small languages with
language resources, and
computational-linguistic descriptions of their language.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
6. What is Apertium good for?
What is Apertium good for?
Apertium is basically good to translate between related languages. Some
examples in Apertium:
Spanish ↔ Portuguese
Norwegian Nynorsk ↔ Norwegian Bokmål
Slovenian ↔ Croatian
Tatar ↔ Kazakh
Postediting Apertium output in these cases may save time compared to
translation from scratch.
It is also being used for less-related language pairs in gisting applications.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
7. Apertium is collaboratively developed
Apertium licensing: free/open-source
Apertium language data and code are both licensed under the GNU
General Public License:
a free/open-source license allowing free distribution of unmodied and
modied versions
a copylefted license: it avoids private appropriation and encourages
giving improvements back to the project (a commons) → community
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
8. Apertium is collaboratively developed
Apertium is collaboratively developed
Very active group of hundreds of developers (freelance developers,
researchers, industrial partners).
Wiki documentation (wiki.apertium.org) in addition to formal
documents.
Help available at IRC channel #apertium in freenode.net
Mailing lists: apertium-stuff@lists.sf.net and other
language-specic lists
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
9. Apertium is collaboratively developed
Research and business with Apertium
Apertium is already an active research and business platform:
Research: 40+ publications, 2 PhD thesis, 4 master's theses
Business: companies (Prompsit, Eleka, Imaxin Software, etc.)
oering services to customers such as Autodesk, the Government of
Catalonia, one of the main Basque banks, the daily newspaper La Voz
de Galicia, etc.)
The free/open-source model creates a community which eectively
connects researchers, developers, vendors and users.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
10. Becoming an Apertium user
Becoming an Apertium user
Professional translators can:
use Apertium oine plugins in the OmegaT free/open-source CAT
environment.
(as with any other system) easily align source and MT to generate
machine translation memories to feed into other CAT systems
Muggles can use:
a stand-alone Java application for the desktop: apertium-caffeine
an Android version for handhelds
a stand-alone version (Apertium Simpleton) for Windows and MacOS.
a plug-in for the OmegaT CAT platform apertium-omegat
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
11. Becoming an Apertium developer
Becoming an Apertium developer
It's easy to become an Apertium developer. It just takes
reasonable computing skills (XML, shell commands, etc.), which are
not too hard to acquire,
good translation skills.
In no time, developers nd themselves contributing to a language pair with
the support of the community.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
12. A nice side eect: monolingual resources
A nice side eect: monolingual resources
When developing a language pair, monolingual language resources are
developed, such as
morphological dictionaries
morphological disambiguation rules and probabilities
The corresponding monolingual processors are available to help statistical
machine translation deal, for instance, with languages having a challenging
morphology.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
13. Success cases
Success cases
Apertium a is mature technology which is used:
in Wikimedia Content Translation to generate Wikipedia content in
other languages,
to produce a Catalan edition of Valencia daily newspaper
Levante-EMV,
by Universities in the Catalan speaking area to help in the generation
of courseware and academic information,
in PLATA, the Spanish government platform for on-the-y webpage
machine translation of public-service webpages.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13