The document summarizes the European Commission's CEF eTranslation program, which provides machine translation services for 24 EU languages. Some key points:
- CEF eTranslation launched in 2013 and provides statistical machine translation for 78 direct language pairs using the Moses engine.
- It draws over 1 billion translation segments from the EURAMIS database of official EU translations to train statistical models.
- Neural machine translation engines are being developed and rolled out gradually, with over 20 language pairs available by end of 2018.
- Usage has increased significantly since the program went online in November 2017, with over 3 million translations and 3.7 million pages translated as of March 2018 across various EU institutions and programs.
2. • Moses-based statistical machine
translation
• All 24 EU languages
78 direct language pairs + domain specific engines
• Launched 26 June 2013
• Used by EU Institutions, websites
and public administrations
•
Where we
started:
MT@EC
3. EURAMIS
DGT’s database of translated segments
• Built using official EU translations
• > 1 billion segments covering all EU
languages
+ small amounts of AR, RU, TR, ZH
• growing at ~ 1 million segments per
month
7. COUNCIL DECISION (EU)
2017/1912
of 9 October 2017
on the conclusion of the
Agreement between the
European Union and Iceland on
the protection of geographical
indications of agricultural
products and foodstuffs
THE COUNCIL OF THE
EUROPEAN UNION,
having regard to the Treaty on
the Functioning of the
European Union, and in
particular Article 207(4) first
subparagraph, in conjunction
with Article 218(6)(a) (v) and
Article 218(7),
DÉCISION (UE) 2017/1912 DU
CONSEIL
du 9 octobre 2017
concernant la conclusion de
l'accord entre l'Union
européenne et l'Islande relatif
à la protection des indications
géographiques des produits
agricoles et des denrées
alimentaires
LE CONSEIL DE L'UNION
EUROPÉENNE,
vu le traité sur le
fonctionnement de l'Union
européenne, et notamment
son article 207, paragraphe 4,
premier alinéa, en liaison avec
l'article 218, paragraphe 6,
point a) v), et avec l'article
218, paragraphe 7,
8. But issues with “normal” speech…
The chair had a
broken leg.
Le président a eu
une jambe cassée.
Il presidente ha
rotto una gamba.
Formanden havde
et brækket ben.
Ο πρόεδρος είχε ένα
σπασμένο πόδι.
9. Breaking the EU bubble
Other sources of data:
Finnish Prime Minister’s Office
Norwegian (Bokmål)
Icelandic Ministry
German Bundesbank
European Medicines Agency
European Chemicals Agency
data
Not easy to use:
How to deliver? FTP? USB key?
Irregular data formats
Overlapping data deliveries
(e.g. only 40% new for
Iceland)
12. Moving forward
• Improvements through ELRC:
Regular, consistent and clear deliveries
Clearly identified and structured
Legally cleared
• Goal:
More domain-specific, regular updates (incremental training)
• Building domain specific engines:
No predefined schedule – ad hoc according to deliveries and demand
How much data is available?
Separate or combined build?
13.
14. Neural Machine Translation Engines
• Available:
• EN DE, HU; EN ET, FI (15 November 2017)
• ET EN, FI EN (12 December 2017)
• EN LT, LV (27 February 2018)
• EN CS, GA, PL (14 March 2018)
• EN → BG, SK, SL;
GA, LV, LT → EN (4 April 2018)
15. Neural Machine Translation Engines
• EN into:
BG CS DA DE EL ES ET FI FR
GA HR HU IS IT LT LV MT NB
NL PL PT RO SK SL SV
16. Neural Machine Translation Engines
• EN into:
BG CS DA DE EL ES ET FI FR
GA HR HU IS IT LT LV MT NB
NL PL PT RO SK SL SV
• Right now!
17. Neural Machine Translation Engines
• EN into:
BG CS DA DE EL ES ET FI FR
GA HR HU IS IT LT LV MT NB
NL PL PT RO SK SL SV
• By end 2018
18. Coming up…
• EN HR, ET (retrain), FI (retrain)
• BG, PL, CS, SK, SL, HR EN
• By end 2018:
• EN FR
• EN NL, SV, DA
• EN IT
• EN PT, EL, ES, MT, RO
• all remaining EN
• FR DE
19.
20. eTranslation Usage
• From July 2017 to March 2018:
• 3 104 995 translations requested
• 3 712 913 pages translated
Remember: the web page only came on line on 15 November 2017
21. CEF DSIs
(BRIS and eHealth are still on MT@EC for the moment...)
CLIENT
TRANSLATIONS
REQUESTED
PAGES
Open Data Portal 263 689 2 242 326
EU Presidency Translator 174 481 6 006
Web Page 46 031 558 707
eProcurement 15 729 814
EExchange Social Security Information 824 26
ODR 452 274
iADAATPA 6 0
23. eTranslation needs to offer more
• Multilingual search
• Ontologies
• EU Publications Office’s Public Multilingual
Knowledge Infrastructure Project (PMKI)
• Other gaps?
23