Main document types
1 EU law, including the legislative process
2 Guardian of the Treaties/Implementation of EU law
4 Political documents
5 Relations with other EU institutions
6 Communication, web, media, publications
7 Budget, budgetary procedure
8 Documents linked to international organisations and non-EU countries
9 Notices for publication in OJ
10 Commission working or internal documents
Evolution 2012-2018 : Number of translated pages and number of DGT staff
2012 2013 2014 2015 2016 2017 2018
Long-standing use of language technology + CAT tools
"More (better) with less"
More complexity, new formats, new ways of working
Stronger recourse to outsourcing
Shift from documents to content
Machine Translation as integral part of the resource mix
Ca. 1976 to 2010
2013 - 2018
Machine translation at DGT
Buzz kill – or why I hate “AI”
• Beware of the images
• Neural MT vs. Recursive hetero-associative memories for translation
• Artificial intelligence is not about intelligence
• Neural networks have little to do with actual neurons
• Big data + neurons + deep learning + magic = Amazing stuff
• Do we really have big(-ish) data?
• Believe the hype - but in moderation
• Technology is not a solution
• Poor processes don’t get better through AI
• Doing the same and expecting different results = insanity
So, this had to be said.
But it’s pretty cool anyway.
• The technology has become accessible.
• “Big data” discussions have shown the possibilities of correlating
data from different sources.
• New ways of transforming data into usable information?
Why did it
Big data? - Big Questions!
What we translate
• What is the
• Is the document difficult, i.e.
demanding or complex?
• Are we working on
• Do we have reliable
resources for this
• How well will MT work for
• How should this content be
• Who is most suitable to
• How should the content be
split between several
• What is our capacity to
• Are there meaningful
alternatives to the existing
• How good is the contractor’s
• How confident are we that
they will deliver good
• How reliable are they?
• Can we correlate
freelancer/agency, history of
document type, document
complexity to calculate a
“reliability indicator” that
could support outsourcing
More Big Questions!
• How good is a given translation?
• How good are our language
• Can we automatically detect
technically and linguistically poor
• How can we learn from mistakes?
• What are the common issues in
• What do they have in common?
• Do we have the linguistic
resources to handle their
• What are their request patterns?
•Explore use cases and
•Validate or reject ideas
and assumptions in a
•Training (also for
•Learn what we do not