TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS
1. TAUS Translation Data Landscape Report
Authors: Andrew Joscelyne & Anna Samiotou
Reviewer: Jaap van der Meer
2. The report…
• was published in December 2015
• has been written by TAUS in consultation with
the EU project LT Observatory supervised by
LT Innovate
• has drawn insights through surveys of industry
and interviews with a broad range of
stakeholders
3. The report attempts to answer to:
• Who are the producers and consumers of translation
data? How are they changing?
• Is there a viable “market” for translation data, beyond
the current informal sharing or web- scraping model?
• What can we do to overcome the legal/technical issues
and concerns regarding translation data sharing?
• How could translation data sharing as a natural
practice integrate with the European Digital Single
Market program?
• Which models of translation data circulation work
best? For how long? What could disrupt them?
4. Methods to obtain Translation data
• Leveraging public and open resources
• Creating one’s own resources by human, semi-
automatic or automatic means
• Scraping the web by web crawling: Parallel text
collections to be used mainly by MT systems
• Sharing or exchanging data
• Paying for data: Stakeholders will pay for translation data
when these are known to be uniquely valuable in terms of
relevance and impact to the task at hand, are affordable and
there is no other solution
6. Scenarios for a Translation data
Marketplace
• Datasets: Buy data, sell data, exchange data, bid for data,
order data, offer specific in-domain translation data.
• Datasets & Tools: A commercial service for translation
data together with multilingual enablers and tools that can
provide fingerprints of the data, curate, benchmark, validate
the quality and relevance of the data to the task at hand.
• Trained domain MT engines: Deliver in-domain
translation engines
• Plug & play model: This is the current model used today
for accessing a service in one go.
9. How about a Translation data
Marketplace?
Drivers: highly globalized market – providing
translation data for reasonable price – allow for
benchmarking prior to purchase
Inhibitors: Using other peoples’ resources can be a
blind guess – current lack of tools – imbalance of
high & low resource languages
Challenges: enhance language coverage – address
high risk of local markets being edged by global
players and by plug & play technologies
11. Critical determinants of the way ahead
• We are at the beginning of the translation data
age.
• Content will be king and queen.
• Innovation will be vital: many different competing
solutions will emerge for streamlining the value
chain between raw data and specific translation
requirements.
• The term “translation data” has two meanings:
– we need the data to drive translation automation.
– we also vitally need data about translation: find good
data about global data usage.
Editor's Notes
These facts suggest that globally there is at present little role for any kind of independent translation data marketplace/data hub or data sharing platform.