Optimized
Big Data Approach
to
Machine Translation
Diego Bartolome
@diegobartolome
dbc@tauyou.com
Data for Machine Translation
Some techniques might work...
Baseline engines
In-domain + out-of-domain balance
Domain-specific engines
But ...
Baseline engines
In-domain + out-of-domain balance
Domain-specific engines
Big Data Approach
Unilingual
Texts
Small Data
Glossaries/
Dictionaries
Translation
Memories
.
.
.External
Data
Average Improvement
+21%
Sample Results
Important topics
Supervised Data Classification
Data Clustering
Parameter optimization
Key Performance Indicators (KPIs)
P...
Not everything that can be counted
counts, and not everything that
counts can be counted.
William Bruce Cameron
Optimized Big Data Approach to Machine Translation @ TAUS
Optimized Big Data Approach to Machine Translation @ TAUS
Optimized Big Data Approach to Machine Translation @ TAUS
Upcoming SlideShare
Loading in …5
×

Optimized Big Data Approach to Machine Translation @ TAUS

664 views

Published on

The volume of available multilingual data has exploded. One option is to create machine translation systems based on the previously translated segments, with as much data as possible. This works in many cases, but it is well known in the market that the cleaner the data, the better the results in terms of productivity, cost, and even quality. tauyou <language> has made an additional step in the process by optimizing the translation engines on a per-document basis, which has proven to provide a significant quality increase in the machine translation output. This approach, linked to a joint source content optimization and summarization, leads to significant savings in multilingual communications.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
664
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Optimized Big Data Approach to Machine Translation @ TAUS

  1. 1. Optimized Big Data Approach to Machine Translation Diego Bartolome @diegobartolome dbc@tauyou.com
  2. 2. Data for Machine Translation
  3. 3. Some techniques might work... Baseline engines In-domain + out-of-domain balance Domain-specific engines
  4. 4. But ... Baseline engines In-domain + out-of-domain balance Domain-specific engines
  5. 5. Big Data Approach Unilingual Texts Small Data Glossaries/ Dictionaries Translation Memories . . .External Data
  6. 6. Average Improvement +21%
  7. 7. Sample Results
  8. 8. Important topics Supervised Data Classification Data Clustering Parameter optimization Key Performance Indicators (KPIs) Predictive MT quality estimation Measure + Measure + Measure
  9. 9. Not everything that can be counted counts, and not everything that counts can be counted. William Bruce Cameron

×