Translation memories
Hermes Traducciones y Servicios Lingüísticos
A brief history…
Processes have changed…

…but not the ultimate goal.
Productivity
Found in Translation,
Nataly Kelly & Jost Zetzsche

(2012)
LAN
Translation
Memory
LAN Server

Project
Managers

Translators

Engineering

Revisers
WAN
Translator

Reviser

DTPer

Project Manager

INTERNET

Translation
Memory
LAN Server
Project
Managers

Translators

En...
Clouding
Crowdsourcing

MT
TEnTs

CAT
SaaS
Translation
Memory

LAN Server
Project
Managers

Engineering

Revisers
Translat...
Internals of a translation memory
Translation Memory Exchange

•OSCAR (Open Standards for Container/Content
Allowing Re-use)
•TMX Standard (Translation Memo...
The ancestors of CAT Tools…

XL8

DOS tool in a workflow known as XLN
IBM TranslationManager

Translation
proposal

Exact match

Source text

Proposed terms in dictionary
Trados Workbench
Déjà-Vu
Star Transit (no memory!)
WordFast
SDLx
memoQ
OmegaT (free!)
Workflow tools: Across
Across
SDL Idiom World Server
Specialised tools: Catalyst
Specialised tools: Passolo
Basic TM features in CAT tools

 Leverage of previous translations.
 Analysis for quoting, planning and keeping
track of...
Leveraging TMs

CAT tools provide answers to these questions:
 What is the fuzzy match of the segment?
 What parts of th...
Fuzzy match display
Fuzzy match display (II)
Fuzzy match display (III)
Fuzzy match display (IV)
Analysis feature

 Every word from each segment is assigned to a different match band:
101%
100%
99-95%
94-85%
84-75%
New...
Analysis results
Different tools, different word counts
CAT Tool 1

CAT Tool 2

101%

41,352

101%

29,782

100%

4194

100%

16,002

99-95...
Different word counts

 There is no standard fuzzy matching algorithm.
 CAT tools may have different auto-substitution e...
Weighted word count
 Each band is assigned a percentage of the full word rate
according to a weighting scheme (negotiable...
Different tools, different word counts (II)
CAT Tool 1
Band

41,352

Weighted
words

Words

101%

CAT Tool 2

x 0%

Band

...
Weigted word count tools
TMs and statistical analysis
 If big enough, TMs provide the bilingual corpus
necessary to build SMT engines.
 Some CAT ...
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
Upcoming SlideShare
Loading in …5
×

5. manuel arcedillo & juanjo arevalillo (hermes) translation memories

567
-1

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
567
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

5. manuel arcedillo & juanjo arevalillo (hermes) translation memories

  1. 1. Translation memories Hermes Traducciones y Servicios Lingüísticos
  2. 2. A brief history…
  3. 3. Processes have changed… …but not the ultimate goal.
  4. 4. Productivity
  5. 5. Found in Translation, Nataly Kelly & Jost Zetzsche (2012)
  6. 6. LAN Translation Memory LAN Server Project Managers Translators Engineering Revisers
  7. 7. WAN Translator Reviser DTPer Project Manager INTERNET Translation Memory LAN Server Project Managers Translators Engineering Revisers
  8. 8. Clouding Crowdsourcing MT TEnTs CAT SaaS Translation Memory LAN Server Project Managers Engineering Revisers Translators
  9. 9. Internals of a translation memory
  10. 10. Translation Memory Exchange •OSCAR (Open Standards for Container/Content Allowing Re-use) •TMX Standard (Translation Memory eXchange). •Leveraging of translation memories regardless the tool or platform.
  11. 11. The ancestors of CAT Tools… XL8 DOS tool in a workflow known as XLN
  12. 12. IBM TranslationManager Translation proposal Exact match Source text Proposed terms in dictionary
  13. 13. Trados Workbench
  14. 14. Déjà-Vu
  15. 15. Star Transit (no memory!)
  16. 16. WordFast
  17. 17. SDLx
  18. 18. memoQ
  19. 19. OmegaT (free!)
  20. 20. Workflow tools: Across
  21. 21. Across
  22. 22. SDL Idiom World Server
  23. 23. Specialised tools: Catalyst
  24. 24. Specialised tools: Passolo
  25. 25. Basic TM features in CAT tools  Leverage of previous translations.  Analysis for quoting, planning and keeping track of progress.  Concordance for sub-segment searches.  Maintenance to perform global changes, import/export content, etc.
  26. 26. Leveraging TMs CAT tools provide answers to these questions:  What is the fuzzy match of the segment?  What parts of the text are different?  Where is the match coming from?
  27. 27. Fuzzy match display
  28. 28. Fuzzy match display (II)
  29. 29. Fuzzy match display (III)
  30. 30. Fuzzy match display (IV)
  31. 31. Analysis feature  Every word from each segment is assigned to a different match band: 101% 100% 99-95% 94-85% 84-75% New words Repetitions
  32. 32. Analysis results
  33. 33. Different tools, different word counts CAT Tool 1 CAT Tool 2 101% 41,352 101% 29,782 100% 4194 100% 16,002 99-95% 3698 99-95% 6038 94-85% 2077 94-85% 2633 84-75% 5270 84-75% 1369 New words 5241 New words 6150 Repetitions 2068 Repetitions 5451 Total 63,900 Total 58,425
  34. 34. Different word counts  There is no standard fuzzy matching algorithm.  CAT tools may have different auto-substitution elements:  numbers, dates, acronyms, variables, etc.     Different approaches to 101% matches. Cross-file repetitions and internal fuzzy leverage. Different file format filters. Different segmentation rules.  SRX is the standard for segmentation rules.
  35. 35. Weighted word count  Each band is assigned a percentage of the full word rate according to a weighting scheme (negotiable per client). For example: 101% 0% 100% 20% 99-95% 30% 94-85% 40% 84-75% 50% New words 100% Repetitions 20%
  36. 36. Different tools, different word counts (II) CAT Tool 1 Band 41,352 Weighted words Words 101% CAT Tool 2 x 0% Band Words 0 101% 29782 Weighted words x 0% 0 100% 4194 x 20% 839 100% 16002 x 20% 3200 99-95% 3698 x 30% 1109 99-95% 6038 x 30% 1811 94-85% 2077 x 40% 831 94-85% 2633 x 40% 1053 84-75% 5270 x 50% 2635 84-75% 1369 x 50% 684 New words 5241 x 100% 5241 New words 6150 x 100% 6150 Repetitions 2068 x 20% 414 Repetitions 5451 x 20% 1090 11,069 Total Total 63,900 58,425 14,989
  37. 37. Weigted word count tools
  38. 38. TMs and statistical analysis  If big enough, TMs provide the bilingual corpus necessary to build SMT engines.  Some CAT tools can scan the TM in search of correlation between words in source and target.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×