5. manuel arcedillo & juanjo arevalillo (hermes) translation memories

903 views
657 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
903
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

5. manuel arcedillo & juanjo arevalillo (hermes) translation memories

  1. 1. Translation memories Hermes Traducciones y Servicios Lingüísticos
  2. 2. A brief history…
  3. 3. Processes have changed… …but not the ultimate goal.
  4. 4. Productivity
  5. 5. Found in Translation, Nataly Kelly & Jost Zetzsche (2012)
  6. 6. LAN Translation Memory LAN Server Project Managers Translators Engineering Revisers
  7. 7. WAN Translator Reviser DTPer Project Manager INTERNET Translation Memory LAN Server Project Managers Translators Engineering Revisers
  8. 8. Clouding Crowdsourcing MT TEnTs CAT SaaS Translation Memory LAN Server Project Managers Engineering Revisers Translators
  9. 9. Internals of a translation memory
  10. 10. Translation Memory Exchange •OSCAR (Open Standards for Container/Content Allowing Re-use) •TMX Standard (Translation Memory eXchange). •Leveraging of translation memories regardless the tool or platform.
  11. 11. The ancestors of CAT Tools… XL8 DOS tool in a workflow known as XLN
  12. 12. IBM TranslationManager Translation proposal Exact match Source text Proposed terms in dictionary
  13. 13. Trados Workbench
  14. 14. Déjà-Vu
  15. 15. Star Transit (no memory!)
  16. 16. WordFast
  17. 17. SDLx
  18. 18. memoQ
  19. 19. OmegaT (free!)
  20. 20. Workflow tools: Across
  21. 21. Across
  22. 22. SDL Idiom World Server
  23. 23. Specialised tools: Catalyst
  24. 24. Specialised tools: Passolo
  25. 25. Basic TM features in CAT tools  Leverage of previous translations.  Analysis for quoting, planning and keeping track of progress.  Concordance for sub-segment searches.  Maintenance to perform global changes, import/export content, etc.
  26. 26. Leveraging TMs CAT tools provide answers to these questions:  What is the fuzzy match of the segment?  What parts of the text are different?  Where is the match coming from?
  27. 27. Fuzzy match display
  28. 28. Fuzzy match display (II)
  29. 29. Fuzzy match display (III)
  30. 30. Fuzzy match display (IV)
  31. 31. Analysis feature  Every word from each segment is assigned to a different match band: 101% 100% 99-95% 94-85% 84-75% New words Repetitions
  32. 32. Analysis results
  33. 33. Different tools, different word counts CAT Tool 1 CAT Tool 2 101% 41,352 101% 29,782 100% 4194 100% 16,002 99-95% 3698 99-95% 6038 94-85% 2077 94-85% 2633 84-75% 5270 84-75% 1369 New words 5241 New words 6150 Repetitions 2068 Repetitions 5451 Total 63,900 Total 58,425
  34. 34. Different word counts  There is no standard fuzzy matching algorithm.  CAT tools may have different auto-substitution elements:  numbers, dates, acronyms, variables, etc.     Different approaches to 101% matches. Cross-file repetitions and internal fuzzy leverage. Different file format filters. Different segmentation rules.  SRX is the standard for segmentation rules.
  35. 35. Weighted word count  Each band is assigned a percentage of the full word rate according to a weighting scheme (negotiable per client). For example: 101% 0% 100% 20% 99-95% 30% 94-85% 40% 84-75% 50% New words 100% Repetitions 20%
  36. 36. Different tools, different word counts (II) CAT Tool 1 Band 41,352 Weighted words Words 101% CAT Tool 2 x 0% Band Words 0 101% 29782 Weighted words x 0% 0 100% 4194 x 20% 839 100% 16002 x 20% 3200 99-95% 3698 x 30% 1109 99-95% 6038 x 30% 1811 94-85% 2077 x 40% 831 94-85% 2633 x 40% 1053 84-75% 5270 x 50% 2635 84-75% 1369 x 50% 684 New words 5241 x 100% 5241 New words 6150 x 100% 6150 Repetitions 2068 x 20% 414 Repetitions 5451 x 20% 1090 11,069 Total Total 63,900 58,425 14,989
  37. 37. Weigted word count tools
  38. 38. TMs and statistical analysis  If big enough, TMs provide the bilingual corpus necessary to build SMT engines.  Some CAT tools can scan the TM in search of correlation between words in source and target.

×