Translation Memory systems
Technology in the service of the translation professional




                      Elina Lagou...
Translation Memory (TM) system: a definition



      Computer application that allows the
      user to store previous tra...
How does it work?

•   Initially, the program splits the pair of texts (original +
    translation) into segments (=units,...
Types of matching
• Exact (100%) match
Example of a source segment: “Every employee is expected to act in
accordance with ...
Context of use
• Scenario 1: new translation projects bearing
  similarities (in terms and/or expressions) with
  previous...
Ideal text type candidates for TM use
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Ideal text type candidates for TM use

✓   Repetitive text: technical (e.g. manuals, technical documentation),
    financia...
Benefits deriving from the use of TM systems


       enjoy increased productivity
   ★


       access and reuse ideas fro...
TM systems: Facts & Misconceptions

✓   Stores/references each segment with its translation (translation units) in
    the...
Overview of TM systems

Approx. 30 different TM systems currently available on the market!

Some of them are:

•   Déjà Vu...
Differences between TM systems


  In terms of design (user interface):

  • some TM tools are integrated into MS Word (as...
TRADOS
Heartsome Translation Suite: XLIFF Translation Editor
Differences between TM systems

  In terms of technology implemented:

  • granularity of segmentation (at sentence, phras...
Importance of TM systems
Importance of TM systems

•   New socio-economic conditions instigated by globalisation
    • high demand for multilingual...
Focus of current research
 •   User-friendlier TM tools

 •   Expansion of the scope of use of a TM tool

 •   Functionali...
Latest developments in TM research 1/4


 • Challenge: ability to have some context
   for the proposed match
 • Solution ...
Latest developments in TM research 2/4

 • Challenge: improve match recall (find
   ALL matches available in the TM for our...
Latest developments in TM research 3/4



 • Challenge: improve match precision (find the
   CORRECT match for our source s...
Latest developments in TM research 4/4


 •   Challenge: What can the system offer if it cannot find any exact or fuzzy
   ...
Current trends in TM research


 Functionality optimization & expansion
 •   Optimizing interoperability (pressure to adop...
Future directions in TM research

  Enhanced access to linguistic resources & deployment
  •   Provision of tools which in...
Technology should not lead us!

   Translators shouldn’t be afraid of
   technology, but keep an open mind and
   seek to ...
Upcoming SlideShare
Loading in …5
×

Translation Memory systems: Technology in the service of translation professionals

10,163 views
10,030 views

Published on

Presentation at 1st Athens International Conference on Translation and Interpretation, October 2006.

Published in: Technology, Business
2 Comments
21 Likes
Statistics
Notes
No Downloads
Views
Total views
10,163
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
2
Likes
21
Embeds 0
No embeds

No notes for slide

Translation Memory systems: Technology in the service of translation professionals

  1. 1. Translation Memory systems Technology in the service of the translation professional Elina Lagoudaki Mark Shuttleworth Imperial College London
  2. 2. Translation Memory (TM) system: a definition Computer application that allows the user to store previous translations along with their originals in a database (or index) and to re-use them in new translation projects, when similar source text is encountered
  3. 3. How does it work? • Initially, the program splits the pair of texts (original + translation) into segments (=units, e.g. sentences, phrases or words) and then it aligns them (a pair of aligned segments is what we often call a ‘translation unit’) • It then stores and indexes the translation units in a database (or index) in an organised way with a variety of information attached • When the user starts a new translation which has some source segments identical or similar to the ones existing in the database, the system recognises them and retrieves the past translation for those segments and suggests them to the user
  4. 4. Types of matching • Exact (100%) match Example of a source segment: “Every employee is expected to act in accordance with the Business Principles.” TU that exists in the database: EN(UK) “Every employee is expected to act in accordance with the Business Principles” - IT (IT) “Ogni dipendente è tenuto ad agire in conformità ai Principi commerciali” • Fuzzy match Example of a source segment: “Key milestones in the development of our company can be found in the following section.” TU that exists in the database: EN (UK) “Key milestones in the development of Intracom can be found in the following sections.” IT (IT) “I punti salienti della storia di Intracom sono delineati nelle seguenti sezioni.”
  5. 5. Context of use • Scenario 1: new translation projects bearing similarities (in terms and/or expressions) with previous translations - texts with a great amount of terminology repeating throughout the document • Scenario 2: large project done simultaneously by several translators (problem with terminology consistency) • Scenario 3: updated (revised) documents • Scenario 4: translation of websites, desktop publishing files and software (localisation)
  6. 6. Ideal text type candidates for TM use
  7. 7. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal
  8. 8. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal ✓ Websites (HTML, XML files)
  9. 9. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal ✓ Websites (HTML, XML files) ✓ Software (Java properties, Windows resource files, etc.)
  10. 10. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal ✓ Websites (HTML, XML files) ✓ Software (Java properties, Windows resource files, etc.) ✓ Text contained in complex formats (such as DTP files: FrameMaker, Illustrator, Interleaf, Pagemaker, etc.)
  11. 11. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal ✓ Websites (HTML, XML files) ✓ Software (Java properties, Windows resource files, etc.) ✓ Text contained in complex formats (such as DTP files: FrameMaker, Illustrator, Interleaf, Pagemaker, etc.) NOT SUITABLE for:
  12. 12. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal ✓ Websites (HTML, XML files) ✓ Software (Java properties, Windows resource files, etc.) ✓ Text contained in complex formats (such as DTP files: FrameMaker, Illustrator, Interleaf, Pagemaker, etc.) NOT SUITABLE for: • Literary texts
  13. 13. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal ✓ Websites (HTML, XML files) ✓ Software (Java properties, Windows resource files, etc.) ✓ Text contained in complex formats (such as DTP files: FrameMaker, Illustrator, Interleaf, Pagemaker, etc.) NOT SUITABLE for: • Literary texts • Short texts (one paragraph document, slogans, etc.)
  14. 14. Ideal text type candidates for TM use ✓ Repetitive text: technical (e.g. manuals, technical documentation), financial, legal ✓ Websites (HTML, XML files) ✓ Software (Java properties, Windows resource files, etc.) ✓ Text contained in complex formats (such as DTP files: FrameMaker, Illustrator, Interleaf, Pagemaker, etc.) NOT SUITABLE for: • Literary texts • Short texts (one paragraph document, slogans, etc.) • One-off projects / small volume of translation work
  15. 15. Benefits deriving from the use of TM systems enjoy increased productivity ★ access and reuse ideas from previous translations ★ never translate the same sentence twice (time and ★ effort savings) consistency in terminology ★ uniformity in style ★ improvement in the quality of translation ★
  16. 16. TM systems: Facts & Misconceptions ✓ Stores/references each segment with its translation (translation units) in the TM database/index. ✓ Looks up each new segment to find a matching one and offers translations made for the same or similar segments. ✓ Connects to a terminology database for automatic term look-up. ✓ The bigger the TM database gets the more valuable it becomes. • TM systems differ from Machine Translation (MT) systems. TMs do not replace the translator. • The TM database is initially empty, the translators have to fill it themselves. • TM systems are not limited to a specific language pair like MT systems, a Translation Memory can be used for any pair of languages.
  17. 17. Overview of TM systems Approx. 30 different TM systems currently available on the market! Some of them are: • Déjà Vu • WordFast • SDL TRADOS • MultiTrans • Omega-T • TrAID
  18. 18. Differences between TM systems In terms of design (user interface): • some TM tools are integrated into MS Word (as add- ins) • others offer their own text processing environment (usually in a tabular way)
  19. 19. TRADOS
  20. 20. Heartsome Translation Suite: XLIFF Translation Editor
  21. 21. Differences between TM systems In terms of technology implemented: • granularity of segmentation (at sentence, phrase, or word level) • indexing method (indexing of segments vs. full-text indexing) • match retrieval techniques (structure-based vs. content based)
  22. 22. Importance of TM systems
  23. 23. Importance of TM systems • New socio-economic conditions instigated by globalisation • high demand for multilingual documentation • growing demand for translations from the life sciences industry • the translation industry is required to cope with larger volumes of translation work, faster turnaround, and a great variety of text formats • The deployment of technology seems to be the only way to address the new challenges • Growing interest and investment in the development of new tools and the improvement of existing ones
  24. 24. Focus of current research • User-friendlier TM tools • Expansion of the scope of use of a TM tool • Functionality expansion • Enhancement of access to linguistic resources via a unified TM platform • Optimised leveraging of previously translated content – improved fuzzy matching • Standardisation & efficient exchange of TM resources
  25. 25. Latest developments in TM research 1/4 • Challenge: ability to have some context for the proposed match • Solution provided by the full-text approach and the alignment of source and target documents at the paragraph level (examples: MultiTrans, LogiTrans, Lingotek)
  26. 26. Latest developments in TM research 2/4 • Challenge: improve match recall (find ALL matches available in the TM for our source segment) • Solution developed: structure-based technique which uses character string matching algorithms to look for a match not only in segments but also in sub- parts of the segments (example: DéjàVu X)
  27. 27. Latest developments in TM research 3/4 • Challenge: improve match precision (find the CORRECT match for our source segment) • Solution developed: content-based matching techniques - each segment is annotated with grammatical information and constitutes a ‘translation pattern’ - matches are sought by a deep-structure pattern recognition method which looks beyond the surface appearance of segments (Masterin)
  28. 28. Latest developments in TM research 4/4 • Challenge: What can the system offer if it cannot find any exact or fuzzy match in its database? • Solution developed: implementation of Machine Translation techniques • if, for a source segment, the system finds two sub-segments that exist in two different segments stored in the TM database, it uses Example- Based Machine Translation techniques to put together the two sub- segments to form a new suggested match (DéjàVu X) • the system constructs and suggests a fuzzy match from the available resources in the database (such as a lexicon) by applying translation heuristics (Masterin)
  29. 29. Current trends in TM research Functionality optimization & expansion • Optimizing interoperability (pressure to adopt the TMX and SRX standards to facilitate the exchange of resources between different TM systems) • Expanding the scope of use: assist in the translation of general texts as well as technical texts (a system that relies less on repetition and more on the linguistic resources it contains) • Integrating TM modules into Content Management Systems
  30. 30. Future directions in TM research Enhanced access to linguistic resources & deployment • Provision of tools which integrate linguistic resources (such as bilingual corpora - parallel or aligned, glossaries, dictionaries - online or on CD- ROMs) in the TM database efficiently, easily, quickly and on a large scale • Development of language resources (glossaries, dictionaries) as add-ins that will be sold along with the TM application • Optimization of the deployment of acquired resources (improve algorithms for search and retrieval of matches => more relevant results, quicker + with useful linguistic information) • Exploitation of the Web as a source of bilingual (or monolingual) corpora
  31. 31. Technology should not lead us! Translators shouldn’t be afraid of technology, but keep an open mind and seek to be informed about these systems; there are solutions available that can take care of the grunt work and, thus, let the translator focus on the creative part of translation.

×