Introduction To Translation Technologies


Published on

A subset of the presentation that I use for my "Introduction to translation technologies" course at Lessius Hogeschool, Antwerp (Belgium).

Published in: Technology, Business

Introduction To Translation Technologies

  1. 1. xenotext xenotext Introduction to translation technologies Gerrit Sanders Computer-Assisted Translation
  2. 2. Computer-assisted translation xenotext xenotext Computer-assisted translation (CAT) or computer-aided translation is a translation process in which a human translator uses software to obtain a higher degree of precision and efficiency. 2 Computer-Assisted Translation Introduction
  3. 3. Computer-assisted translation xenotext xenotext Typical components of a CAT-solution include: Data mining tools: Translation memory alignment and (TM) term extraction Translation editor Quality assurance Translation management Termbase system (TMS) 33 Computer-Assisted Translation Introduction
  4. 4. xenotext xenotext Translation memory (TM) Computer-Assisted Translation
  5. 5. Translation memory xenotext xenotext A translation memory (TM) is a database that stores sentences and their translations for reuse in new translation projects. This is a Ceci est This is a sentence. sentence. Ceci est une phrase. une phrase. 5 Computer-Assisted Translation Translation memory
  6. 6. Translation unit xenotext xenotext A record in the translation memory is called a translation unit (TU). source segment This is a sentence. target segment Ceci est une phrase. Created on: 18/09/2006 Created by: Gerrit information fields Customer: ACME Project: Training 6 Computer-Assisted Translation Translation memory
  7. 7. Segmentation xenotext xenotext Segmentation is the process of splitting the new source text into logical, reusable units. Segmentation can be either sentence-based or paragraph-based. Paragraph-based segmentation Sentence-based segmentation 1 Welcome to Brussels 1 Welcome to Brussels 2 Brussels is the capital of 2 Brussels is the capital of Belgium. It is officially bilingual. Belgium. 3 It is officially bilingual. 7 Computer-Assisted Translation Translation memory
  8. 8. Match types xenotext xenotext Translation memory (TM) 0% 99% or lower 100% 101% ?? No match Fuzzy match Exact match Context match The new source The new source The new source The new source segment is segment is segment is segment is not found in the similar (but not identical to a identical to a TM. identical) to a source segment source segment source segment found in the TM. found in the TM found in the TM. and they both have the same context. 8Computer-Assisted Translation Translation memory
  9. 9. TMX xenotext xenotext • Most translation memory tools support TMX (Translation Memory eXchange), an XML-based open standard for the exchange of translation memory data. • TMX is developed and maintained by LISA (  TMX does not ensure 100% compatibility between different translation tools: e.g. segmentation or formatting may be handled in different ways. 9 Computer-Assisted Translation Translation memory
  10. 10. SRX xenotext xenotext • SRX (Segmentation Rules eXchange) is an XML-based open standard for the exchange of segmentation rules. • Without SRX, TMX leverage may be lower than expected. • SRX is developed and maintained by LISA (  SRX is currently not supported by SDL Trados. 10 Computer-Assisted Translation Translation memory
  11. 11. xenotext xenotext Translation editor Computer-Assisted Translation
  12. 12. Translation editor xenotext xenotext • A translation editor is the translator's working environment, offering easy access to source and target segments. • Translation editors typically include spelling checkers in a wide variety of languages, and may enable the user to add comments or status indications to each translation. • File filters convert the source document to a translatable (or localizable) format, such as XLIFF. 12 Computer-Assisted Translation Translation editor
  13. 13. File filters xenotext xenotext Source Document Translation Editor Target Document HTML DLL HTML DLL EXE PowerPoint EXE PowerPoint InDesign PHP InDesign PHP SGML FrameMaker SGML FrameMaker XLIFF DOCX File filters File filters DOCX PDF RTF PDF RTF QuarkXPress QuarkXPress OpenOffice Excel OpenOffice Excel TXT XML TXT XML DITA DITA PageMaker PageMaker 13 Computer-Assisted Translation Translation editor
  14. 14. XLIFF xenotext xenotext • XLIFF (XML Localization Interchange File Format) is an XML-based open standard for translatable (or localizable) files. • XLIFF is developed and maintained by OASIS (  There are various "flavours" of XLIFF (e.g. SDLXLIFF), which in practice complicates the interchange of XLIFF data between different tools. 14 Computer-Assisted Translation Translation editor
  15. 15. XLIFF xenotext xenotext XLIFF (localization data) source target skeleton (other data) 15 Computer-Assisted Translation Translation editor
  16. 16. xenotext xenotext Alignment Computer-Assisted Translation
  17. 17. Alignment xenotext xenotext Alignment is the process in which specialized software compares a source text with its translation, matching equivalent segments, e.g. for the purpose of creating a translation memory. In a semi-automatic alignment process, the alignment results are reviewed and misalignments are corrected by a human linguist. 17 17 Computer-Assisted Translation Alignment
  18. 18. Alignment process xenotext xenotext legacy segmentation revision export import documents + alignment source file TMX translation memory target file 18 Computer-Assisted Translation Alignment
  19. 19. xenotext xenotext Termbase Computer-Assisted Translation
  20. 20. Example entry structure xenotext xenotext Entry Subject Note English Definition Source Term Gender Source Term Gender Source French Definition Source Term Gender Source 20 Computer-Assisted Translation Termbase
  21. 21. Concept-oriented termbases xenotext xenotext Your concept may look like this All terms and synonyms referring to the same concept should be stored in the same entry: car, motorcar, automobile, voiture, bagnole, ... This will ensure that each language in your termbase can be used as source or target language. 21 Computer-Assisted Translation Termbase
  22. 22. TBX xenotext xenotext • TBX (TermBase eXchange) is an XML-based open standard for exchanging structured terminological data. • The TBX standard is developed by LISA ( and has also been published as an ISO standard. 22 22 Computer-Assisted Translation Termbase
  23. 23. Term extraction xenotext xenotext Term extraction (or terminology extraction) is the process of extracting mono- or bilingual lists of potentially interesting terms from a selection of electronic texts. 23 23 Computer-Assisted Translation Termbase
  24. 24. Terminology extraction xenotext xenotext Linguistic term extraction: • uses grammatical information to identify term candidates (and their translations) • language dependent Statistical term extraction: • looks for repeated sequences of lexical items • language independent 24 Computer-Assisted Translation Termbase
  25. 25. xenotext xenotext Quality assurance (QA) Computer-Assisted Translation
  26. 26. xenotext xenotext Quality assurance (QA) tools detect formal errors in translations and/or translation memories, and enable their correction. Traceable errors include omissions, inconsistent translations, punctuation differences, formatting problems, terminology errors etc.  QA tools do NOT guarantee a flawless translation! 26 26 Computer-Assisted Translation Quality assurance
  27. 27. The end... xenotext xenotext 27 Computer-Assisted Translation