DITA and Translation Best Praticices

0 views
1,845 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
0
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

DITA and Translation Best Praticices

  1. 1. <ul><ul><li>DITA and Translation </li></ul></ul><ul><ul><li>Best Practices </li></ul></ul>Andrzej Zydroń: azydron@xml-intl.com DITA Europe™ Conference 2006
  2. 2. DITA Strengths <ul><li>XML Based Open Standard </li></ul><ul><li>Intelligent architecture </li></ul><ul><li>Topic based authoring </li></ul><ul><li>Reuse </li></ul><ul><li>Extensible </li></ul><ul><li>Powerful built-in features: </li></ul><ul><ul><li>Conditional Meta data based processing </li></ul></ul><ul><ul><li>Automatic Substitution of text </li></ul></ul>
  3. 3. DITA and Translation <ul><li>The Good: </li></ul><ul><ul><li>Topic level reuse </li></ul></ul><ul><ul><li>Maps </li></ul></ul><ul><ul><li>&quot;translate&quot; attribute </li></ul></ul><ul><ul><li>xml:lang </li></ul></ul><ul><li>The Bad: </li></ul><ul><ul><li>Translatable attributes </li></ul></ul><ul><ul><li>Typographical elements </li></ul></ul><ul><li>The Ugly: </li></ul><ul><ul><li>Conref </li></ul></ul><ul><ul><li>Nesting </li></ul></ul>
  4. 4. Translating XML <ul><li>The importance of open standards: </li></ul><ul><li>W3C ITS Document Rules </li></ul><ul><li>Unicode TR29 </li></ul><ul><li>LISA OSCAR SRX </li></ul><ul><li>LISA OSCAR xml:tm </li></ul><ul><li>LISA OSCAR TMX </li></ul><ul><li>LISA OSCAR GMX </li></ul><ul><li>OASIS XLIFF </li></ul>
  5. 5. W3C ITS Document Rules <ul><li>http://www.w3.org/International/its </li></ul><ul><li>Internationalization Tag Set </li></ul><ul><ul><li>Develop a set of elements and attributes that support the internationalization and localization of XML documents </li></ul></ul><ul><ul><li>provide best practice techniques </li></ul></ul><ul><li>Very ambitious and far reaching review of XML Localization </li></ul><ul><li>Document rules specification provides mechanism for defining: </li></ul><ul><ul><li>Scope, Translatability, Directionality, Inline elements </li></ul></ul>
  6. 6. Unicode TR29 <ul><li>http://www.unicode.org/reports/tr29/ </li></ul><ul><li>Text Boundaries </li></ul><ul><li>How to define grapheme clusters (“user characters”), words, and sentences. </li></ul><ul><li>Revision 29-9, Standard Annexe </li></ul><ul><ul><li>forms an integral part of the Unicode Standard, but is published as a separate document. </li></ul></ul>
  7. 7. LISA OSCAR SRX <ul><li>http://www.lisa.org/standards/srx </li></ul><ul><li>Segmentation Rules Exchange </li></ul><ul><li>How sentences are segmented </li></ul><ul><li>Allows for the exchange of segmentation rules using regular expressions </li></ul><ul><li>Complements TMX standard </li></ul>
  8. 8. LISA OSCAR TMX <ul><li>http://www.lisa.org/standards/tmx </li></ul><ul><li>Translation Memory Exchange </li></ul><ul><li>Current version 1.4b </li></ul><ul><li>Allows for the interchange of translation memories between different vendor systems </li></ul><ul><ul><li>No translation vendor lock-in </li></ul></ul><ul><ul><li>Free exchange of translation assets </li></ul></ul>
  9. 9. LISA OSCAR GMX <ul><li>http://www.lisa.org/gmx </li></ul><ul><li>GILT Metrics Exchange </li></ul><ul><ul><li>Proposed Standard </li></ul></ul><ul><ul><li>Tripartite </li></ul></ul><ul><ul><ul><li>GMX/V – Volume, awaiting public comment phase </li></ul></ul></ul><ul><ul><ul><li>GMX/C – Complexity, initial specification </li></ul></ul></ul><ul><ul><ul><li>GMX/Q – Quality </li></ul></ul></ul><ul><li>GILT Industry standard for defining and exchanging Word and Character count, and other relevant metric data </li></ul><ul><li>Allows for quantifying job complexity </li></ul><ul><li>Uses current industry best practices </li></ul><ul><li>Allows for verification </li></ul><ul><ul><li>XLIFF based Canonical form </li></ul></ul><ul><ul><li>Unicode encoding </li></ul></ul>
  10. 10. OASIS XLIFF <ul><li>http://www.oasis-open.org/committees/tc_home.php?wg_abbrev = xliff </li></ul><ul><li>XLIFF – XML Localization Interchange File Format </li></ul><ul><li>Current status </li></ul><ul><ul><li>XLIFF 1.1 Committee Specification (31 Oct 2003) </li></ul></ul><ul><ul><li>XLIFF 1.2 will shortly be approved as Committee Draft, subsequently submitted to OASIS standards review process </li></ul></ul><ul><ul><ul><li>(X)HTML XLIFF 1.1 Representation Guide approved and published </li></ul></ul></ul><ul><ul><ul><li>PO / POT XLIFF 1.1. Representation Guide approved and published </li></ul></ul></ul><ul><ul><ul><li>Java / Windows / .Net XLIFF 1.1 Representation Guide in late stage drafts </li></ul></ul></ul>
  11. 11. LISA OSCAR xml:tm <ul><li>http:// www.lisa.org/standards/xmltm / </li></ul><ul><li>XML based Text Memory </li></ul><ul><ul><li>Radical rethink of how to handle Translation Memory </li></ul></ul><ul><ul><li>Donated by XML INTL to LISA OSCAR </li></ul></ul><ul><ul><li>Version 1.0 approved for public comment in July 2006 </li></ul></ul><ul><li>Takes the DITA reuse principle down to sentence level </li></ul><ul><ul><li>Author Memory </li></ul></ul><ul><ul><li>Translation Memory </li></ul></ul>
  12. 12. DITA and xml:tm <ul><li>Both are about reuse </li></ul><ul><li>DITA reuse is at topic level </li></ul><ul><li>xml:tm reuse is at sentence level </li></ul><ul><li>DITA donated by IBM to OASIS </li></ul><ul><li>xml:tm donated by XML-INTL to LISA OSCAR </li></ul><ul><li>Both integrate like hand in glove </li></ul>
  13. 13. xml:tm <ul><li>XML based text memory </li></ul><ul><li>Revolutionary approach to translating XML documents </li></ul><ul><li>First significant advance in translation memory technology </li></ul><ul><li>Uses XML namespace to transparently embed contextual information </li></ul><ul><li>The one ring that binds them all </li></ul>
  14. 14. xml:tm namespace <ul><li>Text Memory namespace </li></ul><ul><li>Can be mapped onto any XML document </li></ul><ul><li>Vertical view of document in terms of ‘text segments’ </li></ul><ul><li>Can be totally transparent </li></ul>
  15. 15. xml:tm namespace Example of the use of tm namespace in an XML document: <document xmlns:tm=&quot;urn:xml-Intl-tm&quot; > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>
  16. 16. xml:tm namespace doc title section section para tm te sentence sentence tu tu te sentence sentence tu tu te sentence sentence tu tu tm namespace view te text tu text te sentence sentence tu tu para text para text para text para text para text te sentence sentence tu tu te sentence sentence tu tu text original document view
  17. 17. xml:tm namespace Namespace is very simple. It is easy to use. te sentence sentence tu tu original document view tm namespace view < para > </ para > <para> </para> <tm:te id=“e1”> <tm:tu id=“u1.1”> Namespace is very simple. </tm:tu> <tm:tu id=“u1.2”> It is easy to use. </tm:tu> </tm:te> text
  18. 18. <ul><li>Author memory </li></ul><ul><ul><li>Maintain memory of source text </li></ul></ul><ul><ul><li>Authoring statistics </li></ul></ul><ul><ul><li>Authoring tool input </li></ul></ul><ul><li>Translation memory </li></ul><ul><ul><li>Automatic alignment </li></ul></ul><ul><ul><li>Maintain exact link of source and target text </li></ul></ul><ul><ul><li>Reduce translation costs </li></ul></ul>xml:tm namespace
  19. 19. xml:tm DOM differencing Updated Source Document tu id=”1” tu id=”3” tu id=”4” tu id=” 7 ” tu id=”6” d eleted tu id=” 8 ” new Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” origid=” 5 ” modified
  20. 20. xml:tm Author Memory <ul><li>Namespace aware DOM differencing </li></ul><ul><li>Identify changes from the previous version </li></ul><ul><li>Unique text unit identifiers are maintained </li></ul><ul><li>Modification history </li></ul><ul><li>Text units can be loaded into a database </li></ul><ul><li>Authoring environment integration </li></ul>
  21. 22. xml:tm Translation Memory <ul><li>The tm namespace can be used to create XLIFF files </li></ul><ul><li>Automatic alignment of source and target languages </li></ul><ul><li>Allows for more focused translation matching </li></ul><ul><ul><li>Exa ct matching </li></ul></ul><ul><ul><li>Leveraged matching from document - identical text </li></ul></ul><ul><ul><li>Leveraged matching from database </li></ul></ul><ul><ul><li>Modified text unit matching </li></ul></ul><ul><ul><li>Non translatable text unit identification </li></ul></ul>
  22. 23. xml:tm translation via XLIFF Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” XLIFF Document trans-unit id=”1” trans-unit id=”2” trans-unit id=”3” trans-unit id=”4” trans-unit id=”5” trans-unit id=”6”
  23. 24. xml:tm translated document doc title section section para tekst tm te zdanie zdanie tu tu te zdanie zdanie tu tu te zdanie zdanie tu tu translated tm namespace view translated document view te tekst tu tekst te zdanie zdanie tu tu para tekst para tekst para tekst para tekst para tekst te zdanie zdanie tu tu te zdanie zdanie tu tu
  24. 25. xml:tm exa ct alignment Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Exa ct alignment
  25. 26. xml:tm exact matching Updated Source Document tu id=” 1 ” tu id=” 2 ” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” modified new Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=” 7 ” tu id=”6” tu id=” 8 ” Exa ct Matching requires translation requires translation Exact match Exact match Exact match Exact match
  26. 27. xml:tm loading DB memory Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Exa ct alignment DB TMX
  27. 28. xml:tm matching Updated Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” non trans tu id=”8” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=” 7 ” tu id=”6” tu id=” 8 ” Exa ct Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” tu id=”9” DB requires proofing DB leveraged match tu id=”2” requires no translation non translatable Exact match Exact match Exact match Exact match
  28. 29. Traditional Translation Scenario source text Publishing Translation source text extract Extracted text tm process Prepared text Translate Translated text target text target text merge target text QA
  29. 30. True Costs of Translation Source Professor Reinhard Schäler LRC - ASLIB 2002
  30. 31. Putting it all together xml:tm W3C ITS Unicode TR 29 SRX GMX-V DITA TBX/LINK XLIFF TMX
  31. 32. xml:tm Translation Scenario x ml :tm source text Publishing Translator Extracted text tm process XLIFF file Translate x ml :tm target text Web leveraged matching Automatic Process Web service/ interface QA Automatic Process extract merge perfect matching
  32. 34. <ul><li>Use a CMS </li></ul><ul><li>Always use xml:lang attribute on top element </li></ul><ul><li>Avoid translatable attributes </li></ul><ul><li>Keep topic granularity low </li></ul><ul><li>Keep document structure simple – avoid nesting elements </li></ul><ul><li>Use conref carefully </li></ul><ul><ul><li>Linguistically complete phrases </li></ul></ul><ul><ul><li>Proper Nouns as subject </li></ul></ul>DITA Translation Best Practices
  33. 35. <ul><li>Indexterm – be careful where you place </li></ul>DITA Translation Best Practices
  34. 36. <ul><li>Use 'translate' attribute where required </li></ul><ul><li>Use directionality attribute when mixing text with different directionality e.g. English and Hebrew </li></ul><ul><li>Use xml:tm to allow you to maintain author memory. </li></ul><ul><li>Use xml:tm for maintaining translation memory </li></ul>DITA Translation Best Practices

×