OAXAL

651 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
651
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OAXAL

  1. 1. <ul><ul><li>OAXAL </li></ul></ul><ul><ul><li>Beyond DITA </li></ul></ul>Andrzej Zydroń: azydron@xml-intl.com Santa Clara April 2007
  2. 2. OAXAL <ul><li>Open Architecture for XML Authoring and Localization </li></ul><ul><li>DITA – a big step in the right direction </li></ul><ul><ul><li>Reuse at the topic level </li></ul></ul><ul><ul><li>Reduced costs of implementing XML </li></ul></ul><ul><li>OAXAL = Comprehensive Standards based framework for XML Authoring and Localization </li></ul><ul><ul><li>Candidate OASIS Reference Architecture TC </li></ul></ul><ul><ul><li>Further significant cost reduction </li></ul></ul>
  3. 3. OAXAL XML1.0 SRX Unicode 5.0 GMX W3C ITS Unicode TR29 XML Vocabulary, e.g. DITA xml:tm XLIFF Author Memory Translation Memory TMX
  4. 4. OAXAL <ul><li>W3C ITS Document Rules </li></ul><ul><li>Unicode TR29 </li></ul><ul><li>LISA OSCAR SRX </li></ul><ul><li>LISA OSCAR xml:tm </li></ul><ul><li>LISA OSCAR TMX </li></ul><ul><li>LISA OSCAR GMX </li></ul><ul><li>OASIS XLIFF </li></ul><ul><li>OASIS DITA, or any component based XML Vocabulary </li></ul>
  5. 5. W3C ITS <ul><li>http://www.w3.org/International/its </li></ul><ul><li>Internationalization Tag Set </li></ul><ul><ul><li>Develop a set of elements and attributes that support the internationalization and localization of XML documents </li></ul></ul><ul><ul><li>provides best practice techniques </li></ul></ul><ul><li>Very ambitious and far reaching review of XML I18N </li></ul><ul><li>Provides definitive mechanisms and recommendations </li></ul><ul><ul><li>Scope, Translatability, Inline elements </li></ul></ul><ul><ul><li>Document Rules </li></ul></ul>
  6. 6. Unicode <ul><li>http://www.unicode.org/reports/tr29/ </li></ul><ul><li>Allows all possible character codes to included in one format </li></ul><ul><ul><li>No conversion nightmares </li></ul></ul><ul><li>Underpins XML character encoding </li></ul><ul><li>Current version 4.0 </li></ul><ul><li>ISO equivalent ISO 10646 </li></ul><ul><li>16 bit and 32 bit extended encoding </li></ul><ul><ul><li>Enough to encode all possible characters in all scripts </li></ul></ul><ul><li>Technical Reports – TR29 Text Boundaries </li></ul>
  7. 7. TMX <ul><li>http://www.lisa.org/standards/tmx </li></ul><ul><li>Translation Memory Exchange </li></ul><ul><li>Current version 1.4b – 2.0 under development </li></ul><ul><li>Allows for the interchange of translation memories between different vendor systems </li></ul><ul><ul><li>No translation vendor lock-in </li></ul></ul><ul><ul><li>Free exchange of translation assets </li></ul></ul>
  8. 8. SRX <ul><li>http://www.lisa.org/standards/srx </li></ul><ul><li>Segmentation Rules Exchange </li></ul><ul><li>How sentences are segmented </li></ul><ul><li>Allows for the exchange of segmentation rules using regular expressions </li></ul><ul><li>Mandated by xml:tm, complements TMX </li></ul>
  9. 9. GMX <ul><li>http://www.lisa.org/ standards/ gmx </li></ul><ul><li>Global information Management Metrics Exchange </li></ul><ul><ul><li>Proposed Standard </li></ul></ul><ul><ul><li>Tripartite </li></ul></ul><ul><ul><ul><li>GMX/V – Volume, awaiting public comment phase </li></ul></ul></ul><ul><ul><ul><li>GMX/C – Complexity, initial specification </li></ul></ul></ul><ul><ul><ul><li>GMX/Q – Quality </li></ul></ul></ul><ul><li>GILT Industry standard for defining and exchanging Word and Character count, and other relevant metric data </li></ul><ul><li>GMX/V OSCAR Standard Feb 2007 </li></ul><ul><li>Allows for quantifying job complexity </li></ul><ul><li>Uses current industry best practices </li></ul><ul><li>Allows for verification </li></ul><ul><ul><li>Canonical form </li></ul></ul><ul><ul><li>Unicode encoding </li></ul></ul>
  10. 10. XLIFF <ul><li>http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff </li></ul><ul><li>XLIFF – XML Localization Interchange File Format </li></ul><ul><li>Current status </li></ul><ul><ul><li>XLIFF 1.2 Officially of approved in 2007 </li></ul></ul><ul><ul><ul><li>(X)HTML XLIFF 1.1 Representation Guide approved and published </li></ul></ul></ul><ul><ul><ul><li>PO / POT XLIFF 1.1. Representation Guide approved and published </li></ul></ul></ul><ul><ul><ul><li>Java / Windows / .Net XLIFF 1.1 Representation Guide in late stage drafts </li></ul></ul></ul>
  11. 11. DITA <ul><li>www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita </li></ul><ul><li>Darwin Information Typing Architecture </li></ul><ul><ul><li>Donated by IBM to OASIS </li></ul></ul><ul><li>Reuse at Topic, Task, Concept level </li></ul><ul><li>Introduces fine level of granularity to publications </li></ul><ul><li>Individual documents can be reused in different publications </li></ul><ul><li>Publication map to create publication from individual components </li></ul>
  12. 12. xml:tm <ul><li>http:// www.lisa.org /s tandards /xmltm </li></ul><ul><li>XML based Text Memory </li></ul><ul><ul><li>Radical rethink of how to handle Translation Memory </li></ul></ul><ul><ul><li>Donated by XML INTL to LISA OSCAR </li></ul></ul><ul><ul><li>LISA OSCAR standard in Feb 2007 </li></ul></ul><ul><li>Takes the DITA reuse principle down to sentence level </li></ul><ul><ul><li>Author Memory </li></ul></ul><ul><ul><li>Translation Memory </li></ul></ul>
  13. 13. xml:tm and W3C ITS <ul><li>W3C ITS scoping/translatability and 'inline' rules will be used by xml:tm during document namespace seeding. </li></ul><ul><li>Scoping rules define text that is not translatable (including attributes). </li></ul><ul><li>Inline (segmentation) rules define which elements are 'inline' and should not break linguistic integrity during segmentation </li></ul>
  14. 14. xml:tm and Unicode TR29 <ul><li>The grapheme boundary rules regarding word boundaries are used during XML PCDATA tokenization </li></ul><ul><li>Unicode TR29 forms a core part of xml:tm namespace implementation. </li></ul>
  15. 15. xml:tm and SRX <ul><li>xml:tm mandates the use of SRX for segmentation </li></ul><ul><li>SRX is totally integrated into the xml:tm framework </li></ul>
  16. 16. xml:tm and GMX-V <ul><li>The xml:tm namespace provides a an ideal environment for calculating word and character counts </li></ul><ul><li>Text data is clearly identified (including translatable attributes)‏ </li></ul><ul><li>XML Document is pre-prepared by xml:tm for GMX-V operations </li></ul>
  17. 17. xml:tm and DITA/XML <ul><li>xml:tm is designed to leverage the capabilities of XML to make authoring and translation easy and transparent </li></ul><ul><li>xml:tm is an ideal companion to DITA – it takes the DITA reuse principle down to the topic level </li></ul><ul><li>xml:tm is localization for DITA/XML documents </li></ul>
  18. 18. xml:tm and TMX <ul><li>xml:tm is an ideal companion for TMX </li></ul><ul><li>Source and target documents are perfectly aligned at the sentence level </li></ul><ul><li>The SRX rules used for segmentation are mandated </li></ul><ul><li>Creation of TMX 1.4b compliant document is simple and automatic </li></ul>
  19. 19. xml:tm and XLIFF <ul><li>xml:tm mandates the use of XLIFF for localization </li></ul><ul><li>xml:tm and XLIFF are in perfect symbiosis </li></ul><ul><li>XLIFF documents can be created using a simple XSLT transformation as all of the preparatory work has been done by xml:tm </li></ul>
  20. 20. Putting it all together: OAXAL xml:tm Unicode TR 29 SRX W3C ITS GMX-V DITA/XML TMX XLIFF
  21. 21. xml:tm <ul><li>XML based text memory </li></ul><ul><li>Revolutionary approach to translating XML documents </li></ul><ul><li>First significant advance in translation memory technology </li></ul><ul><li>Uses XML namespace to transparently embed contextual information </li></ul><ul><li>The one ring that binds them all </li></ul>
  22. 22. xml:tm namespace Example of the use of tm namespace in an XML document: <document xmlns:tm=&quot;urn:xml-Intl-tm&quot; > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>
  23. 23. xml:tm namespace doc title section section para tm te sentence sentence tu tu te sentence sentence tu tu te sentence sentence tu tu Source document tm namespace view te text tu text te sentence sentence tu tu para text para text para text para text para text te sentence sentence tu tu te sentence sentence tu tu text Source document view
  24. 24. <ul><li>Author memory </li></ul><ul><ul><li>Maintain memory of source text </li></ul></ul><ul><ul><li>Authoring statistics </li></ul></ul><ul><ul><li>Authoring tool input </li></ul></ul><ul><li>Translation memory </li></ul><ul><ul><li>Automatic alignment </li></ul></ul><ul><ul><li>Maintain exact link of source and target text </li></ul></ul><ul><ul><li>Reduce translation costs </li></ul></ul>xml:tm namespace
  25. 25. xml:tm differencing tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Original Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” deleted tu id=”8” modified new Updated Source Document DOM Differencing
  26. 26. xml:tm author memory <ul><li>Namespace aware DOM differencing </li></ul><ul><li>Identify changes from the previous version </li></ul><ul><li>Unique text unit identifiers are maintained </li></ul><ul><li>Modification history </li></ul><ul><li>Text units can be loaded into a database </li></ul><ul><li>Authoring environment integration </li></ul>
  27. 28. xml:tm exact alignment tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Original Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Target Document Trans-unit id=” 1 ” XLIFF File Trans-unit id=” 2 ” Trans-unit id=” 3 ” Trans-unit id=” 4 ” Trans-unit id=” 5 ” Trans-unit id=” 6 ”
  28. 29. xml:tm exact matching Updated Source Document tu id=” 1 ” tu id=” 2 ” tu id=”3” tu id=”4” tu id=”7” tu id=”6” deleted tu id=”8” modified new Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=” 7 ” tu id=”6” tu id=” 8 ” Exact Matching requires translation requires translation Exact match Exact match Exact match Exact match
  29. 30. xml:tm matching Updated Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” non trans tu id=”8” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=” 7 ” tu id=”6” tu id=” 8 ” requires translation requires proofing fuzzy match origid=&quot;5&quot; doc leveraged match tu id=”9” tu id=”9” DB requires proofing DB leveraged match tu id=”2” requires no translation non translatable Exact match Exact match Exact match Exact match modified
  30. 31. xml:tm translated document doc title section section para tm te zdanie zdanie tu tu te zdanie zdanie tu tu te zdanie zdanie tu tu Translated docuemnt tm namespace view te tekst tu tekst te zdanie zdanie tu tu para tekst para tekst para tekst para tekst para tekst te zdanie zdanie tu tu te zdanie zdanie tu tu tekst translated document view
  31. 32. Traditional translation scenario source text source text extract extracted text tm process prepared text translate translated text target text target text merge target text QA
  32. 33. True costs of translation Source Professor Reinhard Schäler LRC - ASLIB 2002 Translator Translation Company profit Translation Company costs and o/heads
  33. 34. xml:tm Translation scenario xml:tm source text extracted text tm process XLIFF file translate xml:tm target text merge Internet exact matching leveraged matching Automated Workflow web browser QA Automated Workflow extract
  34. 36. <ul><li>Any Questions? </li></ul>
  35. 37. Contact Details <ul><li>Postal address: </li></ul><ul><ul><li>PO Box 2167 </li></ul></ul><ul><ul><li>Gerrards Cross </li></ul></ul><ul><ul><li>Bucks SL9 8XF </li></ul></ul><ul><ul><li>United Kingdom </li></ul></ul><ul><li>Phone: +44 1753 480 467 </li></ul><ul><li>Fax: +44 1753 480 465 </li></ul><ul><li>Andrzej Zydroń – [email_address] </li></ul>

×