• Save
OAXAL
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

OAXAL

on

  • 675 views

 

Statistics

Views

Total Views
675
Views on SlideShare
670
Embed Views
5

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 5

http://www.linkedin.com 3
http://www.slideshare.net 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

OAXAL Presentation Transcript

  • 1.
      • OAXAL
      • Beyond DITA
    Andrzej Zydroń: azydron@xml-intl.com Santa Clara April 2007
  • 2. OAXAL
    • Open Architecture for XML Authoring and Localization
    • DITA – a big step in the right direction
      • Reuse at the topic level
      • Reduced costs of implementing XML
    • OAXAL = Comprehensive Standards based framework for XML Authoring and Localization
      • Candidate OASIS Reference Architecture TC
      • Further significant cost reduction
  • 3. OAXAL XML1.0 SRX Unicode 5.0 GMX W3C ITS Unicode TR29 XML Vocabulary, e.g. DITA xml:tm XLIFF Author Memory Translation Memory TMX
  • 4. OAXAL
    • W3C ITS Document Rules
    • Unicode TR29
    • LISA OSCAR SRX
    • LISA OSCAR xml:tm
    • LISA OSCAR TMX
    • LISA OSCAR GMX
    • OASIS XLIFF
    • OASIS DITA, or any component based XML Vocabulary
  • 5. W3C ITS
    • http://www.w3.org/International/its
    • Internationalization Tag Set
      • Develop a set of elements and attributes that support the internationalization and localization of XML documents
      • provides best practice techniques
    • Very ambitious and far reaching review of XML I18N
    • Provides definitive mechanisms and recommendations
      • Scope, Translatability, Inline elements
      • Document Rules
  • 6. Unicode
    • http://www.unicode.org/reports/tr29/
    • Allows all possible character codes to included in one format
      • No conversion nightmares
    • Underpins XML character encoding
    • Current version 4.0
    • ISO equivalent ISO 10646
    • 16 bit and 32 bit extended encoding
      • Enough to encode all possible characters in all scripts
    • Technical Reports – TR29 Text Boundaries
  • 7. TMX
    • http://www.lisa.org/standards/tmx
    • Translation Memory Exchange
    • Current version 1.4b – 2.0 under development
    • Allows for the interchange of translation memories between different vendor systems
      • No translation vendor lock-in
      • Free exchange of translation assets
  • 8. SRX
    • http://www.lisa.org/standards/srx
    • Segmentation Rules Exchange
    • How sentences are segmented
    • Allows for the exchange of segmentation rules using regular expressions
    • Mandated by xml:tm, complements TMX
  • 9. GMX
    • http://www.lisa.org/ standards/ gmx
    • Global information Management Metrics Exchange
      • Proposed Standard
      • Tripartite
        • GMX/V – Volume, awaiting public comment phase
        • GMX/C – Complexity, initial specification
        • GMX/Q – Quality
    • GILT Industry standard for defining and exchanging Word and Character count, and other relevant metric data
    • GMX/V OSCAR Standard Feb 2007
    • Allows for quantifying job complexity
    • Uses current industry best practices
    • Allows for verification
      • Canonical form
      • Unicode encoding
  • 10. XLIFF
    • http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff
    • XLIFF – XML Localization Interchange File Format
    • Current status
      • XLIFF 1.2 Officially of approved in 2007
        • (X)HTML XLIFF 1.1 Representation Guide approved and published
        • PO / POT XLIFF 1.1. Representation Guide approved and published
        • Java / Windows / .Net XLIFF 1.1 Representation Guide in late stage drafts
  • 11. DITA
    • www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita
    • Darwin Information Typing Architecture
      • Donated by IBM to OASIS
    • Reuse at Topic, Task, Concept level
    • Introduces fine level of granularity to publications
    • Individual documents can be reused in different publications
    • Publication map to create publication from individual components
  • 12. xml:tm
    • http:// www.lisa.org /s tandards /xmltm
    • XML based Text Memory
      • Radical rethink of how to handle Translation Memory
      • Donated by XML INTL to LISA OSCAR
      • LISA OSCAR standard in Feb 2007
    • Takes the DITA reuse principle down to sentence level
      • Author Memory
      • Translation Memory
  • 13. xml:tm and W3C ITS
    • W3C ITS scoping/translatability and 'inline' rules will be used by xml:tm during document namespace seeding.
    • Scoping rules define text that is not translatable (including attributes).
    • Inline (segmentation) rules define which elements are 'inline' and should not break linguistic integrity during segmentation
  • 14. xml:tm and Unicode TR29
    • The grapheme boundary rules regarding word boundaries are used during XML PCDATA tokenization
    • Unicode TR29 forms a core part of xml:tm namespace implementation.
  • 15. xml:tm and SRX
    • xml:tm mandates the use of SRX for segmentation
    • SRX is totally integrated into the xml:tm framework
  • 16. xml:tm and GMX-V
    • The xml:tm namespace provides a an ideal environment for calculating word and character counts
    • Text data is clearly identified (including translatable attributes)‏
    • XML Document is pre-prepared by xml:tm for GMX-V operations
  • 17. xml:tm and DITA/XML
    • xml:tm is designed to leverage the capabilities of XML to make authoring and translation easy and transparent
    • xml:tm is an ideal companion to DITA – it takes the DITA reuse principle down to the topic level
    • xml:tm is localization for DITA/XML documents
  • 18. xml:tm and TMX
    • xml:tm is an ideal companion for TMX
    • Source and target documents are perfectly aligned at the sentence level
    • The SRX rules used for segmentation are mandated
    • Creation of TMX 1.4b compliant document is simple and automatic
  • 19. xml:tm and XLIFF
    • xml:tm mandates the use of XLIFF for localization
    • xml:tm and XLIFF are in perfect symbiosis
    • XLIFF documents can be created using a simple XSLT transformation as all of the preparatory work has been done by xml:tm
  • 20. Putting it all together: OAXAL xml:tm Unicode TR 29 SRX W3C ITS GMX-V DITA/XML TMX XLIFF
  • 21. xml:tm
    • XML based text memory
    • Revolutionary approach to translating XML documents
    • First significant advance in translation memory technology
    • Uses XML namespace to transparently embed contextual information
    • The one ring that binds them all
  • 22. xml:tm namespace Example of the use of tm namespace in an XML document: <document xmlns:tm=&quot;urn:xml-Intl-tm&quot; > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>
  • 23. xml:tm namespace doc title section section para tm te sentence sentence tu tu te sentence sentence tu tu te sentence sentence tu tu Source document tm namespace view te text tu text te sentence sentence tu tu para text para text para text para text para text te sentence sentence tu tu te sentence sentence tu tu text Source document view
  • 24.
    • Author memory
      • Maintain memory of source text
      • Authoring statistics
      • Authoring tool input
    • Translation memory
      • Automatic alignment
      • Maintain exact link of source and target text
      • Reduce translation costs
    xml:tm namespace
  • 25. xml:tm differencing tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Original Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” deleted tu id=”8” modified new Updated Source Document DOM Differencing
  • 26. xml:tm author memory
    • Namespace aware DOM differencing
    • Identify changes from the previous version
    • Unique text unit identifiers are maintained
    • Modification history
    • Text units can be loaded into a database
    • Authoring environment integration
  • 27.  
  • 28. xml:tm exact alignment tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Original Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Target Document Trans-unit id=” 1 ” XLIFF File Trans-unit id=” 2 ” Trans-unit id=” 3 ” Trans-unit id=” 4 ” Trans-unit id=” 5 ” Trans-unit id=” 6 ”
  • 29. xml:tm exact matching Updated Source Document tu id=” 1 ” tu id=” 2 ” tu id=”3” tu id=”4” tu id=”7” tu id=”6” deleted tu id=”8” modified new Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=” 7 ” tu id=”6” tu id=” 8 ” Exact Matching requires translation requires translation Exact match Exact match Exact match Exact match
  • 30. xml:tm matching Updated Source Document tu id=” 1 ” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” non trans tu id=”8” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=” 7 ” tu id=”6” tu id=” 8 ” requires translation requires proofing fuzzy match origid=&quot;5&quot; doc leveraged match tu id=”9” tu id=”9” DB requires proofing DB leveraged match tu id=”2” requires no translation non translatable Exact match Exact match Exact match Exact match modified
  • 31. xml:tm translated document doc title section section para tm te zdanie zdanie tu tu te zdanie zdanie tu tu te zdanie zdanie tu tu Translated docuemnt tm namespace view te tekst tu tekst te zdanie zdanie tu tu para tekst para tekst para tekst para tekst para tekst te zdanie zdanie tu tu te zdanie zdanie tu tu tekst translated document view
  • 32. Traditional translation scenario source text source text extract extracted text tm process prepared text translate translated text target text target text merge target text QA
  • 33. True costs of translation Source Professor Reinhard Schäler LRC - ASLIB 2002 Translator Translation Company profit Translation Company costs and o/heads
  • 34. xml:tm Translation scenario xml:tm source text extracted text tm process XLIFF file translate xml:tm target text merge Internet exact matching leveraged matching Automated Workflow web browser QA Automated Workflow extract
  • 35.  
  • 36.
    • Any Questions?
  • 37. Contact Details
    • Postal address:
      • PO Box 2167
      • Gerrards Cross
      • Bucks SL9 8XF
      • United Kingdom
    • Phone: +44 1753 480 467
    • Fax: +44 1753 480 465
    • Andrzej Zydroń – [email_address]