Your SlideShare is downloading. ×
XML and Localization
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

XML and Localization


Published on

An overview of XML and how it is used in the localization world

An overview of XML and how it is used in the localization world

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. XML and LOCALIZATION An overview by @Fantpmas from @YamagataEurope
  • 2. What is XML? And why do you people love acronyms so much?
  • 3. XML stands for eXtensible Markup Language You can write your own language/dialect A language to store data in a human readable format
  • 4. XML is designed to carry data not display data like HTML XML doesn't do anything on its own, nada, zilch!
  • 5. A sample XML document (Don't worry it's all plain text) The root element 3 child elements
  • 6. An XML element in detail Start tag Element content Attribute value Attribute End tag
  • 7. XML elements can be empty is the same as Self-closing element
  • 8. There are rules to follow When all rules are abided by, the XML is well-formed
  • 9. XML well-formedness rules (not exhaustive) • • • • • • • • • There must be a root element Elements must follow naming rules All elements must be closed Element names are case sensitive Elements must be properly nested Attributes must be quoted Attributes can only appear once in same start tag Some characters cannot be used as such Entities must be declared
  • 10. There must be a root element
  • 11. Elements must follow naming rules Names can only start with • A letter (in any language, including accented letters) • A colon • An underscore 筆者 筆者
  • 12. Elements must follow naming rules Names cannot contain • White spaces • Most punctuation characters except colon, underscore, hyphen, dot, middle dot • Symbol characters 筆 者 筆 者
  • 13. All elements must be closed
  • 14. Element names are case sensitive
  • 15. Elements must be properly nested
  • 16. Attribute values must be quoted Single or double quotes
  • 17. Attention to those darn quotes If double quotes are used you cannot use double quotes inside the attribute value . The same applies for single quotes.
  • 18. Attributes must be unique in tags
  • 19. Some characters cannot be used • < and & need to escaped into entities: and • Most control characters (characters to indicate carriage return, tab or backspace)
  • 20. A word about entities Entities are used to represent characters or a sequence of characters that needs to be repeated throughout a document Syntax: Ampersand Semicolon
  • 21. Predefined XML entities 5 predefined character entities, only 2 are obligatory &lt; < less than &gt; > greater than &amp; & ampersand &apos; ' apostrophe &quot; " quotation mark
  • 22. Entities must be declared Except for predefined entities all entities must be declared in the Document Type Definition DTD Entity declaration Entity
  • 23. Other constructs • XML declaration • Stylesheet declaration • Document Type declaration • Comments • CDATA
  • 24. Document Type Definition A DTD defines the structure of an XML document
  • 25. How to declare DTDs DTDs can be internal DTD
  • 26. How to declare DTDs DTDs can be external
  • 27. XML Schema XML Schema (*.xsd) is an XML based alternative to DTD
  • 28. DTDs in the localization world Don't be scared, but XML really is everywhere • • • • • • • • TMX TBX XLIFF TTX SRX QT Linguist TS DITA ...
  • 29. Encoding All XML parsers must support at least UTF-8 and UTF-16. Default encoding is UTF-8. Always a good idea to specify the encoding
  • 30. Byte Order Mark A character to indicate the byte order of an XML document In UTF-8 it's optional and not even recommended In UTF-16 it's used to indicate endianness: little-endian or big-endian If you see these at the start of a file, something's wrong:
  • 31. Complimentary technologies What? There's more of this geek stuff!?
  • 32. Extensible Stylesheet Language Transformation (XSLT) It's XML to transform another XML document!
  • 33. XSL Transformations (X)HTML XML XML TXT
  • 34. How to apply an XSLT Declare the stylesheet in the XML file itself Use an application like XMLSpy or xmlstarlet
  • 35. XSLT localization examples • • • • • • Convert a TTX to a two-column HTML or CSV Convert a TMX to a TBX Convert a TMX to a TXT (for spell-check in MS Word) Convert multilingual XML to TMX/TBX Generate HTML preview for XML in SDL Trados Studio Prepare XML files for translation
  • 36. XPath It's a query language to select nodes from an XML document It's used in XSLT Will select all elements that have an attribute called and whose value is And also in SDL Trados Studio file types
  • 37. Is XML good for localization? Yes, but not always
  • 38. XML is great for localization • Unicode supported by default • Metadata gives more information about content • Separates content from formatting (to some extent) • Human readable • Easily transformable using XSLT • Excellent for single-sourcing
  • 39. But bad XML is bad • Translatable content in attributes • No metadata to distinguish between content e.g. mixed languages, translatable vs not translatable • CDATA is just plain cheating • Bad implementations of standards (XLIFF)
  • 40. And also • Multilingual XML can be challenging (XSLT can help) 東京 • Big files and one-liners can cause processing problems (pretty-printing can help)
  • 41. Tools, tools, tools • Altova XMLSpy: all-round XML editor • Altova DiffDog: compare XML files • xmlstarlet: command line XML toolkit • EditPad Pro for all encoding/BOM matters
  • 42. "Specification is only theory. In practice, there is only the parser." @Tnkrd