Your SlideShare is downloading. ×
0
XML and LOCALIZATION
An overview by @Fantpmas from @YamagataEurope
What is XML?
And why do you people love acronyms so much?
XML stands for
eXtensible Markup Language
You can write your own
language/dialect

A language to store data in
a human rea...
XML is designed to carry data
not display data like HTML
XML doesn't do anything on its own, nada, zilch!
A sample XML document
(Don't worry it's all plain text)
The root element
3 child elements
An XML element in detail
Start tag

Element content

Attribute value

Attribute

End tag
XML elements can be empty

is the same as

Self-closing element
There are rules to follow
When all rules are abided by, the XML is well-formed
XML well-formedness rules
(not exhaustive)
•
•
•
•
•
•
•
•
•

There must be a root element
Elements must follow naming rul...
There must be a root element
Elements must follow naming rules
Names can only start with
• A letter (in any language, including accented letters)
• A c...
Elements must follow naming rules
Names cannot contain
• White spaces
• Most punctuation characters except colon, undersco...
All elements must be closed
Element names are case sensitive
Elements must be properly nested
Attribute values must be quoted

Single or double quotes
Attention to those darn quotes
If double quotes are used you cannot use double quotes inside
the attribute value . The sam...
Attributes must be unique in tags
Some characters cannot be used
• < and & need to escaped into entities:
and
• Most control characters
(characters to indic...
A word about entities
Entities are used to represent characters or a sequence of
characters that needs to be repeated thro...
Predefined XML entities
5 predefined character entities, only 2 are obligatory
&lt;

<

less than

&gt;

>

greater than

...
Entities must be declared
Except for predefined entities all entities must be declared in
the Document Type Definition

DT...
Other constructs
• XML declaration
• Stylesheet declaration
• Document Type declaration
• Comments
• CDATA
Document Type Definition
A DTD defines the structure of an XML document
How to declare DTDs
DTDs can be internal

DTD
How to declare DTDs
DTDs can be external
XML Schema
XML Schema (*.xsd) is an XML based alternative to DTD
DTDs in the localization world
Don't be scared, but XML really is everywhere

•
•
•
•
•
•
•
•

TMX
TBX
XLIFF
TTX
SRX
QT Li...
Encoding
All XML parsers must support at least UTF-8 and UTF-16.
Default encoding is UTF-8.
Always a good idea to specify ...
Byte Order Mark
A character to indicate the byte order of an XML document

In UTF-8 it's optional and not even recommended...
Complimentary technologies
What? There's more of this geek stuff!?
Extensible Stylesheet Language
Transformation (XSLT)
It's XML to transform another XML document!
XSL Transformations
(X)HTML

XML

XML

TXT
How to apply an XSLT
Declare the stylesheet in the XML file itself

Use an application like XMLSpy or xmlstarlet
XSLT localization examples
•
•
•
•
•
•

Convert a TTX to a two-column HTML or CSV
Convert a TMX to a TBX
Convert a TMX to ...
XPath
It's a query language to select nodes from an XML document
It's used in XSLT

Will select all
elements that have an ...
Is XML good for localization?
Yes, but not always
XML is great for localization
• Unicode supported by default
• Metadata gives more information about content

• Separates ...
But bad XML is bad
• Translatable content in attributes
• No metadata to distinguish between content
e.g. mixed languages,...
And also
• Multilingual XML can be challenging (XSLT can help)

東京

• Big files and one-liners can cause processing proble...
Tools, tools, tools
• Altova XMLSpy: all-round XML editor
• Altova DiffDog: compare XML files
• xmlstarlet: command line X...
"Specification is only theory.
In practice, there is only the parser."
@Tnkrd
Upcoming SlideShare
Loading in...5
×

XML and Localization

837

Published on

An overview of XML and how it is used in the localization world

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
837
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "XML and Localization"

  1. 1. XML and LOCALIZATION An overview by @Fantpmas from @YamagataEurope
  2. 2. What is XML? And why do you people love acronyms so much?
  3. 3. XML stands for eXtensible Markup Language You can write your own language/dialect A language to store data in a human readable format
  4. 4. XML is designed to carry data not display data like HTML XML doesn't do anything on its own, nada, zilch!
  5. 5. A sample XML document (Don't worry it's all plain text) The root element 3 child elements
  6. 6. An XML element in detail Start tag Element content Attribute value Attribute End tag
  7. 7. XML elements can be empty is the same as Self-closing element
  8. 8. There are rules to follow When all rules are abided by, the XML is well-formed
  9. 9. XML well-formedness rules (not exhaustive) • • • • • • • • • There must be a root element Elements must follow naming rules All elements must be closed Element names are case sensitive Elements must be properly nested Attributes must be quoted Attributes can only appear once in same start tag Some characters cannot be used as such Entities must be declared
  10. 10. There must be a root element
  11. 11. Elements must follow naming rules Names can only start with • A letter (in any language, including accented letters) • A colon • An underscore 筆者 筆者
  12. 12. Elements must follow naming rules Names cannot contain • White spaces • Most punctuation characters except colon, underscore, hyphen, dot, middle dot • Symbol characters 筆 者 筆 者
  13. 13. All elements must be closed
  14. 14. Element names are case sensitive
  15. 15. Elements must be properly nested
  16. 16. Attribute values must be quoted Single or double quotes
  17. 17. Attention to those darn quotes If double quotes are used you cannot use double quotes inside the attribute value . The same applies for single quotes.
  18. 18. Attributes must be unique in tags
  19. 19. Some characters cannot be used • < and & need to escaped into entities: and • Most control characters (characters to indicate carriage return, tab or backspace)
  20. 20. A word about entities Entities are used to represent characters or a sequence of characters that needs to be repeated throughout a document Syntax: Ampersand Semicolon
  21. 21. Predefined XML entities 5 predefined character entities, only 2 are obligatory &lt; < less than &gt; > greater than &amp; & ampersand &apos; ' apostrophe &quot; " quotation mark
  22. 22. Entities must be declared Except for predefined entities all entities must be declared in the Document Type Definition DTD Entity declaration Entity
  23. 23. Other constructs • XML declaration • Stylesheet declaration • Document Type declaration • Comments • CDATA
  24. 24. Document Type Definition A DTD defines the structure of an XML document
  25. 25. How to declare DTDs DTDs can be internal DTD
  26. 26. How to declare DTDs DTDs can be external
  27. 27. XML Schema XML Schema (*.xsd) is an XML based alternative to DTD
  28. 28. DTDs in the localization world Don't be scared, but XML really is everywhere • • • • • • • • TMX TBX XLIFF TTX SRX QT Linguist TS DITA ...
  29. 29. Encoding All XML parsers must support at least UTF-8 and UTF-16. Default encoding is UTF-8. Always a good idea to specify the encoding
  30. 30. Byte Order Mark A character to indicate the byte order of an XML document In UTF-8 it's optional and not even recommended In UTF-16 it's used to indicate endianness: little-endian or big-endian If you see these at the start of a file, something's wrong:
  31. 31. Complimentary technologies What? There's more of this geek stuff!?
  32. 32. Extensible Stylesheet Language Transformation (XSLT) It's XML to transform another XML document!
  33. 33. XSL Transformations (X)HTML XML XML TXT
  34. 34. How to apply an XSLT Declare the stylesheet in the XML file itself Use an application like XMLSpy or xmlstarlet
  35. 35. XSLT localization examples • • • • • • Convert a TTX to a two-column HTML or CSV Convert a TMX to a TBX Convert a TMX to a TXT (for spell-check in MS Word) Convert multilingual XML to TMX/TBX Generate HTML preview for XML in SDL Trados Studio Prepare XML files for translation
  36. 36. XPath It's a query language to select nodes from an XML document It's used in XSLT Will select all elements that have an attribute called and whose value is And also in SDL Trados Studio file types
  37. 37. Is XML good for localization? Yes, but not always
  38. 38. XML is great for localization • Unicode supported by default • Metadata gives more information about content • Separates content from formatting (to some extent) • Human readable • Easily transformable using XSLT • Excellent for single-sourcing
  39. 39. But bad XML is bad • Translatable content in attributes • No metadata to distinguish between content e.g. mixed languages, translatable vs not translatable • CDATA is just plain cheating • Bad implementations of standards (XLIFF)
  40. 40. And also • Multilingual XML can be challenging (XSLT can help) 東京 • Big files and one-liners can cause processing problems (pretty-printing can help)
  41. 41. Tools, tools, tools • Altova XMLSpy: all-round XML editor • Altova DiffDog: compare XML files • xmlstarlet: command line XML toolkit • EditPad Pro for all encoding/BOM matters
  42. 42. "Specification is only theory. In practice, there is only the parser." @Tnkrd
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×