SlideShare a Scribd company logo
XML and LOCALIZATION
An overview by @Fantpmas from @YamagataEurope
What is XML?
And why do you people love acronyms so much?
XML stands for
eXtensible Markup Language
You can write your own
language/dialect

A language to store data in
a human readable format
XML is designed to carry data
not display data like HTML
XML doesn't do anything on its own, nada, zilch!
A sample XML document
(Don't worry it's all plain text)
The root element
3 child elements
An XML element in detail
Start tag

Element content

Attribute value

Attribute

End tag
XML elements can be empty

is the same as

Self-closing element
There are rules to follow
When all rules are abided by, the XML is well-formed
XML well-formedness rules
(not exhaustive)
•
•
•
•
•
•
•
•
•

There must be a root element
Elements must follow naming rules
All elements must be closed
Element names are case sensitive
Elements must be properly nested
Attributes must be quoted
Attributes can only appear once in same start tag
Some characters cannot be used as such
Entities must be declared
There must be a root element
Elements must follow naming rules
Names can only start with
• A letter (in any language, including accented letters)
• A colon
• An underscore

筆者

筆者
Elements must follow naming rules
Names cannot contain
• White spaces
• Most punctuation characters except colon, underscore,
hyphen, dot, middle dot
• Symbol characters

筆 者

筆 者
All elements must be closed
Element names are case sensitive
Elements must be properly nested
Attribute values must be quoted

Single or double quotes
Attention to those darn quotes
If double quotes are used you cannot use double quotes inside
the attribute value . The same applies for single quotes.
Attributes must be unique in tags
Some characters cannot be used
• < and & need to escaped into entities:
and
• Most control characters
(characters to indicate carriage return, tab or backspace)
A word about entities
Entities are used to represent characters or a sequence of
characters that needs to be repeated throughout a document

Syntax:
Ampersand

Semicolon
Predefined XML entities
5 predefined character entities, only 2 are obligatory
&lt;

<

less than

&gt;

>

greater than

&amp;

&

ampersand

&apos;

'

apostrophe

&quot;

"

quotation mark
Entities must be declared
Except for predefined entities all entities must be declared in
the Document Type Definition

DTD

Entity declaration

Entity
Other constructs
• XML declaration
• Stylesheet declaration
• Document Type declaration
• Comments
• CDATA
Document Type Definition
A DTD defines the structure of an XML document
How to declare DTDs
DTDs can be internal

DTD
How to declare DTDs
DTDs can be external
XML Schema
XML Schema (*.xsd) is an XML based alternative to DTD
DTDs in the localization world
Don't be scared, but XML really is everywhere

•
•
•
•
•
•
•
•

TMX
TBX
XLIFF
TTX
SRX
QT Linguist TS
DITA
...
Encoding
All XML parsers must support at least UTF-8 and UTF-16.
Default encoding is UTF-8.
Always a good idea to specify the encoding
Byte Order Mark
A character to indicate the byte order of an XML document

In UTF-8 it's optional and not even recommended
In UTF-16 it's used to indicate endianness:
little-endian or big-endian
If you see these at the start of a file, something's wrong:
Complimentary technologies
What? There's more of this geek stuff!?
Extensible Stylesheet Language
Transformation (XSLT)
It's XML to transform another XML document!
XSL Transformations
(X)HTML

XML

XML

TXT
How to apply an XSLT
Declare the stylesheet in the XML file itself

Use an application like XMLSpy or xmlstarlet
XSLT localization examples
•
•
•
•
•
•

Convert a TTX to a two-column HTML or CSV
Convert a TMX to a TBX
Convert a TMX to a TXT (for spell-check in MS Word)
Convert multilingual XML to TMX/TBX
Generate HTML preview for XML in SDL Trados Studio
Prepare XML files for translation
XPath
It's a query language to select nodes from an XML document
It's used in XSLT

Will select all
elements that have an attribute called
and whose value is
And also in SDL Trados Studio file types
Is XML good for localization?
Yes, but not always
XML is great for localization
• Unicode supported by default
• Metadata gives more information about content

• Separates content from formatting (to some extent)
• Human readable

• Easily transformable using XSLT
• Excellent for single-sourcing
But bad XML is bad
• Translatable content in attributes
• No metadata to distinguish between content
e.g. mixed languages, translatable vs not translatable
• CDATA is just plain cheating
• Bad implementations of standards (XLIFF)
And also
• Multilingual XML can be challenging (XSLT can help)

東京

• Big files and one-liners can cause processing problems
(pretty-printing can help)
Tools, tools, tools
• Altova XMLSpy: all-round XML editor
• Altova DiffDog: compare XML files
• xmlstarlet: command line XML toolkit

• EditPad Pro for all encoding/BOM matters
"Specification is only theory.
In practice, there is only the parser."
@Tnkrd

More Related Content

What's hot

Xml dtd
Xml dtdXml dtd
Xml dtd
sana mateen
 
Intro xml
Intro xmlIntro xml
Intro xml
sana mateen
 
DTD
DTDDTD
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
Gtu Booker
 
DTD
DTDDTD
DTD
Kumar
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
shannonsdavis
 
XML
XMLXML
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Vijay Mishra
 
Dtd
DtdDtd
Xml
XmlXml
01 xml document structure
01 xml document structure01 xml document structure
01 xml document structure
Baskarkncet
 
XML
XMLXML
02 well formed and valid documents
02 well formed and valid documents02 well formed and valid documents
02 well formed and valid documents
Baskarkncet
 
SQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataSQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML Data
Marek Maśko
 
Xml presentation
Xml presentationXml presentation
Xml presentation
Miguel Angel Teheran Garcia
 
Dtd
DtdDtd
Xml
XmlXml
Castro Chapter 9
Castro Chapter 9Castro Chapter 9
Castro Chapter 9
Jeff Byrnes
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
BG Java EE Course
 
Introduction to XML and Databases
Introduction to XML and DatabasesIntroduction to XML and Databases
Introduction to XML and Databases
torp42
 

What's hot (20)

Xml dtd
Xml dtdXml dtd
Xml dtd
 
Intro xml
Intro xmlIntro xml
Intro xml
 
DTD
DTDDTD
DTD
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 
DTD
DTDDTD
DTD
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML
XMLXML
XML
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Dtd
DtdDtd
Dtd
 
Xml
XmlXml
Xml
 
01 xml document structure
01 xml document structure01 xml document structure
01 xml document structure
 
XML
XMLXML
XML
 
02 well formed and valid documents
02 well formed and valid documents02 well formed and valid documents
02 well formed and valid documents
 
SQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataSQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML Data
 
Xml presentation
Xml presentationXml presentation
Xml presentation
 
Dtd
DtdDtd
Dtd
 
Xml
XmlXml
Xml
 
Castro Chapter 9
Castro Chapter 9Castro Chapter 9
Castro Chapter 9
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Introduction to XML and Databases
Introduction to XML and DatabasesIntroduction to XML and Databases
Introduction to XML and Databases
 

Similar to XML and Localization

Web Technology Part 4
Web Technology Part 4Web Technology Part 4
Web Technology Part 4
Thapar Institute
 
Xml by Luqman
Xml by LuqmanXml by Luqman
Xml by Luqman
Luqman Shareef
 
xml.pptx
xml.pptxxml.pptx
xml.pptx
TilakaRt
 
chapter 4 web authoring unit 4 xml.pptx
chapter 4 web authoring  unit 4 xml.pptxchapter 4 web authoring  unit 4 xml.pptx
chapter 4 web authoring unit 4 xml.pptx
amare63
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
vamsi krishna
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
vamsitricks
 
XML - Extensible Markup Language for Network Security.pptx
XML - Extensible Markup Language for Network Security.pptxXML - Extensible Markup Language for Network Security.pptx
XML - Extensible Markup Language for Network Security.pptx
kalanamax
 
XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
Sudharsan S
 
Xml
XmlXml
Xml
soumya
 
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
Dr.Saranya K.G
 
Xml and Co.
Xml and Co.Xml and Co.
Xml and Co.
Findik Dervis
 
WT UNIT-2 XML.pdf
WT UNIT-2 XML.pdfWT UNIT-2 XML.pdf
WT UNIT-2 XML.pdf
Ranjeet Reddy
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
kiransurariya
 
Xml intro1
Xml intro1Xml intro1
Xml
XmlXml
Xml 1
Xml 1Xml 1
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Maung Nyunt
 
XML, DTD & XSD Overview
XML, DTD & XSD OverviewXML, DTD & XSD Overview
XML, DTD & XSD Overview
Pradeep Rapolu
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
Dr.Florence Dayana
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
Shivalik college of engineering
 

Similar to XML and Localization (20)

Web Technology Part 4
Web Technology Part 4Web Technology Part 4
Web Technology Part 4
 
Xml by Luqman
Xml by LuqmanXml by Luqman
Xml by Luqman
 
xml.pptx
xml.pptxxml.pptx
xml.pptx
 
chapter 4 web authoring unit 4 xml.pptx
chapter 4 web authoring  unit 4 xml.pptxchapter 4 web authoring  unit 4 xml.pptx
chapter 4 web authoring unit 4 xml.pptx
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
XML - Extensible Markup Language for Network Security.pptx
XML - Extensible Markup Language for Network Security.pptxXML - Extensible Markup Language for Network Security.pptx
XML - Extensible Markup Language for Network Security.pptx
 
XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
 
Xml
XmlXml
Xml
 
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
 
Xml and Co.
Xml and Co.Xml and Co.
Xml and Co.
 
WT UNIT-2 XML.pdf
WT UNIT-2 XML.pdfWT UNIT-2 XML.pdf
WT UNIT-2 XML.pdf
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
 
Xml intro1
Xml intro1Xml intro1
Xml intro1
 
Xml
XmlXml
Xml
 
Xml 1
Xml 1Xml 1
Xml 1
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML, DTD & XSD Overview
XML, DTD & XSD OverviewXML, DTD & XSD Overview
XML, DTD & XSD Overview
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 

More from Yamagata Europe

Smart QA
Smart QASmart QA
Smart QA
Yamagata Europe
 
Machine Translation Quality Metrics
Machine Translation Quality MetricsMachine Translation Quality Metrics
Machine Translation Quality Metrics
Yamagata Europe
 
A standards driven workflow for Sitecore localization
A standards driven workflow for Sitecore localizationA standards driven workflow for Sitecore localization
A standards driven workflow for Sitecore localization
Yamagata Europe
 
QA Distiller
QA DistillerQA Distiller
QA Distiller
Yamagata Europe
 
SnellSpell
SnellSpellSnellSpell
SnellSpell
Yamagata Europe
 
Machine translation
Machine translationMachine translation
Machine translation
Yamagata Europe
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressions
Yamagata Europe
 
DITA translatability best practices
DITA translatability best practicesDITA translatability best practices
DITA translatability best practices
Yamagata Europe
 

More from Yamagata Europe (8)

Smart QA
Smart QASmart QA
Smart QA
 
Machine Translation Quality Metrics
Machine Translation Quality MetricsMachine Translation Quality Metrics
Machine Translation Quality Metrics
 
A standards driven workflow for Sitecore localization
A standards driven workflow for Sitecore localizationA standards driven workflow for Sitecore localization
A standards driven workflow for Sitecore localization
 
QA Distiller
QA DistillerQA Distiller
QA Distiller
 
SnellSpell
SnellSpellSnellSpell
SnellSpell
 
Machine translation
Machine translationMachine translation
Machine translation
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressions
 
DITA translatability best practices
DITA translatability best practicesDITA translatability best practices
DITA translatability best practices
 

Recently uploaded

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 

Recently uploaded (20)

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 

XML and Localization

  • 1. XML and LOCALIZATION An overview by @Fantpmas from @YamagataEurope
  • 2. What is XML? And why do you people love acronyms so much?
  • 3. XML stands for eXtensible Markup Language You can write your own language/dialect A language to store data in a human readable format
  • 4. XML is designed to carry data not display data like HTML XML doesn't do anything on its own, nada, zilch!
  • 5. A sample XML document (Don't worry it's all plain text) The root element 3 child elements
  • 6. An XML element in detail Start tag Element content Attribute value Attribute End tag
  • 7. XML elements can be empty is the same as Self-closing element
  • 8. There are rules to follow When all rules are abided by, the XML is well-formed
  • 9. XML well-formedness rules (not exhaustive) • • • • • • • • • There must be a root element Elements must follow naming rules All elements must be closed Element names are case sensitive Elements must be properly nested Attributes must be quoted Attributes can only appear once in same start tag Some characters cannot be used as such Entities must be declared
  • 10. There must be a root element
  • 11. Elements must follow naming rules Names can only start with • A letter (in any language, including accented letters) • A colon • An underscore 筆者 筆者
  • 12. Elements must follow naming rules Names cannot contain • White spaces • Most punctuation characters except colon, underscore, hyphen, dot, middle dot • Symbol characters 筆 者 筆 者
  • 13. All elements must be closed
  • 14. Element names are case sensitive
  • 15. Elements must be properly nested
  • 16. Attribute values must be quoted Single or double quotes
  • 17. Attention to those darn quotes If double quotes are used you cannot use double quotes inside the attribute value . The same applies for single quotes.
  • 18. Attributes must be unique in tags
  • 19. Some characters cannot be used • < and & need to escaped into entities: and • Most control characters (characters to indicate carriage return, tab or backspace)
  • 20. A word about entities Entities are used to represent characters or a sequence of characters that needs to be repeated throughout a document Syntax: Ampersand Semicolon
  • 21. Predefined XML entities 5 predefined character entities, only 2 are obligatory &lt; < less than &gt; > greater than &amp; & ampersand &apos; ' apostrophe &quot; " quotation mark
  • 22. Entities must be declared Except for predefined entities all entities must be declared in the Document Type Definition DTD Entity declaration Entity
  • 23. Other constructs • XML declaration • Stylesheet declaration • Document Type declaration • Comments • CDATA
  • 24. Document Type Definition A DTD defines the structure of an XML document
  • 25. How to declare DTDs DTDs can be internal DTD
  • 26. How to declare DTDs DTDs can be external
  • 27. XML Schema XML Schema (*.xsd) is an XML based alternative to DTD
  • 28. DTDs in the localization world Don't be scared, but XML really is everywhere • • • • • • • • TMX TBX XLIFF TTX SRX QT Linguist TS DITA ...
  • 29. Encoding All XML parsers must support at least UTF-8 and UTF-16. Default encoding is UTF-8. Always a good idea to specify the encoding
  • 30. Byte Order Mark A character to indicate the byte order of an XML document In UTF-8 it's optional and not even recommended In UTF-16 it's used to indicate endianness: little-endian or big-endian If you see these at the start of a file, something's wrong:
  • 31. Complimentary technologies What? There's more of this geek stuff!?
  • 32. Extensible Stylesheet Language Transformation (XSLT) It's XML to transform another XML document!
  • 34. How to apply an XSLT Declare the stylesheet in the XML file itself Use an application like XMLSpy or xmlstarlet
  • 35. XSLT localization examples • • • • • • Convert a TTX to a two-column HTML or CSV Convert a TMX to a TBX Convert a TMX to a TXT (for spell-check in MS Word) Convert multilingual XML to TMX/TBX Generate HTML preview for XML in SDL Trados Studio Prepare XML files for translation
  • 36. XPath It's a query language to select nodes from an XML document It's used in XSLT Will select all elements that have an attribute called and whose value is And also in SDL Trados Studio file types
  • 37. Is XML good for localization? Yes, but not always
  • 38. XML is great for localization • Unicode supported by default • Metadata gives more information about content • Separates content from formatting (to some extent) • Human readable • Easily transformable using XSLT • Excellent for single-sourcing
  • 39. But bad XML is bad • Translatable content in attributes • No metadata to distinguish between content e.g. mixed languages, translatable vs not translatable • CDATA is just plain cheating • Bad implementations of standards (XLIFF)
  • 40. And also • Multilingual XML can be challenging (XSLT can help) 東京 • Big files and one-liners can cause processing problems (pretty-printing can help)
  • 41. Tools, tools, tools • Altova XMLSpy: all-round XML editor • Altova DiffDog: compare XML files • xmlstarlet: command line XML toolkit • EditPad Pro for all encoding/BOM matters
  • 42. "Specification is only theory. In practice, there is only the parser." @Tnkrd