XML is everywhere. Computers, Mobiles, Bank Systems, Internet, TVs, Microwaves, all use XML as an Information Wrapping and Information Xchange System. We will tell you all the basics in a simplest possible way.
What is XML?
HTML vs. XML.
Basic XML Syntax.
Some XML Rules.
Element Vs. Attribute.
Node Naming Principles.
Advanced Concepts related to XML
Future of XML
XML Eye Opener
SIMPLE: So simple that you would wonder, why you
were not trying to understand it till date.
SUCCESSFUL: Most successful data storage format till
date that even big brand who were strong believers of
proprietary formats for commercial reasons have started
SOLID: Most solid ageless concept that this generation
will pass-on to other future generations and they will
keep the baton moving.
What is XML-1
XML is abbreviation of
eXtensible Markup Language.
XML evolved from more general
purpose ISO standard SGML
(Standard Generalised Markup
All Data needs Description to make
it some useful Information. XML
provides a neat solution.
XML looks like normal English but it
has been designed to be machine
What is XML-2
XML can store data
XML can help standardization in
exchange of data.
User defined markup tags to name
Library Functions are available in most
programming languages to parse XML.
The syntax looks like
Understanding Basic XML Syntax
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<CAPITAL largestcity="No">New Delhi</CAPITAL>
Version: of XML
Used. UTF-8 is common
(unicode 8 bit variant)
non-usage of external
Root Element Node
Five predefined Entities to allow for special charaters in the PCDATA
> to >
< to <
& to &
' to '
" to "
CDATA section (Character Data Not to be parsed). This is meant for
putting lot of code like or general purpose data. Even HTML data can
be put here.
<![CDATA[ ... ]]>
Processing Instructions (PI) or Directives given betweem <? ?>
<?xml-stylesheet type="text/css" href="mySheet.css"?>
or even initial declaration like below is a PI
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Parsable Character data (PCDATA)
between element <address> start and end
Attribute has a name and a value in
Some XML Rules - 1
All elements to have closing tags.
All elements are case sensitive.
Elements shall be correctly nested.
Attribute values must be quoted.
Some XML Rules - 2
XML Document must have a root element and only one root element
(it can have any name though).
Entities in data values must use special codes.
> as > < as < & as & ' as ' " as "
Comments has this syntax.
<!– This is a comment -->
Comments can not contain – in its text matter.
Whitespace are preserved as against HTML. For e.g.
"Hello World" in HTML would be "Hello World". In XML it will retain
exact spaces specified.
Empty Elements have this kind of optional format.
Some XML Rules - 3
Whitespace are preserved as against
"Hello World" in HTML would be
In XML it will retain exact spaces
The optional style of writing empty
<Name /> in place of <Name></Name>
XML Practice: Element Vs Attributes - 1
It is generally possible to define all data as
ELEMENT tags in a tree format.
A neat alternative to above could be using
ATTRIBUTES as follows:
<Book ID="201" ISBN="8175257660">
XML Practice: Element Vs Attributes -2
Which method to use is a thoughtful decision.
Information that is surely singular (will not be
repeated) and is not domain specific is recommended
If you are unable to classify or the Information can be
repeated (For e.g. Author tag can be repeated in
above example) should be used as ELEMENT.
Even better format for previous example would be
This is because ISBN is a book related property while ID
may be related to a storage place.
XML Node Naming – Begins with
Node (elements or attributes) names shall
begin with a letter or _ (underscore).
<1STLINE></1STLINE> invalid element naming
<LINE1></LINE1> valid naming
<BOOK 1Ver="1.00"></BOOK> invalid attribute naming
<BOOK _Ver="1.00"></BOOK> valid attribute naming
XML Node Naming – Consists of
Name can consist of
Any English Character or even any foreign language
character as allowed by the encoding set given in the
A dot (.) or hyphen (-) or _(undescore)
Tabs and Spaces are not allowed in
XML Node Names.
XML Node Naming – Based on
Name can belong to a namespace
Table may be used in html or furniture. One can
resolve this problem by using namespaces as follows
HTML Vs XML - 1
Both Uses markup tags
(elements and attributes) e.g.
<H1>Heading1</H1> or <font
Both use entities e.g. < >
Both are derived from SGML
HTML Vs XML - 2
HTML has predefined tags, XML
tags are user defined.
HTML is for Humans and errors
are ignored. XML is for
computers as data storehouse or
definitions so errors can not be
HTML is usually not updated by
programs while XML is meant for
program based writing.
HTML has large number of
entities. XML has just five.
XSL (Extensible Stylesheet Language)
Unlike HTML styling using CSS (Cascade
Style Sheet) it has tags that are user
It has three parts
XSLT (XSL Transformation): for showing XML
data as transformed XHTML onto a webpage.
Xpath: a way to reach a particular data-item in
an XML file. This is very often useful in
reading XML based configuration files.
XSL-FO (XSL Formatting Objects): Provides a
display/print formatting mechanism for XML
DTD (Document Type Definition)
A DTD is referred within a DOCTYPE
declaration in an XML file such as.
<!DOCTYPE note SYSTEM "Note.dtd">
This DTD file will have the format as
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
XML file has the root node
named note with four sub-
Process of reading XML file and extracting
valid data out of it is called "PARSING".
Parsers are of two types
Non-Validating Parser: When the document
doesn't check against a validating DTD.
Validating Parser: When a document is
checked against its DTD
Some Advanced Concepts Related to
XML Schema: Relates to defining
validation rules in form of XSD
(XML Schema Definition) files that
too are in the XML format.
XQuery: This is a way to search
within an XML file and get the
selected nodes that match the
Where to View/Edit
Browsers: Most Browsers are good at viewing
XML. Internet Explorer is particularly good at it.
Editors: Special Editors are available that allow
good XML views/editing facilities. Microsoft's
XML Editor, Peter's XML editor are good at it.
Office Tools: MS-Word, Frontpage like tools
provide good XML Editing. Even MS-Excel
support XML file opening.
Visual Studio/WebDeveloper: They provide
excellent environment for XML editing and
viewing along with validation support.
Let's Quickly Revise
2 Types of Nodes: Elements and Attributes. Elements
are repeatable. Attributes can always be put up like
elements, reverse may not be true.
Special syntax for non-parsable data as CDATA.
5 Entities for special symbols( <, >, ', ", &).
HTML style Comments Allowed. <!-- comments --
Case-Sensitive. Closing Required
One can apply other Processing Instructions (PI) that
is enclosed with in <? ?>. First line is usually a
Version declaration line which is also a PI.
Always have a single root node.
Future of XML
All websites may one day be written in XML.
HTML has already been re-standardised as
XHTML which provides better syntax checking
and browser compatibility.
XML promises to be the most open system for
storage of information from all IT gadgets like
Desktops to Mobile phones to ipods to ipads to
DVD players to microwave-ovens etc. It is already
being used and it is expected to be used in more
and more devices.
All office documents/e-books offline and online
shall ultimately be in XML as it is the sole non-
proprietary format that is simple and is able to
meet the needs well.
Ask and guide me at
Share this information with as
many people as possible.
Keep visiting www.sunmitra.com
for programme updates.