The document provides an overview of XML including:
1. XML was developed by the W3C to overcome HTML limitations and transport data rather than display it. XML is readable, understandable, well-defined, and self-descriptive.
2. An XML document has a tree structure with a root element containing child elements, attributes, and data. Elements are used to classify data and can contain other elements, text, and attributes.
3. XML documents must follow syntax rules like having matching opening and closing tags and properly nested elements. Attributes require values to be in quotes.
2. Index No Index No
Introduction 01-02 Viewers - JavaScript 28
HTML - XML 03 Viewers - XSLT 29
Sample XML Document 04
Tree Structure 05-09
Basic Rules 10-12
Elements 13-15
Attributes 16
elements and attributes 17
Validation 18-19
XML Namespaces 20
CDATA 21
XPath 22
DTD 23-25
Viewers 26
Viewers - CSS 27
3. Introduction
The XML or eXtensible Markup Language is defined as a markup language used to create
document using your own self describing tags.
The World Wide Web Consortium (W3C) developed this language to overcome the limitations of
the Hypertext Markup Language (HTML) which forms the basis for all Web pages.
Similar to HTML, XML is also based on Standard Generalized Markup Language (SGML).
Regarded as a W3C Recommendation on February 10, 1998.
Designed to transport data rather than to display data.
XML is readable and understandable, even by novices and easy to code.
01
4. Introduction
Possess a well-defined structure as it is designed to be self-descriptive.
Separates data from HTML. With XML, you can store data in separate XML files which lets you
focus on HTML/CSS for display and layout, without causing any changes to the HTML when the
underlying data changes.
Eases the process of data sharing: As XML stores data in plain text format, it offers a method that
is independent of the software and hardware data storage mechanism.
Simplifies the platform changes: As XML stores data in text format, it enables you to expand or
upgrade the data to new operating systems, new applications, or new browsers, without losing
data.
Used to develop new internet languages such as XHTML, WSDL, and SMIL and so on.
02
6. Sample XML Document
04
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Line 1: Indicates the XML declaration that defines the XML version and the encoding used. In this
example, the XML version used is 1.0 with the encoding as ISO-8859-1 = Latin-1/West European
character set.
Line 2: Indicates the root element of the document. In this example the root element is <note>.
Line 3: Indicates the child elements of the root. The four child elements in this example are to,
from, heading, and body.
Line 4: Indicates the end of the root element. In this example the root element is </note>.
Note: When you observe the code in an XML document, you will realize that the XML program is
selfdescriptive
7. Tree Structure
The various components of an XML document are:
Prolog
XML Declaration
DTD Declaration
Root Element (Document)
Elements (Nested Elements)
Element Attributes and Values
Data
05
8. Tree Structure
XML prolog: Is an optional piece of information that comes before the root element. A prolog
consists of two parts namely:
XML declaration: Indicates the version of XML used. The declaration is not absolutely
necessary, but should be included anyways.
<?xml version="1.0"?>
Document Type Declaration (DTD): Describes the rules of your XML document that must be
followed
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
If both an XML declaration and a DTD declaration are to be included on an XML document
then XML declaration should be placed first, followed by the DTD declaration.
06
9. Tree Structure
XML root element:
Indicates the type of XML document.
Your XML must include the root element as the first element in an XML document.
There can only be one root element in one XML document.
The root element is also known as the document.
Element attributes and values:
The essence of an XML document originates from the elements contained within the root
element.
Each element in the document characterizes a different type of data stored in the document.
Usually, the elements are associated with attributes.
The XML attributes are quite similar to HTML attributes as they have a name value
relationship.
07
10. Tree Structure
Data:
XML elements contain data in text format.
Apart from the data stored in the element itself, the bulk of data in XML typically resides within
the opening and closing tag of an XML element.
This data is commonly known as XML's content.
08
11. Tree Structure
All the XML documents form a tree structure that begins at the root and branches to the leaves.
Hence, all XML documents must contain a root element, commonly referred to as the parent
element.
In addition to this, all elements can have sub elements commonly known as child elements.
The terms parent, child, and sibling are mainly used to specify the relationships between elements,
wherein, all elements can contain text content and attributes.
09
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
12. Basic Rules
10
The syntax rules of XML are very simple and logical. Following are the syntax rules used to create
a wellformed XML document.
All XML elements must contain a closing tag. It is observed that the XML treats it illegal to
leave the closing tag. However, you don’t need the closing tag for the declaration as it is not a
part of the XML document itself.
<p>This is a paragraph</p> - Valid Code
<p>This is another paragraph - Invalid Code
All the opening and closing tags must be written with the same case as XML is case sensitive.
<message>Learning is fun</message> - Valid Code
<Message> Learning is fun </message> - Invalid Code
Any XML document must contain one root element that is the parent of all other elements.
13. Basic Rules
11
All XML elements must be properly nested within each other. Consider the below code block that
shows proper nesting of all XML elements within each other.
<b><i>This text is bold and italic</i></b>
You must close the <i> element inside the <b> element only as you have opened it with the
<b> element as shown in the illustration above.
All the XML attribute values must be quoted. The XML elements are allowed to contain attributes in
name/value pairs just like in HTML.
Invalid Code :
<note date=12/11/2007>
<to>John</to>
<from>Jani</from>
</note>
Valid Code :
<note date="12/11/2007">
<to>John</to>
<from>Jani</from>
</note>
14. Basic Rules
12
XML stores a new line as LF.
XML does not shorten the white-spaces in a document.
XML comments are similar to HTML. The syntax to write a comment in XML is as follows:
<!-- This is a comment -->
Apart from the above syntax rules, your XML document might contain some characters that
represent a special meaning. For instance, when you include a character like < within an XML
element, it generates an error as the parser interprets it as the start of a new element.
<message>if salary < 1000 then</message> Invalid Code
However, in order to avoid the above error, you can replace the < character with an entity reference
as shown below:
<message>if salary < 1000 then</message> Valid Code
15. Elements
13
An element in the XML document is used to classify data thus rendering the data as self-
explanatory.
The opening and closing tags represent the start and end of an element respectively.
In addition to this, you can use the attributes to include extra information apart from the data
enclosed between the opening and closing tag.
Typically, an XML element can contain the following entities:
other elements
text
attributes
or a mix of all of the above
16. Elements
14
Unlike any programming language, XML requires you to include meaningful names for the XML
elements that would render the code more understandable.
Following are some of the important naming rules for XML elements:
Include simple, short and descriptive names.
Can hold letters, numbers, and other characters.
Avoid characters as element names.
Although, non-English letters like éòá are perfectly legal in XML, you might encounter issues if
your software vendor does not provide any support.
Cannot begin with a number or punctuation character.
Cannot begin with the letters xml (or XML, or Xml, etc).
Cannot contain spaces.
18. Attributes
16
Syntax <element attributeName = “value”>
Description Used to specify any additional information about the elements in an
XML document. Also, an attribute must always appear within the
opening tag of an element. Unlike in HTML, XML demands you to
include value to all XML attributes, that is, attributes must always be
assigned a value.
Illustration <student active="true">
<name>Robert</name>
<grade>A+</grade>
</student>
Notes Attributes are not displayed in any special way. They are invisible to
the reader.
19. elements and attributes
17
Though XML attributes are being widely used, following are a few issues you might encounter while
using attributes:
Attributes cannot hold multiple values as against elements.
Attributes cannot hold tree structures as against elements.
Attribute values are tedious to examine against a DTD.
20. Validation
18
An XML validation is defined as a process of testing a XML document to confirm that it is both
wellformed and also valid.
A well-formed document is one that strictly follows all the basic syntactic rules of XML.
A valid document is one that follows all the rules specified by a particular DTD or XML schema.
Although, an XML document is considered to be well-formed when it successfully fulfills all the
syntactical requirements defined by the W3C, it might still be invalid if it does not follow the
constraints specified by its DTD or schema.
Therefore, although all valid XML documents are well-formed, it is not mandatory that all well-
formed XML documents are valid.
Any error in an XML document instantaneously terminates your XML applications.
To render the XML software as small, fast and compatible, the W3C XML specification states that a
program must quit execution of an XML document as soon as it encounters an error.
There are several XML validation tools that are available online to check the validity of XML.
21. Validation
19
Well-formed XML document:
Is defined as an XML document that follows all the syntax rules that most other markup
languages follow.
The syntax rules that must be followed by each XML document to be well-formed are as listed
below:
The XML document must contain only one root element.
The XML elements must be properly nested, that is, the elements must be closed in the
order they are opened.
All the tags are case-sensitive.
All the XML attributes should be enclosed within quotes.
All the XML elements must contain a closing tag.
All the entities that are referenced in the document must also be well-formed.
22. XML Namespaces
20
When you are creating new elements, there is the chance that the element's name already exists.
The solution to this problem is to create XML Namespaces, which will differentiate between these
two similarly named elements.
Syntax:
<?xml version="1.0" encoding="ISO-8859-15"?>
<html:html xmlns:html='http://www.w3.org/TR/xhtml1/'>
<html:body>
<html:p>Welcome</html:p>
</html:body>
<health:body xmlns:health=‘http://www.exampleorg.com/health'>
<health:height>6ft</height>
<health:weight>75kgs</weight>
</health:body>
</html:html>
23. CDATA
21
The term CDATA is used about text data that should not be parsed by the XML parser.
Everything inside a CDATA section is ignored by the parser.
A CDATA section starts with "<![CDATA[" and ends with "]]>":
Example
<city>
<![CDATA[
Text you want to escape goes here...
]]>
</city>
<pincode>560102</pincode>
24. XPath
22
XPath is the solution to finding information in an XML document.
XPath uses expressions to find elements, attributes, and other information in your XML.
XPath expressions:
Accessing element: address/city/
Accessing attribute: addressbook/address/state@direction
Descendants: addressbook//city
Parent: state/..
Wildcard: address/*
Combine: address/city | address/state
Predicates: addressbook/address[state = ‘Karnataka’]
25. DTD
23
A Document Type Definition (DTD) is a type of schema.
The purpose of DTDs is to provide a framework for validating XML documents.
By defining a structure that XML documents must conform to, DTDs allow different organizations to
create shareable data files.
A DTD outlines what elements can be in an XML document and the attributes and subelements
that they can take.
Example
<!ELEMENT addressbook (address+)>
<!ELEMENT address (city, state)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ATTLIST state direction CDATA #REQUIRED>
26. DTD
24
When creating a DTD, the first step is to define the document element.
<!ELEMENT addressbook(address+)>
The element declaration above states that the addressbook element must contain one or more
address element.
When defining child elements in DTDs, you can specify how many times those elements can
appear by adding a modifier after the element name.
Modifier Description
? Zero or one times.
+ One or more times.
* Zero or more times.
27. DTD
25
Each name element must contain a city and state element, which each must appear once and only
once and in that order.
<!ELEMENT name (city, state)>
Some elements contain only text. This is declared in a DTD as #PCDATA. PCDATA stands for
parsed character data, meaning that the data will be parsed for XML tags and entities. The city and
state elements contain only text.
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
Attributes are declared using the <!ATTLIST > declaration
The DOCTYPE declaration in an XML document specifies the DTD to which it should conform.
<!DOCTYPE addressbook SYSTEM “address.dtd">
28. Viewers
26
There are several free viewers readily available to enable you to view the XML documents.
Though you cannot expect XML files to display as HTML pages, you can use all major browsers to
view the raw XML files can be viewed in all major browsers.
In case you have an erroneous XML file opened, the browser immediately reports an error.
As XML documents do not transport information on how to display the data.
Hence, due to lack of information on how to display the data, almost all the browsers displays the
XML document as it is.
In order to solve this problem, you can use any of the following technologies to display data:
CSS
JavaScript
XSLT
As a security concern, most of the latest browsers blocks access across domains. Hence, you
must place both the web page and the XML file it tries to load on the same server.
29. Viewers - CSS
27
You can use Cascading Style Sheet (CSS) to format an XML document and add display
information to an XML document.
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href=“style.css"?>
<addressbook> <address>
<houseno>100</houseno>
<cross>27</cross>
<location>HSR Layout</location>
<city>Bangalore</city>
<pincode>560102</pincode>
<state direction="south">Karnataka</state>
. .
</address>
30. Viewers - JavaScript
28
You can use JavaScript to parse XML data within an HTML.
Example:
<script type="text/javascript">
function displayBusinessCardData() {
var xmldata1 = document.getElementById(“addressbook");
var bizCard = xmldata1.getElementsByTagName(“address")[0];
</script>
31. Viewers - XSLT
29
You can use XSLT to parse XML data within an HTML.
Example
<?xml version="1.0"?>
<xsl:stylesheet xmlns="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.
org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<html>
<head><title>Our Items</title></head>
<body style="background-color:#DACFE5; font-family:Arial, Helvetica, sans-serif">
<xsl:for-each select="/items/item">
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>