Basics of XML

A Programme Under the compumitra Series
Copyright 2010-14 © Sunmitra Education Technologies Limited, India
eXtensible Markup Language (XML)
A comment by Tim Bray of Sun Microsystems on Celebration of
10th Anniversary of XML in Feb 2008.
"There is essentially no computer in the world, desk-top, hand-held,
or back-room, that doesn't process XML sometimes. This is a good
thing, because it shows that information can be packaged and
transmitted and used in a way that's independent of the kinds of
computer and software that are involved. XML won't be the last
neutral information wrapping system; but as the first, it's done very
well."

Outline
 XML Eye-opener.
 What is XML?
 HTML vs. XML.
 Basic XML Syntax.
 Constituents.
 Some XML Rules.
 Element Vs. Attribute.
 Node Naming Principles.
 Advanced Concepts related to XML
 Future of XML

XML Eye Opener
 SIMPLE: So simple that you would wonder, why you
were not trying to understand it till date.
 SUCCESSFUL: Most successful data storage format till
date that even big brand who were strong believers of
proprietary formats for commercial reasons have started
using it.
 SOLID: Most solid ageless concept that this generation
will pass-on to other future generations and they will
keep the baton moving.

What is XML-1
 XML is abbreviation of
eXtensible Markup Language.
 XML evolved from more general
purpose ISO standard SGML
(Standard Generalised Markup
Language).
 All Data needs Description to make
it some useful Information. XML
provides a neat solution.
 XML looks like normal English but it
has been designed to be machine
readable.

What is XML-2
 XML can store data
 XML can help standardization in
exchange of data.
 User defined markup tags to name
dataitems.
 Library Functions are available in most
programming languages to parse XML.
 The syntax looks like
<addressbook>
<adrrecord>
<name>Name1</name>
<address>Address1</address>
<city>City1</city>
</adrrecord>
</addressbook>

Understanding Basic XML Syntax
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<COUNTRYLIST>
<COUNTRY group="G20">
<NAME>India</NAME>
<CODE>IN</CODE>
<ISD>91</ISD>
<CAPITAL largestcity="No">New Delhi</CAPITAL>
<LCITY>Mumbai</LCITY>
<CURRENCY>Indian Rupee</CURRENCY>
<CURCODE>INR</CURCODE>
</COUNTRY>
<COUNTRY group="G5">
<NAME>Japan</NAME>
<CODE>JP</CODE>
<ISD>81</ISD>
<CAPITAL largestcity="Yes">Tokyo</CAPITAL>
<LCITY>Tokyo</LCITY>
<CURRENCY>Yen</CURRENCY>
<CURCODE>JPY</CURCODE>
</COUNTRY>
</COUNTRYLIST>
Element
Node
XML Declarations:
Version: of XML
Encoding: Character-set
Used. UTF-8 is common
(unicode 8 bit variant)
Standalone=Yes, depicts
non-usage of external
type definitions
Attribute Node
Root Element Node
Element Value
Attribute Value

XML Constituents
 Elements
<address><name>somename</name></address>
 Attributes
<Book Version="1.0"><name></name></Book>
 Five predefined Entities to allow for special charaters in the PCDATA
area.
> to >
< to <
& to &
' to '
" to "
 CDATA section (Character Data Not to be parsed). This is meant for
putting lot of code like or general purpose data. Even HTML data can
be put here.
<![CDATA[ ... ]]>
 Processing Instructions (PI) or Directives given betweem <? ?>
<?xml-stylesheet type="text/css" href="mySheet.css"?>
or even initial declaration like below is a PI
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Parsable Character data (PCDATA)
between element <address> start and end
tags.
Attribute has a name and a value in
quotes.

Some XML Rules - 1
 All elements to have closing tags.
<address>invalid syntax
<address>valid syntax</address>
 All elements are case sensitive.
<Name>incorrect</name>
<Name>correct</Name>
 Elements shall be correctly nested.
<address><name>incorrect</address></name
>
<address><name>correct</name></address>
 Attribute values must be quoted.
<Book Version=1.0><name></name></Book>
(Incorrect)
<Book
Version="1.0"><name></name></Book>
(correct)

Some XML Rules - 2
 XML Document must have a root element and only one root element
(it can have any name though).
<root>
<Child>correct</child>
</root>
 Entities in data values must use special codes.
> as > < as < & as & ' as ' " as "
 Comments has this syntax.
<!– This is a comment -->
Comments can not contain – in its text matter.
 Whitespace are preserved as against HTML. For e.g.
"Hello World" in HTML would be "Hello World". In XML it will retain
exact spaces specified.
 Empty Elements have this kind of optional format.
<Name />

Some XML Rules - 3
 Whitespace are preserved as against
HTML.
For e.g.
"Hello World" in HTML would be
"Hello World".
In XML it will retain exact spaces
specified.
 The optional style of writing empty
elements is.
<Name /> in place of <Name></Name>

XML Practice: Element Vs Attributes - 1
 It is generally possible to define all data as
ELEMENT tags in a tree format.
<Library>
<Book>
<ID>201</ID>
<ISBN>8175257660</ISBN>
<Author>Name1</Author>
<Title>Book Title</Title>
</Book>
</Library>
 A neat alternative to above could be using
ATTRIBUTES as follows:
<Library>
<Book ID="201" ISBN="8175257660">
</Book>
</Library>

XML Practice: Element Vs Attributes -2
 Which method to use is a thoughtful decision.
 Information that is surely singular (will not be
repeated) and is not domain specific is recommended
as ATTRIBUTE.
 If you are unable to classify or the Information can be
repeated (For e.g. Author tag can be repeated in
above example) should be used as ELEMENT.
 Even better format for previous example would be
<Library>
<Book ID="201">
<ISBN>8175257660</ISBN>
</Book>
</Library>
This is because ISBN is a book related property while ID
may be related to a storage place.

XML Node Naming – Begins with
 Node (elements or attributes) names shall
begin with a letter or _ (underscore).
<1STLINE></1STLINE> invalid element naming
<LINE1></LINE1> valid naming
<BOOK 1Ver="1.00"></BOOK> invalid attribute naming
<BOOK _Ver="1.00"></BOOK> valid attribute naming

XML Node Naming – Consists of
 Name can consist of
 Any English Character or even any foreign language
character as allowed by the encoding set given in the
declaration.
<Name>Sun</Name>
<नाम>सूरज</नाम>
 A dot (.) or hyphen (-) or _(undescore)
<Address.Cityname>Delhi</Address.Cityname>
<Address-Cityname>Delhi</Address-Cityname>
<Address_Cityname>Delhi</Address_Cityname>
Tabs and Spaces are not allowed in
XML Node Names.

XML Node Naming – Based on
Namespace
 Name can belong to a namespace
 Table may be used in html or furniture. One can
resolve this problem by using namespaces as follows
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>Dining Table</f:name>
<f:width>120</f:width>
<f:length>230</f:length>
</f:table>

HTML Vs XML - 1
 Similarities.
Both Uses markup tags
(elements and attributes) e.g.
<H1>Heading1</H1> or <font
face="Verdana"></font>.
Both use entities e.g. < >
etc.
Both are derived from SGML

HTML Vs XML - 2
 Differences.
HTML has predefined tags, XML
tags are user defined.
HTML is for Humans and errors
are ignored. XML is for
computers as data storehouse or
definitions so errors can not be
ignored.
HTML is usually not updated by
programs while XML is meant for
program based writing.
HTML has large number of
entities. XML has just five.

XSL (Extensible Stylesheet Language)
 Unlike HTML styling using CSS (Cascade
Style Sheet) it has tags that are user
defined.
 It has three parts
XSLT (XSL Transformation): for showing XML
data as transformed XHTML onto a webpage.
Xpath: a way to reach a particular data-item in
an XML file. This is very often useful in
reading XML based configuration files.
XSL-FO (XSL Formatting Objects): Provides a
display/print formatting mechanism for XML
data.

DTD (Document Type Definition)
 A DTD is referred within a DOCTYPE
declaration in an XML file such as.
<!DOCTYPE note SYSTEM "Note.dtd">
 This DTD file will have the format as
follows.<!DOCTYPE note
[
<!ELEMENT note
(to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
XML file has the root node
named note with four sub-
elements.
The sub-
elements have
the PCDATA
format.

Parsing XML
 Process of reading XML file and extracting
valid data out of it is called "PARSING".
 Parsers are of two types
Non-Validating Parser: When the document
doesn't check against a validating DTD.
Validating Parser: When a document is
checked against its DTD

Some Advanced Concepts Related to
XML
 XML Schema: Relates to defining
validation rules in form of XSD
(XML Schema Definition) files that
too are in the XML format.
 XQuery: This is a way to search
within an XML file and get the
selected nodes that match the
criteria.

Where to View/Edit
 Browsers: Most Browsers are good at viewing
XML. Internet Explorer is particularly good at it.
 Editors: Special Editors are available that allow
good XML views/editing facilities. Microsoft's
XML Editor, Peter's XML editor are good at it.
 Office Tools: MS-Word, Frontpage like tools
provide good XML Editing. Even MS-Excel
support XML file opening.
 Visual Studio/WebDeveloper: They provide
excellent environment for XML editing and
viewing along with validation support.

Let's Quickly Revise
 2 Types of Nodes: Elements and Attributes. Elements
are repeatable. Attributes can always be put up like
elements, reverse may not be true.
 Special syntax for non-parsable data as CDATA.
 5 Entities for special symbols( <, >, ', ", &).
 HTML style Comments Allowed. <!-- comments --
>
 Case-Sensitive. Closing Required
 One can apply other Processing Instructions (PI) that
is enclosed with in <? ?>. First line is usually a
Version declaration line which is also a PI.
 Always have a single root node.

Future of XML
 All websites may one day be written in XML.
HTML has already been re-standardised as
XHTML which provides better syntax checking
and browser compatibility.
 XML promises to be the most open system for
storage of information from all IT gadgets like
Desktops to Mobile phones to ipods to ipads to
DVD players to microwave-ovens etc. It is already
being used and it is expected to be used in more
and more devices.
 All office documents/e-books offline and online
shall ultimately be in XML as it is the sole non-
proprietary format that is simple and is able to
meet the needs well.

 Ask and guide me at
sunmitraeducation@gmail.com
 Share this information with as
many people as possible.
 Keep visiting www.sunmitra.com
for programme updates.

Basics of XML

More Related Content

What's hot

Viewers also liked

Similar to Basics of XML

More from indiangarg

Recently uploaded

Basics of XML