2. JERRY KURIAN. OVER 20 YEARS EXPERIENCE.
TECHNOLOGY INNOVATOR & ENTREPRENEUR
Started coding with an Intel 486 machine more than 25 years
back and enjoying it ever since. Developed using VB, Pascal,
C++, Java Enterprise and OSS, Scala, Node JS and the saga
continues. Started using Spring, hibernate before it became hip.
Started using Scala when it was in its infancy.
After spending 8 years working in various software
companies like Huawei Tech, Quidnunc across UK, US,
China and India, the entrepreneurship bug bit in 2006
(before it was hip!!). Built one of the pioneers in SMS
social network called CellZapp, I developed the
product on my own and sold it to marquee customers
like ESPN and Hungama Digital. Recently launched a
product in field informatics www.isense-tech.co.in.
Successfully launched across 3 pilot customers and on
track to sign up more.
A family man with two kids, I am a passionate weekend
cricketer and an involved dad. I urge my two sons to
follow their dreams, which they do by staying out of the
conventional schooling system and exploring their passion
at a democratic free school called BeMe. Check it out at
http://beme.org.in
3. ORIGIN OF XML
XML (Extensible Markup Language) is a derivative of
SGML (Standard Generalized Markup Language), the
earliest attempt at a markup language
XML is not a programming language, but a set of
rules that structure data in a representational manner
XML rules are standard and allow easy eXtensibility
as per business needs
4. WHY XML
Most application have domain specific data that
needs to be shared across components
With the service orientation of new-age applications,
data has to be shared across different applications
too
Application can share data in a format that can be
parsed and understood by a program
5. WHY XML
Data has traditionally been shared by defining
protocols and arranging data as per the protocol
Every protocol needs development of a parser for
understanding the protocol and extracting data out of
it
Development of parser is not an easy undertaking
and in fact adds no value to the overall application in
terms of its actual business goals
6. WHY XML
XML provides an easy substitution to the need of
creating proprietary protocols
By following XML rules, new domain specific
language (Protocol) can be generated without the
need for creating its custom parser
Any XML document can be parsed by using a valid
XML parser
7. WHY XML
XML allows application developers to define a
business specific protocol which is easy to read for
humans as well as easy to parse for applications
Numerous parsers are available in all programming
language to parse any XML document
8. ADVANTAGES
XML allows definition of data in a format
understandable to both humans and computers
Standard rules of XML allow a standard parser to be
used for parsing any XML document
XML enables representation of data in simple texts,
allowing easy transfer over any type of
communication medium
9. XML DOCUMENT
An XML document is made up of a set of tags in the
form of ‘<‘ ‘some text’ ‘>’ that denotes start of a ‘node’
The node area ends with ‘<‘ ‘/’ ‘some text’ ‘>’
The XML nodes are made up of
Element
Attribute
Entity
Comment
10. XML USAGE PROBLEM
DEFINITION
Consider a multi user gaming platform where each
user plays a game on his own machine and makes a
move
Data about each move is sent to the other user in the
form of XML
The game requires each player to send a challenge
question to another player with choice of at least 3
answers, one of which can be right
11. XML USAGE PROBLEM
DEFINITION
Whenever a move is sent by player 1 to player 2, the
details of player 1 along with current points should
also be sent
12. XML DEFINITION
In the problem definition, the various elements are
Player
Player Name
Player Address
Player Points
Questions
Question
Answer
13. XML DEFINITION
The various elements identified in the previous slide
can provide almost all the information about a move
made by a player
These elements will be arranged in an XML document
in the following manner
17. ELEMENT
Element is the basic building block of XML document
Every aspect of the domain is described through the
Element
In our example, the nodes like <person>, <question>
etc are elements
As seen above, one element can contain one or more
elements as its child element
18. ATTRIBUTE
If an element has some additional characteristics,
which is not an element in itself, then it can be
denoted using an attribute
The attribute is placed within the element node and
contains a name=value pair
In our example, the list of answers should contain one
correct answer. The correctness of an answer can be
denoted using an attribute
20. ROOT ELEMENT
The XML elements can be represented in the form of
a tree
The top most element of the XML document is the
Root element and each of its child is a root to its own
children
In our case, the <game> element is the root element
of the document.
21. EMPTY ELEMENTS
There could be elements that do not have any child
elements under it
These elements could just have the attributes in it
Such elements are called Empty elements
Empty elements are usually denoted as
<element_name/>. This is same as
<element_name></element_name> with no content
between
22. COMMENTS
Comments can be added into an XML document to
give more information about tags
Comments will be ignored by the parser
Comments can be provided between the tags <!- -
and - - >
<!- - Your comment here - - >
23. ENTITY
Entities can be used to substitute a value for a data
item
Entities behave like macros where they are place
holders for something else
Entities start with & and end with ;
Predefined entity like " will be replaced by a ‘
when parsed
24. CDATA
As seen in the example, most of the element contain
text between then, which is the value for the element
The XML parser returns the value of element by
getting the content between the nodes
If the content contains some special characters like
‘<‘, ‘>’ its, then it may lead to error in parsing
25. CDATA
Such characters can be escaped by using entities as
explained earlier
But if you want to avoid entities, then CDATA section
can be used
When CDATA section is encountered, the parser will
leave it alone and pass the text unchanged
CDATA can be defined in the following format
<![CDATA[ content ]]>.
28. DOCUMENT TYPES
There are two types of XML documents
Well Formed
Well formed and valid
Well formed documents are any XML document that
follow the general XML rules
The XML documents above are examples of well
formed XML documents
29. DOCUMENT TYPES
Well formed and valid XML documents are ones that
not only follow general XML rules, but also conform to
certain domain specific grammar
The domain specific grammar is denoted using DTD
(Document type definition)
DTDs define rules for a domain specific XML
document
30. DTD
DTD is made of tags that define the various nodes
allowed in an XML document
The DTD can be used to define the various aspects of
XML document like
Element
Attribute
Entities
31. DTD
A document can refer to a DTD using the
<!DOCTYPE> element
<!DOCTYPE document [
<! - - DTD goes here - ->
]>
<game>
<person>
<name>Jerry</name>
<address>Bangalore</address>
<points>10</points>
</person>
32. DTD
An XML document can also refer to an external DTD
file instead of defining it as part of the XML document
itself
<!DOCTYPE document SYSTEM “game.dtd">
The SYSTEM specifies this to be a private DTD
33. PUBLIC DTDS
DTDs can be created by public body and can be
accessed by any XML document
<!DOCTYPE document PUBLIC ‘dtd’>
The dtd location needs to be specified using a formal
public identifier (FPI)
FPI Example:
-//W3C//DTD XHTML 1.0 Transitional//EN
34. FPI RULES
The first field indicates whether the DTD is for a formal standard.
For DTDs you create on your own, this field should be -. If a non-
official standards body has created the DTD, you use +. For
formal standards bodies, this field is a reference to the standard
itself (such as ISO/IEC 19775:2003).
The second field holds the name of the group or person
responsible for the DTD. You should use a name that is unique
(for example, W3C just uses W3C).
The third field specifies the type of the document the DTD is for
and should be followed by a unique version number of some kind
(such as Version 1.0).
The fourth field specifies the language in which the DTD is
written (for example, EN for English).
35. DECLARING ELEMENT
The XML elements are declared in DTD using the
following syntax
<!ELEMENT name content_model >
The name indicates the name of the element
The content_model indicates the content that the
element is allowed to have as its children
If there is no content_model specified then the
element will be treated as an empty element
36. DECLARING ELEMENT
In our example, the game element can be declared in
the following way
<!ELEMENT game (person,questions)>
The above element definition specifies that the game
element can have person and questions elements as
its children
If an element provides content_model as ANY then
that element can contain any type of children,
effectively telling parser to ignore validation of the
element
<!ELEMENT name ANY>
37. CHILD ELEMENTS
The DTD can specify the number of children allowed
for each element
<!ELEMENT game (person)>
Specifies game element can have only one person
child element
<!ELEMENT questions (question)*>
Specifies that the questions element can zero or
many question elements as children
38. CHILD ELEMENTS
Element x or y can be present- but not bothx | y
Element x should be followed by element yx , y
There can be zero or one occurrence of the
element
?
There can be one or more occurrences of the
element
+
There can be zero or more occurrences of the
element
*
DescriptionNotation
39. ATTRIBUTE
Attributes provide additional details for an element
Attributes can be defined in a DTD using the following
notation
<!ATTLIST element_name attribute_name type
default_value
40. ATTRIBUTE DEFINITION
In our example,
the element
answer has an
attribute correct
<!ATTLIST answer
correct CDATA
#IMPLIED>
Specifies default
value for attribute
value
Mandates the
attribute
#REQUIRED
Sets attribute’s
value to value
#FIXED value
Attribute is
optional
#IMPLIED
42. ATTRIBUTE TYPES
CDATA- allows character data that should not contain
special characters
Enumerated types provides a comma separated list of
options.
<!ATTLIST answer correct (true | false) #REQUIRED>
]>
NMTOKEN are any name token that confirm to XML
standards
NMTOKENS are a set of NMTOKENS seperated by
white space
43. ENTITY
An entity in XML is just a data item
Entities are usually text that are used quite often
across the document
Entities can also be binary data
Entities can be declared like
<!ENTITY name definition>
46. XML SCHEMAS
XML schemas are an alternate way of defining the
structure of an XML document
Schemas are much more comprehensive and detailed
way of specifying the XML syntax
Schemas also specify the element and attribute of an
XML document
47. SCHEMA EXAMPLE
The game XML document can be defined as
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="game" type="gameType"/>
<xsd:complexType name="gameType">
<xsd:sequence>
<xsd:element name="person" minOccurs="1"/>
<xsd:element name="questions" type="questionType" minOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="questionType">
<xsd:sequence>
<xsd:element ref="query"/>
<xsd:element name="answers" type="answersType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="query" type="xsd:string"/>
<xsd:complexType name="answersType">
<xsd:sequence>
<xsd:element name="answer" type="answerType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="answerType">
<xsd:attribute name="correct" type="xsd:string" use="optional"/>
</xsd:complexType>
</xsd:schema>
48. SCHEMA ELEMENT
Schema element can be defined in the following
manner
<xsd:element name=“query" type="xsd:string"/>
Any element that contain child elements or attribute
needs to be defined as a complexType
Elements that enclose only simple data such as
numbers, strings or date are simpleTypes
49. SCHEMA ELEMENT
There are some built-in simple schema types like
String
anyURI
Boolean
Date
dateTime
<xsd:sequence> element specifies the sequence of
the elements
50. NUMBER OF ELEMENTS
The person element has a minOccurs attribute to
specify that it will occur at least once
To make an element option, minOccurs should be 0
To make it appear from 0 to 10 times, then we can
use minOccurs=“0” and maxOccurs=“10”
To specify unlimited number of occurances, set
maxOccurs=“unbounded”
51. VALUES OF ELEMENT
An element can be specified a default value through
<xsd:element name=“term” type=“xsd:integer”
default=“10”/>
An element can be specified a fixed value through
<xsd:element name=“term” type=“xsd:integer”
fixed=“200”/>
52. ATTRIBUTES
An element with attributes can be specified in the
following manner
<xsd:attribute name="correct" type="xsd:string"
use="optional"/>
Optional tag specifies that the attribute is optional
Some of the other use attribute that can be specified
are
Default
Fixed
Optional
Prohibited
required
53. NAMESPACE
The namespaces are useful in reuse of XML tags
Once XML document can reuse part of another well
defined XML document
The new XML document may contain elements that
have same name as of the other XML document
being referred
The name clashes can be avoided using a
namespace
54. NAMESPACE
The namespace for an XML document can be defined
using the targetNamespace attribute of the schema
element
<xsd:schema targetNamespace="http://xmlpowercorp"
xmlns="http://xmlpowercorp"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="qualified" elementFormDefault
55. NAMESPACE
The qualified attribute value specifies that the
namespace name will be specified before every
element in that namespace
To avoid this we can use set the value to unqualified