XML Introduction,Syntax of XML,Well formed XML Documents,XML Document Structure,Document Type Definitions,XML Namespace,XML Schemas,DOM(Document Object Model)
1. XML
Extensible Markup Language
1
Name of the Staff : M.FLORENCE DAYANA M.C.A.,M.Phil.,(Ph.D).,
Head, Dept. of BCA
Bon Secours College For Women
Thanjavur.
Class : III BCA A
Subject : Web Designing
Semester : V
Unit : 5 - XML
2. Chapter : 1 Introduction
• Extensible Markup Language (XML) is a markup language that defines a set of rules
for encoding documents in a format that is both human-readable and machine-
readable through use of tags that can be created and defined by users.
• XML is a subset of Standard Generalized Mark up Language(SGML). Which is
the parent of other markup language, such as hypertext Markup
Language(HTML).
• A Markup language is composed of commands that instruct a program such
as word processor, text editor and internet browser how to publish the
output on the screen.
• XML is a Meta Markup Language(MML) is a language for defining markup
languages.
• Meta Markup is Markup that also allows you to provide extra data to describe
structure
2
3. • It is an unlimited set of tags.
• It provides a framework for tagging structured data.
• It is designed to enable the use of SGML (Standard Generalized Markup
Language) on WWW.
• It is not single, predefined markup language. It is Meta language, that
specifies rules for creating markup languages.
• XML is a language for describing other languages, which lets you design
your own markup.
• XML documents are made up of markup and character data.
• Character data is also known as content (all text and images that appear
on the page) 3
4. Some Reasons for XML has become as popular as
it is today:
1. XML is easy to understand and read.
2. A large number of platforms support XML and large
set of tools available for XML data reading, writing
and manipulation.
3. XML can be used across open standards that are
available today.
4. XML allows developers to create their own definitions
and models for representation.
4
5. Advantages of XML
XML brings power & flexibility to web based applications.
It provides a number of benefits to developers & users.
1. More meaningful searches.
2. Development of flexible web application.
3. Data integration from different sources.
4. Local computation & manipulation of data.
5. Multiple views of the data
6. It shall support a wide variety of application
7. Xml doc shall be easy to create
5
6. Syntax of XML
The syntax of XML can be thought of at two distinct levels
1. The general low-level rules that apply to all XML documents
2. Second specifies by either document type definitions(DTD) or XML
Schemas.
Rules when you create XML syntax:
1. XML names are used to name elements and attributes. An XML
name must begin with a letter or a underscore and can include
digits, hyphens and periods.
2. All xml elements must have a closing tag.
3. Xml tags are CASE sensitive, so Body, body, BODY are all distinct
names.
4.There is no length limitation for XML names.
5. All xml elements must be properly nested.
6. All xml documents must have a root element.
7. Attribute values must always be quoted.
6
7. • All XML documents begin with an XML declaration:
• <?xml version = "1.0" encoding =
"utf-8"?>
• Character set & Encoding
– All informations in xml is unicode text. It supports
representation of all international character sets.
– Unicode can be transmitted directly as if bit characters.
– Xml supports a range of encodings default is UTF-8
– UTF-8 (Unicode Transformation Format in 8-bit format)
7
8. elements
• Every XML document defines a single root element.
• An element is everything from starting and ending tag.
• An element can contain:
-> other elements
-> text
-> attributes
-> or a mix of all the above
. Top element is the Root element or Document element.
. All the other elements are like Child elements.
. At the end of the branches, the elements that contain Character data.
. Empty elements do not contain any Child elements or Character data
such as image files, sound, video files and line break.
8
9. The Syntax of XML Tag
• <!-- A tag with one attribute -->
• <class name = “BCA A SECTION STUDENTS">
• ...
• </class>
• <!-- A tag with one nested tag -->
• <patient>
• <name> BCAA SECTION STUDENTS </name>
• ...
• </patient>
• <!-- A tag with one nested tag, which contains three nested tags -->
• <patient>
• <name>
• <first> BCA</first>
• <middle> A SECTION </middle>
• <last> STUDENTS </last>
• </name>
• </patient>
9
10. Example
<?xml version=“1.0”?> xml declaration
<mail> Root element
<to>virat</to>
<from>sachin</from> child element
<heading>match</heading>
<body>don’t forget call me</body>
</mail> end root element
10
11. Well formed XML Documents.
All XML documents that strictly used to these syntax rules is
considered as well formed rules.
• A well-formed XML document must have a corresponding end tag for all
of its start tags.
• Nesting of elements within each other in an XML document must be
proper. For example, <tutorial><topic>XML</topic></tutorial> is a
correct way of nesting but <tutorial><topic>XML</tutorial></topic> is
not.
• In each element two attributes must not have the same value. For
example, <tutorial id="001"><topic>XML</topic></tutorial> is right,
but <tutorial id="001" id="w3r"><topic>XML</topic></tutorial> is
incorrect.
• Markup characters must be properly specified. For example, <tutorial
id="001"><topic>XML</topic></tutorial> is right, not <tutorial id="001"
id="w3r"><topic>XML</topic></tutorial>.
• An XML document can contain only one root element. So, the root
element of an xml document is an element which is present only once in
an xml document and it does not appear as a child element within any
other element.
11
12. 12
Chapter : 2 XML Document Structure
- An XML document often uses two auxiliary files:
- One to specify the structural syntactic rules
- One to provide a style specification
- An XML document has a single root element, but often consists of one or more entities
- Entities range from a single special character
- An XML document has more entities called document entity.
Reasons for entity structure
1. Large documents are easier to manage
2. Repeated entities need not be repeated
3. Binary entities can only be referenced in the document entities, such as images.
14. 14
- Entity names:
- No length limitation
- Must begin with a letter, a dash, or a colon
- Can include letters, digits, periods, dashes,underscores, or
colons
A reference to an entity has the form
&entity_name;
For example, if apple_image is the name
of the entity, &apple_image; is a
reference to it.
15. 15
If several predefined entities must appear near each other in a document.
Predefined Entities or Reserved Characters
< <
> >
& &
" "
' '
White space - Ignore white space, tabs, new lines..
16. Character data section
<![CDATA[ content ]]>
e.g., instead of
Start > > > > HERE
< < < <
use
<![CDATA[Start >>>> HERE <<<<]]> 16
18. 18
- A DTD is a set of structural rules called declarations
- These rules specify a set of elements, along with how
and where they can appear in a document.
- The DTD for a document can be internal or external
- All of the declarations of a DTD are enclosed in
the block of a DOCTYPE
- DTD declarations have the form:
<!keyword … >
- There are four possible declaration keywords:
ELEMENT, ATTLIST, ENTITY, and NOTATION
19. 19
Document Type Definitions (continued)
- Declaring Elements
- An element declaration specifies the name of an an element, and the
element’s structure
- If the element is a leaf node of the document tree, its structure is in terms
of characters
- If it is an internal node, its structure is a list of child elements.
(either leaf or internal nodes)
- General form:
<!ELEMENT element_name (list of child names)>
e.g., for document tree structure
<!ELEMENT memo (from, to, date, re, body)>
memo
from to date re body
20. 20
Document Type Definitions (continued)
- Declaring Attributes : An attribute declaration must include the name of the
element to which the attribute belongs, the attribute name, and its type.
- General form:
<!ATTLIST el_name at_name at_type [default]>
If more than one attribute is declared for a given
element, the declarations can be combined,
<!ATTLIST element name
attribute name_1 attribute type default_Value_1
attribute name_2 attribute type default_Value_2
………….
attribute name_n attribute type default_Value_n
>
21. 21
Document Type Definitions (continued)
- Declaring Attributes (continued)
- Attribute types: there are ten different types, but we will consider only CDATA
- Default values:
a value
#FIXED value (every element will have this value),
#REQUIRED (every instance of the element must have a value specified), or
#IMPLIED (no default value and need not specify a value)
<!ATTLIST element name
attribute name_1 attribute type default_Value_1>
<!ATTLIST car doors CDATA "4">
<!ATTLIST car engine_type CDATA #REQUIRED>
<!ATTLIST car price CDATA #IMPLIED>
<!ATTLIST car make CDATA #FIXED "Ford">
22. Chapter :4-XML Namespace
• XML Namespace is a mechanism to avoid name conflicts by differentiating
elements or attributes within an XML document that may have identical
names, but different definitions.
• An XML namespace is a collection of names used in XML documents as
element types and attribute names
22
23. 23
Eg 1:
<?xml version=“1.0” encoding=“UTF-8”?>
<Book xmlns=“http://www.xmlws101.com/xmlns/Book”>
<title> SE </title>
<price> 175 </price>
<year> 2017 </year>
</Book>
Eg 2:
<?xml version=“1.0” encoding=“UTF-8”?>
<Author xmlns=“http://www.xmlws101.com/xmlns/Author”>
<title> SE </title>
<fname> richard </fname>
< lnamer> fairdly </lname>
</Author>
In this examples the <Book> and <Author> elements contains XML namespace that
uniquely identifies this XML tag and all other tags are contained within it.
24. 24
Can declare two namespaces on
one element - YES
<root
xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furnit
ure">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
The xmlns attribute in the first
<table> element gives the
h: prefix a qualified namespace.
The xmlns attribute in the second
<table> element gives the
f: prefix a qualified namespace.
When a namespace is defined for
an element, all child elements
with the same prefix are
associated with the same
namespace.
Namespaces can also be declared
in the XML root element:
25. 25
Chapter : 5-XML SCHEMAS
• “Schemas” is a general term--DTDs are a form of XML
schemas
– According to the dictionary, a schema is “a structured
framework or plan”
An XML Schema:
• defines elements that can appear in a document
• defines attributes that can appear within elements
• defines which elements are child elements
• defines the sequence in which the child elements can appear
• defines the number of child elements
• defines whether an element is empty or can include text
• defines default values for attributes
26. 26
XML Schemas
- Schemas are written using a namespace
- Every XML schema has a single root, schema
The schema element must specify the namespace for schemas
as its xmlns:xsd attribute
XMLS defines 44 data types
- Primitive: String, Boolean, float, …
- Derived: byte, decimal, positive Integer,
27. 27
Example of XML Schema document
<xml version=“1.0” encoding=“UTF-8”?>
<City xmlns:xsi=„http://www.w3.org/2013/xmlschema-
instance”(specify the namespace)
xsi:NamespaceSchemaLocation=“AtomicType.xsd” (specify
the filename)
</City>
<xsd:complexType name="sportscar“>
<xsd:element name=“make“ type="xsd:string"/>
<xsd:element name=“model" type="xsd:string"/>
<xsd:element name=“engine" type="xsd:string"/>
<xsd:element name=“year" type="xsd:decimal"/>
</xsd:complexType>
(complex type means ordered,un ordered groups)
(sequence type means only in ordered group)
28. 28
Chapter : 6 -Displaying XML Documents
with CSS
(Cascading Style Sheet)
Cascading Style Sheets (CSS) is
designed primarily to enable the
separation of presentation and content,
including aspects such as
the layout, colors, and fonts.
CSS is a style sheet
language used for describing
the presentation of a document
written in a markup language.
CSS is a technology for define
layout or formatting for
documents.
- A CSS style sheet for an XML document is
just a list of its tags and associated styles
Example :
<?xml version="1.0"?>
<!-- XML demonstration -
->
<?xml-stylesheet
type="text/css“
href="style9.css"?>
<!DOCTYPE planet>
<planet>
<ocean>
<name>Arctic</name>
<area>13,000</area>
<depth>1,200</depth>
</ocean>
<ocean>
<name>Atlantic</name>
<area>87,000</area>
<depth>3,900</depth>
</ocean>
</planet>
29. XSLT Style Sheets
• A Stylesheet is a file which contains a declarative set of rules for
converting an XML document into another document.
• XSL(eXtensible Stylesheet Language) began as a standard for
presentations of XML documents.
• XSLT (Extensible Stylesheet Language Transformations) is
a language for transforming XML documents into other
XML documents, or other formats such as HTML for web
pages, plain text or XSL Formatting Objects, which may
subsequently be converted to other formats, such as PDF,
PostScript and PNG.
• XSL is a Family of recommendations for defining XML document
transformations and presentation.
• 29
30. 30
XSLT Style Sheets (continued)
- An XSLT processor merges an XML document into an XSLT document (a style
sheet) to create an XSL document
- This merging is a template-driven process
- XSLT processor examines the nodes of the XML document, comparing them
with the XSLT templates
- Matching templates are put in a list of templates that could be applied– if
more than one, a set of rules determine which is used.
- Split into three parts:
- XSLT – Transformations
- XPATH - XML Path Language
- XSL-FO - Formatting objects for printable docs
<?xml-stylesheet type = "text/xsl" href = "XSLT style
sheet.xsl"?>
32. 32
XML Processors
- There are two different approaches to designing XML processors:
- SAX (Simple API for XML)
- DOM (Document Object Model)
SAX is an event driven programming interface for XML parsing.
SAX is Widely accepted and supported
SAX Packages:
Org.xml.sax -> defines handler interface, which call handler methods such as
events or errors
Org.xml.sax.helpers- provides default implementations.
33. DOM
- The DOM(Document Object Model) is a document to navigate and manipulate
the structure and content of the document.
- The DOM processor builds a DOM tree structure of the document.
The root of the tree is document node, which has one or more child nodes.
EXAMPLE
<?xml version=“1.0” encoding=“UTF-8”??
<Products>
<Products Category=“Actor”>
<Product ID>Mersal</ Product ID>
<Name>Vijay</Name>
<ProductNumber>101</ProductNumber>
</Product>
<Products Category=“Actress”>
<Product ID>bhagubali</ Product ID>
<Name>Anushka</Name>
<ProductNumber>102</ProductNumber>
</Product>
<Products Category=”Comedy”>
<Product ID>VVS</ Product ID>
<Name>Soori</Name>
<ProductNumber>103</ProductNumber>
</Product>
</Products> 33
35. 35
- Advantages of the DOM :
1. Good if any part of the document must be accessed more than once
2. If any rearrangement of the document must be done, it is facilitated by
having a representation of the whole document in memory
3. Random access to any part of the document is possible
- Disadvantages of the DOM
1. Large documents require a large memory
2. The DOM approach is slower