SlideShare a Scribd company logo
1 of 22
Download to read offline
UNIT-II XML
Introduction to XML
XML stands for Extensible Markup Language. It is a text-based markup language derived from
Standard Generalized Markup Language (SGML).
XML tags identify the data and are used to store and organize the data, rather than specifying
how to display it like HTML tags, which are used to display the data. XML is not going to
replace HTML in the near future, but it introduces new possibilities by adopting many successful
features of HTML.
There are three important characteristics of XML that make it useful in a variety of systems and
solutions:
XML is extensible: XML allows you to create your own self-descriptive tags, or language, that
suits your application.
XML carries the data, does not present it: XML allows you to store the data irrespective of
how it will be presented.
XML is a public standard: XML was developed by an organization called the World Wide
Web Consortium (W3C) and is available as an open standard.
XMLUsage
A short list of XML usage says it all:
XML can work behind the scene to simplify the creation of HTML documents for large web
sites.
XML can be used to exchange the information between organizations and systems.
XML can be used for offloading and reloading of databases.
XML can be used to store and arrange the data, which can customize your data handling needs.
XML can easily be merged with style sheets to create almost any desired output.
Virtually, any type of data can be expressed as an XML document.
What isMarkup?
XML is a markup language that defines set of rules for encoding documents in a format that
is both human-readable and machine-readable. So what exactly is a markup language?
Markup is information added to a document that enhances its meaning in certain ways, in
that it identifies the parts and how they relate to each other. More specifically, a markup
language is a set of symbols that can be placed in the text of a document to demarcate and
label the parts of that document.
Following example shows how XML markup looks, when embedded in a piece of text:
<message>
<text>Hello, world!</text>
</message>
This snippet includes the markup symbols, or the tags such as
<message>...</message> and <text>...</text>. The tags <message> and
</message> mark the start and the end of the XML code fragment. The tags <text> and
</text> surround the text Hello, world!.
Is XMLaProgrammingLanguage?
A programming language consists of grammar rules and its own vocabulary which is used to
create computer programs. These programs instructs computer to perform specific tasks.
perform any computation or algorithms. It is usually stored in a simple text file and is
processed by special software that is capable of interpretingXML.
Tags andElements
An XML file is structured by several XML-elements, also called XML-nodes or XML- tags.
XML-elements' names are enclosed by triangular brackets < > as shown below:
<element>
Syntax Rules for Tags and Elements
Element Syntax: Each XML-element needs to be closed either with start or with end
elements as shown below:
<element>....</element>
or in simple-cases, just this way:
<element/>
Nesting of elements: An XML-element can contain multiple XML-elements as its children,
but the children elements must not overlap. i.e., an end tag of an element must have the same
name as that of the most recent unmatched start tag.
Following example shows incorrect nested tags:
<?xml version="1.0"?>
<contact-info>
<company>IARE
<contact-info>
</company>
Following example shows correct nested tags:
<?xml version="1.0"?>
<contact-info>
<company>IARE</company>
<contact-info>
Let us learn about one of the most important part of XML, the XML tags. XML tags form the
foundation of XML. They define the scope of an element in the XML. They can also be used to
insert comments, declare settings required for parsing the environment and to insert special
instructions.
We can broadly categorize XML tags as follows:
StartTag
The beginning of every non-empty XML element is marked by a
start-tag. An example of start-tag is:
<address>
EndTag
Every element that has a start tag should end with an end-tag. An
example of end- tag is:
</address>
Note that the end tags include a solidus ("/") before the name of an
element.
EmptyTag
The text that appears between start-tag and end-tag is called content. An element which has
no content is termed as empty. An empty element can be represented in two ways as below:
(1) A start-tag immediately followed by an end-tag as shown below:
<hr></hr>
(2) A complete empty-element tag is as shown below:
<hr />
Empty-element tags may be used for any element which has no content.
XML TagsRules
Following are the rules that need to be followed to use XML tags:
Rule 1
XML tags are case-sensitive. Following line of code is an example of wrong syntax </Address>,
because of the case difference in two tags, which is treated as erroneous syntax in XML.
<address>This is wrong syntax</Address>
Following code shows a correct way, where we use the same case to name the start and the
end tag. <address>This is correct syntax</address>
Rule 2
XML tags must be closed in an appropriate order, i.e., an XML tag opened inside another
element must be closed before the outer element is closed. For example:
<outer_element>
<internal_element>
This tag is closed before the outer_element
</internal_element>
</outer_element>
XMLElements
XML elements can be defined as building blocks of an XML. Elements can behave as containers
to hold text, elements, attributes, media objects or all of these.
Each XML document contains one or more elements, the scope of which are
either delimited by start and end tags, or for empty elements, by an emptyelement
tag.
Syntax
Following is the syntax to write an XML element:
<element-name attribute1 attribute2>
....content
</element-name>
where
element-name is the name of the element. The name its
case in the start and end tags must match.
attribute1, attribute2 are attributes of the element
separated by white spaces. An attribute defines a property of the element. It
associates a name with a value, which is a string of characters. An attribute
is written as:
name = "value"
The name is followed by an = sign and a string value inside double(" ") or single('
') quotes.
EmptyElement
An empty element (element with no content) has following syntax:
<name attribute1 attribute2.../>
Example of an XML document using various XML element:
<?xml version="1.0"?>
<contact-info>
<address category="residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
<address/>
</contact-info>
XML ElementsRules
Following rules are required to be followed for XML elements:
An element name can contain any alphanumeric characters. The only punctuation
marks allowed in names are the hyphen (-), under-score (_) and period (.).
Names are case sensitive. For example, Address, address, and ADDRESS are
different names.
Start and end tags of an element must be identical.
An element, which is a container, can contain text or elements as seen in the above
example.
Root element: An XML document can have only one root element. For example, following
is not a correct XML document, because both the x and y elements occur at the top level
without a root element:
<x>...</x>
<y>...</y>
The following example shows a correctly formed XML document:
<root>
<x>...</x>
<y>...</y>
</root>
Case sensitivity: The names of XML-elements are case-sensitive. That means the name of
the start and the end elements need to be exactly in the same case.
For example, <contact-info> is different from<Contact-Info>.
XML DTD
What is a DTD?
A DTD is a Document Type Definition.
A DTD defines the structure and the legal elements and attributes of an XML document.
Why Use a DTD?
With a DTD, independent groups of people can agree on a standard DTD for interchanging data.
An application can use a DTD to verify that XML data is valid.
An Internal DTD Declaration
If the DTD is declared inside the XML file, it must be wrapped inside the <!DOCTYPE>
definition:
XML document with an internal DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
View XML file »
In the XML file, select "view source" to view the DTD.
The DTD above is interpreted like this:
!DOCTYPE note defines that the root element of this document is note
!ELEMENT note defines that the note element must contain four elements:
"to,from,heading,body"
!ELEMENT to defines the to element to be of type "#PCDATA"
!ELEMENT from defines the from element to be of type "#PCDATA"
!ELEMENT heading defines the heading element to be of type "#PCDATA"
!ELEMENT body defines the body element to be of type "#PCDATA"
An External DTD Declaration
If the DTD is declared in an external file, the <!DOCTYPE> definition must contain a reference
to the DTD file:
XML document with a reference to an external DTD
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
View XML file »
And here is the file "note.dtd", which contains the DTD:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
DTD - XML Building Blocks
The main building blocks of both XML and HTML documents are elements.
The Building Blocks of XML Documents
Seen from a DTD point of view, all XML documents are made up by the following building
blocks:
Elements
Attributes
Entities
PCDATA
CDATA
Elements:
Elements are the main building blocks of both XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements could be
"note" and "message". Elements can contain text, other elements, or be empty. Examples of
empty HTML elements are "hr", "br" and "img".
Examples:
<body>some text</body>
<message>some text</message>
Attributes:
Attributes provide extra information about elements.
Attributes are always placed inside the opening tag of an element. Attributes always come in
name/value pairs. The following "img" element has additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is
"computer.gif". Since the element itself is empty it is closed by a " /".
Entities
Some characters have a special meaning in XML, like the less than sign (<) that defines the start
of an XML tag.
Most of you know the HTML entity: "&nbsp;". This "no-breaking-space" entity is used in
HTML to insert an extra space in a document. Entities are expanded when a document is parsed
by an XML parser.
The following entities are predefined in XML:
Entity References Character
&lt; <
&gt; >
&amp; &
&quot; "
&apos; '
PCDATA:
PCDATA means parsed character data.
Think of character data as the text found between the start tag and the end tag of an XML
element.
PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for
entities and markup.
Tags inside the text will be treated as markup and entities will be expanded.
However, parsed character data should not contain any &, <, or > characters; these need to be
represented by the &amp; &lt; and &gt; entities, respectively.
CDATA
CDATA means character data.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as
markup and entities will not be expanded.
XML Schema
An XML Schema describes the structure of an XML document, just like a DTD.
An XML document with correct syntax is called "Well Formed".
An XML document validated against an XML Schema is both "Well Formed" and "Valid".
XML Schema
XML Schema is an XML-based alternative to DTD:
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The Schema above is interpreted like this:
<xs:element name="note"> defines the element called "note"
<xs:complexType> the "note" element is a complex type
<xs:sequence> the complex type is a sequence of elements
<xs:element name="to" type="xs:string"> the element "to" is of type string (text)
<xs:element name="from" type="xs:string"> the element "from" is of type string
<xs:element name="heading" type="xs:string"> the element "heading" is of type string
<xs:element name="body" type="xs:string"> the element "body" is of type string
XML Schemas are More Powerful than DTD
XML Schemas are written in XML
XML Schemas are extensible to additions
XML Schemas support data types
XML Schemas support namespaces
Why Use an XML Schema?
With XML Schema, your XML files can carry a description of its own format.
With XML Schema, independent groups of people can agree on a standard for interchanging
data.
With XML Schema, you can verify data.
XML Schemas Support Data Types
One of the greatest strength of XML Schemas is the support for data types:
It is easier to describe document content
It is easier to define restrictions on data
It is easier to validate the correctness of data
It is easier to convert data between different data types
XML Schemas use XML Syntax
Another great strength about XML Schemas is that they are written in XML:
You don't have to learn a new language
You can use your XML editor to edit your Schema files
You can use your XML parser to parse your Schema files
You can manipulate your Schemas with the XML DOM
You can transform your Schemas with XSLT
XML DOM
What is the DOM?
The DOM defines a standard for accessing and manipulating documents:
"The W3C Document Object Model (DOM) is a platform and language-neutral interface that
allows programs and scripts to dynamically access and update the content, structure, and style
of a document."
The HTML DOM defines a standard way for accessing and manipulating HTML documents. It
presents an HTML document as a tree-structure.
The XML DOM defines a standard way for accessing and manipulating XML documents. It
presents an XML document as a tree-structure.
Understanding the DOM is a must for anyone working with HTML or XML.
The HTML DOM
All HTML elements can be accessed through the HTML DOM.
This example changes the value of an HTML element with id="demo":
Example
<h1 id="demo">This is a Heading</h1>
<script>
document.getElementById("demo").innerHTML = "Hello World!";
</script>
This example changes the value of the first <h1> element in an HTML document:
Example
<h1>This is a Heading</h1>
<h1>This is a Heading</h1>
<script>
document.getElementsByTagName("h1")[0].innerHTML = "Hello World!";
</script>
Note: Even if the HTML document contains only ONE <h1> element you still have to specify
the array index [0], because the getElementsByTagName() method always returns an array.
The XML DOM
All XML elements can be accessed through the XML DOM.
The XML DOM is:
 A standard object model for XML
 A standard programming interface for XML
 Platform- and language-independent
 A W3C standard
In other words: The XML DOM is a standard for how to get, change, add, or delete XML
elements.
Get the Value of an XML Element
This code retrieves the text value of the first <title> element in an XML document:
Example
txt = xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
Loading an XML File
This example reads "books.xml" into xmlDoc and retrieves the text value of the first <title>
element in books.xml:
Example
<!DOCTYPE html>
<html>
<body>
<p id="demo"></p>
<script>
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
myFunction(this);
}
};
xhttp.open("GET", "books.xml", true);
xhttp.send();
function myFunction(xml) {
var xmlDoc = xml.responseXML;
document.getElementById("demo").innerHTML =
xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
}
</script>
</body>
</html>
Example Explained
 xmlDoc - the XML DOM object created by the parser.
 getElementsByTagName("title")[0] - get the first <title> element
 childNodes[0] - the first child of the <title> element (the text node)
 nodeValue - the value of the node (the text itself)
Loading an XML String
This example loads a text string into an XML DOM object, and extracts the info from it with
JavaScript:
Example
<html>
<body>
<p id="demo"></p>
<script>
var text, parser, xmlDoc;
text = "<bookstore><book>" +
"<title>Everyday Italian</title>" +
"<author>Giada De Laurentiis</author>" +
"<year>2005</year>" +
"</book></bookstore>";
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");
document.getElementById("demo").innerHTML =
xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
Programming Interface
The DOM models XML as a set of node objects. The nodes can be accessed with JavaScript or
other programming languages. In this tutorial we use JavaScript.
The programming interface to the DOM is defined by a set standard properties and methods.
Properties are often referred to as something that is (i.e. nodename is "book").
Methods are often referred to as something that is done (i.e. delete "book").
XML DOM Properties
These are some typical DOM properties:
 x.nodeName - the name of x
 x.nodeValue - the value of x
 x.parentNode - the parent node of x
 x.childNodes - the child nodes of x
 x.attributes - the attributes nodes of x
Note: In the list above, x is a node object.
XML DOM Methods
 x.getElementsByTagName(name) - get all elements with a specified tag name
 x.appendChild(node) - insert a child node to x
 x.removeChild(node) - remove a child node from x
Note: In the list above, x is a node object.
The sample XML considered in the examples is:
<employees>
<employee id="111">
<firstName>Rakesh</firstName>
<lastName>Mishra</lastName>
<location>Bangalore</location>
</employee>
<employee id="112">
<firstName>John</firstName>
<lastName>Davis</lastName>
<location>Chennai</location>
</employee>
<employee id="113">
<firstName>Rajesh</firstName>
<lastName>Sharma</lastName>
<location>Pune</location>
</employee>
</employees>
And the obejct into which the XML content is to be extracted is defined as below:
class Employee{
String id;
String firstName;
String lastName;
String location;
@Override
public String toString() {
return firstName+" "+lastName+"("+id+")"+location;
}
}
There are 3 main parsers for which I have given sample code:
DOM Parser
SAX Parser
StAX Parser
Using DOM Parser
I am making use of the DOM parser implementation that comes with the JDK and in my
example I am using JDK 7. The DOM Parser loads the complete XML content into a Tree
structure. And we iterate through the Node and NodeList to get the content of the XML. The
code for XML parsing using DOM parser is given below.
public class DOMParserDemo {
public static void main(String[] args) throws Exception {
//Get the DOM Builder Factory
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
//Get the DOM Builder
DocumentBuilder builder = factory.newDocumentBuilder();
//Load and Parse the XML document
//document contains the complete XML as a Tree.
Document document = builder.parse(
ClassLoader.getSystemResourceAsStream("xml/employee.xml"));
List<Employee> empList = new ArrayList<>();
//Iterating through the nodes and extracting the data.
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
//We have encountered an <employee> tag.
Node node = nodeList.item(i);
if (node instanceof Element) {
Employee emp = new Employee();
emp.id = node.getAttributes().
getNamedItem("id").getNodeValue();
NodeList childNodes = node.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
Node cNode = childNodes.item(j);
//Identifying the child tag of employee encountered.
if (cNode instanceof Element) {
String content = cNode.getLastChild().
getTextContent().trim();
switch (cNode.getNodeName()) {
case "firstName":
emp.firstName = content;
break;
case "lastName":
emp.lastName = content;
break;
case "location":
emp.location = content;
break;
}
}
}
empList.add(emp);
}
}
//Printing the Employee list populated.
for (Employee emp : empList) {
System.out.println(emp);
}
}
}
class Employee{
String id;
String firstName;
String lastName;
String location;
@Override
public String toString() {
return firstName+" "+lastName+"("+id+")"+location;
}}
The output for the above will be:
Rakesh Mishra(111)Bangalore
John Davis(112)Chennai
Rajesh Sharma(113)Pune
Using SAX Parser
SAX Parser is different from the DOM Parser where SAX parser doesn’t load the complete
XML into the memory, instead it parses the XML line by line triggering different events as and
when it encounters different elements like: opening tag, closing tag, character data, comments
and so on. This is the reason why SAX Parser is called an event based parser.
Along with the XML source file, we also register a handler which extends the DefaultHandler
class. The DefaultHandler class provides different callbacks out of which we would be interested
in:
startElement() – triggers this event when the start of the tag is encountered.
endElement() – triggers this event when the end of the tag is encountered.
characters() – triggers this event when it encounters some text data.
The code for parsing the XML using SAX Parser is given below:
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAXParserDemo {
public static void main(String[] args) throws Exception {
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandler handler = new SAXHandler();
parser.parse(ClassLoader.getSystemResourceAsStream("xml/employee.xml"),
handler);
//Printing the list of employees obtained from XML
for ( Employee emp : handler.empList){
System.out.println(emp);
}
}
}
/**
* The Handler for SAX Events.
*/
class SAXHandler extends DefaultHandler {
List<Employee> empList = new ArrayList<>();
Employee emp = null;
String content = null;
@Override
//Triggered when the start of tag is found.
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
switch(qName){
//Create a new Employee object when the start tag is found
case "employee":
emp = new Employee();
emp.id = attributes.getValue("id");
break;
}
}
@Override
public void endElement(String uri, String localName,
String qName) throws SAXException {
switch(qName){
//Add the employee to list once end tag is found
case "employee":
empList.add(emp);
break;
//For all other end tags the employee has to be updated.
case "firstName":
emp.firstName = content;
break;
case "lastName":
emp.lastName = content;
break;
case "location":
emp.location = content;
break;
}
}
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
content = String.copyValueOf(ch, start, length).trim();
}
}
class Employee {
String id;
String firstName;
String lastName;
String location;
@Override
public String toString() {
return firstName + " " + lastName + "(" + id + ")" + location;
}
}
The output for the above would be:
Rakesh Mishra(111)Bangalore
John Davis(112)Chennai
Rajesh Sharma(113)Pune
With this I have covered parsing the same XML document and performing the same task of
populating the list of Employee objects using all the three parsers namely:
DOM Parser
SAX Parser

More Related Content

Similar to xml introduction in web technologies subject

XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
Sudharsan S
 
Web programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTH
Web programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTHWeb programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTH
Web programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTH
Bhavsingh Maloth
 

Similar to xml introduction in web technologies subject (20)

Xml
Xml Xml
Xml
 
Sgml and xml
Sgml and xmlSgml and xml
Sgml and xml
 
Xml
XmlXml
Xml
 
Web engineering notes unit 4
Web engineering notes unit 4Web engineering notes unit 4
Web engineering notes unit 4
 
Xml 150323102007-conversion-gate01
Xml 150323102007-conversion-gate01Xml 150323102007-conversion-gate01
Xml 150323102007-conversion-gate01
 
XML notes.pptx
XML notes.pptxXML notes.pptx
XML notes.pptx
 
XML DTD Validate
XML DTD ValidateXML DTD Validate
XML DTD Validate
 
PHP XML
PHP XMLPHP XML
PHP XML
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
 
Web engineering UNIT IV as per RGPV syllabus
Web engineering UNIT IV as per RGPV syllabusWeb engineering UNIT IV as per RGPV syllabus
Web engineering UNIT IV as per RGPV syllabus
 
XML simple Introduction
XML simple IntroductionXML simple Introduction
XML simple Introduction
 
Web programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTH
Web programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTHWeb programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTH
Web programming unit IIII XML &DOM NOTES BY BHAVSINGH MALOTH
 
Wp unit III
Wp unit IIIWp unit III
Wp unit III
 
Intro to xml
Intro to xmlIntro to xml
Intro to xml
 
Xml and DTD's
Xml and DTD'sXml and DTD's
Xml and DTD's
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 
Xml
XmlXml
Xml
 
Xml material
Xml materialXml material
Xml material
 
Xml material
Xml materialXml material
Xml material
 

Recently uploaded

Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
drjose256
 

Recently uploaded (20)

NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTUUNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 

xml introduction in web technologies subject

  • 1. UNIT-II XML Introduction to XML XML stands for Extensible Markup Language. It is a text-based markup language derived from Standard Generalized Markup Language (SGML). XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags, which are used to display the data. XML is not going to replace HTML in the near future, but it introduces new possibilities by adopting many successful features of HTML. There are three important characteristics of XML that make it useful in a variety of systems and solutions: XML is extensible: XML allows you to create your own self-descriptive tags, or language, that suits your application. XML carries the data, does not present it: XML allows you to store the data irrespective of how it will be presented. XML is a public standard: XML was developed by an organization called the World Wide Web Consortium (W3C) and is available as an open standard. XMLUsage A short list of XML usage says it all: XML can work behind the scene to simplify the creation of HTML documents for large web sites. XML can be used to exchange the information between organizations and systems. XML can be used for offloading and reloading of databases. XML can be used to store and arrange the data, which can customize your data handling needs. XML can easily be merged with style sheets to create almost any desired output. Virtually, any type of data can be expressed as an XML document. What isMarkup? XML is a markup language that defines set of rules for encoding documents in a format that is both human-readable and machine-readable. So what exactly is a markup language? Markup is information added to a document that enhances its meaning in certain ways, in that it identifies the parts and how they relate to each other. More specifically, a markup language is a set of symbols that can be placed in the text of a document to demarcate and label the parts of that document. Following example shows how XML markup looks, when embedded in a piece of text: <message> <text>Hello, world!</text> </message> This snippet includes the markup symbols, or the tags such as <message>...</message> and <text>...</text>. The tags <message> and </message> mark the start and the end of the XML code fragment. The tags <text> and </text> surround the text Hello, world!.
  • 2. Is XMLaProgrammingLanguage? A programming language consists of grammar rules and its own vocabulary which is used to create computer programs. These programs instructs computer to perform specific tasks. perform any computation or algorithms. It is usually stored in a simple text file and is processed by special software that is capable of interpretingXML. Tags andElements An XML file is structured by several XML-elements, also called XML-nodes or XML- tags. XML-elements' names are enclosed by triangular brackets < > as shown below: <element> Syntax Rules for Tags and Elements Element Syntax: Each XML-element needs to be closed either with start or with end elements as shown below: <element>....</element> or in simple-cases, just this way: <element/> Nesting of elements: An XML-element can contain multiple XML-elements as its children, but the children elements must not overlap. i.e., an end tag of an element must have the same name as that of the most recent unmatched start tag. Following example shows incorrect nested tags: <?xml version="1.0"?> <contact-info> <company>IARE <contact-info> </company> Following example shows correct nested tags: <?xml version="1.0"?> <contact-info> <company>IARE</company> <contact-info> Let us learn about one of the most important part of XML, the XML tags. XML tags form the foundation of XML. They define the scope of an element in the XML. They can also be used to insert comments, declare settings required for parsing the environment and to insert special instructions. We can broadly categorize XML tags as follows: StartTag The beginning of every non-empty XML element is marked by a start-tag. An example of start-tag is: <address> EndTag Every element that has a start tag should end with an end-tag. An example of end- tag is: </address> Note that the end tags include a solidus ("/") before the name of an element.
  • 3. EmptyTag The text that appears between start-tag and end-tag is called content. An element which has no content is termed as empty. An empty element can be represented in two ways as below: (1) A start-tag immediately followed by an end-tag as shown below: <hr></hr> (2) A complete empty-element tag is as shown below: <hr /> Empty-element tags may be used for any element which has no content. XML TagsRules Following are the rules that need to be followed to use XML tags: Rule 1 XML tags are case-sensitive. Following line of code is an example of wrong syntax </Address>, because of the case difference in two tags, which is treated as erroneous syntax in XML. <address>This is wrong syntax</Address> Following code shows a correct way, where we use the same case to name the start and the end tag. <address>This is correct syntax</address> Rule 2 XML tags must be closed in an appropriate order, i.e., an XML tag opened inside another element must be closed before the outer element is closed. For example: <outer_element> <internal_element> This tag is closed before the outer_element </internal_element> </outer_element> XMLElements XML elements can be defined as building blocks of an XML. Elements can behave as containers to hold text, elements, attributes, media objects or all of these. Each XML document contains one or more elements, the scope of which are either delimited by start and end tags, or for empty elements, by an emptyelement tag. Syntax Following is the syntax to write an XML element: <element-name attribute1 attribute2> ....content </element-name> where element-name is the name of the element. The name its case in the start and end tags must match. attribute1, attribute2 are attributes of the element separated by white spaces. An attribute defines a property of the element. It associates a name with a value, which is a string of characters. An attribute is written as: name = "value" The name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.
  • 4. EmptyElement An empty element (element with no content) has following syntax: <name attribute1 attribute2.../> Example of an XML document using various XML element: <?xml version="1.0"?> <contact-info> <address category="residence"> <name>Tanmay Patil</name> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> <address/> </contact-info> XML ElementsRules Following rules are required to be followed for XML elements: An element name can contain any alphanumeric characters. The only punctuation marks allowed in names are the hyphen (-), under-score (_) and period (.). Names are case sensitive. For example, Address, address, and ADDRESS are different names. Start and end tags of an element must be identical. An element, which is a container, can contain text or elements as seen in the above example. Root element: An XML document can have only one root element. For example, following is not a correct XML document, because both the x and y elements occur at the top level without a root element: <x>...</x> <y>...</y> The following example shows a correctly formed XML document: <root> <x>...</x> <y>...</y> </root> Case sensitivity: The names of XML-elements are case-sensitive. That means the name of the start and the end elements need to be exactly in the same case. For example, <contact-info> is different from<Contact-Info>.
  • 5. XML DTD What is a DTD? A DTD is a Document Type Definition. A DTD defines the structure and the legal elements and attributes of an XML document. Why Use a DTD? With a DTD, independent groups of people can agree on a standard DTD for interchanging data. An application can use a DTD to verify that XML data is valid. An Internal DTD Declaration If the DTD is declared inside the XML file, it must be wrapped inside the <!DOCTYPE> definition: XML document with an internal DTD <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note> View XML file » In the XML file, select "view source" to view the DTD. The DTD above is interpreted like this: !DOCTYPE note defines that the root element of this document is note !ELEMENT note defines that the note element must contain four elements: "to,from,heading,body" !ELEMENT to defines the to element to be of type "#PCDATA" !ELEMENT from defines the from element to be of type "#PCDATA" !ELEMENT heading defines the heading element to be of type "#PCDATA"
  • 6. !ELEMENT body defines the body element to be of type "#PCDATA" An External DTD Declaration If the DTD is declared in an external file, the <!DOCTYPE> definition must contain a reference to the DTD file: XML document with a reference to an external DTD <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> View XML file » And here is the file "note.dtd", which contains the DTD: <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>
  • 7. DTD - XML Building Blocks The main building blocks of both XML and HTML documents are elements. The Building Blocks of XML Documents Seen from a DTD point of view, all XML documents are made up by the following building blocks: Elements Attributes Entities PCDATA CDATA Elements: Elements are the main building blocks of both XML and HTML documents. Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img". Examples: <body>some text</body> <message>some text</message> Attributes: Attributes provide extra information about elements. Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file: <img src="computer.gif" /> The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".
  • 8. Entities Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag. Most of you know the HTML entity: "&nbsp;". This "no-breaking-space" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser. The following entities are predefined in XML: Entity References Character &lt; < &gt; > &amp; & &quot; " &apos; ' PCDATA: PCDATA means parsed character data. Think of character data as the text found between the start tag and the end tag of an XML element. PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded. However, parsed character data should not contain any &, <, or > characters; these need to be represented by the &amp; &lt; and &gt; entities, respectively. CDATA CDATA means character data. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
  • 9. XML Schema An XML Schema describes the structure of an XML document, just like a DTD. An XML document with correct syntax is called "Well Formed". An XML document validated against an XML Schema is both "Well Formed" and "Valid". XML Schema XML Schema is an XML-based alternative to DTD: <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> The Schema above is interpreted like this: <xs:element name="note"> defines the element called "note" <xs:complexType> the "note" element is a complex type <xs:sequence> the complex type is a sequence of elements <xs:element name="to" type="xs:string"> the element "to" is of type string (text) <xs:element name="from" type="xs:string"> the element "from" is of type string <xs:element name="heading" type="xs:string"> the element "heading" is of type string <xs:element name="body" type="xs:string"> the element "body" is of type string
  • 10. XML Schemas are More Powerful than DTD XML Schemas are written in XML XML Schemas are extensible to additions XML Schemas support data types XML Schemas support namespaces Why Use an XML Schema? With XML Schema, your XML files can carry a description of its own format. With XML Schema, independent groups of people can agree on a standard for interchanging data. With XML Schema, you can verify data. XML Schemas Support Data Types One of the greatest strength of XML Schemas is the support for data types: It is easier to describe document content It is easier to define restrictions on data It is easier to validate the correctness of data It is easier to convert data between different data types XML Schemas use XML Syntax Another great strength about XML Schemas is that they are written in XML: You don't have to learn a new language You can use your XML editor to edit your Schema files You can use your XML parser to parse your Schema files You can manipulate your Schemas with the XML DOM You can transform your Schemas with XSLT
  • 11. XML DOM What is the DOM? The DOM defines a standard for accessing and manipulating documents: "The W3C Document Object Model (DOM) is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document." The HTML DOM defines a standard way for accessing and manipulating HTML documents. It presents an HTML document as a tree-structure. The XML DOM defines a standard way for accessing and manipulating XML documents. It presents an XML document as a tree-structure. Understanding the DOM is a must for anyone working with HTML or XML. The HTML DOM All HTML elements can be accessed through the HTML DOM. This example changes the value of an HTML element with id="demo": Example <h1 id="demo">This is a Heading</h1> <script> document.getElementById("demo").innerHTML = "Hello World!"; </script>
  • 12. This example changes the value of the first <h1> element in an HTML document: Example <h1>This is a Heading</h1> <h1>This is a Heading</h1> <script> document.getElementsByTagName("h1")[0].innerHTML = "Hello World!"; </script> Note: Even if the HTML document contains only ONE <h1> element you still have to specify the array index [0], because the getElementsByTagName() method always returns an array. The XML DOM All XML elements can be accessed through the XML DOM. The XML DOM is:  A standard object model for XML  A standard programming interface for XML  Platform- and language-independent  A W3C standard In other words: The XML DOM is a standard for how to get, change, add, or delete XML elements. Get the Value of an XML Element This code retrieves the text value of the first <title> element in an XML document: Example txt = xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue; Loading an XML File This example reads "books.xml" into xmlDoc and retrieves the text value of the first <title> element in books.xml:
  • 13. Example <!DOCTYPE html> <html> <body> <p id="demo"></p> <script> var xhttp = new XMLHttpRequest(); xhttp.onreadystatechange = function() { if (this.readyState == 4 && this.status == 200) { myFunction(this); } }; xhttp.open("GET", "books.xml", true); xhttp.send(); function myFunction(xml) { var xmlDoc = xml.responseXML; document.getElementById("demo").innerHTML = xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue; } </script> </body> </html> Example Explained  xmlDoc - the XML DOM object created by the parser.  getElementsByTagName("title")[0] - get the first <title> element  childNodes[0] - the first child of the <title> element (the text node)  nodeValue - the value of the node (the text itself) Loading an XML String This example loads a text string into an XML DOM object, and extracts the info from it with JavaScript: Example <html> <body> <p id="demo"></p>
  • 14. <script> var text, parser, xmlDoc; text = "<bookstore><book>" + "<title>Everyday Italian</title>" + "<author>Giada De Laurentiis</author>" + "<year>2005</year>" + "</book></bookstore>"; parser = new DOMParser(); xmlDoc = parser.parseFromString(text,"text/xml"); document.getElementById("demo").innerHTML = xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue; </script> </body> </html> Programming Interface The DOM models XML as a set of node objects. The nodes can be accessed with JavaScript or other programming languages. In this tutorial we use JavaScript. The programming interface to the DOM is defined by a set standard properties and methods. Properties are often referred to as something that is (i.e. nodename is "book"). Methods are often referred to as something that is done (i.e. delete "book"). XML DOM Properties These are some typical DOM properties:  x.nodeName - the name of x  x.nodeValue - the value of x  x.parentNode - the parent node of x  x.childNodes - the child nodes of x  x.attributes - the attributes nodes of x Note: In the list above, x is a node object. XML DOM Methods  x.getElementsByTagName(name) - get all elements with a specified tag name  x.appendChild(node) - insert a child node to x  x.removeChild(node) - remove a child node from x Note: In the list above, x is a node object.
  • 15. The sample XML considered in the examples is: <employees> <employee id="111"> <firstName>Rakesh</firstName> <lastName>Mishra</lastName> <location>Bangalore</location> </employee> <employee id="112"> <firstName>John</firstName> <lastName>Davis</lastName> <location>Chennai</location> </employee> <employee id="113"> <firstName>Rajesh</firstName> <lastName>Sharma</lastName> <location>Pune</location> </employee> </employees> And the obejct into which the XML content is to be extracted is defined as below: class Employee{ String id; String firstName; String lastName; String location; @Override
  • 16. public String toString() { return firstName+" "+lastName+"("+id+")"+location; } } There are 3 main parsers for which I have given sample code: DOM Parser SAX Parser StAX Parser Using DOM Parser I am making use of the DOM parser implementation that comes with the JDK and in my example I am using JDK 7. The DOM Parser loads the complete XML content into a Tree structure. And we iterate through the Node and NodeList to get the content of the XML. The code for XML parsing using DOM parser is given below. public class DOMParserDemo { public static void main(String[] args) throws Exception { //Get the DOM Builder Factory DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); //Get the DOM Builder DocumentBuilder builder = factory.newDocumentBuilder(); //Load and Parse the XML document //document contains the complete XML as a Tree. Document document = builder.parse( ClassLoader.getSystemResourceAsStream("xml/employee.xml")); List<Employee> empList = new ArrayList<>(); //Iterating through the nodes and extracting the data.
  • 17. NodeList nodeList = document.getDocumentElement().getChildNodes(); for (int i = 0; i < nodeList.getLength(); i++) { //We have encountered an <employee> tag. Node node = nodeList.item(i); if (node instanceof Element) { Employee emp = new Employee(); emp.id = node.getAttributes(). getNamedItem("id").getNodeValue(); NodeList childNodes = node.getChildNodes(); for (int j = 0; j < childNodes.getLength(); j++) { Node cNode = childNodes.item(j); //Identifying the child tag of employee encountered. if (cNode instanceof Element) { String content = cNode.getLastChild(). getTextContent().trim(); switch (cNode.getNodeName()) { case "firstName": emp.firstName = content; break; case "lastName": emp.lastName = content; break; case "location": emp.location = content; break;
  • 18. } } } empList.add(emp); } } //Printing the Employee list populated. for (Employee emp : empList) { System.out.println(emp); } } } class Employee{ String id; String firstName; String lastName; String location; @Override public String toString() { return firstName+" "+lastName+"("+id+")"+location; }} The output for the above will be: Rakesh Mishra(111)Bangalore John Davis(112)Chennai Rajesh Sharma(113)Pune
  • 19. Using SAX Parser SAX Parser is different from the DOM Parser where SAX parser doesn’t load the complete XML into the memory, instead it parses the XML line by line triggering different events as and when it encounters different elements like: opening tag, closing tag, character data, comments and so on. This is the reason why SAX Parser is called an event based parser. Along with the XML source file, we also register a handler which extends the DefaultHandler class. The DefaultHandler class provides different callbacks out of which we would be interested in: startElement() – triggers this event when the start of the tag is encountered. endElement() – triggers this event when the end of the tag is encountered. characters() – triggers this event when it encounters some text data. The code for parsing the XML using SAX Parser is given below: import java.util.ArrayList; import java.util.List; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class SAXParserDemo { public static void main(String[] args) throws Exception { SAXParserFactory parserFactor = SAXParserFactory.newInstance(); SAXParser parser = parserFactor.newSAXParser(); SAXHandler handler = new SAXHandler(); parser.parse(ClassLoader.getSystemResourceAsStream("xml/employee.xml"), handler); //Printing the list of employees obtained from XML
  • 20. for ( Employee emp : handler.empList){ System.out.println(emp); } } } /** * The Handler for SAX Events. */ class SAXHandler extends DefaultHandler { List<Employee> empList = new ArrayList<>(); Employee emp = null; String content = null; @Override //Triggered when the start of tag is found. public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { switch(qName){ //Create a new Employee object when the start tag is found case "employee": emp = new Employee(); emp.id = attributes.getValue("id"); break; } }
  • 21. @Override public void endElement(String uri, String localName, String qName) throws SAXException { switch(qName){ //Add the employee to list once end tag is found case "employee": empList.add(emp); break; //For all other end tags the employee has to be updated. case "firstName": emp.firstName = content; break; case "lastName": emp.lastName = content; break; case "location": emp.location = content; break; } } @Override public void characters(char[] ch, int start, int length) throws SAXException { content = String.copyValueOf(ch, start, length).trim();
  • 22. } } class Employee { String id; String firstName; String lastName; String location; @Override public String toString() { return firstName + " " + lastName + "(" + id + ")" + location; } } The output for the above would be: Rakesh Mishra(111)Bangalore John Davis(112)Chennai Rajesh Sharma(113)Pune With this I have covered parsing the same XML document and performing the same task of populating the list of Employee objects using all the three parsers namely: DOM Parser SAX Parser