SlideShare a Scribd company logo
1 of 69
Download to read offline
Web Data Management
XML Data Model
1
Semi-structured Data Model
• A data model, based on graphs, for representing
both regular and irregular data.
• Basic ideas
• Self-describing data. The content comes with its own
description;
– contrast with the relational model, where schema and
content are represented separately.
• Flexible typing. Data may be typed (i.e., “such nodes
are integer values” or “this part of the graph
complies to this description”); often no typing, or a
very flexible one
• Serialized form. The graph representation is
associated to a serialized form, convenient for
exchanges in an heterogeneous environment.
2
Self-describing data
• Starting point: association lists, i.e., records of
label-value pairs.
• Natural extension: values may themselves be
other structures:
• Further extension: allow duplicate labels.
{name: "Alan", tel: 2157786, email: "agb@abc.com"}
{name: {first: "Alan", last: "Black"},
tel: 2157786,
email: "agb@abc.com"}
{name: “Alan’’, tel: 2157786, tel: 2498762 }
3
Tree-based representation
• Data can be graphically represented as trees:
label structure can be captured by tree edges,
and values reside at leaves.
4
Tree-based representation: labels as nodes
• Another choice is to represent both labels and
values as vertices.
• The XML data model adopts this latter
representation.
5
Representation of regular data
• The syntax makes it easy to describe sets of
tuples as in:
– relational data can be represented
– for regular data, the semi-structure representation
is highly redundant.
{ person: {name: "alan", phone: 3127786, email: "alan@abc.com"},
person: {name: "sara", phone: 2136877, email: "sara@xyz.edu"},
person: {name: "fred", phone: 7786312, email: "fd@ac.uk"} }
6
Representation of irregular data
• Many possible variations in the structure:
missing values, duplicates, changes, etc.
– Nodes can be identified, and referred to by their
identity.
{ person: {name: "alan", phone: 3127786, email: "agg@abc.com"},
person: &314
{ name: {first: "Sara", last: "Green" },
phone: 2136877,
email: "sara@math.xyz.edu",
spouse: &443 },
person: &443
{ name: "fred", Phone: 7786312, Height: 183,
spouse: &314 }}
7
XML represents Semi structured Data
• Do not care about the type of the data
• Serialize the data by annotating each data item
explicitly with its description (e.g. name, phone, etc.)
• Such data is called self-describing
• Serialization: convert data into a byte stream that can
be easily transmitted and reconstructed at the receiver
• Self-describing data wastes space but provides
interoperability (required on the web)
• Semistructured data models – XML, JSON (Javascript
Object Notation)
8
XML as a Semi-structured Data
Explained
• Missing attributes:
– <person> <name> John</name>
<phone>1234</phone>
</person>
– <person><name>Joe</name></person> no phone !
• Repeated attributes
– <person> <name> Mary</name>
<phone>2345</phone>
<phone>3456</phone>
</person>
9
XML as a Semi structured Data
Explained
• Attributes with different types in different objects
– <person> <name> <first> John </first>
<last> Smith </last>
</name> complex name !
<phone>1234</phone>
</person>
• Nested collections (no 1NF)
10
XML in brief
• XML is the World-Wide-Web Consortium (W3C)
standard for Web data exchange.
– XML documents can be serialized in a normalized encoding
(typically iso-8859-1, or utf-8), and safely transmitted on
the Internet.
– XML is a generic format, which can be specialized in
“dialects” for specific domain (e.g., XHTML)
– The W3C promotes companion standards: DOM (object
model), XSchema (typing), XPath (path expression), XSLT
(restructuring), Xquery (query language), and many others.
• XML is a simplified version of SGML, a long-term used
language for technical documents.
11
XML documents
• An XML document is a labeled, unranked, ordered tree:
– Labeled means that some annotation, the label, is
attached to each node.
– Unranked means that there is no a priori bound on the
number of children of a node.
– Ordered means that there is an order between the
children of each node.
• XML specifies nothing more than a syntax: no meaning
is attached to the labels.
• A dialect, on the other hand, associates a meaning to
labels (e.g., title in XHTML).
12
XML documents are trees
<person>
<row> <name>John</name>
<phone> 3634</phone></row>
<row> <name>Sue</name>
<phone> 6343</phone>
<row> <name>Dick</name>
<phone> 6363</phone></row>
</person> 13
name phone
John 3634
Sue 6343
Dick 6363
row row row
name name name
phone phone phone
“John” 3634 “Sue” “Dick”
6343 6363
person XML: person
Serialized representation
XML and Semi structured Data:
Similarities and Differences
<person id=“o123”>
<name> Alan </name>
<age> 42 </age>
<email> ab@com </email>
</person>
<person id=“o123”>
<name> Alan </name>
<age> 42 </age>
<email> ab@com </email>
</person>
{ person: &o123
{ name: “Alan”,
age: 42,
email: “ab@com” }
}
{ person: &o123
{ name: “Alan”,
age: 42,
email: “ab@com” }
}
14
person
name age email
Alan 42 ab@com
person
name age email
Alan 42 ab@com
father father
<person father=“o123”> …
</person>
{ person: { father: &o123 …}
}
similar on trees, different on graphs
XML describes structured content
• Applications cannot interpret unstructured content:
• XML provides a means to structure this content:
• Now, an application can access the XML tree, extract some parts,
rename the labels, reorganize the content into another structure,
etc.
15
The book ‘‘Fundations of Databases’’, written by Serge Abiteboul, Rick Hull
and Victor Vianu, published in 1995 by Addison-Wesley
<bibliography>
<book>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year> </book>
<book>...</book>
</bibliography>
Applications associate semantics to
XML docs
• Letter document
16
<letter>
<header>
<author>...</author>
<date>...</date>
<recipient>...</recipient>
<cc>...<cc>
</header>
<body>
<text>...</text>
<signature>...</signature>
</body>
</letter>
Serialized and tree-based forms
• The serialized form is a textual, linear representation
of the tree; it complies to a (sometimes complicated)
syntax;
• Tree-based forms implement in a specific context,
object-oriented model, the abstract tree
representation (Document Object Model)
• Typically, an application gets a document in serialized
form, parse it in tree form, and serializes it back at
the end.
17
Serialized and tree-based forms: text
and elements
• The basic components
of an XML document
are element and text.
• Here is an element,
whose content is a text.
• The tree form of the
document, modeled in
DOM: each node has a
type, either Document
or Text.
18
<elt_name>
Textual content
</elt_name>
Serialized and tree-based forms:
nesting elements
• Serialized form: The
content of an element is
the part between the
opening and ending
tags
• Tree-based form: the
subtree rooted at the
corresponding Element
node (in DOM)
19
<elt1>
Textual content
<elt2>
Another content
</elt2>
</elt1>
Serialized and tree-based forms:
attributes
• Serialized form: Attributes
are pairs of name/value
attached to an element
• The content of an attribute
is always atomic text (no
nesting)
• Attributes are not ordered,
and there cannot be two
attributes with the same
name in an element
• Tree-based form:
Attributes are special
child nodes of the
Element node (in DOM)
20
<elt1 att1=’12’ att2=’fr’>
Textual content
</elt1>
Serialized and tree-based forms: the
document root
• A document with its
prologue, and element
root
• In the DOM representation,
the prologue appears as a
Document node, called the
root node.
21
• Document content must always be enclosed in a single
opening/ending tag, called the element root
• The first line of the serialized form must always be the prologue if
there is one:
<?xml version="1.0"encoding="utf-8"?>
<?xml version="1.0“ encoding="utf-8" ?>
<elt>
Document content.
</elt>
Web Data Management with XML
• Publishing
– an XML document can easily be converted to another XML
document (same content, but another structure)
– Web publishing is the process of transforming XML
documents to XHTML.
• Integration
– XML documents from many sources can be transformed in
a common dialect, and constitute a collection.
– Search engines, or portals, provide browsing and searching
services on collections of XML documents.
• Distributed Data Processing
– many softwares can be adapted to consume/produce XML-
represented data.
– Web services provide remote services for XML data
processing.
22
Web Publishing: restructuring to XHTML
• The Web application produces some XML content,
structured in some application-dependent dialect, on
the server.
• In a second phase, the XML content is transformed in
an XHTML document that can be visualized by
humans.
• The transformation is typically expressed in XSLT, and
can be processed either on the server or on the
client.
23
Web publishing: content +
presentation instructions
• XML content • Document in XHTML
24
<bibliography>
<book>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year> </book>
<book>...</book>
</bibliography>
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br/> Addison Wesley, 1995 </p>
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br/> Morgan Kaufmann, 1999 </p>
Web publishing
• The same content may be published using
different means:
– Web publishing: XML  XHTML
– WAP (Wireless Application Protocol): XML  WML
25
Web publishing
• Data obtained from a relational database and
from XML files
• XSLT restructures the XML data
• Produces XHTML pages for a browser
26
Web Integration: gluing together
heterogeneous sources
• The portal receives (possibly continuously) XML-
structured content, each source using its own
dialect.
• Each feed provides some content, extracted with
XSLT or XQuery, or any convenient XML processing
tool (e.g., SAX).
27
Data integration
28
Distributed Data Management with
XML
• XML encoding is used to exchange information
between applications.
• A specific dialect, WDSL, is used to describe
Web Services Interfaces.
29
Exploiting XML documents
30
XML dialects
• Dialects define specialized structures, constraints and
vocabularies to construct ad hoc XML contents that can
be used and exchanged in a specific application area
• RSS is an XML dialect for describing content updates
that is heavily used for blog entries, news headlines or
podcasts.
• WML (Wireless Mark-up Language) is used in Web sites
by wireless applications based on the Wireless
Application Protocol (WAP).
• MathML (Mathematical Mark-up Language) is an XML
dialect for describing mathematical notation and
capturing both its structure and content.
31
XML dialects
• Xlink (XML Linking Language) is an XML dialect for
defining hyperlinks between XML documents.
These links are expressed in XML and may be
introduced inside XML documents.
• SVG (Scalable Vector Graphics) is an XML dialect
for describing two-dimensional vector graphics,
both static and animated. With SVG, images may
contain outbound hyperlinks in XLinks.
32
XML dialects – SVG example
33
<?xml version="1.0" encoding="UTF-8" ?>
<svg xmlns="http://www.w3.org/2000/svg">
<polygon points="0,0 50,0 25,50"
style="stroke:#660000; fill:#cc3333;"/>
<text x="20" y="40">Some SVG text</text>
</svg>
Some SVG text
XML standards
• SAX (Simple API for XML) sees an XML document
as a sequence of tokens (its serialization).
• DOM (Document Object Model) is an object
model for representing (HTML and) XML
document independently of the programming
language.
• XPath (XML Path Language) that we will study, is
a language for addressing portions of an XML
document.
34
XML standards
• XQuery (that we will study) is a flexible query
language for extracting information from
collections of XML documents.
• XSLT (Extensible Stylesheet Language
Transformations), that we will study, is a
language for specifying the transformation of
XML documents into other XML documents.
• Web services provide interoperability
between machines based on Web protocols.
35
Processing an XML Document with SAX
and DOM
• A SAX parser transforms an XML document into a
flow of events.
– Examples of events: start/end of a document, the
start/end of an element, a text token, a comment, etc.
• Example: Load data in XML format into a relational
database
36
1. when document start is received, connect to the database;
2. when a Movie open tag is received, create a new Movie record;
(a) when a text node is received, assign its content to X;
(b) when a Title close tag is received, assign X to Movie.Title;
(c) when a Year close tag is received, assign X to Movie.Year, etc.
3. when a Movie close tag is received, insert the Movie record in the
database (and commit the transaction);
4. when document end is received, close the database connection.
Sax
• SAX is a good choice when the content of a
document needs to be examined once
• SAX handler written in Java
– It features methods that handle SAX events:
opening and closing tags; character data
– See next slide
37
38
import org.xml.sax.*;
import org.xml.sax.helpers.LocatorImpl;
public class SaxHandler implements ContentHandler {
/** Constructor */
public SaxHandler() {
super(); }
/** Handler for the beginning and end of the document */
public void startDocument() throws SAXException {
out.println("Start the parsing of document"); }
public void endDocument() throws SAXException {
out.println("End the parsing of document"); }
/** Opening tag handler */
public void startElement(String nameSpaceURI, String localName, String rawName, Attributes attributes) throws
SAXException {
out.println("Opening tag: " + localName);
// Show the attributes, if any
if (attributes.getLength() > 0) {
System.out.println(" Attributes: ");
for (int index = 0;
index < attributes.getLength(); index++) {
out.println(" - " + attributes.getLocalName(index) + " = " + attributes.getValue(index)); } } }
/** Closing tag handler */
public void endElement(String nameSpaceURI, String localName, String rawName)
throws SAXException {
out.print("Closing tag : " + localName);
out.println(); }
/** Character data handling */
public void characters(char[] ch, int start,
int end) throws SAXException {
out.println("#PCDATA: " + new String(ch, start, end)); }}
DOM (Document Object Model)
• A DOM parser transforms an XML document
into a tree and offers an object API for that
tree.
• A partial view of the Class hierarchy of DOM
39
Parsing XML Documents
 A parser checks XML documents for well-
formedness or validates it against a
schema
 So, what is parsing? well, in computer
terminology..
‘to analyze or separate (input, for example) into more easily
processed components’
 XML parsers load XML documents and
provide access to it’s contents in the form
of objects
Parsing XML Documents
 A Validating Parser: can use a DTD or schema to verify
that a document is properly constructed
 A Non-Validating Parser: only requires the document
to be well formed; many free parsers on the Web are
non-validating
 Stream-Based Parsers: read through the document
and signal the application every time a new component
appears
 Tree-Based Parsers: read the entire document and give
the application a tree structure corresponding to the
element structure of the document
Parsing XML Documents
 DOM API: language & platform independent
interfaces for accessing & manipulating info. in
XML documents
 Document checked to see if it’s well formed and
valid
 Parser then converts information into a tree of
nodes
 A tree starts at one root node; in DOM terms,
called a document object instance
 You can modify, delete and create leaves and
branches on the tree using interfaces in API
Parsing XML Documents
 Titles.xml; a sample XML document
<?xml version="1.0" encoding="UTF-8"?>
<BookList>
<Book>
<book_id>BU1111</book_id>
<title>Cooking with Computers: Surreptitious Balance Sheets</title>
<type>business</type>
<pub_id>1389</pub_id>
<price>11.95</price>
<advance>5000</advance>
<royalty>10</royalty>
<ytd_sales>3876</ytd_sales>
<notes>Helpful hints on how to use your electronic...</notes>
<pubdate>1991-06-09T05:00:00</pubdate>
</Book>
<Book>
<book_id>BU7832</book_id>
<title>Straight Talk About Computers</title>
<type>business</type>
<pub_id>1389</pub_id>
<price>19.99</price>
<advance>5000</advance>
<royalty>10</royalty>
<ytd_sales>4095</ytd_sales>
<notes>Annotated analysis of what computers can do for you</notes>
<pubdate>1991-06-22T05:00:00</pubdate>
</Book>
</BookList>
Parsing XML
Documents
DOM
 DOM provides interfaces in it’s hierarchy of Node
objects
 Node: an XML Document object created after a
DOM parser reads an XML file
 Inheritance relationship between important
interfaces
DOM
A sample document object tree
Methods of Node Object
Method Description
hasChildNodes() finds out if a Node has children, takes no parameters, returns a
Boolean
getNodeType() returns the type of a particular Node. The type is a constant integer
used to identify different types of Nodes
appendChild() adds a new child object, which is passed to the method, to the
current Node
cloneNode() returns a duplicate of the Node
hasAttributes() returns a Boolean true if the Node has any attributes. This method
was added in DOM Level 2
insertBefore() takes a new child Node and a reference child Node and inserts the
new child Node before the reference Node
isSupported() tests whether or not this implementation of the DOM supports a
specific feature. This method was added in DOM Level 2 and takes a
version number and a feature as parameters
normalize() puts all text nodes in the full depth of the sub-tree underneath this
Node
removeChild() removes the specified child
replaceChild() replaces the specified child with the new child passed
Methods of Node Object
Method Description
getAttribute() retrieves the specified attribute.
getAttributeNS() retrieves the specified attribute by local name and namespace.
This method was added in Level 2.
getAttributeNode() retrieves an Attr node by name.
getAttributeNodeNS() retrieves an Attr node by local name and namespace. This
method was added in Level 2.
getElementsByTagName() returns a NodeList of all child elements of a given tag name in
the order in which they are encountered.
getElementsByTagNameNS() returns a NodeList of all child elements of a given tag by local
name and namespace in the order in which they are
encountered. This method was added in Level 2.
hasAttribute() returns a Boolean true if the specified attribute is present.
Returns Boolean false otherwise.
hasAttributeNS() returns a Boolean true if the specified attribute, by local name
and namespace, is present. Returns Boolean false otherwise.
This method was added in Level 2.
removeAttribute() removes the specified attribute.
removeAttributeNS() removes the attribute specified by local name and namespace.
This method was added in Level 2.
Methods of Node Object
Method Description
removeAttributeNode() removes the specified Attr node.
setAttribute() adds a new attribute. If an attribute of the same name
exists, its value is changed to the specified value.
setAttributeNS() adds a new attribute. If an attribute of the same local name
and namespace exists, its value is changed to the specified
value. This method was added in Level 2.
setAttributeNode() adds a new Attr node. If an Attr node of the same name
exists, its value is changed to the specified value.
setAttributeNodeNS() adds a new Attr node. If an Attr node of the same local name
and namespace exists, its value is changed to the specified
value. This method was added in Level 2.
DOM
 Node Interface: NodeList is an iterator for a Nodes list
A DOM NodeList object
 NodeList has only a single method, item(); it returns
the Node at the indexed position passed to the method
SAX
 SAX Parser: stream in documents according to
specific events
 SAX parser doesn't have a default object model
 SAX parser read in a XML document and start
events based on the following:
 open or start of elements
 closing or end of documents
 #PCDATA and CDATA sections
 processing instructions, comments & entity declarations
DOM Vs SAX
 The following are DOM benefits you should focus
on:
 it allows random access to the document
 complex searches can be easily implemented
 the DTD or schema is available
 The following list contains some of the most
useful benefits of SAX:
 it can parse files of any size
 it is a fast processing method
 you can build your own data structure
 you can access only a small subset of info. if desired
Example Parsers
 MSXML: first parser which can perform both SAX
and DOM-based parsing, from Microsoft
 Xerces: from the Apache Group at http://xml.apache.org
 IBM's XML for Java: http:// www.alphaworks.ibm.com/
formula/xml
 Microstar's Ifred: http://home.pacbell.net/david-b/xml/
 Sun's Java API for XML: http://java.sun.com/products/xml
 Oracle's XML Parser for Java: http://technet.oracle.com/
XPath
• Language for expressing paths in an XML
document
– Navigation: child, descendant, parent, ancestor
– Tests on the nature of the node
– More complex selection predicates
• XPath Means to specify portions of a
document
• Basic tool for other XML languages: Xlink,
XSLT, Xquery
54
XQuery
• Query language:
• Like SQL: select portions of the data and reconstruct a
result
• Query structure: FLW (pronounced "flower")
• $p : scans the sequence of publishers
• $b : scans the sequence of books for a publisher
• WHERE filters out some publishers
• RETURN constructs the result
55
for $p in document("bib.xml")//publisher
let $b := document("bib.xml)//book[publisher = $p]
where count($b) > 100
return $p
XSLT
• Transformation language:
• An XSLT style sheet includes a set of
transformation rules: pattern/template
• Pattern: based on XPATH expressions; it specifies
a structural context in the tree
• Template: specifies what should be produced
• Principle: when a pattern is matched in the
source document, the corresponding templates
produces some data
56
XLINK
• XML Linking Language
• Advanced hypertext primitives
• Allows inserting in XML documents
descriptions of links to external Web resources
• Simple mono-directional links (HREF)
• Multidirectional links
• XLink relies on XPath for addressing portions
of XML documents
57
XLink
• XLink links resources, which include documents,
audio, video, database data, etc.
• Many types: simple, extended, locator, arc,
resource, or title
58
<person xmlns:xlink=“http:///.w3.org/1999/xlink”
xlink:type=“simple”
xlink:href=http://a.b.c/myhomepage.html
xlink:role="http://www.example.com/studentlist"
xlink:title=“The Homepage”
xlink:show=“replace”
xlink:actuate=“onRequest”>
.....
</person>
<person xmlns:xlink=“http:///.w3.org/1999/xlink”
xlink:type=“simple”
xlink:href=http://a.b.c/myhomepage.html
xlink:role="http://www.example.com/studentlist"
xlink:title=“The Homepage”
xlink:show=“replace”
xlink:actuate=“onRequest”>
.....
</person>
required attributes
optional attributes
XLink
• The linking element (i.e., person) that references document2 is called a
local resource.
• The resource referenced(point to) is called the remote resource.
• Arc: how to traverse a pair of resources, including the direction of
traversal and possibly application behavior information as well.
• Outbound arc: An arc that has a local starting resource and a remote
ending resource
• Inbound arc: If an arc's ending resource is local but its starting resource is
remote
• Third party arc: If neither the starting resource nor the ending resource is
local
59
XLink
• The use of XLink elements and attributes
requires declaration of the XLink
namespace,
<person xmlns:xlink=“http:///.w3.org/1999/xlink”…..
• href attribute defines the remote resource's
URI (uniform resource identifier)
• Role attribute is a URI that references a
resource that describes the link
• Title attribute is a descriptive title for the
link.
60
XLink
• show attribute specifies how to display a resource when it is
loaded and can be
– “new”, to display in new window
– ”replace”, to replace current resource with the linked resource
– ”embed”, replace current element with the linked resource
– ”other”, XLink-aware application can decide how to display
• actuate attribute specifies when the resource should be
retrieved and can be
– “onLoad”, retrieve as soon as it is loaded
– ”onRequest”, retrieve resource by clicking on the link
– ”other”, XLink-aware applications can decide when to load
– ”none”, no information when to load the resource
61
XLink
• Extended Links
– for linking multiple combinations of local and remote
resources.
– The figure shows two unidirectional links
– With XLink, we can create multidirectional links for
traversing between resources
62
XLink
• Multidirectional links are not limited to just two
resources, but can link any number of resources
• The links need not be traversed sequentially.
63
64
Books – root element
A link to the book's authors
with information located at
/authors/deitel.xml
A label's value is used to
link one resource to another
type locator, which
specifies a remote resource
When either Deitel,
Harvey or Deitel, Paul is
selected, the document
deitel.xml will be
retrieved
1 <?xml version = "1.0"?>
2
3 <!-- Fig. 14.8 : booklinks.xml -->
4 <!-- XML document containing extended links -->
5
6 <books xmlns:xlink = "http://www.w3.org/1999/xlink"
7 xlink:type = "extended"
8 xlink:title = "Book Inventory">
9
10 <author xlink:label = "authorDeitel"
11 xlink:type = "locator"
12 xlink:href = "/authors/deitel.xml"
13 xlink:role = "http://deitel.com/xlink/author"
14 xlink:title = "Deitel &amp; Associates, Inc.">
15 <persons id = "authors">
16 <person>Deitel, Harvey</person>
17 <person>Deitel, Paul</person>
18 </persons>
19 </author>
20
21 <publisher xlink:label = "publisherPrenticeHall"
22 xlink:type = "locator"
23 xlink:href = "/publisher/prenticehall.xml"
24 xlink:role = "http://deitel.com/xlink/publisher"
25 xlink:title = "Prentice Hall"/>
26
27 <warehouse xlink:label = "warehouseXYZ"
28 xlink:type = "locator"
29 xlink:href = "/warehouse/xyz.xml"
30 xlink:role = "http://deitel.com/xlink/warehouse"
31 xlink:title = "X.Y.Z. Books"/>
32
65
33 <book xlink:label = "JavaBook"
34 xlink:type = "resource"
35 xlink:role = "http://deitel.com/xlink/author"
36 xlink:title = "Textbook on Java">
37 Java How to Program: Third edition
38 </book>
39
40 <arcElement xlink:type = "arc"
41 xlink:from = "JavaBook"
42 xlink:arcrole = "http://deitel.com/xlink/info"
43 xlink:to = "authorDeitel"
44 xlink:show = "new"
45 xlink:actuate = "onRequest"
46 xlink:title = "About the author"/>
47
48 <arcElement xlink:type = "arc"
49 xlink:from = "JavaBook"
50 xlink:arcrole = "http://deitel.com/xlink/info"
51 xlink:to = "publisherPrenticeHall"
52 xlink:show = "new"
53 xlink:actuate = "onRequest"
54 xlink:title = "About the publisher"/>
55
56 <arcElement xlink:type = "arc"
57 xlink:from = "warehouseXYZ"
58 xlink:arcrole = "http://deitel.com/xlink/info"
59 xlink:to = "JavaBook"
60 xlink:show = "new"
61 xlink:actuate = "onRequest"
62 xlink:title = "Information about this book"/>
63
Book - a local resource
can be linked to (or from) an
author or publisher
create an outbound arc
between the book local
resource and the author
remote resource
arcrole provides information
about the book's author
an inbound arc, that has a
starting resource that is
remote, i.e., warehouseXYZ
and an ending resource that is
local,i.e. JavaBook
create an outbound arc
between the book local
resource and the publisher
remote resource
XLink
66
64 <arcElement xlink:type = "arc"
65 xlink:from = "publisherPrenticeHall"
66 xlink:arcrole = "http://deitel.com/xlink/stock"
67 xlink:to = "warehouseXYZ"
68 xlink:show = "embed"
69 xlink:actuate = "onLoad"
70 xlink:title = "Publisher&apos;s inventory"/>
71
72 </books>
- a third-party arc, that has starting and ending resources that are both
remote
- Attribute show has value embed, which indicates that the ending
resource should replace the starting resource when the link is traversed
- Attribute actuate has value onLoad, so upon loading the XML
document, the link is traversed
- Because we consider the relationship between the publisher and
warehouse as being different than the previous three arcs, we provide a
different arcrole value for this link
XPointer
• An extension of XPath
• Usage:
– href=www.a.b.c/document.xml#xpointerExpr
• XPointer can link to
– specific locations (i.e., nodes in an XPath tree),
or even
– ranges of locations, in an XML document.
• XPointer also adds the ability to search XML
documents by using string matching.
67
XPointer
• Pointing to a point (=XML element or
character)
• Full XPointer form
• The name xpointer—called a scheme
68
1 <?xml version = "1.0"?>
2 <!-- Fig. 14.14 : contacts.xml -->
3 <!-- contact list document -->
4
5 <contacts>
6 <contact id = "author01">Deitel, Harvey</contact>
7 <contact id = "author02">Deitel, Paul</contact>
8 <contact id = "author03">Nieto, Tem</contact>
9 </contacts>
xlink:href = "/contacts.xml#xpointer(//contact[@id = 'author02'])"
XPointer
• If a document's unique identifier is referenced in an
expression such as
• bare-name XPointer address – simplified expression
• Child sequence: e.g. #xpointer( /1/3/2/5),
#xpointer( /bib/book[3])
• Pointing to a range: e.g. #xpointer(id(3652 to 44))
69
xlink:href = "/contacts.xml#xpointer(id('author01‘))"
xlink:href = "/contacts.xml#author01"

More Related Content

Similar to xMLDataModel.pdf

Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTD
AnushaMahmood
 

Similar to xMLDataModel.pdf (20)

Xml and webdata
Xml and webdataXml and webdata
Xml and webdata
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
 
XMl
XMlXMl
XMl
 
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
 
Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTD
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
 
Xml basics concepts
Xml basics conceptsXml basics concepts
Xml basics concepts
 
Web Service Workshop - 3 days
Web Service Workshop - 3 daysWeb Service Workshop - 3 days
Web Service Workshop - 3 days
 
Xml
XmlXml
Xml
 
XML1.pptx
XML1.pptxXML1.pptx
XML1.pptx
 
Hpd ppt
Hpd pptHpd ppt
Hpd ppt
 
Unit2_XML_S_SS_US Data_CS19414.pptx
Unit2_XML_S_SS_US Data_CS19414.pptxUnit2_XML_S_SS_US Data_CS19414.pptx
Unit2_XML_S_SS_US Data_CS19414.pptx
 
WT UNIT-2 XML.pdf
WT UNIT-2 XML.pdfWT UNIT-2 XML.pdf
WT UNIT-2 XML.pdf
 
Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentation
 
Xml nisha dwivedi
Xml nisha dwivediXml nisha dwivedi
Xml nisha dwivedi
 
XML
XMLXML
XML
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml (2)
Xml (2)Xml (2)
Xml (2)
 
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
 
Xml schema
Xml schemaXml schema
Xml schema
 

Recently uploaded

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

xMLDataModel.pdf

  • 1. Web Data Management XML Data Model 1
  • 2. Semi-structured Data Model • A data model, based on graphs, for representing both regular and irregular data. • Basic ideas • Self-describing data. The content comes with its own description; – contrast with the relational model, where schema and content are represented separately. • Flexible typing. Data may be typed (i.e., “such nodes are integer values” or “this part of the graph complies to this description”); often no typing, or a very flexible one • Serialized form. The graph representation is associated to a serialized form, convenient for exchanges in an heterogeneous environment. 2
  • 3. Self-describing data • Starting point: association lists, i.e., records of label-value pairs. • Natural extension: values may themselves be other structures: • Further extension: allow duplicate labels. {name: "Alan", tel: 2157786, email: "agb@abc.com"} {name: {first: "Alan", last: "Black"}, tel: 2157786, email: "agb@abc.com"} {name: “Alan’’, tel: 2157786, tel: 2498762 } 3
  • 4. Tree-based representation • Data can be graphically represented as trees: label structure can be captured by tree edges, and values reside at leaves. 4
  • 5. Tree-based representation: labels as nodes • Another choice is to represent both labels and values as vertices. • The XML data model adopts this latter representation. 5
  • 6. Representation of regular data • The syntax makes it easy to describe sets of tuples as in: – relational data can be represented – for regular data, the semi-structure representation is highly redundant. { person: {name: "alan", phone: 3127786, email: "alan@abc.com"}, person: {name: "sara", phone: 2136877, email: "sara@xyz.edu"}, person: {name: "fred", phone: 7786312, email: "fd@ac.uk"} } 6
  • 7. Representation of irregular data • Many possible variations in the structure: missing values, duplicates, changes, etc. – Nodes can be identified, and referred to by their identity. { person: {name: "alan", phone: 3127786, email: "agg@abc.com"}, person: &314 { name: {first: "Sara", last: "Green" }, phone: 2136877, email: "sara@math.xyz.edu", spouse: &443 }, person: &443 { name: "fred", Phone: 7786312, Height: 183, spouse: &314 }} 7
  • 8. XML represents Semi structured Data • Do not care about the type of the data • Serialize the data by annotating each data item explicitly with its description (e.g. name, phone, etc.) • Such data is called self-describing • Serialization: convert data into a byte stream that can be easily transmitted and reconstructed at the receiver • Self-describing data wastes space but provides interoperability (required on the web) • Semistructured data models – XML, JSON (Javascript Object Notation) 8
  • 9. XML as a Semi-structured Data Explained • Missing attributes: – <person> <name> John</name> <phone>1234</phone> </person> – <person><name>Joe</name></person> no phone ! • Repeated attributes – <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> </person> 9
  • 10. XML as a Semi structured Data Explained • Attributes with different types in different objects – <person> <name> <first> John </first> <last> Smith </last> </name> complex name ! <phone>1234</phone> </person> • Nested collections (no 1NF) 10
  • 11. XML in brief • XML is the World-Wide-Web Consortium (W3C) standard for Web data exchange. – XML documents can be serialized in a normalized encoding (typically iso-8859-1, or utf-8), and safely transmitted on the Internet. – XML is a generic format, which can be specialized in “dialects” for specific domain (e.g., XHTML) – The W3C promotes companion standards: DOM (object model), XSchema (typing), XPath (path expression), XSLT (restructuring), Xquery (query language), and many others. • XML is a simplified version of SGML, a long-term used language for technical documents. 11
  • 12. XML documents • An XML document is a labeled, unranked, ordered tree: – Labeled means that some annotation, the label, is attached to each node. – Unranked means that there is no a priori bound on the number of children of a node. – Ordered means that there is an order between the children of each node. • XML specifies nothing more than a syntax: no meaning is attached to the labels. • A dialect, on the other hand, associates a meaning to labels (e.g., title in XHTML). 12
  • 13. XML documents are trees <person> <row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row> </person> 13 name phone John 3634 Sue 6343 Dick 6363 row row row name name name phone phone phone “John” 3634 “Sue” “Dick” 6343 6363 person XML: person Serialized representation
  • 14. XML and Semi structured Data: Similarities and Differences <person id=“o123”> <name> Alan </name> <age> 42 </age> <email> ab@com </email> </person> <person id=“o123”> <name> Alan </name> <age> 42 </age> <email> ab@com </email> </person> { person: &o123 { name: “Alan”, age: 42, email: “ab@com” } } { person: &o123 { name: “Alan”, age: 42, email: “ab@com” } } 14 person name age email Alan 42 ab@com person name age email Alan 42 ab@com father father <person father=“o123”> … </person> { person: { father: &o123 …} } similar on trees, different on graphs
  • 15. XML describes structured content • Applications cannot interpret unstructured content: • XML provides a means to structure this content: • Now, an application can access the XML tree, extract some parts, rename the labels, reorganize the content into another structure, etc. 15 The book ‘‘Fundations of Databases’’, written by Serge Abiteboul, Rick Hull and Victor Vianu, published in 1995 by Addison-Wesley <bibliography> <book> <title> Foundations of Databases </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> <book>...</book> </bibliography>
  • 16. Applications associate semantics to XML docs • Letter document 16 <letter> <header> <author>...</author> <date>...</date> <recipient>...</recipient> <cc>...<cc> </header> <body> <text>...</text> <signature>...</signature> </body> </letter>
  • 17. Serialized and tree-based forms • The serialized form is a textual, linear representation of the tree; it complies to a (sometimes complicated) syntax; • Tree-based forms implement in a specific context, object-oriented model, the abstract tree representation (Document Object Model) • Typically, an application gets a document in serialized form, parse it in tree form, and serializes it back at the end. 17
  • 18. Serialized and tree-based forms: text and elements • The basic components of an XML document are element and text. • Here is an element, whose content is a text. • The tree form of the document, modeled in DOM: each node has a type, either Document or Text. 18 <elt_name> Textual content </elt_name>
  • 19. Serialized and tree-based forms: nesting elements • Serialized form: The content of an element is the part between the opening and ending tags • Tree-based form: the subtree rooted at the corresponding Element node (in DOM) 19 <elt1> Textual content <elt2> Another content </elt2> </elt1>
  • 20. Serialized and tree-based forms: attributes • Serialized form: Attributes are pairs of name/value attached to an element • The content of an attribute is always atomic text (no nesting) • Attributes are not ordered, and there cannot be two attributes with the same name in an element • Tree-based form: Attributes are special child nodes of the Element node (in DOM) 20 <elt1 att1=’12’ att2=’fr’> Textual content </elt1>
  • 21. Serialized and tree-based forms: the document root • A document with its prologue, and element root • In the DOM representation, the prologue appears as a Document node, called the root node. 21 • Document content must always be enclosed in a single opening/ending tag, called the element root • The first line of the serialized form must always be the prologue if there is one: <?xml version="1.0"encoding="utf-8"?> <?xml version="1.0“ encoding="utf-8" ?> <elt> Document content. </elt>
  • 22. Web Data Management with XML • Publishing – an XML document can easily be converted to another XML document (same content, but another structure) – Web publishing is the process of transforming XML documents to XHTML. • Integration – XML documents from many sources can be transformed in a common dialect, and constitute a collection. – Search engines, or portals, provide browsing and searching services on collections of XML documents. • Distributed Data Processing – many softwares can be adapted to consume/produce XML- represented data. – Web services provide remote services for XML data processing. 22
  • 23. Web Publishing: restructuring to XHTML • The Web application produces some XML content, structured in some application-dependent dialect, on the server. • In a second phase, the XML content is transformed in an XHTML document that can be visualized by humans. • The transformation is typically expressed in XSLT, and can be processed either on the server or on the client. 23
  • 24. Web publishing: content + presentation instructions • XML content • Document in XHTML 24 <bibliography> <book> <title> Foundations of Databases </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> <book>...</book> </bibliography> <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br/> Addison Wesley, 1995 </p> <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br/> Morgan Kaufmann, 1999 </p>
  • 25. Web publishing • The same content may be published using different means: – Web publishing: XML  XHTML – WAP (Wireless Application Protocol): XML  WML 25
  • 26. Web publishing • Data obtained from a relational database and from XML files • XSLT restructures the XML data • Produces XHTML pages for a browser 26
  • 27. Web Integration: gluing together heterogeneous sources • The portal receives (possibly continuously) XML- structured content, each source using its own dialect. • Each feed provides some content, extracted with XSLT or XQuery, or any convenient XML processing tool (e.g., SAX). 27
  • 29. Distributed Data Management with XML • XML encoding is used to exchange information between applications. • A specific dialect, WDSL, is used to describe Web Services Interfaces. 29
  • 31. XML dialects • Dialects define specialized structures, constraints and vocabularies to construct ad hoc XML contents that can be used and exchanged in a specific application area • RSS is an XML dialect for describing content updates that is heavily used for blog entries, news headlines or podcasts. • WML (Wireless Mark-up Language) is used in Web sites by wireless applications based on the Wireless Application Protocol (WAP). • MathML (Mathematical Mark-up Language) is an XML dialect for describing mathematical notation and capturing both its structure and content. 31
  • 32. XML dialects • Xlink (XML Linking Language) is an XML dialect for defining hyperlinks between XML documents. These links are expressed in XML and may be introduced inside XML documents. • SVG (Scalable Vector Graphics) is an XML dialect for describing two-dimensional vector graphics, both static and animated. With SVG, images may contain outbound hyperlinks in XLinks. 32
  • 33. XML dialects – SVG example 33 <?xml version="1.0" encoding="UTF-8" ?> <svg xmlns="http://www.w3.org/2000/svg"> <polygon points="0,0 50,0 25,50" style="stroke:#660000; fill:#cc3333;"/> <text x="20" y="40">Some SVG text</text> </svg> Some SVG text
  • 34. XML standards • SAX (Simple API for XML) sees an XML document as a sequence of tokens (its serialization). • DOM (Document Object Model) is an object model for representing (HTML and) XML document independently of the programming language. • XPath (XML Path Language) that we will study, is a language for addressing portions of an XML document. 34
  • 35. XML standards • XQuery (that we will study) is a flexible query language for extracting information from collections of XML documents. • XSLT (Extensible Stylesheet Language Transformations), that we will study, is a language for specifying the transformation of XML documents into other XML documents. • Web services provide interoperability between machines based on Web protocols. 35
  • 36. Processing an XML Document with SAX and DOM • A SAX parser transforms an XML document into a flow of events. – Examples of events: start/end of a document, the start/end of an element, a text token, a comment, etc. • Example: Load data in XML format into a relational database 36 1. when document start is received, connect to the database; 2. when a Movie open tag is received, create a new Movie record; (a) when a text node is received, assign its content to X; (b) when a Title close tag is received, assign X to Movie.Title; (c) when a Year close tag is received, assign X to Movie.Year, etc. 3. when a Movie close tag is received, insert the Movie record in the database (and commit the transaction); 4. when document end is received, close the database connection.
  • 37. Sax • SAX is a good choice when the content of a document needs to be examined once • SAX handler written in Java – It features methods that handle SAX events: opening and closing tags; character data – See next slide 37
  • 38. 38 import org.xml.sax.*; import org.xml.sax.helpers.LocatorImpl; public class SaxHandler implements ContentHandler { /** Constructor */ public SaxHandler() { super(); } /** Handler for the beginning and end of the document */ public void startDocument() throws SAXException { out.println("Start the parsing of document"); } public void endDocument() throws SAXException { out.println("End the parsing of document"); } /** Opening tag handler */ public void startElement(String nameSpaceURI, String localName, String rawName, Attributes attributes) throws SAXException { out.println("Opening tag: " + localName); // Show the attributes, if any if (attributes.getLength() > 0) { System.out.println(" Attributes: "); for (int index = 0; index < attributes.getLength(); index++) { out.println(" - " + attributes.getLocalName(index) + " = " + attributes.getValue(index)); } } } /** Closing tag handler */ public void endElement(String nameSpaceURI, String localName, String rawName) throws SAXException { out.print("Closing tag : " + localName); out.println(); } /** Character data handling */ public void characters(char[] ch, int start, int end) throws SAXException { out.println("#PCDATA: " + new String(ch, start, end)); }}
  • 39. DOM (Document Object Model) • A DOM parser transforms an XML document into a tree and offers an object API for that tree. • A partial view of the Class hierarchy of DOM 39
  • 40. Parsing XML Documents  A parser checks XML documents for well- formedness or validates it against a schema  So, what is parsing? well, in computer terminology.. ‘to analyze or separate (input, for example) into more easily processed components’  XML parsers load XML documents and provide access to it’s contents in the form of objects
  • 41. Parsing XML Documents  A Validating Parser: can use a DTD or schema to verify that a document is properly constructed  A Non-Validating Parser: only requires the document to be well formed; many free parsers on the Web are non-validating  Stream-Based Parsers: read through the document and signal the application every time a new component appears  Tree-Based Parsers: read the entire document and give the application a tree structure corresponding to the element structure of the document
  • 42. Parsing XML Documents  DOM API: language & platform independent interfaces for accessing & manipulating info. in XML documents  Document checked to see if it’s well formed and valid  Parser then converts information into a tree of nodes  A tree starts at one root node; in DOM terms, called a document object instance  You can modify, delete and create leaves and branches on the tree using interfaces in API
  • 43. Parsing XML Documents  Titles.xml; a sample XML document <?xml version="1.0" encoding="UTF-8"?> <BookList> <Book> <book_id>BU1111</book_id> <title>Cooking with Computers: Surreptitious Balance Sheets</title> <type>business</type> <pub_id>1389</pub_id> <price>11.95</price> <advance>5000</advance> <royalty>10</royalty> <ytd_sales>3876</ytd_sales> <notes>Helpful hints on how to use your electronic...</notes> <pubdate>1991-06-09T05:00:00</pubdate> </Book> <Book> <book_id>BU7832</book_id> <title>Straight Talk About Computers</title> <type>business</type> <pub_id>1389</pub_id> <price>19.99</price> <advance>5000</advance> <royalty>10</royalty> <ytd_sales>4095</ytd_sales> <notes>Annotated analysis of what computers can do for you</notes> <pubdate>1991-06-22T05:00:00</pubdate> </Book> </BookList>
  • 45. DOM  DOM provides interfaces in it’s hierarchy of Node objects  Node: an XML Document object created after a DOM parser reads an XML file  Inheritance relationship between important interfaces
  • 46. DOM A sample document object tree
  • 47. Methods of Node Object Method Description hasChildNodes() finds out if a Node has children, takes no parameters, returns a Boolean getNodeType() returns the type of a particular Node. The type is a constant integer used to identify different types of Nodes appendChild() adds a new child object, which is passed to the method, to the current Node cloneNode() returns a duplicate of the Node hasAttributes() returns a Boolean true if the Node has any attributes. This method was added in DOM Level 2 insertBefore() takes a new child Node and a reference child Node and inserts the new child Node before the reference Node isSupported() tests whether or not this implementation of the DOM supports a specific feature. This method was added in DOM Level 2 and takes a version number and a feature as parameters normalize() puts all text nodes in the full depth of the sub-tree underneath this Node removeChild() removes the specified child replaceChild() replaces the specified child with the new child passed
  • 48. Methods of Node Object Method Description getAttribute() retrieves the specified attribute. getAttributeNS() retrieves the specified attribute by local name and namespace. This method was added in Level 2. getAttributeNode() retrieves an Attr node by name. getAttributeNodeNS() retrieves an Attr node by local name and namespace. This method was added in Level 2. getElementsByTagName() returns a NodeList of all child elements of a given tag name in the order in which they are encountered. getElementsByTagNameNS() returns a NodeList of all child elements of a given tag by local name and namespace in the order in which they are encountered. This method was added in Level 2. hasAttribute() returns a Boolean true if the specified attribute is present. Returns Boolean false otherwise. hasAttributeNS() returns a Boolean true if the specified attribute, by local name and namespace, is present. Returns Boolean false otherwise. This method was added in Level 2. removeAttribute() removes the specified attribute. removeAttributeNS() removes the attribute specified by local name and namespace. This method was added in Level 2.
  • 49. Methods of Node Object Method Description removeAttributeNode() removes the specified Attr node. setAttribute() adds a new attribute. If an attribute of the same name exists, its value is changed to the specified value. setAttributeNS() adds a new attribute. If an attribute of the same local name and namespace exists, its value is changed to the specified value. This method was added in Level 2. setAttributeNode() adds a new Attr node. If an Attr node of the same name exists, its value is changed to the specified value. setAttributeNodeNS() adds a new Attr node. If an Attr node of the same local name and namespace exists, its value is changed to the specified value. This method was added in Level 2.
  • 50. DOM  Node Interface: NodeList is an iterator for a Nodes list A DOM NodeList object  NodeList has only a single method, item(); it returns the Node at the indexed position passed to the method
  • 51. SAX  SAX Parser: stream in documents according to specific events  SAX parser doesn't have a default object model  SAX parser read in a XML document and start events based on the following:  open or start of elements  closing or end of documents  #PCDATA and CDATA sections  processing instructions, comments & entity declarations
  • 52. DOM Vs SAX  The following are DOM benefits you should focus on:  it allows random access to the document  complex searches can be easily implemented  the DTD or schema is available  The following list contains some of the most useful benefits of SAX:  it can parse files of any size  it is a fast processing method  you can build your own data structure  you can access only a small subset of info. if desired
  • 53. Example Parsers  MSXML: first parser which can perform both SAX and DOM-based parsing, from Microsoft  Xerces: from the Apache Group at http://xml.apache.org  IBM's XML for Java: http:// www.alphaworks.ibm.com/ formula/xml  Microstar's Ifred: http://home.pacbell.net/david-b/xml/  Sun's Java API for XML: http://java.sun.com/products/xml  Oracle's XML Parser for Java: http://technet.oracle.com/
  • 54. XPath • Language for expressing paths in an XML document – Navigation: child, descendant, parent, ancestor – Tests on the nature of the node – More complex selection predicates • XPath Means to specify portions of a document • Basic tool for other XML languages: Xlink, XSLT, Xquery 54
  • 55. XQuery • Query language: • Like SQL: select portions of the data and reconstruct a result • Query structure: FLW (pronounced "flower") • $p : scans the sequence of publishers • $b : scans the sequence of books for a publisher • WHERE filters out some publishers • RETURN constructs the result 55 for $p in document("bib.xml")//publisher let $b := document("bib.xml)//book[publisher = $p] where count($b) > 100 return $p
  • 56. XSLT • Transformation language: • An XSLT style sheet includes a set of transformation rules: pattern/template • Pattern: based on XPATH expressions; it specifies a structural context in the tree • Template: specifies what should be produced • Principle: when a pattern is matched in the source document, the corresponding templates produces some data 56
  • 57. XLINK • XML Linking Language • Advanced hypertext primitives • Allows inserting in XML documents descriptions of links to external Web resources • Simple mono-directional links (HREF) • Multidirectional links • XLink relies on XPath for addressing portions of XML documents 57
  • 58. XLink • XLink links resources, which include documents, audio, video, database data, etc. • Many types: simple, extended, locator, arc, resource, or title 58 <person xmlns:xlink=“http:///.w3.org/1999/xlink” xlink:type=“simple” xlink:href=http://a.b.c/myhomepage.html xlink:role="http://www.example.com/studentlist" xlink:title=“The Homepage” xlink:show=“replace” xlink:actuate=“onRequest”> ..... </person> <person xmlns:xlink=“http:///.w3.org/1999/xlink” xlink:type=“simple” xlink:href=http://a.b.c/myhomepage.html xlink:role="http://www.example.com/studentlist" xlink:title=“The Homepage” xlink:show=“replace” xlink:actuate=“onRequest”> ..... </person> required attributes optional attributes
  • 59. XLink • The linking element (i.e., person) that references document2 is called a local resource. • The resource referenced(point to) is called the remote resource. • Arc: how to traverse a pair of resources, including the direction of traversal and possibly application behavior information as well. • Outbound arc: An arc that has a local starting resource and a remote ending resource • Inbound arc: If an arc's ending resource is local but its starting resource is remote • Third party arc: If neither the starting resource nor the ending resource is local 59
  • 60. XLink • The use of XLink elements and attributes requires declaration of the XLink namespace, <person xmlns:xlink=“http:///.w3.org/1999/xlink”….. • href attribute defines the remote resource's URI (uniform resource identifier) • Role attribute is a URI that references a resource that describes the link • Title attribute is a descriptive title for the link. 60
  • 61. XLink • show attribute specifies how to display a resource when it is loaded and can be – “new”, to display in new window – ”replace”, to replace current resource with the linked resource – ”embed”, replace current element with the linked resource – ”other”, XLink-aware application can decide how to display • actuate attribute specifies when the resource should be retrieved and can be – “onLoad”, retrieve as soon as it is loaded – ”onRequest”, retrieve resource by clicking on the link – ”other”, XLink-aware applications can decide when to load – ”none”, no information when to load the resource 61
  • 62. XLink • Extended Links – for linking multiple combinations of local and remote resources. – The figure shows two unidirectional links – With XLink, we can create multidirectional links for traversing between resources 62
  • 63. XLink • Multidirectional links are not limited to just two resources, but can link any number of resources • The links need not be traversed sequentially. 63
  • 64. 64 Books – root element A link to the book's authors with information located at /authors/deitel.xml A label's value is used to link one resource to another type locator, which specifies a remote resource When either Deitel, Harvey or Deitel, Paul is selected, the document deitel.xml will be retrieved 1 <?xml version = "1.0"?> 2 3 <!-- Fig. 14.8 : booklinks.xml --> 4 <!-- XML document containing extended links --> 5 6 <books xmlns:xlink = "http://www.w3.org/1999/xlink" 7 xlink:type = "extended" 8 xlink:title = "Book Inventory"> 9 10 <author xlink:label = "authorDeitel" 11 xlink:type = "locator" 12 xlink:href = "/authors/deitel.xml" 13 xlink:role = "http://deitel.com/xlink/author" 14 xlink:title = "Deitel &amp; Associates, Inc."> 15 <persons id = "authors"> 16 <person>Deitel, Harvey</person> 17 <person>Deitel, Paul</person> 18 </persons> 19 </author> 20 21 <publisher xlink:label = "publisherPrenticeHall" 22 xlink:type = "locator" 23 xlink:href = "/publisher/prenticehall.xml" 24 xlink:role = "http://deitel.com/xlink/publisher" 25 xlink:title = "Prentice Hall"/> 26 27 <warehouse xlink:label = "warehouseXYZ" 28 xlink:type = "locator" 29 xlink:href = "/warehouse/xyz.xml" 30 xlink:role = "http://deitel.com/xlink/warehouse" 31 xlink:title = "X.Y.Z. Books"/> 32
  • 65. 65 33 <book xlink:label = "JavaBook" 34 xlink:type = "resource" 35 xlink:role = "http://deitel.com/xlink/author" 36 xlink:title = "Textbook on Java"> 37 Java How to Program: Third edition 38 </book> 39 40 <arcElement xlink:type = "arc" 41 xlink:from = "JavaBook" 42 xlink:arcrole = "http://deitel.com/xlink/info" 43 xlink:to = "authorDeitel" 44 xlink:show = "new" 45 xlink:actuate = "onRequest" 46 xlink:title = "About the author"/> 47 48 <arcElement xlink:type = "arc" 49 xlink:from = "JavaBook" 50 xlink:arcrole = "http://deitel.com/xlink/info" 51 xlink:to = "publisherPrenticeHall" 52 xlink:show = "new" 53 xlink:actuate = "onRequest" 54 xlink:title = "About the publisher"/> 55 56 <arcElement xlink:type = "arc" 57 xlink:from = "warehouseXYZ" 58 xlink:arcrole = "http://deitel.com/xlink/info" 59 xlink:to = "JavaBook" 60 xlink:show = "new" 61 xlink:actuate = "onRequest" 62 xlink:title = "Information about this book"/> 63 Book - a local resource can be linked to (or from) an author or publisher create an outbound arc between the book local resource and the author remote resource arcrole provides information about the book's author an inbound arc, that has a starting resource that is remote, i.e., warehouseXYZ and an ending resource that is local,i.e. JavaBook create an outbound arc between the book local resource and the publisher remote resource
  • 66. XLink 66 64 <arcElement xlink:type = "arc" 65 xlink:from = "publisherPrenticeHall" 66 xlink:arcrole = "http://deitel.com/xlink/stock" 67 xlink:to = "warehouseXYZ" 68 xlink:show = "embed" 69 xlink:actuate = "onLoad" 70 xlink:title = "Publisher&apos;s inventory"/> 71 72 </books> - a third-party arc, that has starting and ending resources that are both remote - Attribute show has value embed, which indicates that the ending resource should replace the starting resource when the link is traversed - Attribute actuate has value onLoad, so upon loading the XML document, the link is traversed - Because we consider the relationship between the publisher and warehouse as being different than the previous three arcs, we provide a different arcrole value for this link
  • 67. XPointer • An extension of XPath • Usage: – href=www.a.b.c/document.xml#xpointerExpr • XPointer can link to – specific locations (i.e., nodes in an XPath tree), or even – ranges of locations, in an XML document. • XPointer also adds the ability to search XML documents by using string matching. 67
  • 68. XPointer • Pointing to a point (=XML element or character) • Full XPointer form • The name xpointer—called a scheme 68 1 <?xml version = "1.0"?> 2 <!-- Fig. 14.14 : contacts.xml --> 3 <!-- contact list document --> 4 5 <contacts> 6 <contact id = "author01">Deitel, Harvey</contact> 7 <contact id = "author02">Deitel, Paul</contact> 8 <contact id = "author03">Nieto, Tem</contact> 9 </contacts> xlink:href = "/contacts.xml#xpointer(//contact[@id = 'author02'])"
  • 69. XPointer • If a document's unique identifier is referenced in an expression such as • bare-name XPointer address – simplified expression • Child sequence: e.g. #xpointer( /1/3/2/5), #xpointer( /bib/book[3]) • Pointing to a range: e.g. #xpointer(id(3652 to 44)) 69 xlink:href = "/contacts.xml#xpointer(id('author01‘))" xlink:href = "/contacts.xml#author01"