SlideShare a Scribd company logo
1 of 64
April 29th, 2003 Organizing and Searching Information with XML 1
What is XML
• XML stands for eXtensible Markup Language.
• A markup language is used to provide information about a
document.
• Tags are added to the document to provide the extra information.
• HTML tags tell a browser how to display the document.
• XML tags give a reader some idea what some of the data means.
April 29th, 2003 Organizing and Searching Information with XML 2
What is XML Used For?
• XML documents are used to transfer data from one place to another often over
the Internet.
• XML subsets are designed for particular applications.
• One is RSS (Rich Site Summary or Really Simple Syndication ). It is used to
send breaking news bulletins from one web site to another.
• A number of fields have their own subsets. These include chemistry,
mathematics, and books publishing.
• Most of these subsets are registered with the W3Consortium and are available
for anyone’s use.
April 29th, 2003 Organizing and Searching Information with XML 3
Advantages of XML
• XML is text (Unicode) based.
– Takes up less space.
– Can be transmitted efficiently.
• One XML document can be displayed differently in different
media.
– Html, video, CD, DVD,
– You only have to change the XML document in order to change all the rest.
• XML documents can be modularized. Parts can be reused.
April 29th, 2003 Organizing and Searching Information with XML 4
Example of an HTML Document
<html>
<head><title>Example</title></head.
<body>
<h1>This is an example of a page.</h1>
<h2>Some information goes here.</h2>
</body>
</html>
April 29th, 2003 Organizing and Searching Information with XML 5
Example of an XML Document
<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
April 29th, 2003 Organizing and Searching Information with XML 6
Difference Between HTML and XML
• HTML tags have a fixed meaning and browsers know
what it is.
• XML tags are different for different applications, and
users know what they mean.
• HTML tags are used for display.
• XML tags are used to describe documents and data.
April 29th, 2003 Organizing and Searching Information with XML 7
XML Rules
• Tags are enclosed in angle brackets.
• Tags come in pairs with start-tags and end-tags.
• Tags must be properly nested.
– <name><email>…</name></email> is not allowed.
– <name><email>…</email><name> is.
• Tags that do not have end-tags must be terminated by a
‘/’.
– <br /> is an html example.
April 29th, 2003 Organizing and Searching Information with XML 8
More XML Rules
• Tags are case sensitive.
– <address> is not the same as <Address>
• XML in any combination of cases is not allowed as part of a tag.
• Tags may not contain ‘<‘ or ‘&’.
• Tags follow Java naming conventions, except that a single colon
and other characters are allowed. They must begin with a letter
and may not contain white space.
• Documents must have a single root tag that begins the document.
April 29th, 2003 Organizing and Searching Information with XML 9
Encoding
• XML (like Java) uses Unicode to encode characters.
• Unicode comes in many flavors. The most common one used in the West is
UTF-8.
• UTF-8 is a variable length code. Characters are encoded in 1 byte, 2 bytes,
or 4 bytes.
• The first 128 characters in Unicode are ASCII.
• In UTF-8, the numbers between 128 and 255 code for some of the more
common characters used in western Europe, such as ã, á, å, or ç.
• Two byte codes are used for some characters not listed in the first 256 and
some Asian ideographs.
• Four byte codes can handle any ideographs that are left.
• Those using non-western languages should investigate other versions of
Unicode.
April 29th, 2003 Organizing and Searching Information with XML 10
Well-Formed Documents
• An XML document is said to be well-formed if it follows all the
rules.
• An XML parser is used to check that all the rules have been
obeyed.
• Recent browsers such as Internet Explorer 5 and Netscape 7 come
with XML parsers.
• Parsers are also available for free download over the Internet.
One is Xerces, from the Apache open-source project.
• Java 1.4 also supports an open-source parser.
April 29th, 2003 Organizing and Searching Information with XML 11
XML Example Revisited
<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
• Markup for the data aids understanding of its purpose.
• A flat text file is not nearly so clear.
Alice Lee
alee@aol.com
212-346-1234
1985-03-22
• The last line looks like a date, but what is it for?
April 29th, 2003 Organizing and Searching Information with XML 12
Expanded Example
<?xml version = “1.0” ?>
<address>
<name>
<first>Alice</first>
<last>Lee</last>
</name>
<email>alee@aol.com</email>
<phone>123-45-6789</phone>
<birthday>
<year>1983</year>
<month>07</month>
<day>15</day>
</birthday>
</address>
April 29th, 2003 Organizing and Searching Information with XML 13
But then – what is it?
XML is a meta markup language
for text documents / textual data
XML allows to define languages
(„applications“) to represent text
documents / textual data
April 29th, 2003 Organizing and Searching Information with XML 14
XML by Example
<article>
<author>Gerhard Weikum</author>
<title>The Web in 10 Years</title>
</article>
• Easy to understand for human users
• Very expressive (semantics along with the data)
• Well structured, easy to read and write from programs
This looks nice, but…
April 29th, 2003 Organizing and Searching Information with XML 15
XML by Example
<t108>
<x87>Gerhard Weikum</x87>
<g10>The Web in 10 Years</g10>
</t108>
• Hard to understand for human users
• Not expressive (no semantics along with the data)
• Well structured, easy to read and write from programs
… this is XML, too:
April 29th, 2003 Organizing and Searching Information with XML 16
XML by Example
<data>
ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983
</data>
• Impossible to understand for human users
• Not expressive (no semantics along with the data)
• Unstructured, read and write only with special programs
… and what about this XML document:
The actual benefit of using XML highly depends
on the design of the application.
April 29th, 2003 Organizing and Searching Information with XML 17
Possible Advantages of Using XML
• Truly Portable Data
• Easily readable by human users
• Very expressive (semantics near data)
• Very flexible and customizable (no finite tag set)
• Easy to use from programs (libs available)
• Easy to convert into other representations
(XML transformation languages)
• Many additional standards and tools
• Widely used and supported
April 29th, 2003 Organizing and Searching Information with XML 18
App. Scenario 1: Content Mgt.
Database with
XML documents
Clients
Converters
XML2HTML XML2WML XML2PDF
April 29th, 2003 Organizing and Searching Information with XML 19
App. Scenario 2: Data Exchange
Legacy
System
(e.g., SAP
R/2)
Legacy
System
(e.g.,
Cobol)
XML
Adapter
XML
Adapter
XML
(BMECat, ebXML, RosettaNet, BizTalk, …)
Su
Buyer
Order
April 29th, 2003 Organizing and Searching Information with XML 20
App. Scenario 3: XML for Metadata
<rdf:RDF
<rdf:Description rdf:about="http://www-dbs/Sch03.pdf">
<dc:title>A Framework for…</dc:title>
<dc:creator>Ralf Schenkel</dc:creator>
<dc:description>While there are...</dc:description>
<dc:publisher>Saarland University</dc:publisher>
<dc:subject>XML Indexing</dc:subject>
<dc:rights>Copyright ...</dc:rights>
<dc:type>Electronic Document</dc:type>
<dc:format>text/pdf</dc:format>
<dc:language>en</dc:language>
</rdf:Description>
</rdf:RDF>
April 29th, 2003 Organizing and Searching Information with XML 21
App. Scenario 4: Document Markup
<article>
<section id=„1“ title=„Intro“>
This article is about <index>XML</index>.
</section>
<section id=„2“ title=„Main Results“>
<name>Weikum</name> <cite idref=„Weik01“/> shows
the following theorem (see Section <ref idref=„1“/>)
<theorem id=„theo:1“ source=„Weik01“>
For any XML document x, ...
</theorem>
</section>
<literature>
<cite id=„Weik01“><author>Weikum</author></cite>
</literature>
</article>
April 29th, 2003 Organizing and Searching Information with XML 22
App. Scenario 4: Document Markup
• Document Markup adds structural and semantic
information to documents, e.g.
– Sections, Subsections, Theorems, …
– Cross References
– Literature Citations
– Index Entries
– Named Entities
• This allows queries like
– Which articles cite Weikum‘s XML paper from 2001?
– Which articles talk about (the named entity) „Weikum“?
April 29th, 2003 Organizing and Searching Information with XML 23
XML for Beginners
Part 2 – Basic XML Concepts
2.1 XML Standards by the W3C
2.2 XML Documents
2.3 Namespaces
April 29th, 2003 Organizing and Searching Information with XML 24
2.1 XML Standards – an Overview
• XML Core Working Group:
– XML 1.0 (Feb 1998), 1.1 (candidate for recommendation)
– XML Namespaces (Jan 1999)
– XML Inclusion (candidate for recommendation)
• XSLT Working Group:
– XSL Transformations 1.0 (Nov 1999), 2.0 planned
– XPath 1.0 (Nov 1999), 2.0 planned
– eXtensible Stylesheet Language XSL(-FO) 1.0 (Oct 2001)
• XML Linking Working Group:
– XLink 1.0 (Jun 2001)
– XPointer 1.0 (March 2003, 3 substandards)
• XQuery 1.0 (Nov 2002) plus many substandards
• XMLSchema 1.0 (May 2001)
• …
April 29th, 2003 Organizing and Searching Information with XML 25
2.2 XML Documents
What‘s in an XML document?
• Elements
• Attributes
• plus some other details
(see the Lecture if you want to know this)
April 29th, 2003 Organizing and Searching Information with XML 26
A Simple XML Document
<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
April 29th, 2003 Organizing and Searching Information with XML 27
A Simple XML Document
<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
Freely definable tags
April 29th, 2003 Organizing and Searching Information with XML 28
Element
Content of
the Element
(Subelements
and/or Text)
A Simple XML Document
<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
End Tag
Start Tag
April 29th, 2003 Organizing and Searching Information with XML 29
A Simple XML Document
<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
Attributes with
name and value
April 29th, 2003 Organizing and Searching Information with XML 30
Elements in XML Documents
• (Freely definable) tags: article, title, author
– with start tag: <article> etc.
– and end tag: </article> etc.
• Elements: <article> ... </article>
• Elements have a name (article) and a content (...)
• Elements may be nested.
• Elements may be empty: <this_is_empty/>
• Element content is typically parsed character data (PCDATA), i.e.,
strings with special characters, and/or nested elements (mixed
content if both).
• Each XML document has exactly one root element and forms a
tree.
• Elements with a common parent are ordered.
April 29th, 2003 Organizing and Searching Information with XML 31
Elements vs. Attributes
Elements may have attributes (in the start tag) that have a name and
a value, e.g. <section number=“1“>.
What is the difference between elements and attributes?
• Only one attribute with a given name per element (but an arbitrary
number of subelements)
• Attributes have no structure, simply strings (while elements can
have subelements)
As a rule of thumb:
• Content into elements
• Metadata into attributes
Example:
<person born=“1912-06-23“ died=“1954-06-07“>
Alan Turing</person> proved that…
April 29th, 2003 Organizing and Searching Information with XML 32
XML Documents as Ordered Trees
article
author title text
section
abstract
The index
Web
provides …
title=“…“
number=“1“
In order …
Gerhard
Weikum
The Web
in 10 years
April 29th, 2003 Organizing and Searching Information with XML 33
More on XML Syntax
• Some special characters must be escaped using entities:
< → &lt;
& → &amp;
(will be converted back when reading the XML doc)
• Some other characters may be escaped, too:
> → &gt;
“ → &quot;
‘ → &apos;
April 29th, 2003 Organizing and Searching Information with XML 34
Well-Formed XML Documents
A well-formed document must adher to, among others, the
following rules:
• Every start tag has a matching end tag.
• Elements may nest, but must not overlap.
• There must be exactly one root element.
• Attribute values must be quoted.
• An element may not have two attributes with the same
name.
• Comments and processing instructions may not appear
inside tags.
• No unescaped < or & signs may occur inside character
data.
April 29th, 2003 Organizing and Searching Information with XML 35
Well-Formed XML Documents
A well-formed document must adher to, among others, the
following rules:
• Every start tag has a matching end tag.
• Elements may nest, but must not overlap.
• There must be exactly one root element.
• Attribute values must be quoted.
• An element may not have to attributes with the same
name.
• Comments and processing instructions may not appear
inside tags.
• No unescaped < or & signs may occur inside character
data.
Only well-formed documents
can be processed by XML
parsers.
April 29th, 2003 Organizing and Searching Information with XML 36
2.3 Namespaces
<library>
<description>Library of the CS Department</description>
<book bid=“HandMS2000“>
<title>Principles of Data Mining</title>
<description>
Short introduction to <em>data mining</em>, useful
for the IRDM course
</description>
</book>
</library>
Semantics of the description element is ambigous
Content may be defined differently
Renaming may be impossible (standards!)
 Disambiguation of separate XML applications using
unique prefixes
April 29th, 2003 Organizing and Searching Information with XML 37
Namespace Syntax
<dbs:book xmlns:dbs=“http://www-dbs/dbs“>
Unique URI to identify
the namespace
Signal that namespace
definition happens
Prefix as abbrevation
of URI
April 29th, 2003 Organizing and Searching Information with XML 38
Namespace Example
<dbs:book xmlns:dbs=“http://www-dbs/dbs“>
<dbs:description> ... </dbs:description>
<dbs:text>
<dbs:formula>
<mathml:math
xmlns:mathml=“http://www.w3.org/1998/Math/MathML“>
...
</mathml:math>
</dbs:formula>
</dbs:text>
</dbs:book>
April 29th, 2003 Organizing and Searching Information with XML 39
Default Namespace
• Default namespace may be set for an element and its
content (but not its attributes):
<book xmlns=“http://www-dbs/dbs“>
<description>...</description>
<book>
• Can be overridden in the elements by specifying the
namespace there (using prefix or default namespace)
April 29th, 2003 Organizing and Searching Information with XML 40
XML for Beginners
Part 3 – Defining XML Data Formats
3.1 Document Type Definitions
3.2 XML Schema (very short)
April 29th, 2003 Organizing and Searching Information with XML 41
3.1 Document Type Definitions
Sometimes XML is too flexible:
• Most Programs can only process a subset of all possible
XML applications
• For exchanging data, the format (i.e., elements,
attributes and their semantics) must be fixed
Document Type Definitions (DTD) for establishing the
vocabulary for one XML application (in some sense
comparable to schemas in databases)
A document is valid with respect to a DTD if it conforms
to the rules specified in that DTD.
Most XML parsers can be configured to validate.
April 29th, 2003 Organizing and Searching Information with XML 42
DTD Example: Elements
<!ELEMENT article (title,author+,text)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT text (abstract,section*,literature?)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT section (#PCDATA|index)+>
<!ELEMENT literature (#PCDATA)>
<!ELEMENT index (#PCDATA)>
Content of the title element
is parsed character data
Content of the article element is a title element,
followed by one or more author elements,
followed by a text element
Content of the text element may
contain zero or more section
elements in this position
April 29th, 2003 Organizing and Searching Information with XML 43
Element Declarations in DTDs
One element declaration for each element type:
<!ELEMENT element_name content_specification>
where content_specification can be
• (#PCDATA) parsed character data
• (child) one child element
• (c1,…,cn) a sequence of child elements c1…cn
• (c1|…|cn) one of the elements c1…cn
For each component c, possible counts can be specified:
– c exactly one such element
– c+ one or more
– c* zero or more
– c? zero or one
Plus arbitrary combinations using parenthesis:
<!ELEMENT f ((a|b)*,c+,(d|e))*>
April 29th, 2003 Organizing and Searching Information with XML 44
More on Element Declarations
• Elements with mixed content:
<!ELEMENT text (#PCDATA|index|cite|glossary)*>
• Elements with empty content:
<!ELEMENT image EMPTY>
• Elements with arbitrary content (this is nothing for
production-level DTDs):
<!ELEMENT thesis ANY>
April 29th, 2003 Organizing and Searching Information with XML 45
Attribute Declarations in DTDs
Attributes are declared per element:
<!ATTLIST section number CDATA #REQUIRED
title CDATA #REQUIRED>
declares two required attributes for element section.
element name
attribute name
attribute type
attribute default
April 29th, 2003 Organizing and Searching Information with XML 46
Attribute Declarations in DTDs
Attributes are declared per element:
<!ATTLIST section number CDATA #REQUIRED
title CDATA #REQUIRED>
declares two required attributes for element section.
Possible attribute defaults:
• #REQUIRED is required in each element instance
• #IMPLIED is optional
• #FIXED default always has this default value
• default has this default value if the attribute is
omitted from the element instance
April 29th, 2003 Organizing and Searching Information with XML 47
Attribute Types in DTDs
• CDATA string data
• (A1|…|An) enumeration of all possible values of the
attribute (each is XML name)
• ID unique XML name to identify the element
• IDREF refers to ID attribute of some other element
(„intra-document link“)
• IDREFS list of IDREF, separated by white space
• plus some more
April 29th, 2003 Organizing and Searching Information with XML 48
Attribute Examples
<ATTLIST publication type (journal|inproceedings) #REQUIRED
pubid ID #REQUIRED>
<ATTLIST cite cid IDREF #REQUIRED>
<ATTLIST citation ref IDREF #IMPLIED
cid ID #REQUIRED>
<publications>
<publication type=“journal“ pubid=“Weikum01“>
<author>Gerhard Weikum</author>
<text>In the Web of 2010, XML <cite cid=„12“/>...</text>
<citation cid=„12“ ref=„XML98“/>
<citation cid=„15“>...</citation>
</publication>
<publication type=“inproceedings“ pubid=“XML98“>
<text>XML, the extended Markup Language, ...</text>
</publication>
</publications>
April 29th, 2003 Organizing and Searching Information with XML 49
Attribute Examples
<ATTLIST publication type (journal|inproceedings) #REQUIRED
pubid ID #REQUIRED>
<ATTLIST cite cid IDREF #REQUIRED>
<ATTLIST citation ref IDREF #IMPLIED
cid ID #REQUIRED>
<publications>
<publication type=“journal“ pubid=“Weikum01“>
<author>Gerhard Weikum</author>
<text>In the Web of 2010, XML <cite cid=„12“/>...</text>
<citation cid=„12“ ref=„XML98“/>
<citation cid=„15“>...</citation>
</publication>
<publication type=“inproceedings“ pubid=“XML98“>
<text>XML, the extended Markup Language, ...</text>
</publication>
</publications>
April 29th, 2003 Organizing and Searching Information with XML 50
Linking DTD and XML Docs
• Document Type Declaration in the XML document:
<!DOCTYPE article SYSTEM “http://www-dbs/article.dtd“>
keywords Root element URI for the DTD
April 29th, 2003 Organizing and Searching Information with XML 51
Linking DTD and XML Docs
• Internal DTD:
<?xml version=“1.0“?>
<!DOCTYPE article [
<!ELEMENT article (title,author+,text)>
...
<!ELEMENT index (#PCDATA)>
]>
<article>
...
</article>
• Both ways can be mixed, internal DTD overwrites
external entity information:
<!DOCTYPE article SYSTEM „article.dtd“ [
<!ENTITY % pub_content (title+,author*,text)
]>
April 29th, 2003 Organizing and Searching Information with XML 52
Flaws of DTDs
• No support for basic data types like integers, doubles,
dates, times, …
• No structured, self-definable data types
• No type derivation
• id/idref links are quite loose (target is not specified)
 XML Schema
April 29th, 2003 Organizing and Searching Information with XML 53
3.2 XML Schema Basics
• XML Schema is an XML application
• Provides simple types (string, integer, dateTime,
duration, language, …)
• Allows defining possible values for elements
• Allows defining types derived from existing types
• Allows defining complex types
• Allows posing constraints on the occurrence of elements
• Allows forcing uniqueness and foreign keys
• Way too complex to cover in an introductory talk
April 29th, 2003 Organizing and Searching Information with XML 54
Simplified XML Schema Example
<xs:schema>
<xs:element name=“article“>
<xs:complexType>
<xs:sequence>
<xs:element name=“author“ type=“xs:string“/>
<xs:element name=“title“ type=“xs:string“/>
<xs:element name=“text“>
<xs:complexType>
<xs:sequence>
<xs:element name=“abstract“ type=“xs:string“/>
<xs:element name=“section“ type=“xs:string“
minOccurs=“0“ maxOccurs=“unbounded“/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
April 29th, 2003 Organizing and Searching Information with XML 55
XML for Beginners
Part 4 – Querying XML Data
4.1 XPath
4.2 XQuery
April 29th, 2003 Organizing and Searching Information with XML 56
Querying XML with XPath and XQuery
XPath and XQuery are query languages for XML data, both
standardized by the W3C and supported by various database products.
Their search capabilities include
• logical conditions over element and attribute content
(first-order predicate logic a la SQL; simple conditions only in XPath)
• regular expressions for pattern matching of element names
along paths or subtrees within XML data
+ joins, grouping, aggregation, transformation, etc. (XQuery only)
In contrast to database query languages like SQL an XML query
does not necessarily (need to) know a fixed structural schema
for the underlying data.
A query result is a set of qualifying nodes, paths, subtrees,
or subgraphs from the underyling data graph,
or a set of XML documents constructed from this raw result.
April 29th, 2003 Organizing and Searching Information with XML 57
4.1 XPath
• XPath is a simple language to identify parts of the XML
document (for further processing)
• XPath operates on the tree representation of the
document
• Result of an XPath expression is a set of elements or
attributes
• Discuss abbreviated version of XPath
April 29th, 2003 Organizing and Searching Information with XML 58
Elements of XPath
• An XPath expression usually is a location path that
consists of location steps, separated by /:
/article/text/abstract: selects all abstract elements
• A leading / always means the root element
• Each location step is evaluated in the context of a node
in the tree, the so-called context node
• Possible location steps:
– child element x: select all child elements with name x
– Attribute @x: select all attributes with name x
– Wildcards * (any child), @* (any attribute)
– Multiple matches, separated by |: x|y|z
April 29th, 2003 Organizing and Searching Information with XML 59
Combining Location Steps
• Standard: / (context node is the result of the preceding
location step)
article/text/abstract (all the abstract nodes of articles)
• Select any descendant, not only children: //
article//index (any index element in articles)
• Select the parent element: ..
• Select the content node: .
The latter two are important when using predicates.
April 29th, 2003 Organizing and Searching Information with XML 60
Predicates in Location Steps
• Added with [] to the location step
• Used to restricts elements that qualify as result of a
location step to those that fulfil the predicate:
– a[b] elements a that have a subelement b
– a[@d] elements a that have an attribute d
– Plus conditions on content/value:
• a[b=„c“]
• A[@d>7]
• <, <=, >=, !=, …
April 29th, 2003 Organizing and Searching Information with XML 61
XPath by Example
/literature/book/author retrieves all book authors:
starting with the root, traverses the tree, matches element
names literature, book, author, and returns elements
<author>Suciu, Dan</author>,
<author>Abiteboul, Serge</author>, ...,
<author><firstname>Jeff</firstname>
<lastname>Ullman</lastname></author>
/literature/*/author authors of books, articles, essays, etc.
/literature//author authors that are descendants of literature
/literature//@year value of the year attribute of descendants of literature
/literature//author[firstname] authors that have a subelement firstname
/literature/(book|article)/author authors of books or articles
/literature/book[price < „50“]
/literature/book[author//country = „Germany“]
low priced books
books with German author
April 29th, 2003 Organizing and Searching Information with XML 62
4.2 Core Concepts of XQuery
XQuery is an extremely powerful query language for XML data.
A query has the form of a so-called FLWR expression:
FOR $var1 IN expr1, $var2 IN expr2, ...
LET $var3 := expr3, $var4 := expr4, ...
WHERE condition
RETURN result-doc-construction
The FOR clause evaluates expressions (which may be XPath-style
path expressions) and binds the resulting elements to variables.
For a given binding each variable denotes exactly one element.
The LET clause binds entire sequences of elements to variables.
The WHERE clause evaluates a logical condition with each of
the possible variable bindings and selects those bindings that
satisfy the condition.
The RETURN clause constructs, from each of the variable bindings,
an XML result tree. This may involve grouping and aggregation
and even complete subqueries.
April 29th, 2003 Organizing and Searching Information with XML 63
XQuery Examples
// find Web-related articles by Dan Suciu from the year 1998
<results> {
FOR $a IN document(“literature.xml“)//article
FOR $n IN $a//author, $t IN $a/title
WHERE $a/@year = “1998“
AND contains($n, “Suciu“) AND contains($t, “Web“)
RETURN <result> $n $t </result> } </results>
// find articles co-authored by authors who have jointly written a book after 1995
<results> {
FOR $a IN document(“literature.xml“)//article
FOR $a1 IN $a//author, $a2 IN $a//author
WHERE SOME $b IN document(“literature.xml“)//book SATISFIES
$b//author = $a1 AND $b//author = $a2 AND $b/@year>“1995“
RETURN <result> $a1 $a2 <wrote> $a </wrote> </result> }
</results>
April 29th, 2003 Organizing and Searching Information with XML 64
Summary and Outlook
You should give one, I won‘t.

More Related Content

Similar to Unit_2_Xml.ppt (20)

Introduction to XML.ppt
Introduction to XML.pptIntroduction to XML.ppt
Introduction to XML.ppt
 
XML
XMLXML
XML
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML1.pptx
XML1.pptxXML1.pptx
XML1.pptx
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
XML
XMLXML
XML
 
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
 
Unit iv xml dom
Unit iv xml domUnit iv xml dom
Unit iv xml dom
 
Full xml
Full xmlFull xml
Full xml
 
CrashCourse: XML technologies
CrashCourse: XML technologiesCrashCourse: XML technologies
CrashCourse: XML technologies
 
00 introduction
00 introduction00 introduction
00 introduction
 
Xml
XmlXml
Xml
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling ApproachXML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling Approach
 
Xml
XmlXml
Xml
 
Introduce to XML
Introduce to XMLIntroduce to XML
Introduce to XML
 
Xml passing in java
Xml passing in javaXml passing in java
Xml passing in java
 
xml
xmlxml
xml
 
02 well formed and valid documents
02 well formed and valid documents02 well formed and valid documents
02 well formed and valid documents
 

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

Unit_2_Xml.ppt

  • 1. April 29th, 2003 Organizing and Searching Information with XML 1 What is XML • XML stands for eXtensible Markup Language. • A markup language is used to provide information about a document. • Tags are added to the document to provide the extra information. • HTML tags tell a browser how to display the document. • XML tags give a reader some idea what some of the data means.
  • 2. April 29th, 2003 Organizing and Searching Information with XML 2 What is XML Used For? • XML documents are used to transfer data from one place to another often over the Internet. • XML subsets are designed for particular applications. • One is RSS (Rich Site Summary or Really Simple Syndication ). It is used to send breaking news bulletins from one web site to another. • A number of fields have their own subsets. These include chemistry, mathematics, and books publishing. • Most of these subsets are registered with the W3Consortium and are available for anyone’s use.
  • 3. April 29th, 2003 Organizing and Searching Information with XML 3 Advantages of XML • XML is text (Unicode) based. – Takes up less space. – Can be transmitted efficiently. • One XML document can be displayed differently in different media. – Html, video, CD, DVD, – You only have to change the XML document in order to change all the rest. • XML documents can be modularized. Parts can be reused.
  • 4. April 29th, 2003 Organizing and Searching Information with XML 4 Example of an HTML Document <html> <head><title>Example</title></head. <body> <h1>This is an example of a page.</h1> <h2>Some information goes here.</h2> </body> </html>
  • 5. April 29th, 2003 Organizing and Searching Information with XML 5 Example of an XML Document <?xml version=“1.0”/> <address> <name>Alice Lee</name> <email>alee@aol.com</email> <phone>212-346-1234</phone> <birthday>1985-03-22</birthday> </address>
  • 6. April 29th, 2003 Organizing and Searching Information with XML 6 Difference Between HTML and XML • HTML tags have a fixed meaning and browsers know what it is. • XML tags are different for different applications, and users know what they mean. • HTML tags are used for display. • XML tags are used to describe documents and data.
  • 7. April 29th, 2003 Organizing and Searching Information with XML 7 XML Rules • Tags are enclosed in angle brackets. • Tags come in pairs with start-tags and end-tags. • Tags must be properly nested. – <name><email>…</name></email> is not allowed. – <name><email>…</email><name> is. • Tags that do not have end-tags must be terminated by a ‘/’. – <br /> is an html example.
  • 8. April 29th, 2003 Organizing and Searching Information with XML 8 More XML Rules • Tags are case sensitive. – <address> is not the same as <Address> • XML in any combination of cases is not allowed as part of a tag. • Tags may not contain ‘<‘ or ‘&’. • Tags follow Java naming conventions, except that a single colon and other characters are allowed. They must begin with a letter and may not contain white space. • Documents must have a single root tag that begins the document.
  • 9. April 29th, 2003 Organizing and Searching Information with XML 9 Encoding • XML (like Java) uses Unicode to encode characters. • Unicode comes in many flavors. The most common one used in the West is UTF-8. • UTF-8 is a variable length code. Characters are encoded in 1 byte, 2 bytes, or 4 bytes. • The first 128 characters in Unicode are ASCII. • In UTF-8, the numbers between 128 and 255 code for some of the more common characters used in western Europe, such as ã, á, å, or ç. • Two byte codes are used for some characters not listed in the first 256 and some Asian ideographs. • Four byte codes can handle any ideographs that are left. • Those using non-western languages should investigate other versions of Unicode.
  • 10. April 29th, 2003 Organizing and Searching Information with XML 10 Well-Formed Documents • An XML document is said to be well-formed if it follows all the rules. • An XML parser is used to check that all the rules have been obeyed. • Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML parsers. • Parsers are also available for free download over the Internet. One is Xerces, from the Apache open-source project. • Java 1.4 also supports an open-source parser.
  • 11. April 29th, 2003 Organizing and Searching Information with XML 11 XML Example Revisited <?xml version=“1.0”/> <address> <name>Alice Lee</name> <email>alee@aol.com</email> <phone>212-346-1234</phone> <birthday>1985-03-22</birthday> </address> • Markup for the data aids understanding of its purpose. • A flat text file is not nearly so clear. Alice Lee alee@aol.com 212-346-1234 1985-03-22 • The last line looks like a date, but what is it for?
  • 12. April 29th, 2003 Organizing and Searching Information with XML 12 Expanded Example <?xml version = “1.0” ?> <address> <name> <first>Alice</first> <last>Lee</last> </name> <email>alee@aol.com</email> <phone>123-45-6789</phone> <birthday> <year>1983</year> <month>07</month> <day>15</day> </birthday> </address>
  • 13. April 29th, 2003 Organizing and Searching Information with XML 13 But then – what is it? XML is a meta markup language for text documents / textual data XML allows to define languages („applications“) to represent text documents / textual data
  • 14. April 29th, 2003 Organizing and Searching Information with XML 14 XML by Example <article> <author>Gerhard Weikum</author> <title>The Web in 10 Years</title> </article> • Easy to understand for human users • Very expressive (semantics along with the data) • Well structured, easy to read and write from programs This looks nice, but…
  • 15. April 29th, 2003 Organizing and Searching Information with XML 15 XML by Example <t108> <x87>Gerhard Weikum</x87> <g10>The Web in 10 Years</g10> </t108> • Hard to understand for human users • Not expressive (no semantics along with the data) • Well structured, easy to read and write from programs … this is XML, too:
  • 16. April 29th, 2003 Organizing and Searching Information with XML 16 XML by Example <data> ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983 </data> • Impossible to understand for human users • Not expressive (no semantics along with the data) • Unstructured, read and write only with special programs … and what about this XML document: The actual benefit of using XML highly depends on the design of the application.
  • 17. April 29th, 2003 Organizing and Searching Information with XML 17 Possible Advantages of Using XML • Truly Portable Data • Easily readable by human users • Very expressive (semantics near data) • Very flexible and customizable (no finite tag set) • Easy to use from programs (libs available) • Easy to convert into other representations (XML transformation languages) • Many additional standards and tools • Widely used and supported
  • 18. April 29th, 2003 Organizing and Searching Information with XML 18 App. Scenario 1: Content Mgt. Database with XML documents Clients Converters XML2HTML XML2WML XML2PDF
  • 19. April 29th, 2003 Organizing and Searching Information with XML 19 App. Scenario 2: Data Exchange Legacy System (e.g., SAP R/2) Legacy System (e.g., Cobol) XML Adapter XML Adapter XML (BMECat, ebXML, RosettaNet, BizTalk, …) Su Buyer Order
  • 20. April 29th, 2003 Organizing and Searching Information with XML 20 App. Scenario 3: XML for Metadata <rdf:RDF <rdf:Description rdf:about="http://www-dbs/Sch03.pdf"> <dc:title>A Framework for…</dc:title> <dc:creator>Ralf Schenkel</dc:creator> <dc:description>While there are...</dc:description> <dc:publisher>Saarland University</dc:publisher> <dc:subject>XML Indexing</dc:subject> <dc:rights>Copyright ...</dc:rights> <dc:type>Electronic Document</dc:type> <dc:format>text/pdf</dc:format> <dc:language>en</dc:language> </rdf:Description> </rdf:RDF>
  • 21. April 29th, 2003 Organizing and Searching Information with XML 21 App. Scenario 4: Document Markup <article> <section id=„1“ title=„Intro“> This article is about <index>XML</index>. </section> <section id=„2“ title=„Main Results“> <name>Weikum</name> <cite idref=„Weik01“/> shows the following theorem (see Section <ref idref=„1“/>) <theorem id=„theo:1“ source=„Weik01“> For any XML document x, ... </theorem> </section> <literature> <cite id=„Weik01“><author>Weikum</author></cite> </literature> </article>
  • 22. April 29th, 2003 Organizing and Searching Information with XML 22 App. Scenario 4: Document Markup • Document Markup adds structural and semantic information to documents, e.g. – Sections, Subsections, Theorems, … – Cross References – Literature Citations – Index Entries – Named Entities • This allows queries like – Which articles cite Weikum‘s XML paper from 2001? – Which articles talk about (the named entity) „Weikum“?
  • 23. April 29th, 2003 Organizing and Searching Information with XML 23 XML for Beginners Part 2 – Basic XML Concepts 2.1 XML Standards by the W3C 2.2 XML Documents 2.3 Namespaces
  • 24. April 29th, 2003 Organizing and Searching Information with XML 24 2.1 XML Standards – an Overview • XML Core Working Group: – XML 1.0 (Feb 1998), 1.1 (candidate for recommendation) – XML Namespaces (Jan 1999) – XML Inclusion (candidate for recommendation) • XSLT Working Group: – XSL Transformations 1.0 (Nov 1999), 2.0 planned – XPath 1.0 (Nov 1999), 2.0 planned – eXtensible Stylesheet Language XSL(-FO) 1.0 (Oct 2001) • XML Linking Working Group: – XLink 1.0 (Jun 2001) – XPointer 1.0 (March 2003, 3 substandards) • XQuery 1.0 (Nov 2002) plus many substandards • XMLSchema 1.0 (May 2001) • …
  • 25. April 29th, 2003 Organizing and Searching Information with XML 25 2.2 XML Documents What‘s in an XML document? • Elements • Attributes • plus some other details (see the Lecture if you want to know this)
  • 26. April 29th, 2003 Organizing and Searching Information with XML 26 A Simple XML Document <article> <author>Gerhard Weikum</author> <title>The Web in Ten Years</title> <text> <abstract>In order to evolve...</abstract> <section number=“1” title=“Introduction”> The <index>Web</index> provides the universal... </section> </text> </article>
  • 27. April 29th, 2003 Organizing and Searching Information with XML 27 A Simple XML Document <article> <author>Gerhard Weikum</author> <title>The Web in Ten Years</title> <text> <abstract>In order to evolve...</abstract> <section number=“1” title=“Introduction”> The <index>Web</index> provides the universal... </section> </text> </article> Freely definable tags
  • 28. April 29th, 2003 Organizing and Searching Information with XML 28 Element Content of the Element (Subelements and/or Text) A Simple XML Document <article> <author>Gerhard Weikum</author> <title>The Web in Ten Years</title> <text> <abstract>In order to evolve...</abstract> <section number=“1” title=“Introduction”> The <index>Web</index> provides the universal... </section> </text> </article> End Tag Start Tag
  • 29. April 29th, 2003 Organizing and Searching Information with XML 29 A Simple XML Document <article> <author>Gerhard Weikum</author> <title>The Web in Ten Years</title> <text> <abstract>In order to evolve...</abstract> <section number=“1” title=“Introduction”> The <index>Web</index> provides the universal... </section> </text> </article> Attributes with name and value
  • 30. April 29th, 2003 Organizing and Searching Information with XML 30 Elements in XML Documents • (Freely definable) tags: article, title, author – with start tag: <article> etc. – and end tag: </article> etc. • Elements: <article> ... </article> • Elements have a name (article) and a content (...) • Elements may be nested. • Elements may be empty: <this_is_empty/> • Element content is typically parsed character data (PCDATA), i.e., strings with special characters, and/or nested elements (mixed content if both). • Each XML document has exactly one root element and forms a tree. • Elements with a common parent are ordered.
  • 31. April 29th, 2003 Organizing and Searching Information with XML 31 Elements vs. Attributes Elements may have attributes (in the start tag) that have a name and a value, e.g. <section number=“1“>. What is the difference between elements and attributes? • Only one attribute with a given name per element (but an arbitrary number of subelements) • Attributes have no structure, simply strings (while elements can have subelements) As a rule of thumb: • Content into elements • Metadata into attributes Example: <person born=“1912-06-23“ died=“1954-06-07“> Alan Turing</person> proved that…
  • 32. April 29th, 2003 Organizing and Searching Information with XML 32 XML Documents as Ordered Trees article author title text section abstract The index Web provides … title=“…“ number=“1“ In order … Gerhard Weikum The Web in 10 years
  • 33. April 29th, 2003 Organizing and Searching Information with XML 33 More on XML Syntax • Some special characters must be escaped using entities: < → &lt; & → &amp; (will be converted back when reading the XML doc) • Some other characters may be escaped, too: > → &gt; “ → &quot; ‘ → &apos;
  • 34. April 29th, 2003 Organizing and Searching Information with XML 34 Well-Formed XML Documents A well-formed document must adher to, among others, the following rules: • Every start tag has a matching end tag. • Elements may nest, but must not overlap. • There must be exactly one root element. • Attribute values must be quoted. • An element may not have two attributes with the same name. • Comments and processing instructions may not appear inside tags. • No unescaped < or & signs may occur inside character data.
  • 35. April 29th, 2003 Organizing and Searching Information with XML 35 Well-Formed XML Documents A well-formed document must adher to, among others, the following rules: • Every start tag has a matching end tag. • Elements may nest, but must not overlap. • There must be exactly one root element. • Attribute values must be quoted. • An element may not have to attributes with the same name. • Comments and processing instructions may not appear inside tags. • No unescaped < or & signs may occur inside character data. Only well-formed documents can be processed by XML parsers.
  • 36. April 29th, 2003 Organizing and Searching Information with XML 36 2.3 Namespaces <library> <description>Library of the CS Department</description> <book bid=“HandMS2000“> <title>Principles of Data Mining</title> <description> Short introduction to <em>data mining</em>, useful for the IRDM course </description> </book> </library> Semantics of the description element is ambigous Content may be defined differently Renaming may be impossible (standards!)  Disambiguation of separate XML applications using unique prefixes
  • 37. April 29th, 2003 Organizing and Searching Information with XML 37 Namespace Syntax <dbs:book xmlns:dbs=“http://www-dbs/dbs“> Unique URI to identify the namespace Signal that namespace definition happens Prefix as abbrevation of URI
  • 38. April 29th, 2003 Organizing and Searching Information with XML 38 Namespace Example <dbs:book xmlns:dbs=“http://www-dbs/dbs“> <dbs:description> ... </dbs:description> <dbs:text> <dbs:formula> <mathml:math xmlns:mathml=“http://www.w3.org/1998/Math/MathML“> ... </mathml:math> </dbs:formula> </dbs:text> </dbs:book>
  • 39. April 29th, 2003 Organizing and Searching Information with XML 39 Default Namespace • Default namespace may be set for an element and its content (but not its attributes): <book xmlns=“http://www-dbs/dbs“> <description>...</description> <book> • Can be overridden in the elements by specifying the namespace there (using prefix or default namespace)
  • 40. April 29th, 2003 Organizing and Searching Information with XML 40 XML for Beginners Part 3 – Defining XML Data Formats 3.1 Document Type Definitions 3.2 XML Schema (very short)
  • 41. April 29th, 2003 Organizing and Searching Information with XML 41 3.1 Document Type Definitions Sometimes XML is too flexible: • Most Programs can only process a subset of all possible XML applications • For exchanging data, the format (i.e., elements, attributes and their semantics) must be fixed Document Type Definitions (DTD) for establishing the vocabulary for one XML application (in some sense comparable to schemas in databases) A document is valid with respect to a DTD if it conforms to the rules specified in that DTD. Most XML parsers can be configured to validate.
  • 42. April 29th, 2003 Organizing and Searching Information with XML 42 DTD Example: Elements <!ELEMENT article (title,author+,text)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT text (abstract,section*,literature?)> <!ELEMENT abstract (#PCDATA)> <!ELEMENT section (#PCDATA|index)+> <!ELEMENT literature (#PCDATA)> <!ELEMENT index (#PCDATA)> Content of the title element is parsed character data Content of the article element is a title element, followed by one or more author elements, followed by a text element Content of the text element may contain zero or more section elements in this position
  • 43. April 29th, 2003 Organizing and Searching Information with XML 43 Element Declarations in DTDs One element declaration for each element type: <!ELEMENT element_name content_specification> where content_specification can be • (#PCDATA) parsed character data • (child) one child element • (c1,…,cn) a sequence of child elements c1…cn • (c1|…|cn) one of the elements c1…cn For each component c, possible counts can be specified: – c exactly one such element – c+ one or more – c* zero or more – c? zero or one Plus arbitrary combinations using parenthesis: <!ELEMENT f ((a|b)*,c+,(d|e))*>
  • 44. April 29th, 2003 Organizing and Searching Information with XML 44 More on Element Declarations • Elements with mixed content: <!ELEMENT text (#PCDATA|index|cite|glossary)*> • Elements with empty content: <!ELEMENT image EMPTY> • Elements with arbitrary content (this is nothing for production-level DTDs): <!ELEMENT thesis ANY>
  • 45. April 29th, 2003 Organizing and Searching Information with XML 45 Attribute Declarations in DTDs Attributes are declared per element: <!ATTLIST section number CDATA #REQUIRED title CDATA #REQUIRED> declares two required attributes for element section. element name attribute name attribute type attribute default
  • 46. April 29th, 2003 Organizing and Searching Information with XML 46 Attribute Declarations in DTDs Attributes are declared per element: <!ATTLIST section number CDATA #REQUIRED title CDATA #REQUIRED> declares two required attributes for element section. Possible attribute defaults: • #REQUIRED is required in each element instance • #IMPLIED is optional • #FIXED default always has this default value • default has this default value if the attribute is omitted from the element instance
  • 47. April 29th, 2003 Organizing and Searching Information with XML 47 Attribute Types in DTDs • CDATA string data • (A1|…|An) enumeration of all possible values of the attribute (each is XML name) • ID unique XML name to identify the element • IDREF refers to ID attribute of some other element („intra-document link“) • IDREFS list of IDREF, separated by white space • plus some more
  • 48. April 29th, 2003 Organizing and Searching Information with XML 48 Attribute Examples <ATTLIST publication type (journal|inproceedings) #REQUIRED pubid ID #REQUIRED> <ATTLIST cite cid IDREF #REQUIRED> <ATTLIST citation ref IDREF #IMPLIED cid ID #REQUIRED> <publications> <publication type=“journal“ pubid=“Weikum01“> <author>Gerhard Weikum</author> <text>In the Web of 2010, XML <cite cid=„12“/>...</text> <citation cid=„12“ ref=„XML98“/> <citation cid=„15“>...</citation> </publication> <publication type=“inproceedings“ pubid=“XML98“> <text>XML, the extended Markup Language, ...</text> </publication> </publications>
  • 49. April 29th, 2003 Organizing and Searching Information with XML 49 Attribute Examples <ATTLIST publication type (journal|inproceedings) #REQUIRED pubid ID #REQUIRED> <ATTLIST cite cid IDREF #REQUIRED> <ATTLIST citation ref IDREF #IMPLIED cid ID #REQUIRED> <publications> <publication type=“journal“ pubid=“Weikum01“> <author>Gerhard Weikum</author> <text>In the Web of 2010, XML <cite cid=„12“/>...</text> <citation cid=„12“ ref=„XML98“/> <citation cid=„15“>...</citation> </publication> <publication type=“inproceedings“ pubid=“XML98“> <text>XML, the extended Markup Language, ...</text> </publication> </publications>
  • 50. April 29th, 2003 Organizing and Searching Information with XML 50 Linking DTD and XML Docs • Document Type Declaration in the XML document: <!DOCTYPE article SYSTEM “http://www-dbs/article.dtd“> keywords Root element URI for the DTD
  • 51. April 29th, 2003 Organizing and Searching Information with XML 51 Linking DTD and XML Docs • Internal DTD: <?xml version=“1.0“?> <!DOCTYPE article [ <!ELEMENT article (title,author+,text)> ... <!ELEMENT index (#PCDATA)> ]> <article> ... </article> • Both ways can be mixed, internal DTD overwrites external entity information: <!DOCTYPE article SYSTEM „article.dtd“ [ <!ENTITY % pub_content (title+,author*,text) ]>
  • 52. April 29th, 2003 Organizing and Searching Information with XML 52 Flaws of DTDs • No support for basic data types like integers, doubles, dates, times, … • No structured, self-definable data types • No type derivation • id/idref links are quite loose (target is not specified)  XML Schema
  • 53. April 29th, 2003 Organizing and Searching Information with XML 53 3.2 XML Schema Basics • XML Schema is an XML application • Provides simple types (string, integer, dateTime, duration, language, …) • Allows defining possible values for elements • Allows defining types derived from existing types • Allows defining complex types • Allows posing constraints on the occurrence of elements • Allows forcing uniqueness and foreign keys • Way too complex to cover in an introductory talk
  • 54. April 29th, 2003 Organizing and Searching Information with XML 54 Simplified XML Schema Example <xs:schema> <xs:element name=“article“> <xs:complexType> <xs:sequence> <xs:element name=“author“ type=“xs:string“/> <xs:element name=“title“ type=“xs:string“/> <xs:element name=“text“> <xs:complexType> <xs:sequence> <xs:element name=“abstract“ type=“xs:string“/> <xs:element name=“section“ type=“xs:string“ minOccurs=“0“ maxOccurs=“unbounded“/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
  • 55. April 29th, 2003 Organizing and Searching Information with XML 55 XML for Beginners Part 4 – Querying XML Data 4.1 XPath 4.2 XQuery
  • 56. April 29th, 2003 Organizing and Searching Information with XML 56 Querying XML with XPath and XQuery XPath and XQuery are query languages for XML data, both standardized by the W3C and supported by various database products. Their search capabilities include • logical conditions over element and attribute content (first-order predicate logic a la SQL; simple conditions only in XPath) • regular expressions for pattern matching of element names along paths or subtrees within XML data + joins, grouping, aggregation, transformation, etc. (XQuery only) In contrast to database query languages like SQL an XML query does not necessarily (need to) know a fixed structural schema for the underlying data. A query result is a set of qualifying nodes, paths, subtrees, or subgraphs from the underyling data graph, or a set of XML documents constructed from this raw result.
  • 57. April 29th, 2003 Organizing and Searching Information with XML 57 4.1 XPath • XPath is a simple language to identify parts of the XML document (for further processing) • XPath operates on the tree representation of the document • Result of an XPath expression is a set of elements or attributes • Discuss abbreviated version of XPath
  • 58. April 29th, 2003 Organizing and Searching Information with XML 58 Elements of XPath • An XPath expression usually is a location path that consists of location steps, separated by /: /article/text/abstract: selects all abstract elements • A leading / always means the root element • Each location step is evaluated in the context of a node in the tree, the so-called context node • Possible location steps: – child element x: select all child elements with name x – Attribute @x: select all attributes with name x – Wildcards * (any child), @* (any attribute) – Multiple matches, separated by |: x|y|z
  • 59. April 29th, 2003 Organizing and Searching Information with XML 59 Combining Location Steps • Standard: / (context node is the result of the preceding location step) article/text/abstract (all the abstract nodes of articles) • Select any descendant, not only children: // article//index (any index element in articles) • Select the parent element: .. • Select the content node: . The latter two are important when using predicates.
  • 60. April 29th, 2003 Organizing and Searching Information with XML 60 Predicates in Location Steps • Added with [] to the location step • Used to restricts elements that qualify as result of a location step to those that fulfil the predicate: – a[b] elements a that have a subelement b – a[@d] elements a that have an attribute d – Plus conditions on content/value: • a[b=„c“] • A[@d>7] • <, <=, >=, !=, …
  • 61. April 29th, 2003 Organizing and Searching Information with XML 61 XPath by Example /literature/book/author retrieves all book authors: starting with the root, traverses the tree, matches element names literature, book, author, and returns elements <author>Suciu, Dan</author>, <author>Abiteboul, Serge</author>, ..., <author><firstname>Jeff</firstname> <lastname>Ullman</lastname></author> /literature/*/author authors of books, articles, essays, etc. /literature//author authors that are descendants of literature /literature//@year value of the year attribute of descendants of literature /literature//author[firstname] authors that have a subelement firstname /literature/(book|article)/author authors of books or articles /literature/book[price < „50“] /literature/book[author//country = „Germany“] low priced books books with German author
  • 62. April 29th, 2003 Organizing and Searching Information with XML 62 4.2 Core Concepts of XQuery XQuery is an extremely powerful query language for XML data. A query has the form of a so-called FLWR expression: FOR $var1 IN expr1, $var2 IN expr2, ... LET $var3 := expr3, $var4 := expr4, ... WHERE condition RETURN result-doc-construction The FOR clause evaluates expressions (which may be XPath-style path expressions) and binds the resulting elements to variables. For a given binding each variable denotes exactly one element. The LET clause binds entire sequences of elements to variables. The WHERE clause evaluates a logical condition with each of the possible variable bindings and selects those bindings that satisfy the condition. The RETURN clause constructs, from each of the variable bindings, an XML result tree. This may involve grouping and aggregation and even complete subqueries.
  • 63. April 29th, 2003 Organizing and Searching Information with XML 63 XQuery Examples // find Web-related articles by Dan Suciu from the year 1998 <results> { FOR $a IN document(“literature.xml“)//article FOR $n IN $a//author, $t IN $a/title WHERE $a/@year = “1998“ AND contains($n, “Suciu“) AND contains($t, “Web“) RETURN <result> $n $t </result> } </results> // find articles co-authored by authors who have jointly written a book after 1995 <results> { FOR $a IN document(“literature.xml“)//article FOR $a1 IN $a//author, $a2 IN $a//author WHERE SOME $b IN document(“literature.xml“)//book SATISFIES $b//author = $a1 AND $b//author = $a2 AND $b/@year>“1995“ RETURN <result> $a1 $a2 <wrote> $a </wrote> </result> } </results>
  • 64. April 29th, 2003 Organizing and Searching Information with XML 64 Summary and Outlook You should give one, I won‘t.