2. XML
• Is the “eXtensible Markup Language”
• Became a W3C Recommendation in 1998
• Tag-based syntax, very much like HTML
• Is not a replacement for HTML
• X means eXtensible - we can make your own tag
• Revolutionizes software development
• Enables application portability
3. XML is not…
• A replacement for HTML
(but HTML can be generated from XML)
• A presentation format
(but XML can be converted into one)
• A programming language
(but it can be used with almost any language)
• A network transfer protocol
(but XML may be transferred over a network)
• A database
(but XML may be stored into a database)
4. But then – what is it?
XML is a meta markup language for
text documents / textual data
XML allows to define languages
(”applications“) to represent text
documents / textual data
5. XML by Example
<article>
<author>Ram Bahadur</author>
<title>The Web in 10 Years</title>
</article>
• Easy to understand for human users
• Very expressive (semantics along with the data)
• Well structured, easy to read and write from programs
6. XML by Example
<t108>
<x87>Ram Bahadur</x87>
<g10>The Web in 10 Years</g10>
</t108>
• Hard to understand for human users
• Not expressive (no semantics along with the data)
• Well structured, easy to read and write from programs
… this is XML, too:
7. XML by Example
<data>
ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983
</data>
• Impossible to understand for human users
• Not expressive (no semantics along with the data)
• Unstructured, read and write only with special
programs
… and what about this XML document:
The actual benefit of using XML highly depends on the design of the application.
8. Possible Advantages of Using XML
• Truly Portable Data
• Easily readable by human users
• Very flexible and customizable (no finite tag set)
• Easy to use from programs (libs available)
• Easy to convert into other representations
(XML transformation languages)
• Many additional standards and tools
• Widely used and supported
11. Application
• Document Markup adds structural and
semantic information to documents, e.g.
– Sections, Subsections, Theorems, …
– Cross References
– Literature Citations
12. XML Standards – an Overview
• XML Core Working Group:
– XML 1.0 (Feb 1998), 1.1 (candidate for recommendation)
– XML Namespaces (Jan 1999)
• XSLT Working Group:
– XSL(eXtensible Stylesheet Language) Transformations 1.0 (Nov 1999), 2.0
planned
– XPath 1.0 (Nov 1999), 2.0 planned
• XML Linking Working Group:
– XLink 1.0 (Jun 2001)
– XPointer 1.0 (March 2003)
• XQuery 1.0 (Nov 2002)
• XMLSchema 1.0 (May 2001)
14. A Simple XML Document
<article>
<author>Ram Bahadur</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
15. A Simple XML Document
<article>
<author>Ram Bahadur</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
Freely definable tags
16. Element
Content of the
Element
(Subelements
and/or Text)
A Simple XML Document<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
End Tag
Start Tag
17. A Simple XML Document<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
Attributes with
name and value
18. Elements in XML Documents
• (Freely definable) tags: article, title, author
– with start tag: <article> etc.
– and end tag: </article> etc.
• Elements: <article> ... </article>
• Elements have a name (article) and a content (...)
• Elements may be nested.
• Elements may be empty: <this_is_empty/>
• Element content is typically parsed character data (PCDATA),
i.e., strings with special characters, and/or nested elements
(mixed content if both).
• Each XML document has exactly one root element and forms
a tree.
19. XML Documents as Ordered Trees
article
author title text
sectionabstract
The index
Web
provides …
title=“…“
number=“1“
In order …
Ram
Bahadur
The Web
in 10 years
20. Well-Formed XML Documents
A well-formed document must adher to, among others,
the following rules:
• Every start tag has a matching end tag.
• Elements may nest, but must not overlap.
• There must be exactly one root element.
• Attribute values must be quoted.
• An element may not have two attributes with the same
name.
• Comments and processing instructions may not appear
inside tags.
• No unescaped < or & signs may occur inside character
data.
21. Well-Formed XML Documents
A well-formed document must adher to, among others,
the following rules:
• Every start tag has a matching end tag.
• Elements may nest, but must not overlap.
• There must be exactly one root element.
• Attribute values must be quoted.
• An element may not have to attributes with the same
name.
• Comments and processing instructions may not appear
inside tags.
• No unescaped < or & signs may occur inside character
data.
Only well-formed documents
can be processed by XML
parsers.
27. XSL
• XML also has it's own styles language – XSL
• XSL stands for Extensible Styles Language and
is a very powerful language for applying styles
to XML documents
• XSL has two parts
– a formatting language
– a transformation language.
28. • The formatting language allows to apply styles
similar to CSS
• Browser support for the XSL formatting language
is limited
• The transformation language is known as XSLT
(XSL Transformations)
• XSLT allows to transform XML document into
another form
• For example, we could use XSLT to dynamically
output some (or all) of the contents of your XML
file into an HTML document
29. Namespace
• XML allows to create our own element names,
there's always the possibility of naming an
element exactly the same as one in another XML
document
• This might be OK if we never use both documents
together
• But what if we need to combine the content of
both documents
• Would have a name conflict
– Would have two different elements, with different
purposes, both with the same name
33. Namespace Syntax
<bk:books xmlns:bk =“http://www.books.com/books_detail“>
Unique URI to identify
the namespace
Signal that namespace
definition happens
Prefix as abbrevation
of URI
34. Local Namespace
• We created a namespace to avoid a name conflict
between the elements of two documents we
wanted to combine
• When we defined the namespace, we defined it
against the root element
• This meant that the namespace was to be used
for the whole document, and we prefixed all child
elements with the same namespace.
• Can also define namespaces against a child node
• This way, we could use multiple namespaces
within the same document if required
36. Multiple Namespaces
• We could also have multiple namespaces
within your XML document
• For example, we could define one namespace
against the root element, and another against
a child element
38. XML Default Namespace
• We applied the prefix when we defined the
namespace, and we applied a prefix to each
element that referred to the namespace.
• We can also use what is known as a default
namespace within our XML documents
• The only difference is, a default namespace is
one where we don't apply a prefix
41. Problematic Characters
• Some characters have a special meaning in XML.
• For example:
– the less than sign (<) marks the beginning of a tag
– And, of course, the greater than sign (>) marks the
end of the tag.
• When the XML processor parses the document, it
looks for these characters (and others) to
determine how to interpret the document
• This is fine, as long as your data doesn't contain
any of these characters.
– But what if it does?
42. • Imagine you had the following text within an
element: 10 < 5.
• When the XML processor encounters the <, it
will assume it's the start of an opening tag.
– Problem is, it's not.
• In order to include characters such as < and &
etc, we need to use their entity reference
instead of the character itself
47. CDATA
• In XML, a CDATA section is used to escape a
block of text that would otherwise be parsed
as markup
48. Why Are CDATA Sections Useful?
• Occasionally data contains large blocks of text
with lots of potentially problematic characters
• For example,
– Data could contain a programming script
– Many progamming scripts contain characters such
as less than/greater than signs, ampersands etc,
which would cause problems for the XML
processor.
49. • CDATA allows to escape the whole block of
text
• This eliminates the need to go through the
whole script, individually replacing all the
potentially problematic characters
• The XML processor knows to escape all data
between the CDATA tags
50. CDATA Syntax
• Declare a CDATA section using
– <![CDATA[ as the opening tag
– ]]> as the closing tag