XML

XML
• Is the “eXtensible Markup Language”
• Became a W3C Recommendation in 1998
• Tag-based syntax, very much like HTML
• Is not a replacement for HTML
• X means eXtensible - we can make your own tag
• Revolutionizes software development
• Enables application portability

XML is not…
• A replacement for HTML
(but HTML can be generated from XML)
• A presentation format
(but XML can be converted into one)
• A programming language
(but it can be used with almost any language)
• A network transfer protocol
(but XML may be transferred over a network)
• A database
(but XML may be stored into a database)

But then – what is it?
XML is a meta markup language for
text documents / textual data
XML allows to define languages
(”applications“) to represent text
documents / textual data

XML by Example
<article>
<author>Ram Bahadur</author>
<title>The Web in 10 Years</title>
</article>
• Easy to understand for human users
• Very expressive (semantics along with the data)
• Well structured, easy to read and write from programs

XML by Example
<t108>
<x87>Ram Bahadur</x87>
<g10>The Web in 10 Years</g10>
</t108>
• Hard to understand for human users
• Not expressive (no semantics along with the data)
• Well structured, easy to read and write from programs
… this is XML, too:

XML by Example
<data>
ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983
</data>
• Impossible to understand for human users
• Not expressive (no semantics along with the data)
• Unstructured, read and write only with special
programs
… and what about this XML document:
The actual benefit of using XML highly depends on the design of the application.

Possible Advantages of Using XML
• Truly Portable Data
• Easily readable by human users
• Very flexible and customizable (no finite tag set)
• Easy to use from programs (libs available)
• Easy to convert into other representations
(XML transformation languages)
• Many additional standards and tools
• Widely used and supported

Application
Database with
XML documents
Clients
ConvertersXML2HTML XML2WML XML2PDF

Application
Legacy
System
(e.g., SAP
)
Legacy
System
(e.g.,
Cobol)
XML
Adapter
XML
Adapter
XML
SupplierBuyer
Order

Application
• Document Markup adds structural and
semantic information to documents, e.g.
– Sections, Subsections, Theorems, …
– Cross References
– Literature Citations

XML Standards – an Overview
• XML Core Working Group:
– XML 1.0 (Feb 1998), 1.1 (candidate for recommendation)
– XML Namespaces (Jan 1999)
• XSLT Working Group:
– XSL(eXtensible Stylesheet Language) Transformations 1.0 (Nov 1999), 2.0
planned
– XPath 1.0 (Nov 1999), 2.0 planned
• XML Linking Working Group:
– XLink 1.0 (Jun 2001)
– XPointer 1.0 (March 2003)
• XQuery 1.0 (Nov 2002)
• XMLSchema 1.0 (May 2001)

A Simple XML Document
<article>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>

A Simple XML Document
<article>
<text>
</section>
</text>
</article>
Freely definable tags

Element
Content of the
Element
(Subelements
and/or Text)
A Simple XML Document<article>
<author>Gerhard Weikum</author>
<text>
</section>
</text>
</article>
End Tag
Start Tag

A Simple XML Document<article>
<author>Gerhard Weikum</author>
<text>
</section>
</text>
</article>
Attributes with
name and value

Elements in XML Documents
• (Freely definable) tags: article, title, author
– with start tag: <article> etc.
– and end tag: </article> etc.
• Elements: <article> ... </article>
• Elements have a name (article) and a content (...)
• Elements may be nested.
• Elements may be empty: <this_is_empty/>
• Element content is typically parsed character data (PCDATA),
i.e., strings with special characters, and/or nested elements
(mixed content if both).
• Each XML document has exactly one root element and forms
a tree.

XML Documents as Ordered Trees
article
author title text
sectionabstract
The index
Web
provides …
title=“…“
number=“1“
In order …
Ram
Bahadur
The Web
in 10 years

Well-Formed XML Documents
A well-formed document must adher to, among others,
the following rules:
• Every start tag has a matching end tag.
• Elements may nest, but must not overlap.
• There must be exactly one root element.
• Attribute values must be quoted.
• An element may not have two attributes with the same
name.
• Comments and processing instructions may not appear
inside tags.
• No unescaped < or & signs may occur inside character
data.

Well-Formed XML Documents
A well-formed document must adher to, among others,
the following rules:
• Every start tag has a matching end tag.
• Elements may nest, but must not overlap.
• There must be exactly one root element.
• Attribute values must be quoted.
• An element may not have to attributes with the same
name.
• Comments and processing instructions may not appear
inside tags.
• No unescaped < or & signs may occur inside character
data.
Only well-formed documents
can be processed by XML
parsers.

XSL
• XML also has it's own styles language – XSL
• XSL stands for Extensible Styles Language and
is a very powerful language for applying styles
to XML documents
• XSL has two parts
– a formatting language
– a transformation language.

• The formatting language allows to apply styles
similar to CSS
• Browser support for the XSL formatting language
is limited
• The transformation language is known as XSLT
(XSL Transformations)
• XSLT allows to transform XML document into
another form
• For example, we could use XSLT to dynamically
output some (or all) of the contents of your XML
file into an HTML document

Namespace
• XML allows to create our own element names,
there's always the possibility of naming an
element exactly the same as one in another XML
document
• This might be OK if we never use both documents
together
• But what if we need to combine the content of
both documents
• Would have a name conflict
– Would have two different elements, with different
purposes, both with the same name

Namespace Syntax
<bk:books xmlns:bk =“http://www.books.com/books_detail“>
Unique URI to identify
the namespace
Signal that namespace
definition happens
Prefix as abbrevation
of URI

Local Namespace
• We created a namespace to avoid a name conflict
between the elements of two documents we
wanted to combine
• When we defined the namespace, we defined it
against the root element
• This meant that the namespace was to be used
for the whole document, and we prefixed all child
elements with the same namespace.
• Can also define namespaces against a child node
• This way, we could use multiple namespaces
within the same document if required

Multiple Namespaces
• We could also have multiple namespaces
within your XML document
• For example, we could define one namespace
against the root element, and another against
a child element

XML Default Namespace
• We applied the prefix when we defined the
namespace, and we applied a prefix to each
element that referred to the namespace.
• We can also use what is known as a default
namespace within our XML documents
• The only difference is, a default namespace is
one where we don't apply a prefix

XML Entities
• We can use entities to represent:
– Characters that would otherwise cause problems
for the XML processor
– Large blocks of data that need to be repeated
throughout the document
– Characters that you can't type on your keyboard
(eg, ©)

Problematic Characters
• Some characters have a special meaning in XML.
• For example:
– the less than sign (<) marks the beginning of a tag
– And, of course, the greater than sign (>) marks the
end of the tag.
• When the XML processor parses the document, it
looks for these characters (and others) to
determine how to interpret the document
• This is fine, as long as your data doesn't contain
any of these characters.
– But what if it does?

• Imagine you had the following text within an
element: 10 < 5.
• When the XML processor encounters the <, it
will assume it's the start of an opening tag.
– Problem is, it's not.
• In order to include characters such as < and &
etc, we need to use their entity reference
instead of the character itself

XML Creating Entities
• Defining
• Using

CDATA
• In XML, a CDATA section is used to escape a
block of text that would otherwise be parsed
as markup

Why Are CDATA Sections Useful?
• Occasionally data contains large blocks of text
with lots of potentially problematic characters
• For example,
– Data could contain a programming script
– Many progamming scripts contain characters such
as less than/greater than signs, ampersands etc,
which would cause problems for the XML
processor.

• CDATA allows to escape the whole block of
text
• This eliminates the need to go through the
whole script, individually replacing all the
potentially problematic characters
• The XML processor knows to escape all data
between the CDATA tags

CDATA Syntax
• Declare a CDATA section using
– <![CDATA[ as the opening tag
– ]]> as the closing tag

XML

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to XML

Similar to XML (20)

More from Kamal Acharya

More from Kamal Acharya (20)

Recently uploaded

Recently uploaded (20)

XML