We have chosen to write about how to store and retrieve XML-data in/from a DB2 database
with assistance of an XML-Extender.
This essay will contain a theoretical and a practical part. In the theoretical part of the essay we
describe the XML and XML-Extender and how they may be used. The theory will do the
practical part more interesting and comprehensible for the reader.
Today’s users of business applications tend to share data and are facing problems in the
replicating, transforming, exporting and saving issues. Consequences of these problems are
dropped data or at least a time-consuming process of the user to ensure the consistency of the
XML(eXtensible Markup Language) is developed to solve these kind of problems by not only
capturing the data to the specific application but also the data structure. Different applications
can then share data without the need of transforming it between specific formats.
1.2 Problem definition
In this essay we will study the powerful combination of XML and a DBMS.
Our problem definition is:
• How to store and retrieve XML-data in/from a DB2 database with assistance of an
To reach our goal and find an answer to this question we are tending to do practical studies
that will result in a database application. This will increase our understanding of the subject
and will in that way make it easier for us to perform a good essay.
1.3 Target group
Students with basic knowledge in computer and database science and other interested in these
specific areas as well.
We have searched for relevant information in books and on the Internet about the subject. We
will also do practical studies.
The software we will use are:
• IBM DB2 version 6.1 to store the data.
• NT 4 server and workstation.
• Home Site 4.
• XML extender.
• Internet Explorer 5.0.
The Web is known to most of the humanity. The idea behind this revolutionarily aspect was
that researchers who were not on the same physical location would be able to write documents
This resulted into three founding ideas of the Web. These are: the Hypertext Markup
Language(HTTP, used to write the documents and make it possible for the documents to be
displayed and edited), the HyperText Transfer Protocol(HTTP, to make it possible to transfer
the documents from one place to another) and the Uniform Resource Locator(URL, used for
referencing the documents and other resources in a way that they are obliged to be published
into a HTML document). All these aspects resulted in the Web that made it possible to
provide e-commerce, intranets and Web browsing.
The negative aspect of HTML, HTTP and URLs was that they were insufficient for designing
large-scale Web applications. For this purpose the Web needed a new language. The many
markup requirements of new next-generation Web applications needed a universal format for
encoding and distribution of data that were able to be understood by any machine on the
Internet. Even if the HTML appeared to fulfil these demands it was insufficient in enforcing
of a specific view on the data it contained. HTML describes how a web page should look like
but does not represent any specific data. This means that HTML only represent the way that
data should be displayed and not what it contains. Much effort was given at constructing
extensions to HTML but it became evident that a new language was desirable. One of the
efforts was to integrate HTML with the syntactic flexibility of the SGML(Standard
Generalized Markup Language). The explanation why SGML was considered for this purpose
was that it is a powerful and complex international standard expressly designed for defining
different special-purpose markup languages. Alas SGML was seen by the Web community to
be far too complex for easy Web application development. Another reason was that SGML
was burdened with a number of features that made it very inflexible in a distributed
environment like the Web. What was needed was a language that had a rich ability to define
new languages like SGML and the Web portability of HTML.
The result was the birth of XML. XML was designed like HTML as an SGML application but
XML was also designed to use a single universal format to support different applications
syntax. Continually XML was designed to be much simpler than SGML and was in particular
designed to be used in distributed environments like the Web, where data marked up in XML
could be far from the machine from whence it came. Another goal with the design of XML
was that the specification of the language should be easy for the developer to assemble simple
XML processing software without to much problems. The specification of XML was consist
and to the point, only 30 pages long comparing to the specification of the HTML which was
on 360 pages.
2.1 Introduction to XML
XML stands for, as earlier mentioned, eXtensible Markup Language and is a text-based
format for documents containing structured information. It describes the data better than its
predecessors because it is extensible. HTML, for example, only describes how data should be
displayed, not what it contains. In XML you are able to define your own “tags”, for example
<artist>name</artist>. This makes the search for a certain document on the Web more
specific. Let us illustrate this by giving you an example. The user of an application is
interested in music and wants to browse after his/hers favourite group on the net. If the
browsing should be done with HTML the results on the seek word Britney Spears + artist
would be pages that include both the words Britney Spears and artist. With XML the artist tag
makes the search more efficient while the search would only be based on music artists named
<?xml version=”1.0” encoding=”UTF-8”?>
<fname> Britney </fname>
Another advantage with XML is data interchange. It is a rather independent language that
makes it easy for data to be transferred between different applications without transformation.
2.2 XML declarations
An XML document consists of elements and attributes. This means that a document uses
elements and attributes to express the semantics and can be edited and processed with the
same tools that are used to process other XML documents. The vocabulary is traditionally
divided into letters, digits and other tokens. XML elements usually start with a start tag and
end with an end tag. The relevant information of the document stored inside the markups is
called character data.
The markup is a part of the logical structure of a XML document. In XML, markup is denoted
by text surrounded by left and right angle brackets. As mentioned in chapter 2.1 the element
<artist>, starts with a start tag and ends with and end tag </artist>.
There are several kinds of markups:
• Public markups, that are adding transfer information to make it possible for data to be
transferred to downline publishers. The specific information includes the content of the
added information, searchable terms that may be used for indexing, an unique identifier,
an expiration date and an action to be take by the publisher. The actions may be create,
kill or update.
• Private inbound markup, are not forwarded to any downline publishers but includes
account information, comments and action specific claim like sort.
• Private outbound markup, includes status, authorisation, version, price and positioning
information as well as warning and confirmation messages.
[ http://www.xml.com/pub/r/AD_Markup ]
The XML document consists of two main parts which are the prolog and document element.
In the prolog the general information regarding the document is provided but the prolog itself
does not represent the actual structure of the document. In our example shown in fig2.1 you
may see that the prolog consists of the markup <?xml version=”1.0” encoding=”UTF-8”?>.
Other things that can be stated in the prolog is document type declarations, comments and/or
processing instructions. The prolog given in the example from above indicates which version
of the XML that the document have been compiled with.The declaration encoding =”UTF-8”
shows that the XML processor is using the UTF-8 encoding from the Unicode/ISO 10646
standard. The XML declaration is optional but recommended while it is useful to know what
character encoding is used to create an binary representation of the document. This is a
guarantee that the document will be operational even in a later version of the XML.
The most common of the markups are the elements. Their task is to identify the content they
surround. The elements begin as other markups with an start tag <element> and end with an
end tag </element>.
Objects need attributes and therefore attributes are used to associate values with elements.
The result of this is that attribute specifications may only appear within start tags and empty
element tags. [http://www.w3.org/TR/REC-xml#attdecls ]
The declaration of attributes may be used to define the set of attributes ordered to a specific
element type, for example a date attribute corresponds to the date of the menu.
If an attribute is declared while the type of the element is not then the XML processor may
issue a warning. When more than one definition is provided for the same attribute of a given
element type, the first declaration is binding and later declarations are ignored.
There are three kinds of attributes types. These are: a string type, a set of tokenised types and
XML provide the ability to make comments on the markups or character data contents of
documents. The comments themselves are not considered as a part of the character data.
Comments are started with <!-- similar to the HTML and ends with -- >. Comments may
appear everywhere in a document.
<?xml version=”1.0” ?>
<!-- Comment -- >
<title>Example 2 </title>
2.2.5 Tree structure
The tree structure has a single root node- the document element. To show this we once again
use our example from fig2.1.
fname lname age
2.3 DTD (Document Type Definitions)
XML is defined as a meta language. It means that XML describes other markup languages.
You can design and build your own markup language for a certain purpose. If you work in the
music industry you might want to create a markup language containing tags associated with
music. Markup languages like they are mentioned are defined in a special document called
DTD. It separates the structure of the document from the actual data. A DTD includes
elements, attributes, tags and entities for a specific document and describes how they are
related to each other.
A DTD set up rules that, for example say that a <document> must have a title and one or
more <customer> must have exactly one <id> and exactly one <address> and one or more
<phone number>. There are two ways at storing the DTD.s. They can be included in the same
file as the document they describe or being linked from an external URL. In the former case
the DTD:s can be shared by different documents and Web sites.
This can be useful when, for example, publishers want certain format from customers to
facilitate the book layout and production. Whole organisations or even trades can agree about
a specific DTD. In that way they can exchange paper over the Web and be assured to receive
it in a well known format they all can read.
2.4 XSL (eXtensible Stylesheet Language)
The separation between the description and presentation of data is one of the most essential
principals of XML.
XSL is XML’s correspondence to HTML’s Cascading Style Sheet(CSS), but gives far more
detailed control over appearance. Every XML document can be attached to an XSL style sheet
that describes how individual elements should be formatted and displayed.
XSL is an improvement of Document Style Semantics and Specification Language(DSSSL)
and earlyer mentioned CSS. It is simpler than DSSSL and much more powerful than CSS.
One great aspect of XSL is the use of Java script. By using these XSL can provide more
complex and dynamic behaviours.
The XSL is an XML document containing elements that describe how XML tags in XML
documents associated with the XSL should be converted. The XML tags will be transformed
to objects as XML document is read. These objects are either of HTML or DSSL. The objects,
containing marked up text are saved in an HTML file or a DSSSL file. This file is generated
by the XSL processor. The file can then be viewed by placing it on a Web browser.
3. XML Extender
Today the computer industry can offer a large amount of different applications, each designed
and build for specific purposes. The user can choose an application that suits its needs. This
width of choice has not only advantages. As mentioned in chapter 2 many applications today
tend to share data. Then you have problem with replicating, transforming, exporting, or saving
their data as a different format that can be imported into another application.
Business applications are especially vulnerable because of the transformation process. The
process may either result in dropped data or at least consume some of the user’s valuable time
ensuring that the data is consistent.
To overcome the problems above you can write Open Database Connectivity applications
with XML interchange utilities. Then you can save the data into a database management
system and don’t have to worry about the transformation part while you use the standard for
data interchange, XML. For this purpose IBM have developed the DB2 XML Extender.
3.1 Introduction to the XML Extender.
The XML extender makes it possible to integrate the power of IBM’s DB2 Universla
Database(DB2 UDB) with the flexibility of XML. Based on the application you are able to
choose whether to use the XML extender for storing entire XML documents in DB2 as a non-
traditional user defined data type or mapping the XML content as traditional data in relational
tables. The mapping function is not only used for storing decomposed XML documents but
also to compose XML documents from the data in already existing tables. Regarding the non-
traditional XML data type, the XML extender adds the possibility to search rich data types of
XML elements or attributes values, in addition to the structural text search that the DB2 UDB
Text Extender provides. This powerful search capability is possible thanks to certain function
in the XML extender, the one allowing XML documents to be indexed. This improves the
usability of a Web site containing a large amount of readable information, such as articles.
3.2 Data types in the XML Extender
DAD is the shortening for Document Access Definition and are specifying how structured
XML documents are stored or created. The DAD itself is a part of the XML. It is a formatted
document which associates XML documents to a DB2 database through two major access and
storage methods. These methods are XML columns and XML collections. The XML columns
are containing intact XML documents while the XML collections are a set of relational tables
containing data that are mapped to an XML document and then may be composed into an
XML document. When you use an XML collection you can take advantage of the relational
capability of DB2. This increases the flexibility of sharable data among applications.
Fig3.1 Storing structured XML documents in a DB2 table column.
Fig3.2 Storing documents as data in a collection of DB2 tables.
UDT and UDF
For the XML columns the XML extender provides UDTs (user-defined-types). UDTs are data
types that are not originally known to the database while they are defined by the user. UDTs
identify the storage type of the XML document in the application table.
UDFs (User-defined-functions) are used for storing and retrieving XML documents in XML
columns. The UDFs are associated with the XML UDTs and are mainly used for XML