Word document


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Word document

  1. 1. 1. Overview 1.1 Introduction We have chosen to write about how to store and retrieve XML-data in/from a DB2 database with assistance of an XML-Extender. This essay will contain a theoretical and a practical part. In the theoretical part of the essay we describe the XML and XML-Extender and how they may be used. The theory will do the practical part more interesting and comprehensible for the reader. Today’s users of business applications tend to share data and are facing problems in the replicating, transforming, exporting and saving issues. Consequences of these problems are dropped data or at least a time-consuming process of the user to ensure the consistency of the data. XML(eXtensible Markup Language) is developed to solve these kind of problems by not only capturing the data to the specific application but also the data structure. Different applications can then share data without the need of transforming it between specific formats. 1.2 Problem definition In this essay we will study the powerful combination of XML and a DBMS. Our problem definition is: • How to store and retrieve XML-data in/from a DB2 database with assistance of an XML-Extender. To reach our goal and find an answer to this question we are tending to do practical studies that will result in a database application. This will increase our understanding of the subject and will in that way make it easier for us to perform a good essay. 1.3 Target group Students with basic knowledge in computer and database science and other interested in these specific areas as well. 1.4 Method We have searched for relevant information in books and on the Internet about the subject. We will also do practical studies. The software we will use are: • IBM DB2 version 6.1 to store the data. • NT 4 server and workstation. • Home Site 4. • XML extender. • Internet Explorer 5.0.
  2. 2. 1.5 Background The Web is known to most of the humanity. The idea behind this revolutionarily aspect was that researchers who were not on the same physical location would be able to write documents together. This resulted into three founding ideas of the Web. These are: the Hypertext Markup Language(HTTP, used to write the documents and make it possible for the documents to be displayed and edited), the HyperText Transfer Protocol(HTTP, to make it possible to transfer the documents from one place to another) and the Uniform Resource Locator(URL, used for referencing the documents and other resources in a way that they are obliged to be published into a HTML document). All these aspects resulted in the Web that made it possible to provide e-commerce, intranets and Web browsing. The negative aspect of HTML, HTTP and URLs was that they were insufficient for designing large-scale Web applications. For this purpose the Web needed a new language. The many markup requirements of new next-generation Web applications needed a universal format for encoding and distribution of data that were able to be understood by any machine on the Internet. Even if the HTML appeared to fulfil these demands it was insufficient in enforcing of a specific view on the data it contained. HTML describes how a web page should look like but does not represent any specific data. This means that HTML only represent the way that data should be displayed and not what it contains. Much effort was given at constructing extensions to HTML but it became evident that a new language was desirable. One of the efforts was to integrate HTML with the syntactic flexibility of the SGML(Standard Generalized Markup Language). The explanation why SGML was considered for this purpose was that it is a powerful and complex international standard expressly designed for defining different special-purpose markup languages. Alas SGML was seen by the Web community to be far too complex for easy Web application development. Another reason was that SGML was burdened with a number of features that made it very inflexible in a distributed environment like the Web. What was needed was a language that had a rich ability to define new languages like SGML and the Web portability of HTML. The result was the birth of XML. XML was designed like HTML as an SGML application but XML was also designed to use a single universal format to support different applications syntax. Continually XML was designed to be much simpler than SGML and was in particular designed to be used in distributed environments like the Web, where data marked up in XML could be far from the machine from whence it came. Another goal with the design of XML was that the specification of the language should be easy for the developer to assemble simple XML processing software without to much problems. The specification of XML was consist and to the point, only 30 pages long comparing to the specification of the HTML which was on 360 pages.
  3. 3. 2. XML 2.1 Introduction to XML XML stands for, as earlier mentioned, eXtensible Markup Language and is a text-based format for documents containing structured information. It describes the data better than its predecessors because it is extensible. HTML, for example, only describes how data should be displayed, not what it contains. In XML you are able to define your own “tags”, for example <artist>name</artist>. This makes the search for a certain document on the Web more specific. Let us illustrate this by giving you an example. The user of an application is interested in music and wants to browse after his/hers favourite group on the net. If the browsing should be done with HTML the results on the seek word Britney Spears + artist would be pages that include both the words Britney Spears and artist. With XML the artist tag makes the search more efficient while the search would only be based on music artists named Britney Spears. <?xml version=”1.0” encoding=”UTF-8”?> <menu date=”24feb2000”> <artist> <fname> Britney </fname> <lname>Spers</lname> <age>17</age> </artist> </menu> Fig 2.1 Another advantage with XML is data interchange. It is a rather independent language that makes it easy for data to be transferred between different applications without transformation.
  4. 4. 2.2 XML declarations 2.2.1 Syntax An XML document consists of elements and attributes. This means that a document uses elements and attributes to express the semantics and can be edited and processed with the same tools that are used to process other XML documents. The vocabulary is traditionally divided into letters, digits and other tokens. XML elements usually start with a start tag and end with an end tag. The relevant information of the document stored inside the markups is called character data. 2.2.2 Markup The markup is a part of the logical structure of a XML document. In XML, markup is denoted by text surrounded by left and right angle brackets. As mentioned in chapter 2.1 the element <artist>, starts with a start tag and ends with and end tag </artist>. There are several kinds of markups: • Public markups, that are adding transfer information to make it possible for data to be transferred to downline publishers. The specific information includes the content of the added information, searchable terms that may be used for indexing, an unique identifier, an expiration date and an action to be take by the publisher. The actions may be create, kill or update. • Private inbound markup, are not forwarded to any downline publishers but includes account information, comments and action specific claim like sort. • Private outbound markup, includes status, authorisation, version, price and positioning information as well as warning and confirmation messages. [ http://www.xml.com/pub/r/AD_Markup ] 2.2.2 Structure The XML document consists of two main parts which are the prolog and document element. In the prolog the general information regarding the document is provided but the prolog itself does not represent the actual structure of the document. In our example shown in fig2.1 you may see that the prolog consists of the markup <?xml version=”1.0” encoding=”UTF-8”?>. Other things that can be stated in the prolog is document type declarations, comments and/or processing instructions. The prolog given in the example from above indicates which version of the XML that the document have been compiled with.The declaration encoding =”UTF-8” shows that the XML processor is using the UTF-8 encoding from the Unicode/ISO 10646 standard. The XML declaration is optional but recommended while it is useful to know what character encoding is used to create an binary representation of the document. This is a guarantee that the document will be operational even in a later version of the XML.
  5. 5. 2.2.3 Elements The most common of the markups are the elements. Their task is to identify the content they surround. The elements begin as other markups with an start tag <element> and end with an end tag </element>. 2.2.4 Attributes Objects need attributes and therefore attributes are used to associate values with elements. The result of this is that attribute specifications may only appear within start tags and empty element tags. [http://www.w3.org/TR/REC-xml#attdecls ] The declaration of attributes may be used to define the set of attributes ordered to a specific element type, for example a date attribute corresponds to the date of the menu. If an attribute is declared while the type of the element is not then the XML processor may issue a warning. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. There are three kinds of attributes types. These are: a string type, a set of tokenised types and enumerated types. 2.2.5 Comments XML provide the ability to make comments on the markups or character data contents of documents. The comments themselves are not considered as a part of the character data. Comments are started with <!-- similar to the HTML and ends with -- >. Comments may appear everywhere in a document. <?xml version=”1.0” ?> <!-- Comment -- > <body> <title>Example 2 </title> </body> Fig 2.2
  6. 6. 2.2.5 Tree structure The tree structure has a single root node- the document element. To show this we once again use our example from fig2.1. menu artist fname lname age Fig 2.2 2.3 DTD (Document Type Definitions) XML is defined as a meta language. It means that XML describes other markup languages. You can design and build your own markup language for a certain purpose. If you work in the music industry you might want to create a markup language containing tags associated with music. Markup languages like they are mentioned are defined in a special document called DTD. It separates the structure of the document from the actual data. A DTD includes elements, attributes, tags and entities for a specific document and describes how they are related to each other. A DTD set up rules that, for example say that a <document> must have a title and one or more <customer> must have exactly one <id> and exactly one <address> and one or more <phone number>. There are two ways at storing the DTD.s. They can be included in the same file as the document they describe or being linked from an external URL. In the former case the DTD:s can be shared by different documents and Web sites. This can be useful when, for example, publishers want certain format from customers to facilitate the book layout and production. Whole organisations or even trades can agree about a specific DTD. In that way they can exchange paper over the Web and be assured to receive it in a well known format they all can read.
  7. 7. 2.4 XSL (eXtensible Stylesheet Language) The separation between the description and presentation of data is one of the most essential principals of XML. XSL is XML’s correspondence to HTML’s Cascading Style Sheet(CSS), but gives far more detailed control over appearance. Every XML document can be attached to an XSL style sheet that describes how individual elements should be formatted and displayed. XSL is an improvement of Document Style Semantics and Specification Language(DSSSL) and earlyer mentioned CSS. It is simpler than DSSSL and much more powerful than CSS. One great aspect of XSL is the use of Java script. By using these XSL can provide more complex and dynamic behaviours. The XSL is an XML document containing elements that describe how XML tags in XML documents associated with the XSL should be converted. The XML tags will be transformed to objects as XML document is read. These objects are either of HTML or DSSL. The objects, containing marked up text are saved in an HTML file or a DSSSL file. This file is generated by the XSL processor. The file can then be viewed by placing it on a Web browser.
  8. 8. 3. XML Extender 3.1 Background Today the computer industry can offer a large amount of different applications, each designed and build for specific purposes. The user can choose an application that suits its needs. This width of choice has not only advantages. As mentioned in chapter 2 many applications today tend to share data. Then you have problem with replicating, transforming, exporting, or saving their data as a different format that can be imported into another application. Business applications are especially vulnerable because of the transformation process. The process may either result in dropped data or at least consume some of the user’s valuable time ensuring that the data is consistent. To overcome the problems above you can write Open Database Connectivity applications with XML interchange utilities. Then you can save the data into a database management system and don’t have to worry about the transformation part while you use the standard for data interchange, XML. For this purpose IBM have developed the DB2 XML Extender. 3.1 Introduction to the XML Extender. The XML extender makes it possible to integrate the power of IBM’s DB2 Universla Database(DB2 UDB) with the flexibility of XML. Based on the application you are able to choose whether to use the XML extender for storing entire XML documents in DB2 as a non- traditional user defined data type or mapping the XML content as traditional data in relational tables. The mapping function is not only used for storing decomposed XML documents but also to compose XML documents from the data in already existing tables. Regarding the non- traditional XML data type, the XML extender adds the possibility to search rich data types of XML elements or attributes values, in addition to the structural text search that the DB2 UDB Text Extender provides. This powerful search capability is possible thanks to certain function in the XML extender, the one allowing XML documents to be indexed. This improves the usability of a Web site containing a large amount of readable information, such as articles.
  9. 9. 3.2 Data types in the XML Extender 3.2.1 DAD DAD is the shortening for Document Access Definition and are specifying how structured XML documents are stored or created. The DAD itself is a part of the XML. It is a formatted document which associates XML documents to a DB2 database through two major access and storage methods. These methods are XML columns and XML collections. The XML columns are containing intact XML documents while the XML collections are a set of relational tables containing data that are mapped to an XML document and then may be composed into an XML document. When you use an XML collection you can take advantage of the relational capability of DB2. This increases the flexibility of sharable data among applications. <?xml> <!DOCTYPE..> <Order key=”1”> </Order> Fig3.1 Storing structured XML documents in a DB2 table column.
  10. 10. <?xml> <!DOCTYPE..> <Order key=”1”> </Order> Collection Fig3.2 Storing documents as data in a collection of DB2 tables. UDT and UDF For the XML columns the XML extender provides UDTs (user-defined-types). UDTs are data types that are not originally known to the database while they are defined by the user. UDTs identify the storage type of the XML document in the application table. UDFs (User-defined-functions) are used for storing and retrieving XML documents in XML columns. The UDFs are associated with the XML UDTs and are mainly used for XML columns.