DB2 pureXML® Podcast Series © IBM 2008

Part 2. Getting Started with pureXML in DB2 9 for z/OS

Hello, my name is Guogen Z...
DB2 pureXML® Podcast Series © IBM 2008
columns: DOCID, MIN_NODEID, and XMLDATA. The XMLDATA column holds the
bulk of the i...
DB2 pureXML® Podcast Series © IBM 2008
cannot be used in distinct, group by or order by clauses, or UNION. UNION ALL is fi...
DB2 pureXML® Podcast Series © IBM 2008
XMLSERIALIZE will convert an XML value into a LOB, so you can use LOB interface
to ...
Upcoming SlideShare
Loading in …5
×

Part 2. Getting Started with pureXML in DB2 9 for z/OS

863 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
863
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Part 2. Getting Started with pureXML in DB2 9 for z/OS

  1. 1. DB2 pureXML® Podcast Series © IBM 2008 Part 2. Getting Started with pureXML in DB2 9 for z/OS Hello, my name is Guogen Zhang. I'm a Senior Technical Staff Member at IBM Silicon Valley Lab, responsible for the delivery of pureXML in DB2 for z/OS server. This is part 2 of the pureXML podcast. The title is "Getting started with pureXML". In part 1, we've learned the benefit and business value of pureXML. In this part, we will talk about how you can actually start trying out the basic pureXML feature. First, what's the pre-requisite for the pureXML? For the basic XML, you need DB2 9 for z/OS New Function Mode, on operating system z/OS release 1.8 or later, or z/OS release 1.7 with APAR OA16303, which provides high-performance XML parsing, and it is a z/OS APAR, not DB2's. For schema support, there is some setup needed, and we will talk about in a separate part. Once you get to DB2 9 New Function Mode, you can start using the XML features. A few zparms or installation parameters may help. The first couple of the zparms are for the virtual storage, we have XMLVALA and XMLVALS. XMLVALA limits virtual storage consumption by each thread for XML processing, it defaults to 200MB. XMLVALS limits the virtual storage consumption by XML processing for the entire subsystem in DBM1 address space, its default is 10GB. They are very similar to LOBVALA and LOBVALS. Both LOBVALA and LOBVALS also impact the XML document size during bind-in and bind-out, because DB2 leverages LOB Manager for bind-in and bind-out of XML documents. However, since DB2 uses streaming technique for inserting XML data, you will be able to insert a document that is larger than the zparm limit. But for SELECT, you don't have that luck. Another important zparm is for the default buffer pool for XML data, TBSBPXML, which defaults to BP16K0. You can change this default to another 16K page buffer pool. By default, DB2 will use this buffer pool for XML table spaces. You can alter specific table space to use a different buffer pool. If you authorize people to create a table with an XML column, you also need to authorize people the usage privilege on these buffer pools. When you start using XML and XPath, you need to pay attention to two important things. The first is the terminal session, it's always a good practice to set the terminal session CCSID to be consistent with the database encoding scheme. Otherwise, some characters may not be recognized by DB2 Compiler. Second, XML and XPath are case-sensitive. So if you use SPUFI or just JCL, you need to pay attention to the case of the letters. For example, you need to set CAPS off or CASE MIXED in your editor session. Otherwise, DB2 will complain. Now let's turn to the second subject: getting to know the XML objects. Let's assume you create table PURCHASEORDERS, with PURCHASEORDERNUBMER, a VARCHAR column, PODATE, a DATE column, and STATUS, a CHAR(1) column, and XMLPO, an XML column for purchase order. Behind the scene, DB2 will create a base table for the regular columns, and an XML table for an XML column. Also in the base table, DB2 will add a hidden DOCID column, which is a BIGINT identity column, and also create a DOCID index on that DOCID column. For the XML table space, it will have three 1
  2. 2. DB2 pureXML® Podcast Series © IBM 2008 columns: DOCID, MIN_NODEID, and XMLDATA. The XMLDATA column holds the bulk of the internal XML data, that is, trees of nodes. In addition, DB2 creates a NODEID index on the internal XML table, which contains data extracted from the XMLDATA column. The names of the implicitly-created objects are generated by DB2 following some naming convention. Users cannot specify them. Some other attributes are either using default values or inherited by some rules. Some of the attributes can be altered using the ALTER statement. Although all the implicit objects are visible in the catalog, they are transparent to applications accessing XML data. In fact, most DML statements cannot directly manipulate these implicit objects. It suffices to say that an XML table space is a regular UTS, that is the universal table space in V9, and it does not have an CCSID associated with it because it contains binary data. For the character data inside, they are in UTF-8 format. Now let's shift gear to the third subject. Talking about how to manipulate XML data, including insert, update, delete, select, load and unload. Once you've created the PURCHASEORDER table successfully, you can insert XML data using the INSERT statement. It is very flexible how you want to hold XML data for the insert. For example, you can use the newly-introduced XML host variable, that is, XML AS CLOB, XML AS BLOB, or XML AS DBCLOB. Or you can use any existing host variable type as long as it can hold the purchase order data you'd like to insert. Or you can put a string literal, a purchase order, in the INSERT statement itself. Although DB2 accepts XML data of any encoding it supports, we recommend you to use UTF-8 if possible, and use binary data type, to avoid any encoding conversion overhead. During the insertion, DB2 will parse the XML data and convert into internal format. By default, it will strip insignificant whitespaces. You can also use LOAD to get XML data in. DB2 LOAD supports the traditional data set of records for XML data up to 32KB in size. For larger documents, you would need to use new feature of load from file references. An XML column does not have a size limit. But you can bind in and bind out a document up to 2GB, due to the size limit of a LOB because DB2 re-uses the LOB Manager for XML bind-in and bind-out. Internally there is no architecture limit for the XML document size. Once you successfully inserted XML data or loaded XML data, you can use SELECT to get XML back. You can use simple SELECT, or use predicates to filter the data. A predicate can be a regular relational predicate or an XML predicate. The only significant XML predicate is the XMLEXISTS predicate, it looks like a function. You can embed an XPath in XMLEXISTS as an argument, so you can have sophisticated predicate to filter based on XML data. XML data type cannot be compared with any other data type directly, or even with another XML type using the regular comparison operators. You can use IS NULL or IS NOT NULL in addition to XMLEXISTS. XML data cannot be sorted. Therefore, it 2
  3. 3. DB2 pureXML® Podcast Series © IBM 2008 cannot be used in distinct, group by or order by clauses, or UNION. UNION ALL is fine for XML. If you select a whole document back, you just need to put an XML column, such as XMLPO in the SELECT clause, and you will get a serialized XML data back into your application. If you want to extract pieces out of your documents, you need to use the XMLQUERY function. XMLQUERY is a scalar function. Its syntax is very similar to XMLEXISTS -- you can embed XPath as an argument, specifying what you want to extract. Pay attention to the differences between XMLQUERY and XMLEXISTS. Typically, XMLQUERY is used in the SELECT clause, and XMLEXISTS is used in the WHERE clause to specify the filter condition on XML data for the rows to be returned. If you use XMLQUERY only without XMLEXISTS, you will get all the rows back, even the XMLQUERY result is empty or NULL. If you don't like to see all these null values in the result, you need to put a similar or same XPath in the XMLEXISTS in the WHERE clause. Similar to bind-in, you can bind out XML into XML host variable or other types as long as they can hold the string value. Truncation of XML data will result in an error. Furthermore, you can bind out XML data into an encoding other than UTF-8. Like bind- in, we recommend to use UTF-8 to avoid conversion overhead and possible conversion data loss. If you want to construct new documents based on existing data, you would need to use XML Constructor functions, together with the XMLQUERY function. For more sophisticated processing, you will need to use the XMLTABLE function, which is an SQL built-in table function, and being delivered post V9 GA. We will talk about that in a separate part. You can apply UPDATE and DELETE statements to tables with XML columns. DB2 will treat XML columns just as other regular columns. In DB2 9, each XML document is pretty much a unit for operations. For update, you need to prepare a new document, and then use SET clause in the update statement to replace the existing document. Delete will delete a whole document. Sub-document update is not supported in V9. You can also use XMLEXISTS predicate, together with other relational predicates, to search for the interested documents in update and delete. In addition to SELECT, you can use UNLOAD to get data out. You can use traditional data sets to unload the XML data with other columns into data records. This will limit the XML document size to 32KB. For larger documents, you would need to use UNLOAD with file references, where each document will be in its own file. Two other important functions related to XML are XMLPARSE and XMLSERIALIZE. Use XMLPARSE if you want to preserve whitespace in a document during insertion. You can only use non-XML type argument in the XMLPARSE function. 3
  4. 4. DB2 pureXML® Podcast Series © IBM 2008 XMLSERIALIZE will convert an XML value into a LOB, so you can use LOB interface to get XML data out, and also to control whether to prepend the serialized XML data with XML declaration or not. But again you need to pay attention for the result size as truncation of XML data will be an error. We've talked about basics to start using XML in DB2 9. I hope you can try it, maybe start with some samples. If you have any questions or feedback, please send emails to db2zxml@us.ibm.com. See you next time. 4

×