Native XML Support in DB2 Universal Database Matthias Nicola, Bert van der Linden IBM Silicon Valley Lab Presented by Mo L...
Agenda <ul><li>What is DB2 9 (Viper)? </li></ul><ul><li>Native XML in the forthcoming version of DB2 </li></ul><ul><li>Nat...
What is DB2 9 (Viper)? <ul><li>IBM DB2 9 is the next-generation hybrid data   server with optimized management of both XML...
<ul><li>New query language support (for XQuery), a new graphical query builder (for   XQuery), and new query optimization ...
XML Databases <ul><li>XML-enabled Databases </li></ul><ul><li>The core data model is not XML (but e.g. relational) </li></...
XML in Relational Databases – Today's Challenge <ul><li>Today’s Challenge:  </li></ul><ul><li>XML must be force fit into  ...
DB2 Hybrid XML Engine - Overview
Integration of XML & Relational Capabilities in DB2 <ul><li>Native XML data type </li></ul><ul><li>(not Varchar, not CLOB,...
Integrating XML and Relational in DB2
DB2 Hybrid XML Engine - Interfaces <ul><li>Data Definition </li></ul><ul><li>create table dept(deptID int, deptdoc xml); <...
Native XML Storage
Efficient Document Tree Storage
Information for Every Node <ul><li>Tag name, encoded as unique StringID </li></ul><ul><li>A nodeID </li></ul><ul><li>Node ...
XML Node Storage Layout
XML Storage: “Regions Index”
XML Indexes in DB2 <ul><li>Need index support to manage millions of XML documents </li></ul><ul><li>Path-specific value in...
XML Value Indexes <ul><li>Table DEPT has two fields: “id” and “dept_doc” </li></ul><ul><li>Field “dept_doc” is an XML docu...
XML Value Indexes (continued)  <ul><li>“ xmlpattern” identifies the XML nodes to be indexed </li></ul><ul><li>Subset of XP...
XML Value Indexes: Data Types <ul><li>Allowed data types for indexes: </li></ul><ul><ul><li>VARCHAR(n) </li></ul></ul><ul>...
XML Value Indexes (continued) <ul><li>Node does not cast to the index type </li></ul><ul><ul><li>No error is raised </li><...
XML Value Indexes: unique indexes <ul><li>Unique indexes enforced within a document, and across all documents </li></ul><u...
XML Value Indexes: multiple elements or attributes <ul><li>Can create indexes on multiple elements or attributes </li></ul...
XML Value Indexes: namespaces <ul><li>Can index in a particular namespace </li></ul><ul><li>XMLPATTERN can contain namespa...
XML Value Indexes: internal <ul><li>For each XML document, each unique path mapped to an integer PathID (like StringID for...
XML Value Indexes: atomic vs. non-atomic <ul><li>Atomic Node: </li></ul><ul><ul><li>if it is an attribute, or </li></ul></...
XML Value Indexes: atomic vs. non-atomic <ul><li>‘/dept/employee’ non-atomic since it has child elements </li></ul><ul><li...
XML Full Text Indexes <ul><li>Allows full-text search of XML columns </li></ul><ul><li>Can be fully indexed or partially i...
Internal index structure <ul><li>System RX: One Part Relational, One Part XML  </li></ul><ul><li>Kevin Beyer, Roberta J Co...
Internal index structure <ul><li>XML index implemented with two B+ trees </li></ul><ul><ul><li>Path index </li></ul></ul><...
Internal index structure: Path Index <ul><li>Path Index maps reverse path (revPath) to a generated path identifier (pathId...
Internal index structure: Value Index <ul><li>Value Index used to represent nodes </li></ul><ul><li>Cconsists of the follo...
Internal index structure: Value Index <ul><li>“ value” is representation of the node’s data value when cast to the index’s...
Internal index structure: Tradeoffs of Value Index key fields <ul><li>Order of keys is a tradeoff </li></ul><ul><li>pathId...
XML Schema Support <ul><li>Optional XML Schema validation </li></ul><ul><li>Insert, Update, Query </li></ul><ul><li>Limite...
XML Schema Support <ul><li>Register XML Schemas and DTDs in DB </li></ul><ul><li>DB then stores type-annotated documents o...
XML Schema Support: XSR <ul><li>XSR consists of several new database catalog tables: </li></ul><ul><ul><li>Original XML sc...
XML Schema Support: Registration <ul><li>Example: </li></ul><ul><li>REGISTER XMLSCHEMA  http://my.dept.com  FROM dept.xsd ...
XML Schema Support: Validation <ul><li>“ XMLVALIDATE” function to validate documents in SQL statements </li></ul><ul><li>S...
XML Schema Support: Validation <ul><li>Example (explicit by URI): </li></ul><ul><li>INSERT INTO DEPT(detpdoc)  VALUES xmlv...
XML Schema Support: Validation <ul><li>Example (implcit) </li></ul><ul><li>DB2 tries to deduce schema from input document ...
XML Schema Support: First repository design principle <ul><li>Repository will not </li></ul><ul><ul><li>require users to m...
XML Schema Support: Second repository design principle <ul><li>Enable schema evolution </li></ul><ul><li>Sequence of chang...
XML Schema Support: Second repository design principle <ul><li>Flexibility of schema repository “paramount importance” </l...
XML Schema Support: Second repository design principle <ul><li>Built-in support for one very simple type of schema evoluti...
Querying XML Data in DB2 <ul><li>Options Supported </li></ul><ul><ul><li>XQuery/XPath as a stand-alone language </li></ul>...
Sample Tables <ul><li>create table ship ( </li></ul><ul><li>shipNo varchar(5) primary key not null, </li></ul><ul><li>capa...
Sample XML Data  Ship.maintenance <ul><li><mrecord> </li></ul><ul><li><log> </li></ul><ul><li><mntid>2353</mntid> </li></u...
Sample XML Data Captain.contactinfo <ul><li><contactinfo> </li></ul><ul><li><Address> </li></ul><ul><li><street>234 Rollin...
Standalone XQuery in DB2 <ul><li>for  $s in db2-fn:xmlcolumn(‘ship.maintenance’) </li></ul><ul><li>let $ml:= $s//log </li>...
SQL Embedded in XQuery <ul><li>for $m in db2-fn:sqlquery(‘select maintenance from ship where class = 1’) </li></ul><ul><li...
Select Statement using XML Column <ul><li>Select shipno,class,maintenance </li></ul><ul><li>from ship </li></ul><ul><li>wh...
SQL/XML Queries <ul><li>Restricting results using XML element values </li></ul><ul><ul><li>select captid,lname,fname from ...
SQL/XML Queries <ul><li>Projecting XML element values </li></ul><ul><ul><li>Two functions: XMLQuery and XMLTable </li></ul...
SQL/XML Queries XMLQuery (Continued) <ul><ul><li>We could also look for only first email for each captain by changing the ...
SQL/XML Queries XMLTable <ul><li>XMLTable retrieves XML elements </li></ul><ul><li>Elements are mapped into result set col...
SQL/XML Queries XMLTable Example <ul><li>select s.shipNo,sm.mid,sm.vid,sm.md,sm.cost </li></ul><ul><li>from ship s, </li><...
Joining XML and Relational Data <ul><li>select c.captid,c.lname,c.fname </li></ul><ul><li>from captain, ship  </li></ul><u...
Using FLWR Expressions in SQL/XML <ul><li>select captid, </li></ul><ul><li>xmlquery(‘for $c in $cn/contactinfo </li></ul><...
XMLElement <ul><li>XML Element allows you to publish relational data as XML </li></ul><ul><li>select xmlelement(name “capt...
XMLElement Output from previous command <ul><li><captain> </li></ul><ul><li><captid>3563</captid> </li></ul><ul><li><lname...
Aggregating and Grouping Data <ul><li>select xmlelement(name “captainlist”, </li></ul><ul><li>xmlagg(xmlelement(name “capt...
Updating and Deleting XML Data <ul><li>Updates </li></ul><ul><ul><li>Use XMLParse command.  You must specify the entire XM...
Query Execution Plans <ul><li>Separate parsers for SQL and XQuery statements </li></ul><ul><li>Integrated query compiler f...
Query Run-time Evaluation <ul><li>3 major components added for processing queries over XML: </li></ul><ul><ul><li>XML Navi...
Summary <ul><li>Problems with CLOB and Shredded XML storage </li></ul><ul><li>Native XML support in DB2 offers: </li></ul>...
References <ul><li>[1] Nicola, M. and van der Linden, B. 2005. Native XML support in DB2 universal database. In  Proceedin...
Upcoming SlideShare
Loading in...5
×

Querying XML Data in DB2

2,265
-1

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,265
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Querying XML Data in DB2

  1. 1. Native XML Support in DB2 Universal Database Matthias Nicola, Bert van der Linden IBM Silicon Valley Lab Presented by Mo Liu , Frate, Joseph and John Russo Some material in the talk is adapted from the slides of this paper’s conference talk.
  2. 2. Agenda <ul><li>What is DB2 9 (Viper)? </li></ul><ul><li>Native XML in the forthcoming version of DB2 </li></ul><ul><li>Native XML Storage </li></ul><ul><li>XML Schema Support </li></ul><ul><li>XML indexes </li></ul><ul><li>Querying XML data in DB2 </li></ul><ul><li>Summery </li></ul>
  3. 3. What is DB2 9 (Viper)? <ul><li>IBM DB2 9 is the next-generation hybrid data server with optimized management of both XML and relational data . </li></ul><ul><li>IBM extended DB2 to include: </li></ul><ul><ul><li>New storage techniques for efficient management of hierarchical structures inherent in XML documents. </li></ul></ul><ul><ul><li>New indexing technology </li></ul></ul>
  4. 4. <ul><li>New query language support (for XQuery), a new graphical query builder (for XQuery), and new query optimization techniques </li></ul><ul><li>New support for validating XML data based on user-supplied schemas </li></ul><ul><li>New administrative capabilities, including extensions to key database utilities </li></ul><ul><li>Integration with popular application programming interfaces (APIs) </li></ul>
  5. 5. XML Databases <ul><li>XML-enabled Databases </li></ul><ul><li>The core data model is not XML (but e.g. relational) </li></ul><ul><li>Mapping between XML data model and DB’s data </li></ul><ul><li>model is required, or XML is stored as text </li></ul><ul><li>E.g.: DB2 XML Extender v8 </li></ul><ul><li>Native XML Databases </li></ul><ul><li>Use the hierarchical XML data model to store and </li></ul><ul><li>process XML internally </li></ul><ul><li>No mapping, no storage as text </li></ul><ul><li>Storage format = processing format </li></ul><ul><li>E.g.: Forthcoming version of DB2 </li></ul>
  6. 6. XML in Relational Databases – Today's Challenge <ul><li>Today’s Challenge: </li></ul><ul><li>XML must be force fit into relational data model – 2 choices </li></ul><ul><li>1. Shredding or decomposing </li></ul><ul><li>− Mapping from XML to relational often too complex </li></ul><ul><li>− Loses hierarchical dependencies </li></ul><ul><li>− Loses digital signature </li></ul><ul><li>− Often requires dozens or hundreds of tables </li></ul><ul><li>− Difficult to change original XML document </li></ul><ul><li>2. Large Object (BLOB, CLOB, Varchar) </li></ul><ul><li>It allows for fast insert and retrieval of full documents but it needs XML parsing at query execution time. </li></ul><ul><li>− SLOW performance </li></ul><ul><li>− Search performance is slow (must parse at search time) </li></ul><ul><li>− Retrieval of sub-documents is expensive </li></ul><ul><li>− Update inside the document is slow </li></ul><ul><li>− Indexing is inefficient (based on relative position) </li></ul><ul><li>− Difficult to join with relational </li></ul><ul><li>− Costs get worse as document size increases </li></ul>
  7. 7. DB2 Hybrid XML Engine - Overview
  8. 8. Integration of XML & Relational Capabilities in DB2 <ul><li>Native XML data type </li></ul><ul><li>(not Varchar, not CLOB, not object- relational) </li></ul><ul><li>XML Capabilities in all DB2 components </li></ul><ul><li>Applications combine XML & relational data </li></ul>
  9. 9. Integrating XML and Relational in DB2
  10. 10. DB2 Hybrid XML Engine - Interfaces <ul><li>Data Definition </li></ul><ul><li>create table dept(deptID int, deptdoc xml); </li></ul><ul><li>Insert </li></ul><ul><li>insert into dept(deptID, deptdoc) values (?,?) </li></ul><ul><li>Index </li></ul><ul><li>create index xmlindex1 on dept(deptdoc) </li></ul><ul><li>generate key using xmlpattern ‘/dept/name’ as varchar(30); </li></ul><ul><li>Retrieve </li></ul><ul><li>select deptdoc from dept where deptID = ? </li></ul><ul><li>SQL based Query </li></ul><ul><li>select deptID, xmlquery('$d/dept/name' passing deptdoc as “d&quot;) from dept where </li></ul><ul><li>deptID <> “PR27”; </li></ul><ul><li>XQuery based Query </li></ul><ul><li>for $book in db2-fn:xmlcolumn('BOOKS')/book </li></ul><ul><li>for $entry in db2-fn:xmlcolumn('REVIEWS')/entry </li></ul><ul><li>where $book/title = $entry/title </li></ul><ul><li>return <review> {$entry/review/text()} </review>; </li></ul>
  11. 11. Native XML Storage
  12. 12. Efficient Document Tree Storage
  13. 13. Information for Every Node <ul><li>Tag name, encoded as unique StringID </li></ul><ul><li>A nodeID </li></ul><ul><li>Node kind (e.g. element, attribute, etc.) </li></ul><ul><li>Namespace / Namespace prefix </li></ul><ul><li>Type annotation </li></ul><ul><li>Pointer to parent </li></ul><ul><li>Array of child pointers </li></ul><ul><li>Hints to the kind & name of child nodes </li></ul><ul><li>(for early-out navigation) </li></ul><ul><li>For text/attribute nodes: the data itself </li></ul>
  14. 14. XML Node Storage Layout
  15. 15. XML Storage: “Regions Index”
  16. 16. XML Indexes in DB2 <ul><li>Need index support to manage millions of XML documents </li></ul><ul><li>Path-specific value indexes on XML columns to index frequently used elements and attributes </li></ul><ul><li>XML-aware full-text indexing </li></ul>
  17. 17. XML Value Indexes <ul><li>Table DEPT has two fields: “id” and “dept_doc” </li></ul><ul><li>Field “dept_doc” is an XML document: </li></ul><ul><li><dept> </li></ul><ul><li> <employee id= 901 > </li></ul><ul><li> <name> John Doe </name> </li></ul><ul><li> <phone> 408 555 1212 </phone> </li></ul><ul><li> <office> 344 </office> </li></ul><ul><li> </employee> </li></ul><ul><li></dept> </li></ul><ul><li>CREATE INDEX idx1 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘/dept/employee/name’ AS SQL VARCHAR(35) </li></ul><ul><li>Creates XML value index on employee name for all documents </li></ul>
  18. 18. XML Value Indexes (continued) <ul><li>“ xmlpattern” identifies the XML nodes to be indexed </li></ul><ul><li>Subset of XPath language </li></ul><ul><ul><li>Wildcards, namespaces allowed </li></ul></ul><ul><ul><li>XPath predicates such as /a/b[c=5] not supported </li></ul></ul><ul><li>“ AS SQL” necessary to define data type, since DB2 does not require single XML schema for all documents in a table (so DB2 may not know data type to use for index) </li></ul>
  19. 19. XML Value Indexes: Data Types <ul><li>Allowed data types for indexes: </li></ul><ul><ul><li>VARCHAR(n) </li></ul></ul><ul><ul><li>VARCHAR HASHED, </li></ul></ul><ul><ul><li>DOUBLE </li></ul></ul><ul><ul><li>DATE </li></ul></ul><ul><ul><li>TIMESTAMP </li></ul></ul><ul><li>DB2 index manager enhanced to handle special XML types (e.g., +0, -0, +INF, -INF, NaN) </li></ul>
  20. 20. XML Value Indexes (continued) <ul><li>Node does not cast to the index type </li></ul><ul><ul><li>No error is raised </li></ul></ul><ul><ul><li>No index entry created for that node </li></ul></ul><ul><li>Single document (e.g., XML field from single record) may contain 0, 1, or multiple index entries </li></ul><ul><ul><li>Different than relational index </li></ul></ul>
  21. 21. XML Value Indexes: unique indexes <ul><li>Unique indexes enforced within a document, and across all documents </li></ul><ul><li>Example of unique index on employee id: </li></ul><ul><li>CREATE UNIQUE INDEX idx2 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘/dept/employee/@id’ AS SQL DOUBLE </li></ul>
  22. 22. XML Value Indexes: multiple elements or attributes <ul><li>Can create indexes on multiple elements or attributes </li></ul><ul><li>Example: create index on all text nodes: </li></ul><ul><ul><li>CREATE INDEX idx3 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘//text()’ AS SQL VARCHAR(hashed) </li></ul></ul><ul><li>Example: create index on all attributes </li></ul><ul><li>CREATE INDEX idx4 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘//@*’ AS SQL DOUBLE </li></ul>
  23. 23. XML Value Indexes: namespaces <ul><li>Can index in a particular namespace </li></ul><ul><li>XMLPATTERN can contain namespace declarations and prefixes </li></ul><ul><li>Example: </li></ul><ul><li>CREATE INDEX idx5 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘DECLARE NAMESPACE m= http:// www.me.com/ ;/m:dept/m:employee/ m:name’ AS SQL VARCHAR(45) </li></ul>
  24. 24. XML Value Indexes: internal <ul><li>For each XML document, each unique path mapped to an integer PathID (like StringID for tags) </li></ul><ul><li>Each index entry includes: </li></ul><ul><ul><li>PathID to identify path of indexed node </li></ul></ul><ul><ul><li>Value of the node cast to the index type </li></ul></ul><ul><ul><li>RowID </li></ul></ul><ul><ul><ul><li>Identify rows containing the matching documents </li></ul></ul></ul><ul><ul><li>NodeID </li></ul></ul><ul><ul><ul><li>Identify matching nodes and regions within the documents </li></ul></ul></ul>
  25. 25. XML Value Indexes: atomic vs. non-atomic <ul><li>Atomic Node: </li></ul><ul><ul><li>if it is an attribute, or </li></ul></ul><ul><ul><li>if it is a text node, or </li></ul></ul><ul><ul><li>if it is an element that has no child elements and exactly one text node child </li></ul></ul><ul><li>Indexes typically defined for atomic nodes </li></ul><ul><li>Possible to define index on non-atomic nodes, e.g. index on ‘/dept/employee’ </li></ul>
  26. 26. XML Value Indexes: atomic vs. non-atomic <ul><li>‘/dept/employee’ non-atomic since it has child elements </li></ul><ul><li>Single index entry for all of “employee” element, on all text nodes under “employee” (concatenation) </li></ul><ul><li>Can be useful for mixed content in text-oriented XML, e.g.: </li></ul><ul><ul><li><title>The benefits of <bold>XML</bold></title> </li></ul></ul>
  27. 27. XML Full Text Indexes <ul><li>Allows full-text search of XML columns </li></ul><ul><li>Can be fully indexed or partially indexed </li></ul><ul><li>Example of full index: </li></ul><ul><ul><li>CREATE INDEX myIndex FOR TEXT ON DEPT(deptdoc) FORMAT XML CONNECT TO PERSONNELDB </li></ul></ul><ul><li>Example query: </li></ul><ul><li> SELECT deptdoc FROM dept WHERE </li></ul><ul><li>CONTAINS(deptdoc,’SECTIONS(“/dept/comment”) “Brazil” ‘) </li></ul><ul><li>=1 </li></ul>
  28. 28. Internal index structure <ul><li>System RX: One Part Relational, One Part XML </li></ul><ul><li>Kevin Beyer, Roberta J Cochrane, </li></ul><ul><li>Vanja Josifovski, Jim Kleewein, George Lapis, </li></ul><ul><li>Guy Lohman, Bob Lyle, Fatma Özcan, </li></ul><ul><li>Hamid Pirahesh, Normen Seemann, </li></ul><ul><li>Tuong Truong, Bert Van der Linden, Brian Vickery, </li></ul><ul><li>Chun Zhang </li></ul>
  29. 29. Internal index structure <ul><li>XML index implemented with two B+ trees </li></ul><ul><ul><li>Path index </li></ul></ul><ul><ul><li>Value Index </li></ul></ul>
  30. 30. Internal index structure: Path Index <ul><li>Path Index maps reverse path (revPath) to a generated path identifier (pathId) </li></ul><ul><li>A “reverse path” is a list of node labels from leaf to root </li></ul><ul><ul><li>Compressed into vector of label identifiers </li></ul></ul><ul><li>Analogy to COLUMNS catalog from relational database </li></ul><ul><li>Used for efficient processing of descendent queries </li></ul><ul><ul><li>Example: “//name” query </li></ul></ul>
  31. 31. Internal index structure: Value Index <ul><li>Value Index used to represent nodes </li></ul><ul><li>Cconsists of the following key: </li></ul><ul><ul><li>PathId </li></ul></ul><ul><ul><li>value </li></ul></ul><ul><ul><li>nodeId </li></ul></ul><ul><ul><li>rid </li></ul></ul>
  32. 32. Internal index structure: Value Index <ul><li>“ value” is representation of the node’s data value when cast to the index’s data type </li></ul><ul><li>“ rid” identifies the row in the table (used for locking) </li></ul><ul><li>“ nodeId” identifies a node within the </li></ul><ul><ul><li>uses a Dewey node identifier </li></ul></ul><ul><ul><li>can provide quick access to a node in the XML store </li></ul></ul><ul><li>“ pathId” to retrieve specific path queries </li></ul>
  33. 33. Internal index structure: Tradeoffs of Value Index key fields <ul><li>Order of keys is a tradeoff </li></ul><ul><li>pathId first allows quick retrieval of specific queries </li></ul><ul><ul><li>e.g., index on //name might match many paths </li></ul></ul><ul><ul><li>query on /book/author/name still has consecutive index entries </li></ul></ul><ul><ul><li>but, query like //name=‘Maggie’ will need to examine every location in the index per matching path </li></ul></ul>
  34. 34. XML Schema Support <ul><li>Optional XML Schema validation </li></ul><ul><li>Insert, Update, Query </li></ul><ul><li>Limited support for DTDs an external entities </li></ul><ul><li>Type annotation produced by validation persisted with document (query execution) </li></ul><ul><li>Conforms to XML Query standard, XML Schema standard, XML standard </li></ul>
  35. 35. XML Schema Support <ul><li>Register XML Schemas and DTDs in DB </li></ul><ul><li>DB then stores type-annotated documents on disk, compiles execution plans with references to the XML Schemas </li></ul><ul><li>Schemas stored in DB itself, for performance </li></ul><ul><ul><li>XML Schema Repository (XSR) </li></ul></ul>
  36. 36. XML Schema Support: XSR <ul><li>XSR consists of several new database catalog tables: </li></ul><ul><ul><li>Original XML schema documents for XML schema </li></ul></ul><ul><ul><li>Binary representation of the schema for fast reference </li></ul></ul>
  37. 37. XML Schema Support: Registration <ul><li>Example: </li></ul><ul><li>REGISTER XMLSCHEMA http://my.dept.com FROM dept.xsd AS departments.deptschema complete </li></ul><ul><li>Schema URI is http://my.dept.com </li></ul><ul><li>File with schema document is “dept.xsd” </li></ul><ul><li>Schema identifier in DB is “deptschema” </li></ul><ul><li>Belongs to relational DB schema “departments” </li></ul>
  38. 38. XML Schema Support: Validation <ul><li>“ XMLVALIDATE” function to validate documents in SQL statements </li></ul><ul><li>Schema for validation </li></ul><ul><ul><li>is specified explicitly, or </li></ul></ul><ul><ul><li>can be deduced from the schemaLocation hints in the instance documents </li></ul></ul><ul><li>Referenced by Schema URI or by identifier </li></ul>
  39. 39. XML Schema Support: Validation <ul><li>Example (explicit by URI): </li></ul><ul><li>INSERT INTO DEPT(detpdoc) VALUES xmlvalidate(?according to xmlschema uri ‘http://my.dept.com’) </li></ul><ul><li>Example (explicit by ID): </li></ul><ul><li>INSERT INTO DEPT(deptdoc) VALUES xmlvalidate(? according to xmlschema id departments.deptschema) </li></ul>
  40. 40. XML Schema Support: Validation <ul><li>Example (implcit) </li></ul><ul><li>DB2 tries to deduce schema from input document </li></ul><ul><li>INSERT INTO dept(deptdoc) VALUES xmlvalidate(?) </li></ul><ul><li>Try to find it in repository </li></ul>
  41. 41. XML Schema Support: First repository design principle <ul><li>Repository will not </li></ul><ul><ul><li>require users to modify a schema before it is being registered </li></ul></ul><ul><ul><li>require users to modify XML documents before they are inserted and validated </li></ul></ul><ul><li>Once document is validated in DB,it will never require updates to remain valid </li></ul><ul><ul><li>Considered infeasible to bulk-update all existing documents to become valid </li></ul></ul>
  42. 42. XML Schema Support: Second repository design principle <ul><li>Enable schema evolution </li></ul><ul><li>Sequence of changes in an XML schema over its lifetime </li></ul><ul><li>New or evolving business needs </li></ul><ul><li>How to accomplish schema evolution is much-debated </li></ul><ul><ul><li>no standards </li></ul></ul><ul><ul><li>business demands require it; so constrain problem </li></ul></ul>
  43. 43. XML Schema Support: Second repository design principle <ul><li>Flexibility of schema repository “paramount importance” </li></ul><ul><li>DB2’s schema repository does not require namespace or the schema URI of each registered schema to be unique (user does not have control) </li></ul><ul><li>Database-specific Schema identifier must be unique (user does have control) </li></ul>
  44. 44. XML Schema Support: Second repository design principle <ul><li>Built-in support for one very simple type of schema evolution </li></ul><ul><li>If new schema is backwards-cmpatible with old schema, then old schema can be replaced with new schema in the schema repository </li></ul><ul><li>DB2 verifies all possible elements and attributes in old schema have same named types in the new schema </li></ul>
  45. 45. Querying XML Data in DB2 <ul><li>Options Supported </li></ul><ul><ul><li>XQuery/XPath as a stand-alone language </li></ul></ul><ul><ul><li>SQL embedded in XQuery </li></ul></ul><ul><ul><li>XQuery/XPath embedded in SQL/XML </li></ul></ul><ul><ul><li>Plain SQL for full-document retrieval </li></ul></ul><ul><li>DB2 treats SQL and XQuery as primary query languages. </li></ul><ul><ul><li>Both will operate independently on their data models </li></ul></ul><ul><ul><li>Can also be integrated </li></ul></ul>
  46. 46. Sample Tables <ul><li>create table ship ( </li></ul><ul><li>shipNo varchar(5) primary key not null, </li></ul><ul><li>capacity decimal(7,2), </li></ul><ul><li>class int, </li></ul><ul><li>purchDate date, </li></ul><ul><li>maintenance xml </li></ul><ul><li>) </li></ul><ul><li>create table captain ( </li></ul><ul><li>captID varchar(5) primary key not null, </li></ul><ul><li>lname varchar(20), </li></ul><ul><li>fname varchar(20), </li></ul><ul><li>DOB date, </li></ul><ul><li>contact xml </li></ul><ul><li>) </li></ul>Notice the xml datatype
  47. 47. Sample XML Data Ship.maintenance <ul><li><mrecord> </li></ul><ul><li><log> </li></ul><ul><li><mntid>2353</mntid> </li></ul><ul><li><shipno>39</shipno> </li></ul><ul><li><vendorid>2345</vendorid> </li></ul><ul><li><captid>9875</captid> </li></ul><ul><li><maintdate>01/10/2007</maintdate> </li></ul><ul><li><service>Removed rust on hull </service> </li></ul><ul><li><resolution>complete</resolution> </li></ul><ul><li><cost>13450.96</cost> </li></ul><ul><li><nextservice>01/10/2008</nextservice> </li></ul><ul><li></log> </li></ul><ul><li><log> </li></ul><ul><li><mntid>1254</mntid> </li></ul><ul><li><shipno>39</shipno> </li></ul><ul><li><vendorid>1253</vendorid> </li></ul><ul><li><captid>9234</captid> </li></ul><ul><li><maintdate>09/20/2005</maintdate> </li></ul><ul><li><service>Replace rudder</service> </li></ul><ul><li><resolution>complete</resolution> </li></ul><ul><li><cost>34532.21</cost> </li></ul><ul><li><nextservice>NA</nextservice> </li></ul><ul><li></log> </li></ul><ul><li></mrecord> </li></ul>
  48. 48. Sample XML Data Captain.contactinfo <ul><li><contactinfo> </li></ul><ul><li><Address> </li></ul><ul><li><street>234 Rolling Lane</street> </li></ul><ul><li><city>Rockport</city> </li></ul><ul><li><state>MA</state> </li></ul><ul><li><zipcode>01210</zipcode> </li></ul><ul><li></Address> </li></ul><ul><li><phone> </li></ul><ul><li><work>9783412321</work> </li></ul><ul><li><home>9722342134</home> </li></ul><ul><li><cell>9782452343</cell> </li></ul><ul><li><satellite>2023051243</satellite> </li></ul><ul><li></phone> </li></ul><ul><li><email>love2fish@finmail.com</email> </li></ul><ul><li></contactinfo> </li></ul>
  49. 49. Standalone XQuery in DB2 <ul><li>for $s in db2-fn:xmlcolumn(‘ship.maintenance’) </li></ul><ul><li>let $ml:= $s//log </li></ul><ul><li>where $ml/cost = > 10000 </li></ul><ul><li>order by $ml/shipno </li></ul><ul><li>return <MaintenanceLog> </li></ul><ul><li>{$ml/shipno,$ml} </li></ul><ul><li></MaintenanceLog> </li></ul>Db2-fn:xmlcolumn returns sequence of all documents in the XML column
  50. 50. SQL Embedded in XQuery <ul><li>for $m in db2-fn:sqlquery(‘select maintenance from ship where class = 1’) </li></ul><ul><li>let $ml := $m//log </li></ul><ul><li>order by $ml/shipno </li></ul><ul><li>return </li></ul><ul><li><maintenanceLog> </li></ul><ul><li>{$ml} </li></ul><ul><li></mantenanceLog> </li></ul><ul><li>This will return the documents for all class one ships. </li></ul>
  51. 51. Select Statement using XML Column <ul><li>Select shipno,class,maintenance </li></ul><ul><li>from ship </li></ul><ul><li>where class = 1 </li></ul><ul><li>This will produce the maintenance document for each ship that is class 1. </li></ul><ul><li>We can also create views this way </li></ul>
  52. 52. SQL/XML Queries <ul><li>Restricting results using XML element values </li></ul><ul><ul><li>select captid,lname,fname from captain </li></ul></ul><ul><ul><li>where xmlexists(‘$c/contactinfo/Address[state=“MA”]’ </li></ul></ul><ul><ul><li>passing captain.contact as “c” </li></ul></ul><ul><ul><li>This will return the captid, lname and fname of all captains who live in Massachusetts </li></ul></ul>
  53. 53. SQL/XML Queries <ul><li>Projecting XML element values </li></ul><ul><ul><li>Two functions: XMLQuery and XMLTable </li></ul></ul><ul><ul><ul><li>XMLQuery retrieves value for 1 element </li></ul></ul></ul><ul><ul><ul><li>XMLTable retrieves value for multiple elements </li></ul></ul></ul><ul><ul><li>XMLQuery example: </li></ul></ul><ul><ul><ul><li>select xmlquery(‘$c/contactinfo/email’ </li></ul></ul></ul><ul><ul><ul><li>passing contact as “c”) </li></ul></ul></ul><ul><ul><ul><li>from captain </li></ul></ul></ul><ul><ul><ul><li>where state = ‘MA’ </li></ul></ul></ul><ul><ul><li>This will return email addresses for all captains in Massachusetts </li></ul></ul>
  54. 54. SQL/XML Queries XMLQuery (Continued) <ul><ul><li>We could also look for only first email for each captain by changing the first line: </li></ul></ul><ul><ul><ul><li>select xmlquery(‘$c/contactinfo/email[1]’ … </li></ul></ul></ul><ul><ul><li>Similarly, we could use xmlexists to qualify: </li></ul></ul><ul><ul><ul><li>select xmlquery(‘$c/contactinfo/email’ </li></ul></ul></ul><ul><ul><ul><li>passing contact as “c”) </li></ul></ul></ul><ul><ul><ul><li>from captain </li></ul></ul></ul><ul><ul><ul><li>where state = ‘MA’ </li></ul></ul></ul><ul><ul><ul><li>and xmlexists(‘$c/contactinfo/email’ </li></ul></ul></ul><ul><ul><ul><li>passing contact as “c”) </li></ul></ul></ul>
  55. 55. SQL/XML Queries XMLTable <ul><li>XMLTable retrieves XML elements </li></ul><ul><li>Elements are mapped into result set columns </li></ul><ul><li>Maps XML data as relational data </li></ul>
  56. 56. SQL/XML Queries XMLTable Example <ul><li>select s.shipNo,sm.mid,sm.vid,sm.md,sm.cost </li></ul><ul><li>from ship s, </li></ul><ul><li>xmltable(‘$c/mrecord/log’ passing s.maintenance as “c” </li></ul><ul><li>columns varchar(4) mid path ‘mntid’, </li></ul><ul><li> varchar(4) vid path ‘vendorid’, </li></ul><ul><li> date md path ‘maintdate’, </li></ul><ul><li> decimal(7,2) cost path ‘cost’) as sm </li></ul><ul><li>This will produce a list of maintenance logs for all ships </li></ul>
  57. 57. Joining XML and Relational Data <ul><li>select c.captid,c.lname,c.fname </li></ul><ul><li>from captain, ship </li></ul><ul><li>where xmlexists(‘$s/mrec/log[captid=$c]’ </li></ul><ul><li>passing ship.maintenance as “s”, captain.captid as “c”) </li></ul><ul><li>If the captain was the captain of any ship when it underwent maintenance, he or she will be listed </li></ul>
  58. 58. Using FLWR Expressions in SQL/XML <ul><li>select captid, </li></ul><ul><li>xmlquery(‘for $c in $cn/contactinfo </li></ul><ul><li> let $x := $c//city </li></ul><ul><li> return $x’ passing contact as “cn”) </li></ul><ul><li>from captain </li></ul><ul><li>where class = 1 </li></ul><ul><li>Returns captid as well as city information </li></ul>
  59. 59. XMLElement <ul><li>XML Element allows you to publish relational data as XML </li></ul><ul><li>select xmlelement(name “captain”, </li></ul><ul><li>xmlelement(name “captid”, captid), </li></ul><ul><li>xmlelement(name “lname”,lname), </li></ul><ul><li>xmlelement(name “fname”,fname), </li></ul><ul><li>xmlelement(name “class”,class)) </li></ul><ul><li>from captain </li></ul><ul><li>where class <= 2 </li></ul>
  60. 60. XMLElement Output from previous command <ul><li><captain> </li></ul><ul><li><captid>3563</captid> </li></ul><ul><li><lname>Smith</lname> </li></ul><ul><li><fname>John</fname> </li></ul><ul><li><class>2</class> </li></ul><ul><li></captain> </li></ul><ul><li>… </li></ul>
  61. 61. Aggregating and Grouping Data <ul><li>select xmlelement(name “captainlist”, </li></ul><ul><li>xmlagg(xmlelement(name “captain”, </li></ul><ul><li>xmlforest(cid as “captid”,lname as “lname”,fname as “fname”,class as “class”)) </li></ul><ul><li>order by cid)) </li></ul><ul><li>from captain </li></ul><ul><li>group by class </li></ul><ul><li>This query produces three captainlist elements each with a number of captains. </li></ul>
  62. 62. Updating and Deleting XML Data <ul><li>Updates </li></ul><ul><ul><li>Use XMLParse command. You must specify the entire XML column to update. If you specify only 1 element to update, the rest of the data will be lost. </li></ul></ul><ul><li>Deletion </li></ul><ul><ul><li>Same as standard SQL </li></ul></ul><ul><ul><li>Can also use xmlexists to use XML as qualifier </li></ul></ul>
  63. 63. Query Execution Plans <ul><li>Separate parsers for SQL and XQuery statements </li></ul><ul><li>Integrated query compiler for both languages </li></ul><ul><li>QGMX is an internal query graph model </li></ul><ul><li>Query execution plans contain special operators for navigation (XSCAN), XML index access (XISCAN) and joins over XML </li></ul><ul><li>indexes (XANDOR) </li></ul>Source: [2]
  64. 64. Query Run-time Evaluation <ul><li>3 major components added for processing queries over XML: </li></ul><ul><ul><li>XML Navigation </li></ul></ul><ul><ul><li>XML Index Runtime </li></ul></ul><ul><ul><li>XQuery Function Library </li></ul></ul>
  65. 65. Summary <ul><li>Problems with CLOB and Shredded XML storage </li></ul><ul><li>Native XML support in DB2 offers: </li></ul><ul><ul><li>Hierarchical and parsed representation </li></ul></ul><ul><ul><li>Path-specific XML indexing </li></ul></ul><ul><ul><li>New XML join and query methods </li></ul></ul><ul><ul><li>Integration of SQL and XQuery </li></ul></ul>
  66. 66. References <ul><li>[1] Nicola, M. and van der Linden, B. 2005. Native XML support in DB2 universal database. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 1164-1174. </li></ul><ul><li>[2] Beyer, K., Cochrane, R. J., Josifovski, V., Kleewein, J., Lapis, G., Lohman, G., Lyle, B., Özcan, F., Pirahesh, H., Seemann, N., Truong, T., Van der Linden, B., Vickery, B., and Zhang, C. 2005. System RX: one part relational, one part XML. In Proceedings of the 2005 ACM SIGMOD international Conference on Management of Data (Baltimore, Maryland, June 14 - 16, 2005). SIGMOD '05. ACM Press, New York, NY, 347-358. </li></ul><ul><li>[3] http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0603saracco2/ </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×