Efficiently Publishing Relational Data as XML Documents Authors:-  J Shanmugasundaram, Michael Carey etc (IBM Almaden Research Center) Presented By Harshavardhan Achrekar (University of Massachusetts-Lowell)
What drove them? XML emerging as standard for business data exchange on World Wide Web. Need a mechanism to publish currently stored relational data as XML Documents.
Primary Issues Language Specifications - structure and tag data from tables as hierarchical XML Documents. Best Implementation Technique – study characteristics and performances of various alternatives for constructing XML documents. When to add tags & structure How much of processing is done within relational engine?
RoadMap Language specification based on SQL Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation
Sample XML Document for Customer <customer id=”C1”> <name> John Doe </name> <accounts> <account id=”A1”> 1894654 </account> <account id=”A2”> 3849342 </account> </accounts> <porders> <porder id=”PO1” acct=”A1”>  // first purchase order <date>1 Jan 2000</date> <items> <item id=”I1”> Shoes </item> <item id=”I2”> Bungee Ropes </item> </items> <payments> <payment id=”P1”> due Jan 15 </payment> <payment id=”P2”> due Jan 20 </payment> <payment id=”P3”> due feb 15 </payment> </payments> </porder> <porder id=”PO2” acct=”A2”>  // second purchase order … </porder> </porders> </customer> Note the Elements Names/Tags ID Refs Attribute Nested sub-element
Underlying tables Customer ( id  int,  name  varchar) Account ( id  varchar,  custID  int,  acctnum  int) Item ( id  int, poID  int, desc  varchar) PurchOrder ( id  int, custID  int, acctID  varchar, date  varchar) Payment ( id  int, poID  int, desc  varchar)
SQL-based language specifications Sqlfunctions:  Define XMLConstruct  CUST (Custid: integer, CustName: varchar)  AS  { <Customer id=$Custid>$CustName </Customer>} Sqlaggregates:  Select  XMLAGG  (   ITEM  (item.id, item.desc) ) From   Item item // returns an XML aggregation of items
Customer Definition of XML Constructor Define XML Constructor  CUST  (custId: integer, custName: varchar(20), acctList: xml, porderList: xml) AS { < customer  id=$custId> < name > $custName </ name > < accounts > $acctList </ accounts > < porders > $porderList </ porders > </ customer > } Input Output Output - A Customer XML Element Aggregate function XMLAGG – Concatenates XML Fragments produced by XML Constructor
Sample SQL query  constructs XML from relational tables Select cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum)) From Account acct  Where acct.custId=cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item  Where item.poid=porder.id) (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay, Where pay.poid=porder.id))) From PurchOrder porder  Where porder.custID=cust.id)) From Customer cust   Correlated sub-query for customer’s Accounts Correlated sub-query For purchase orders Top Level query returns each customer from customer table Correlated sub-query returns XML fragment LINES  1-14 produces Scalar function returning Customer XML
Implementation Alternatives Two main differences: Nesting (structuring) Tagging Space of alternatives: Late Tagging Early Tagging Late Structuring Early Structuring Inside Engine Inside Engine Inside Engine Outside Engine Outside Engine Outside Engine Stored Procedures CLOB
Early tagging and structuring Stored Procedure - Outside the engine Approach  Explicitly issue nested queries Algorithm:- First query & retrieve root elements (customers id, name) Using Customer id ,issue a query to retrieve account info. Next, for same customer id, issue a query to retrieve customers purchase order For each purchase order retrieved, query to get item and payment info. Once done Processing of one customer is over. Repeat same for next customer till entire XML Document is ready. Fixed order Nested Loop Join outside the ENGINE Tag/Structure as soon as structure is ready Many SQL queries issued/tuple for tables with nested structure.
Early tagging and structuring Correlated CLOB - Inside the engine Approach   Push queries  into  the engine Plug in  XMLAGG, XMLCONSTRUCT   support into engine Character Large Objects- CLOBS XML Fragments Performance Issues -handle huge CLOBS in engine Fixed join order – implies nested loop join strategy
Efficiently Publishing Relational Data as XML Documents
Early tagging and structuring De-Correlated CLOB - Inside the engine Approach\ Decorrelate and use Outer Joins – no longer fixed order Compute Account lists associated with all customers Compute Purchase order lists associated with all customers Join results above on customer id. Still carry around CLOBs  (due to early tagging!)
Efficiently Publishing Relational Data as XML Documents

Relational data as_xml

  • 1.
    Efficiently Publishing RelationalData as XML Documents Authors:- J Shanmugasundaram, Michael Carey etc (IBM Almaden Research Center) Presented By Harshavardhan Achrekar (University of Massachusetts-Lowell)
  • 2.
    What drove them?XML emerging as standard for business data exchange on World Wide Web. Need a mechanism to publish currently stored relational data as XML Documents.
  • 3.
    Primary Issues LanguageSpecifications - structure and tag data from tables as hierarchical XML Documents. Best Implementation Technique – study characteristics and performances of various alternatives for constructing XML documents. When to add tags & structure How much of processing is done within relational engine?
  • 4.
    RoadMap Language specificationbased on SQL Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation
  • 5.
    Sample XML Documentfor Customer <customer id=”C1”> <name> John Doe </name> <accounts> <account id=”A1”> 1894654 </account> <account id=”A2”> 3849342 </account> </accounts> <porders> <porder id=”PO1” acct=”A1”> // first purchase order <date>1 Jan 2000</date> <items> <item id=”I1”> Shoes </item> <item id=”I2”> Bungee Ropes </item> </items> <payments> <payment id=”P1”> due Jan 15 </payment> <payment id=”P2”> due Jan 20 </payment> <payment id=”P3”> due feb 15 </payment> </payments> </porder> <porder id=”PO2” acct=”A2”> // second purchase order … </porder> </porders> </customer> Note the Elements Names/Tags ID Refs Attribute Nested sub-element
  • 6.
    Underlying tables Customer( id int, name varchar) Account ( id varchar, custID int, acctnum int) Item ( id int, poID int, desc varchar) PurchOrder ( id int, custID int, acctID varchar, date varchar) Payment ( id int, poID int, desc varchar)
  • 7.
    SQL-based language specificationsSqlfunctions: Define XMLConstruct CUST (Custid: integer, CustName: varchar) AS { <Customer id=$Custid>$CustName </Customer>} Sqlaggregates: Select XMLAGG ( ITEM (item.id, item.desc) ) From Item item // returns an XML aggregation of items
  • 8.
    Customer Definition ofXML Constructor Define XML Constructor CUST (custId: integer, custName: varchar(20), acctList: xml, porderList: xml) AS { < customer id=$custId> < name > $custName </ name > < accounts > $acctList </ accounts > < porders > $porderList </ porders > </ customer > } Input Output Output - A Customer XML Element Aggregate function XMLAGG – Concatenates XML Fragments produced by XML Constructor
  • 9.
    Sample SQL query constructs XML from relational tables Select cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum)) From Account acct Where acct.custId=cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item Where item.poid=porder.id) (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay, Where pay.poid=porder.id))) From PurchOrder porder Where porder.custID=cust.id)) From Customer cust Correlated sub-query for customer’s Accounts Correlated sub-query For purchase orders Top Level query returns each customer from customer table Correlated sub-query returns XML fragment LINES 1-14 produces Scalar function returning Customer XML
  • 10.
    Implementation Alternatives Twomain differences: Nesting (structuring) Tagging Space of alternatives: Late Tagging Early Tagging Late Structuring Early Structuring Inside Engine Inside Engine Inside Engine Outside Engine Outside Engine Outside Engine Stored Procedures CLOB
  • 11.
    Early tagging andstructuring Stored Procedure - Outside the engine Approach Explicitly issue nested queries Algorithm:- First query & retrieve root elements (customers id, name) Using Customer id ,issue a query to retrieve account info. Next, for same customer id, issue a query to retrieve customers purchase order For each purchase order retrieved, query to get item and payment info. Once done Processing of one customer is over. Repeat same for next customer till entire XML Document is ready. Fixed order Nested Loop Join outside the ENGINE Tag/Structure as soon as structure is ready Many SQL queries issued/tuple for tables with nested structure.
  • 12.
    Early tagging andstructuring Correlated CLOB - Inside the engine Approach Push queries into the engine Plug in XMLAGG, XMLCONSTRUCT support into engine Character Large Objects- CLOBS XML Fragments Performance Issues -handle huge CLOBS in engine Fixed join order – implies nested loop join strategy
  • 13.
    Efficiently Publishing RelationalData as XML Documents
  • 14.
    Early tagging andstructuring De-Correlated CLOB - Inside the engine Approach\ Decorrelate and use Outer Joins – no longer fixed order Compute Account lists associated with all customers Compute Purchase order lists associated with all customers Join results above on customer id. Still carry around CLOBs (due to early tagging!)
  • 15.
    Efficiently Publishing RelationalData as XML Documents