Design Concepts For Xml Applications That Will Perform


Published on

S307479 - Oracle XMLDB - Design Concepts for XML Applications That Will Perform - AMIS - Marco Gralike

Oracle Open World 2009 Presentation

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Design Concepts For Xml Applications That Will Perform

  1. 1. Oracle XML Database<br />Design Concepts for XML Applications That Will Perform ! <br />Marco Gralike, AMIS, 2009<br />
  2. 2. Started as DBA with Oracle 7 on Windows NT 3.1 (1994)<br />Experienced with Oracle 7.x / 8.x / 9.x / 10.x and 11.1<br />Oracle 11g Beta tester for Oracle XMLDB<br />Active Oracle OTN XMLDB Forum Member<br />Oracle ACE Award for XMLDB Community Contributions<br />OakTable Network member<br />Introductions<br />
  3. 3. Or a short story <br />“Why XML on Diskcan be faster than XML in Memory…”<br />
  4. 4. Disclaimer<br />The following are “Rules of Numb”<br />Bare in mind: Every environment has its own unique criteria and needs regarding business needs and its architecture, etc…<br />“Maintainability”<br />“Extendibility”<br />…so pay attention to:<br /> “Choice”<br /> “Design”<br />“Testing”<br />“Performance”<br />
  5. 5. A Customer Use-Case<br />
  6. 6. Initial State<br />No performance<br />12.000 “Cases” / night (4 Hour Window)<br />4 hours are not enough anymore<br />The “XML” part “looks like it takes too long”<br />Original database system version 8.1.X<br />Future Wishes<br />The need to be able to handle 120.000 “Cases” / night<br />In the near future hardware/OS from OpenVMS to HPUX<br />Customer Case<br />
  7. 7. An overview<br />Memory<br />/ DOM<br />Memory<br />/ DOM<br />CLOB<br /> Oracle <br />Advanced Queue<br />XMLType<br />BLOB<br />Process <br />Checks<br />Validation<br />XML Schema<br />(JAVA)<br />Store in <br />ETL Tables<br />Oracle <br />Workflow<br />Shred Elements<br />Via XMLDOM<br />
  8. 8. 10.000 “Cases” (~ 10 Mb size)<br />
  9. 9. How expensive are 1.000 “Cases” ?<br />
  10. 10. The Cost of Mixing Worlds<br />
  11. 11. BLOB2CLOB and CLOB2XMLType<br />
  12. 12. Feeding data to the database<br />Why BLOB ?  XML data & PDF data<br />Why CLOB ?  Conversion needed for XML handling <br />Why XMLType  Needed to check XML element content<br /> XML Validation (well-formedness)<br />Memory<br />/ DOM<br />CLOB<br /> Oracle <br />Advanced Queue<br />XMLType<br />BLOB<br />
  13. 13. Different data models. <br />XPath models an XML document as <br /> a tree while most general purpose <br /> programming languages <br /> have no native data types for a tree.<br />Different programming paradigms. <br />XSLT is a functional language, while Java <br /> is object-oriented and Perl is a procedural one.<br />Effect/Costs<br />Unnecessary CPU and Memory Overhead <br />A lot of expensive type and encoding conversions<br />Impedance Mismatch<br />
  14. 14. If you deal with XML  Handle it via XML(DB) <br />So if it is relational, do it the relational way…<br />If XML use XQuery, or others like XPath etc…<br />If you mix worlds be careful regarding<br />Information loss (PK/FK  XML) ?<br />Whitespace  NULL  Whitespace ?<br />Impedance mismatch<br />The General Rule !<br />
  15. 15. XML Document Validation<br />
  16. 16. Validate XML Document via its XML Schema<br />
  17. 17. Validation on content and structure<br />XML Schema  Validation on XML structure <br />  PL/SQL Wrapper with JAVA XML Parser <br />Memory<br />/ DOM<br />Validation<br />XML Schema<br />( JAVA based)<br /> XMLType<br />Shred Elements<br />via XMLDOM<br />Process <br />Checks<br />
  18. 18. Java XML Parser<br />
  19. 19. XML Parsers<br />Often DOM or Infoset based<br />CPU intensive<br />Memory intensive<br />Parsing, serializing or tree traversals, happen in memory<br />Often handle XML tree traversals only via ONE method<br />It is not structured, semi-structured or unstructured<br /> XML content aware<br />It is not very “smart” / “content aware” regarding XML handling based on its XML tree’s and/or XML data content<br />
  20. 20. XML Schema will be parsed only once<br />XML Schema will be cached in memory<br />No additional parsing<br />No additional validation<br />XML Document structure is known, therefore:<br />No parsing is needed when loaded from disk into memory<br />XML Object (XOB) structures can be applied<br />Memory footprint is much less compared to DOM structure<br />Needed specific nodes can now be handled efficiently in memory<br />XML Schema Registration Advantages<br />
  21. 21. XML Schema based - Query Rewrite<br />String<br />CHAR<br />bookstore<br />String<br />VARCHAR2<br />(20)<br />Float<br />CLOB<br />book<br />whitepaper<br />title<br />author<br />author<br />chapter<br />title<br />author<br />id<br />paragraph<br />NUMBER<br />(15)<br />content<br />content<br />
  22. 22. Checked on<br />XML Well-Formedness<br />One root element<br />Begin & End tags<br />If XML Schema reference<br />XOB methods will be used if an <br /> XML Schema is available<br />DOM methods will be used if an <br /> XML Schema information is <br /> not available <br />XMLType – Not just a “Datatype”<br />
  23. 23. Some XSD Design Rules<br />
  24. 24. Keep XML small !<br />Do not use / enforce Pretty Print if not needed<br />Avoid namespace reference “Overkill”<br />Most used Namespace is Leading <br />Use short Namespace References<br />Make XML data as “sparse” as possible<br />&lt;employee&gt;&lt;name&gt;Marco&lt;/name&gt;&lt;/employee&gt;<br />&lt;employee name=“Marco”/&gt;<br />XML Data Partitioning<br />Binary XML if possible<br />Y<br />X<br />
  25. 25. XML Design<br />Avoid Cyclic References in XML Schemata<br />For ease of Maintenance: xdb:annotations<br />Is DOM validation, fidelity needed ?<br />CPU: XML parsing- XML Schema validation “overhead” ?<br />Index maintenance overhead, if implemented via disk<br />Y<br />X<br />
  26. 26. XML Document Handling<br />Shredding & Storing XML<br />
  27. 27. Check Total Amount<br />
  28. 28. XML Content<br />TABLE “B”<br />TABLE “A”<br />TABLE “C”<br />
  29. 29. Think in “3D” or in “Driving Table” terms<br />maxoccurs=“unbounded”<br /> Give me the &lt;title&gt; and &lt;content&gt; where &lt;content&gt; contains…<br />1<br />3<br />4<br />5<br />2<br />X<br />Y<br />6<br />Z<br />x n rows <br />
  30. 30. Checking the Amount…<br />
  31. 31. Used Setup<br /> OpenVMS <br /> Version<br /> 1.000 “Cases”<br />1) l_xpath := &apos;//case[&apos;||i||&apos;]/amount_charged/text()&apos; ;<br />2) l_xpath := &apos;/case_data/case[&apos;||i||&apos;]/amount_charged/text()&apos; ;<br />3) select sum(to_number(extract(value(tr),&apos;/case_data/case/amount_charged/text()&apos;))<br />All in memory: COLLECTION ITERATOR PICKLER FETCH<br />The Effect of // (for a 1.000 “Cases)<br />
  32. 32. CLOB XMLType (V<br />ORA-31186<br />
  33. 33. Effect of //<br />In memory<br />10.000 Cases:<br />ORA-31186<br /> Document contains too <br /> many nodes<br />maxoccurs=unbounded<br />maxLength, totalDigits, etc <br />Increasing volume – XMLType CLOB<br />ORA-31186: Document contains too many nodes<br />Cause: Unable to load the document because it has exceeded <br /> the maximum allocated number of DOM nodes.<br />Action: Reduces the size of the document<br />
  34. 34. XML Document Handling<br />Object Relational, Binary XML<br />
  35. 35. A Solution based on XMLType O.R.<br />Rewrite on Disk <br />/ XOB <br />(Relational)<br />CLOB<br /> Oracle <br />Advanced Queue<br />BLOB<br />Store in <br />ETL Tables<br />Oracle <br />Workflow<br />Validation<br />Against <br />XML Schema<br />Checks<br />XMLType Table<br />(O.R)<br />
  36. 36. Driving Access on CONTENT (11gR1, on Disk)<br />BTree Index<br />BTree Index<br />BTree Index<br />bookstore<br />Secondary Oracle Text Index<br />Function based Index (XPath)<br />BTree<br />Index<br />book<br />whitepaper<br />Unstructured<br />XMLIndex<br />title<br />author<br />author<br />chapter<br />title<br />author<br />id<br />paragraph<br />content<br />structured<br />content<br />
  37. 37. Can be influenced via <br />Statistics<br />Indexes<br />XML Schema Registration (XOB)<br />Encoding in Binary XML storage<br />SQL Re-Write of XPath, XQuery<br />Partitioning<br />Cost Based Optimizer Advantages<br />
  38. 38. O.R. XMLType (V<br />ORA-31186<br />ORA-31186<br />
  39. 39. So why can DISK out perform MEMORY<br />XML Schema validation based on Registered XML Schema<br />Query re-write possible<br />Based on plain “old” SQL/database methods<br />Optimized CPU handling<br />Optimized Memory handling (if needed)<br />Multiple optimized solutions possible via Optimizer instead of one XML parser method<br />Specific parts of XML can be handled / be driven via: <br />specific indexing <br />or content<br />Full blown validation can be avoided<br />
  40. 40. Recap…<br />
  41. 41. Be aware of what you are doing !<br />Avoid unneeded (full) XML Schema validation<br />During Insert<br />Generating XML<br />Avoid Impedance mismatch<br />Java  XML  Java  XML  Relational  XML  Java<br />“All In One Go Objective”<br />Avoid intermediate XML fragments<br />//<br />XMLEXISTS<br />Use Indexes <br />xdb:MaintainDOM=false<br />Y<br />X<br />
  42. 42. XML Data Handling and Design<br />Handle XML Smart<br />Keep XML Small<br />Restrict XML where possible <br />Be precise !<br />maxoccurs, maxLength<br />Provide Oracle of extra / precise information (XSD)<br />Register XML Schema<br />If possible…<br />Y<br />X<br />
  43. 43. Balanced Design<br /><ul><li>Inserts, Updates & Deletes
  44. 44. XML Future Changes
  45. 45. Index Maintenance
  46. 46. Selects
  47. 47. In Memory
  48. 48. Via Indexes
  49. 49. XML Validation
  50. 50. Strict, Lazy
  51. 51. Client Side Possibilities</li></li></ul><li>Now you why DISK can be faster than MEMORY<br />100.000 “Cases” shredded & validated in 5 minutes <br />Instead of 1000 “Cases” in 3 minutes…<br />Avoiding <br />ORA-31186: Document contains too many nodes<br />Scalable <br />Efficient with Memory and CPU<br />Checked in production on a database version<br />Extra:<br />…decreased used PL/SQL code by half…<br />…but will have to KNOW what you are doing…<br />
  52. 52. Oracle Open World 2009 - XMLDB Sessions<br />
  53. 53. References<br />XMLDB DevelopersGuide<br /><br />The XMLDB Forum<br /><br />XML DB FAQ Thread <br /><br />Blog<br /><br /><br />