The Ultimate Performance Challenge:“How to Make XML Perform ?!”<br />Marco Gralike<br />
Agenda<br />
Agenda<br />
“XML is not a ‘fast’ thing, there is a ton of parsing involved. Sorry, I never saw the point in huge XML files – they are ...
“The foundation is there; So why not use it?”<br />…referring to the Relational Model…<br />Chris Date- Hotsos keynote, 20...
Relational…<br />
XML…?<br />
Evolution…<br />
If you’re a performance nerd, <br />	this is actually cool…<br />No one figured out XML yet…<br />Solving the customer pro...
Agenda<br />
Free Format…”XML is cool”… <br />(aka no design effort)<br />Have to uphold the “Coding Granny Argument” (among others mea...
Different data models<br />XPath models an XML document as <br />	a tree while most general purpose <br />	programming lan...
Effects, Costs<br />Unnecessary CPU and Memory <br />Overhead <br />A lot of expensive type and <br />	encoding conversion...
Agenda<br />
Containerization<br />
The “Dimensions” in 1 XML doc.<br />1<br />3<br />4<br />5<br />2<br />X<br />Y<br />6<br />Z<br />nx<br />rows <br />Elem...
Multi Dimensional Issues…<br />Its a database…<br />Its Row based<br />Its Column based<br />Its multiple databases…<br />...
Complexities of a database<br />“Relations”<br />“Redundancy”<br />“Nullology”<br />Design, etc…<br />It can contain a dat...
Checked on<br />XML Well-Formedness<br />One root element<br />Begin & End tags<br />If XML Schema reference<br />XOB meth...
What you want in access…<br />Fast DDL<br />Selects<br />Inserts, Deletes, Updates<br />Specific / Smart<br />Small XML Fr...
Agenda<br />
Document contra Data Driven<br />
Structured / Semi-Structured<br />Structured<br />Semi<br />Structured<br />
Common XML Parsers<br />Often DOM or Infoset based<br />CPU intensive<br />Memory intensive<br />Serializing, parsing, tre...
In Memory: Common XML Parsers<br />Often handle XML tree traversals only via <br />ONEmethod<br />It is not structured, se...
XMLType Physical Storage<br />CLOB<br />LOB<br />LOB index<br />Object Relational<br />Varray, Types, Nested Tables<br />I...
Choosing a Storage Model<br />
Hybrid<br />CLOB<br />Mixed<br />complex[n]<br />un/structured<br />XSD [y]<br />B-Tree, IOT<br />Document<br />na<br />un...
Partition XML data<br />EMPLOYEES_PROJ_TAB<br />PROJ_DETAILS_TAB<br />EMP_PROJ_P11<br />“employees”.”employee”<br />refere...
XML Partitioning<br />Object Relational Partitioning<br />Equi-Partitioning since version Oracle 11.1.0.7.0<br />Binary XM...
Agenda<br />
Index Quick Sheet<br />
Unstructured XMLIndex (UXI)<br />PathTable<br />UsePath Subsetting<br />FullBlown XMLIndex canbe BIG <br />Token Tables (X...
PathTable<br />INDEXED COLUMNS<br />PATH INDEX<br /><ul><li>(PATHID, RID), BTREE</li></ul>ORDER INDEX<br /><ul><li>(RID, O...
FUNCTION BASED</li></ul>NotIndexed: LOCATOR column, pointer to  XML fragments (XDB.X$...)<br />SECONDARY INDEXES<br />Unst...
Structured XMLIndex (SXI)<br />Content Table(s)<br />BasedonXMLTABLE syntax<br />XMLTable construct canbe<br />nestedbut:<...
Content Table(s)<br />INDEXED COLUMNS<br />KEY INDEX<br /><ul><li>(KEY), Unique BTREE</li></ul>RID INDEX<br /><ul><li>(RID...
Driving access on CONTENT<br />BTree Index<br />bookstore<br />Secondary Oracle Text Index<br />Function based Index (XPat...
There can be only one XMLIndex…<br />
Agenda<br />
Design<br />
XML Schema will be parsed only once<br />If registered in the XDB Repository<br />XML Schema will be cached in memory (SGA...
XML Document structure is known, therefore<br />No parsing is needed when loaded from disk into memory<br />XML OBject (XO...
XDB Annotations<br />Hybrid: CLOB withinOR<br />
XDB Annotations (OR/Binary XML)<br />Levels<br />Root, Simpletype, Complextype<br />xmlns:xdb="http://xmlns.oracle.com/xdb...
Mixing Logical and Physical Design<br />
XML Schema - Query Rewrite<br />String<br />CHAR<br />String<br />Float<br />bookstore<br />CLOB<br />VARCHAR2<br />(20)<b...
XML Design<br />Avoid Cyclic References in XML Schemata<br />For ease of Maintenance: xdb:annotations<br />Is DOM validati...
Be aware of what you are doing !<br />Avoid unneeded (full) XML Schema validation<br />During Storage (Inserts), Generatin...
Agenda<br />
Keep XML small<br />Do not use / enforce Pretty Print if not needed<br />Avoid namespace reference “Overkill”<br />Most us...
Keep XML small (OR specific)<br />Don’t use “meaning full element names”<br />64Kb DDL “create table” buffer<br />ORA 0179...
Holistic Approach (Recap)<br />
Customer Use Case<br />Memory<br />/ DOM<br />Memory<br />/ DOM<br />CLOB<br /> Oracle <br />Advanced Queue<br />XMLType<b...
Duration (1000 Cases)<br />
New XML Approach<br />Rewrite on Disk <br />/ XOB <br />(Relational)<br />CLOB<br /> Oracle <br />Advanced Queue<br />BLOB...
Using the CBO as an XML Parser…<br />ORA-31186<br />ORA-31186<br />ORA-31186<br />ORA-31186: 	Document contains too many n...
Using the (XML) Relational Mindset<br />Design XSD as you would with E(E)R<br />Design for proper physical access, perform...
So in short: Balanced Design<br />Inserts, Updates & Deletes<br />XML Future Changes <br />Index Maintenance<br />Selects<...
Reward<br />Optimal performance<br />Out performing XML <br />Proper design will give<br />	you 10, 100 fold <br />	perfor...
Upcoming SlideShare
Loading in …5
×

Hotsos 2010 - The Ultimate Performance Challenge: How To Make Xml Perform?

2,180 views

Published on

XML has or will invade your database, due to business needs, e.g. XBRL, driven by desire for SOA architectures or progressive minded developers. Oracle has adapted and implemented functionality to support this need via Oracle XMLDB. A new performance challenge has been born.

Probably without realizing it, we have all the tools at hand, in our heads and via the more traditional way of approaching this challenge, to make XML actually work in our database.

This presentation will show and demonstrate, in detail, how you can use your relational knowledge to master and make this XML beast perform the way you want it. The cool thing about it; XML knowledge is not required. Isn't it data after all?

Or...as Chris Date mentioned during his Hotsos Symposium 2009 keynote address: "The foundation is there so why not use it?"

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,180
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Square wheel  JSON?
  • Emp/Dept tables, Foreign/Primary Keys…Showing here ONLY 1 XML document…
  • Hotsos 2010 - The Ultimate Performance Challenge: How To Make Xml Perform?

    1. 1. The Ultimate Performance Challenge:“How to Make XML Perform ?!”<br />Marco Gralike<br />
    2. 2. Agenda<br />
    3. 3.
    4. 4.
    5. 5. Agenda<br />
    6. 6. “XML is not a ‘fast’ thing, there is a ton of parsing involved. Sorry, I never saw the point in huge XML files – they are many times larger than they should be and the amount of work involved in parsing them is incredible”.<br />Tom Kyte - Januari 9, 2009, AskTom<br />
    7. 7. “The foundation is there; So why not use it?”<br />…referring to the Relational Model…<br />Chris Date- Hotsos keynote, 2009<br />
    8. 8. Relational…<br />
    9. 9. XML…?<br />
    10. 10. Evolution…<br />
    11. 11. If you’re a performance nerd, <br /> this is actually cool…<br />No one figured out XML yet…<br />Solving the customer problem…<br />Back to basics…<br />Deeper understanding of<br /> the data handling issues…<br />So why the “Hxxx” XML…?<br />
    12. 12. Agenda<br />
    13. 13. Free Format…”XML is cool”… <br />(aka no design effort)<br />Have to uphold the “Coding Granny Argument” (among others meaningful names)<br />Everyone for themselves…<br />Waiting for “Codd, Date”…<br />Square wheels…<br />What’s spoiling the soup…?<br />
    14. 14. Different data models<br />XPath models an XML document as <br /> a tree while most general purpose <br /> programming languages <br /> have no native data types for a tree.<br />Different programming paradigms <br />XSLT is a functional language, while Java <br /> is object-oriented and Perl is a procedural one.<br />Impedance Mismatch<br />
    15. 15. Effects, Costs<br />Unnecessary CPU and Memory <br />Overhead <br />A lot of expensive type and <br /> encoding conversions<br />Impedance Mismatch<br />
    16. 16. Agenda<br />
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22. Containerization<br />
    23. 23. The “Dimensions” in 1 XML doc.<br />1<br />3<br />4<br />5<br />2<br />X<br />Y<br />6<br />Z<br />nx<br />rows <br />Elements with maxoccurs=“unbounded”<br />
    24. 24. Multi Dimensional Issues…<br />Its a database…<br />Its Row based<br />Its Column based<br />Its multiple databases…<br />More then 1 XML doc<br />Not uncommon 1 Mb >><br />
    25. 25. Complexities of a database<br />“Relations”<br />“Redundancy”<br />“Nullology”<br />Design, etc…<br />It can contain a database<br />10 Mb or bigger nowadays<br />More often than less…<br />Enormous complex XSD’s <br />XMLType – Not just a “Container”<br />
    26. 26. Checked on<br />XML Well-Formedness<br />One root element<br />Begin & End tags<br />If XML Schema reference<br />XOB methods will be used if an XML Schema is available<br />DOM methods will be used if registered <br /> XML Schema information is not available <br />XMLType – Not just a “Container”<br />
    27. 27. What you want in access…<br />Fast DDL<br />Selects<br />Inserts, Deletes, Updates<br />Specific / Smart<br />Small XML Fragments<br />Direct Access<br />
    28. 28. Agenda<br />
    29. 29. Document contra Data Driven<br />
    30. 30. Structured / Semi-Structured<br />Structured<br />Semi<br />Structured<br />
    31. 31. Common XML Parsers<br />Often DOM or Infoset based<br />CPU intensive<br />Memory intensive<br />Serializing, parsing, tree traversals, happen in memory…<br />
    32. 32. In Memory: Common XML Parsers<br />Often handle XML tree traversals only via <br />ONEmethod<br />It is not structured, semi-structured or unstructured XML content aware<br />It is not very “smart” / “content aware” regarding XMLhandling based on its XML tree’s and/or XML data content<br />
    33. 33. XMLType Physical Storage<br />CLOB<br />LOB<br />LOB index<br />Object Relational<br />Varray, Types, Nested Tables<br />IOT, B-Tree, XML Schema<br />Binary XML<br />LOB, LOB Index<br />Stored in Post Parse Representation<br />
    34. 34. Choosing a Storage Model<br />
    35. 35. Hybrid<br />CLOB<br />Mixed<br />complex[n]<br />un/structured<br />XSD [y]<br />B-Tree, IOT<br />Document<br />na<br />unstructured<br />XSD [n]<br />XMLIndex<br />Relational World<br />XMLDB World<br />XML Data Storage<br />XMLType<br />column/tables<br />XMLType<br />Views<br />Obj.Rel.<br />Binary XML<br />Content<br />complex[n]<br />structured<br />XSD [y]<br />B-Tree, IOT<br />(Object) <br />Relational <br />Objects<br />Mixed<br />complex[y]<br />un/structured<br />XSD [y/n]<br />XMLIndex<br />Relational <br />Tables<br />
    36. 36.
    37. 37. Partition XML data<br />EMPLOYEES_PROJ_TAB<br />PROJ_DETAILS_TAB<br />EMP_PROJ_P11<br />“employees”.”employee”<br />reference_id<br />EMP_PROJ_P12<br />
    38. 38. XML Partitioning<br />Object Relational Partitioning<br />Equi-Partitioning since version Oracle 11.1.0.7.0<br />Binary XML Partitioning<br />Range, List, Hash<br />Local partitioned XMLIndex<br />LOCAL keyword in XMLIndex create syntax<br />XMLIndex is not supported for HASH partitioning<br />Partition Key on virtual Column (Binary XML)<br />Partition Key on column (Object Relational)<br />
    39. 39. Agenda<br />
    40. 40. Index Quick Sheet<br />
    41. 41. Unstructured XMLIndex (UXI)<br />PathTable<br />UsePath Subsetting<br />FullBlown XMLIndex canbe BIG <br />Token Tables (XDB.X$......)<br />Query re-writeonTokens<br />Fuzzy Searches, //<br />Optimizer Statistics<br />CanbemaintainedManually<br />Recorded inPending Table<br />Secondaryindexespossible<br />Unstructured<br />XMLIndex<br />f (x)<br />Path Table<br />
    42. 42. PathTable<br />INDEXED COLUMNS<br />PATH INDEX<br /><ul><li>(PATHID, RID), BTREE</li></ul>ORDER INDEX<br /><ul><li>(RID, ORDER_KEY), BTREE</li></ul>VALUE INDEX<br /><ul><li>(SUBSTRB("VALUE",1,1599))
    43. 43. FUNCTION BASED</li></ul>NotIndexed: LOCATOR column, pointer to XML fragments (XDB.X$...)<br />SECONDARY INDEXES<br />Unstructured<br />XMLIndex<br />f (x)<br />Path Table<br />
    44. 44. Structured XMLIndex (SXI)<br />Content Table(s)<br />BasedonXMLTABLE syntax<br />XMLTable construct canbe<br />nestedbut:<br />Only 1 extra XMLType allowed<br />VIRTUAL column is passed<br />CanbemaintainedManually<br />Secondaryindexespossible<br />Structured<br />XMLIndex<br />f (x)<br />Content<br />Tables<br />
    45. 45. Content Table(s)<br />INDEXED COLUMNS<br />KEY INDEX<br /><ul><li>(KEY), Unique BTREE</li></ul>RID INDEX<br /><ul><li>(RID), Non-Unique BTREE</li></ul>Indexesneededforcombined XMLIndex Types<br />Mixing Unstructured and StructuredXMLIndexes<br />Yourdefined columns <br />Secondaryindexes<br />Structured<br />XMLIndex<br />f (x)<br />Content<br />Tables<br />
    46. 46. Driving access on CONTENT<br />BTree Index<br />bookstore<br />Secondary Oracle Text Index<br />Function based Index (XPath)<br />book<br />whitepaper<br /> StructuredXMLIndex<br />Unstructured<br />XMLIndex<br />title<br />author<br />author<br />chapter<br />title<br />author<br />id<br />paragraph<br />content<br />structured<br />content<br />Structured<br />XMLIndex<br />
    47. 47. There can be only one XMLIndex…<br />
    48. 48. Agenda<br />
    49. 49. Design<br />
    50. 50. XML Schema will be parsed only once<br />If registered in the XDB Repository<br />XML Schema will be cached in memory (SGA)<br />No additional parsing<br />No additional validation<br />XML Schema Advantages<br />
    51. 51. XML Document structure is known, therefore<br />No parsing is needed when loaded from disk into memory<br />XML OBject (XOB) structures can be applied<br />Memory footprint is much less compared to DOM structure<br />Needed specific nodes can now be handled efficiently in memory<br />XML Schema Advantages<br />
    52. 52. XDB Annotations<br />Hybrid: CLOB withinOR<br />
    53. 53. XDB Annotations (OR/Binary XML)<br />Levels<br />Root, Simpletype, Complextype<br />xmlns:xdb="http://xmlns.oracle.com/xdb"<br />xdb:storeVarrayAsTable<br />xdb:defaultTable<br />xdb:maintainDom<br />xdb:maintainOrder<br />xdb:SQLInline<br />Oracle V.11.1.0.7.0 - Partitioning <br />xdb:tableprops<br />
    54. 54. Mixing Logical and Physical Design<br />
    55. 55. XML Schema - Query Rewrite<br />String<br />CHAR<br />String<br />Float<br />bookstore<br />CLOB<br />VARCHAR2<br />(20)<br />book<br />whitepaper<br />title<br />author<br />author<br />chapter<br />title<br />author<br />id<br />paragraph<br />NUMBER<br />(15)<br />content<br />content<br />
    56. 56. XML Design<br />Avoid Cyclic References in XML Schemata<br />For ease of Maintenance: xdb:annotations<br />Is DOM validation, fidelity needed ?<br />CPU / XML parsing: <br /> XML Schema validation “overhead” ?<br />Index maintenance overhead, <br /> when using “disk” solutions<br />Y<br />X<br />
    57. 57. Be aware of what you are doing !<br />Avoid unneeded (full) XML Schema validation<br />During Storage (Inserts), Generating XML<br />xdb:MaintainDOM=false<br />Avoid Impedance mismatch<br />Java  XML  Java  XML  Relational  XML  Java (“All In One Go Objective”)<br />Avoid XML fragments<br />// and/or via XMLEXISTS<br />Use Indexes <br />Y<br />X<br />
    58. 58. Agenda<br />
    59. 59. Keep XML small<br />Do not use / enforce Pretty Print if not needed<br />Avoid namespace reference “Overkill”<br />Most used Namespace is Leading <br />Use short Namespace References (aliases)<br />Make XML data as “sparse” as possible<br /><employee><name>Marco</name></employee><br /><employee name=“Marco”/><br />XML Data Partitioning<br />Binary XML if needed<br />Y<br />X<br />
    60. 60. Keep XML small (OR specific)<br />Don’t use “meaning full element names”<br />64Kb DDL “create table” buffer<br />ORA 01792 maximum number of columns in a table or view is 1000<br />Break XML up<br />Out of Line<br />CLOB (unstructured)<br />Not Accessed Data<br />Don’t create objects if you don’t need it<br />Use xdb:defaultTable=“” for global types<br />
    61. 61. Holistic Approach (Recap)<br />
    62. 62. Customer Use Case<br />Memory<br />/ DOM<br />Memory<br />/ DOM<br />CLOB<br /> Oracle <br />Advanced Queue<br />XMLType<br />BLOB<br />Process <br />Checks<br />Validation<br />XML Schema<br />(JAVA)<br />Store in <br />ETL Tables<br />Shred Elements<br />Via XMLDOM<br />
    63. 63. Duration (1000 Cases)<br />
    64. 64. New XML Approach<br />Rewrite on Disk <br />/ XOB <br />(Relational)<br />CLOB<br /> Oracle <br />Advanced Queue<br />BLOB<br />Store in <br />ETL Tables<br />Oracle <br />Workflow<br />Validation<br />Against <br />XML Schema<br />Checks<br />XMLType Table<br />(O.R)<br />
    65. 65. Using the CBO as an XML Parser…<br />ORA-31186<br />ORA-31186<br />ORA-31186<br />ORA-31186: Document contains too many nodes<br />Cause: Unable to load the document because it has exceeded the maximum allocated number of DOM nodes.<br />
    66. 66. Using the (XML) Relational Mindset<br />Design XSD as you would with E(E)R<br />Design for proper physical access, performance:<br />Storage, Index<br />Content Awareness<br />Partitioning <br />Overkill of “meaning full” data parsing<br />Avoid Redundancy, whitespace, “Pretty Print”<br />Design with the future in mind<br />
    67. 67. So in short: Balanced Design<br />Inserts, Updates & Deletes<br />XML Future Changes <br />Index Maintenance<br />Selects<br />In Memory<br />Via Indexes<br />XML Validation<br />Strict, Lazy<br />Client Side Possibilities<br />
    68. 68. Reward<br />Optimal performance<br />Out performing XML <br />Proper design will give<br /> you 10, 100 fold <br /> performance increase over<br /> XML handling…<br />…also known as…ehh…<br />…standard relational database performance…<br />
    69. 69.
    70. 70. References<br />Oracle XML DB <br />http://www.oracle.com/pls/db112/homepage<br />XML DB FAQ Thread <br />http://forums.oracle.com/forums/thread.jspa?threadID=410714<br />Blog<br />http://technology.amis.nl/blog<br />http://blog.gralike.com<br />

    ×