Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Oracle XML Database  Design Concepts for XMLApplications That Will Perform !                                         ...
Or a short story  “Why XML on Disk can befaster than XML in Memory…”                                                 ...
A Customer Use-Case                                                    6
Customer CaseInitial State No performance 12.000 “Cases” / night (4 Hour Window) 4 hours are not enough anymore The “X...
An overview                                                         Memory                                                ...
10.000 “Cases” (~ 10 Mb size)                                                                        9
How expensive are 1.000 “Cases” ?                                                                                10
The Cost of Mixing Worlds                                                                11
BLOB2CLOB and CLOB2XMLType                                                                  12
Feeding data to the database                                  Memor                                     y                 ...
Impedance Mismatch Different data models.   XPath models an XML document as    a tree while most general purpose    prog...
The General Rule !If you deal with XML  Handle it via   XML(DB) So if it is relational, do it the relational way… If XM...
XML Document Validation                                                            16
Validate XML Document via its XML Schema                                                                             ...
Validation on content and structure                                   Memor                                      y        ...
Java XML Parser                                            19
XML Parsers Often DOM or Infoset based CPU intensive Memory intensive Parsing, serializing or tree traversals, happen ...
XML Schema Registration Advantages XML Schema will be parsed only once XML Schema will be cached in memory   No additio...
XML Schema based - Query Rewrite                                               CHAR                                       ...
XMLType – Not just a “Datatype”Checked on XML Well-Formedness   One root element   Begin & End tags If XML Schema refe...
Some XSD Design Rules                                                        24
Keep XML small ! Do not use / enforce Pretty Print if not needed Avoid namespace reference “Overkill”   Most used Names...
XML Design Avoid Cyclic References in XML Schemata For ease of Maintenance: xdb:annotations Is DOM validation, fidelity...
XML Document HandlingShredding & Storing XML                                                            27
Check Total Amount                                                  28
XML Content                    TABLE “A”              TABLE “B”                     TABLE “C”                             ...
Think in “3D” or in “Driving Table” termsmaxoccurs=“unbounded”  Give me the <title> and <content> where <content> contain...
Checking the Amount…                                                      31
The Effect of //                               (for a 1.000“Cases)Used Setup OpenVMS Version 9.2.0.5.0 1.000 “Cases”1) l_x...
CLOB XMLType (V 11.1.0.6.0)         ORA-31186                                                                    33
Increasing volume – XMLType CLOB Effect of // In memory 10.000 Cases:   ORA-31186    Document contains too    many nod...
XML Document HandlingObject Relational, Binary XML                                                                   ...
A Solution based on XMLType O.R.          Oracle                                     BLOB         CLOB      Advanced Queue...
Driving Access on CONTENT                                        (…on disk…)                                              ...
Cost Based Optimizer Advantages Can be influenced via     Statistics     Indexes     XML Schema Registration (XOB)   ...
O.R. XMLType (V 11.1.0.6.0)                 ORA-31186                              ORA-31186                              ...
So why can DISK out perform MEMORY XML Schema validation based on Registered XML  Schema Query re-write possible   Base...
Recap…                          41
Be aware of what you are doing ! Avoid unneeded (full) XML Schema validation   During Insert   Generating XML Avoid Im...
XML Data Handling and Design Handle XML Smart Keep XML Small Restrict XML where possible   Be precise !   maxoccurs, ...
Balanced Design Inserts, Updates & Deletes                                   In    XML Future Changes          Memory   ...
Now you why DISK can be faster than MEMORY 100.000 “Cases” shredded & validated in 5 minutes   Instead of 1000 “Cases” i...
46
ReferencesXMLDB Developers Guide   http://www.oracle.com/pls/db112/homepageThe XMLDB Forum   http://forums.oracle.com/fo...
BGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will perform
Upcoming SlideShare
Loading in …5
×

BGOUG 2012 - Design concepts for xml applications that will perform

1,126 views

Published on

updated version

Published in: Technology
  • Be the first to comment

BGOUG 2012 - Design concepts for xml applications that will perform

  1. 1. Oracle XML Database Design Concepts for XMLApplications That Will Perform !  1 Marco Gralike
  2. 2. Or a short story “Why XML on Disk can befaster than XML in Memory…”  5
  3. 3. A Customer Use-Case  6
  4. 4. Customer CaseInitial State No performance 12.000 “Cases” / night (4 Hour Window) 4 hours are not enough anymore The “XML” part “looks like it takes too long” Original database system version 8.1.XFuture Wishes The need to be able to handle 120.000 “Cases” / night In the near future hardware/OS from OpenVMS to HPUX  7
  5. 5. An overview Memory / DOM Oracle BLOB CLOB XMLType Advanced Queue Validation Process Shred Elements Store in Oracle XML Schema Checks Via XMLDOM ETL Tables Workflow (JAVA) Memor y / DOM  8
  6. 6. 10.000 “Cases” (~ 10 Mb size)  9
  7. 7. How expensive are 1.000 “Cases” ?  10
  8. 8. The Cost of Mixing Worlds  11
  9. 9. BLOB2CLOB and CLOB2XMLType  12
  10. 10. Feeding data to the database Memor y / DOM Oracle BLOB CLOB XMLType Advanced Queue Why BLOB ?  XML data & PDF data Why CLOB ?  Conversion needed for XML handling Why XMLType  Needed to check XML element content XML Validation (well-formedness)  13
  11. 11. Impedance Mismatch Different data models.  XPath models an XML document as a tree while most general purpose programming languages have no native data types for a tree. Different programming paradigms.  XSLT is a functional language, while Java is object-oriented and Perl is a procedural one.Effect/Costs Unnecessary CPU and Memory Overhead A lot of expensive type and encoding conversions  14
  12. 12. The General Rule !If you deal with XML  Handle it via XML(DB) So if it is relational, do it the relational way… If XML use XQuery, or others like XPath etc… If you mix worlds be careful regarding  Information loss (PK/FK  XML) ?  Whitespace  NULL  Whitespace ?  Impedance mismatch  15
  13. 13. XML Document Validation  16
  14. 14. Validate XML Document via its XML Schema  17
  15. 15. Validation on content and structure Memor y / DOM Validation Process Shred Elements XMLType XML Schema Checks via XMLDOM ( JAVA based) XML Schema  Validation on XML structure  PL/SQL Wrapper with JAVA XML Parser  18
  16. 16. Java XML Parser  19
  17. 17. XML Parsers Often DOM or Infoset based CPU intensive Memory intensive Parsing, serializing or tree traversals, happen in memory Often handle XML tree traversals only via ONE method  It is not structured, semi-structured or unstructured XML content aware  It is not very “smart” / “content aware” regarding XML handling based on its XML tree’s and/or XML data content  20
  18. 18. XML Schema Registration Advantages XML Schema will be parsed only once XML Schema will be cached in memory  No additional parsing  No additional validationXML Document structure is known, therefore: No parsing is needed when loaded from disk into memory XML Object (XOB) structures can be applied  Memory footprint is much less compared to DOM structure  Needed specific nodes can now be handled efficiently in memory  21
  19. 19. XML Schema based - Query Rewrite CHAR String bookstore VARCHAR String 2 (20) book whitepaper title author author chapter title author id paragraph content NUMBE Float R content (15) CLOB  22
  20. 20. XMLType – Not just a “Datatype”Checked on XML Well-Formedness  One root element  Begin & End tags If XML Schema reference  XOB methods will be used if an XML Schema is available  DOM methods will be used if an XML Schema information is not available  23
  21. 21. Some XSD Design Rules  24
  22. 22. Keep XML small ! Do not use / enforce Pretty Print if not needed Avoid namespace reference “Overkill”  Most used Namespace is Leading  Use short Namespace References Make XML data as “sparse” as possible <employee><name>Marco</name></employee> <employee name=“Marco”/> XML Data Partitioning Y Binary XML if possible X  25
  23. 23. XML Design Avoid Cyclic References in XML Schemata For ease of Maintenance: xdb:annotations Is DOM validation, fidelity needed ? CPU: XML parsing- XML Schema validation “overhead” ? Index maintenance overhead, if implemented via disk Y X  26
  24. 24. XML Document HandlingShredding & Storing XML  27
  25. 25. Check Total Amount  28
  26. 26. XML Content TABLE “A” TABLE “B” TABLE “C”  29
  27. 27. Think in “3D” or in “Driving Table” termsmaxoccurs=“unbounded”  Give me the <title> and <content> where <content> contains… 3 1 4 2 5 X Y 6 Z x n rows  30
  28. 28. Checking the Amount…  31
  29. 29. The Effect of // (for a 1.000“Cases)Used Setup OpenVMS Version 9.2.0.5.0 1.000 “Cases”1) l_xpath := //case[||i||]/amount_charged/text() ;2) l_xpath := /case_data/case[||i||]/amount_charged/text() ;3) select sum(to_number(extract(value(tr),/case_data/case/amount_charged/text()))All in memory: COLLECTION ITERATOR PICKLER  32 FETCH
  30. 30. CLOB XMLType (V 11.1.0.6.0) ORA-31186  33
  31. 31. Increasing volume – XMLType CLOB Effect of // In memory 10.000 Cases:  ORA-31186 Document contains too many nodes  maxoccurs=unbounded maxLength, totalDigits, etc ORA-31186: Document contains too many nodes Cause: Unable to load the document because it has exceeded the maximum allocated number of DOM nodes. Action: Reduces the size of the document  34
  32. 32. XML Document HandlingObject Relational, Binary XML  35
  33. 33. A Solution based on XMLType O.R. Oracle BLOB CLOB Advanced Queue Validation XMLType Table Store in Oracle Against Checks (O.R) ETL Tables Workflow XML Schema Rewrite on Disk / XOB (Relational)  36
  34. 34. Driving Access on CONTENT (…on disk…) BTree BTre BTre Index ee Index Index bookstore Function based Index (XPath) book whitepaper title author author chapter title author id paragraph (Un)-Structured XMLIndex content structured content BTree Secondary Index Oracle Text Index  37
  35. 35. Cost Based Optimizer Advantages Can be influenced via  Statistics  Indexes  XML Schema Registration (XOB)  Encoding in Binary XML storage SQL Re-Write of XPath, XQuery Partitioning  38
  36. 36. O.R. XMLType (V 11.1.0.6.0) ORA-31186 ORA-31186  39
  37. 37. So why can DISK out perform MEMORY XML Schema validation based on Registered XML Schema Query re-write possible  Based on plain “old” SQL/database methods Optimized CPU handling Optimized Memory handling (if needed) Multiple optimized solutions possible via Optimizer instead of one XML parser method Specific parts of XML can be handled / be driven via:  specific indexing  or content Full blown validation can be avoided  40
  38. 38. Recap…  41
  39. 39. Be aware of what you are doing ! Avoid unneeded (full) XML Schema validation  During Insert  Generating XML Avoid Impedance mismatch  Java  XML  Java  XML  Relational  XML  Java  “All In One Go Objective” Avoid intermediate XML fragments  // Y  XMLEXISTS X  Use Indexes  xdb:MaintainDOM=false  42
  40. 40. XML Data Handling and Design Handle XML Smart Keep XML Small Restrict XML where possible  Be precise !  maxoccurs, maxLength Provide Oracle of extra / precise information (XSD) Register XML Schema Y  If possible… X  43
  41. 41. Balanced Design Inserts, Updates & Deletes In  XML Future Changes Memory On Disk  Index Maintenance Selects  In Memory  Via Indexes XML Validation  Strict, Lazy  Client Side Possibilities  44
  42. 42. Now you why DISK can be faster than MEMORY 100.000 “Cases” shredded & validated in 5 minutes  Instead of 1000 “Cases” in 3 minutes… Avoiding  ORA-31186: Document contains too many nodes Scalable  Efficient with Memory and CPU Checked in production on a 9.2.0.5.0 database versionExtra: …decreased used PL/SQL code by half… …but will have to KNOW what you are doing…  45
  43. 43. 46
  44. 44. ReferencesXMLDB Developers Guide  http://www.oracle.com/pls/db112/homepageThe XMLDB Forum  http://forums.oracle.com/forums/forum.jspa?forumID=34XML DB FAQ Thread  http://forums.oracle.com/forums/thread.jspa?threadID=410714Blog  http://www.xmldb.nl  47

×