XML has or will invade your database, due to business needs, e.g. XBRL, driven by desire for SOA architectures or progressive minded developers. Oracle has adapted and implemented functionality to support this need via Oracle XMLDB. A new performance challenge has been born.
Probably without realizing it, we have all the tools at hand, in our heads and via the more traditional way of approaching this challenge, to make XML actually work in our database.
This presentation will show and demonstrate, in detail, how you can use your relational knowledge to master and make this XML beast perform the way you want it. The cool thing about it; XML knowledge is not required. Isn't it data after all?
Or...as Chris Date mentioned during his Hotsos Symposium 2009 keynote address: "The foundation is there so why not use it?"
6. “XML is not a ‘fast’ thing, there is a ton of parsing involved. Sorry, I never saw the point in huge XML files – they are many times larger than they should be and the amount of work involved in parsing them is incredible”. Tom Kyte - Januari 9, 2009, AskTom
7. “The foundation is there; So why not use it?” …referring to the Relational Model… Chris Date- Hotsos keynote, 2009
11. If you’re a performance nerd, this is actually cool… No one figured out XML yet… Solving the customer problem… Back to basics… Deeper understanding of the data handling issues… So why the “Hxxx” XML…?
13. Free Format…”XML is cool”… (aka no design effort) Have to uphold the “Coding Granny Argument” (among others meaningful names) Everyone for themselves… Waiting for “Codd, Date”… Square wheels… What’s spoiling the soup…?
14. Different data models XPath models an XML document as a tree while most general purpose programming languages have no native data types for a tree. Different programming paradigms XSLT is a functional language, while Java is object-oriented and Perl is a procedural one. Impedance Mismatch
15. Effects, Costs Unnecessary CPU and Memory Overhead A lot of expensive type and encoding conversions Impedance Mismatch
23. The “Dimensions” in 1 XML doc. 1 3 4 5 2 X Y 6 Z nx rows Elements with maxoccurs=“unbounded”
24. Multi Dimensional Issues… Its a database… Its Row based Its Column based Its multiple databases… More then 1 XML doc Not uncommon 1 Mb >>
25. Complexities of a database “Relations” “Redundancy” “Nullology” Design, etc… It can contain a database 10 Mb or bigger nowadays More often than less… Enormous complex XSD’s XMLType – Not just a “Container”
26. Checked on XML Well-Formedness One root element Begin & End tags If XML Schema reference XOB methods will be used if an XML Schema is available DOM methods will be used if registered XML Schema information is not available XMLType – Not just a “Container”
27. What you want in access… Fast DDL Selects Inserts, Deletes, Updates Specific / Smart Small XML Fragments Direct Access
31. Common XML Parsers Often DOM or Infoset based CPU intensive Memory intensive Serializing, parsing, tree traversals, happen in memory…
32. In Memory: Common XML Parsers Often handle XML tree traversals only via ONEmethod It is not structured, semi-structured or unstructured XML content aware It is not very “smart” / “content aware” regarding XMLhandling based on its XML tree’s and/or XML data content
33. XMLType Physical Storage CLOB LOB LOB index Object Relational Varray, Types, Nested Tables IOT, B-Tree, XML Schema Binary XML LOB, LOB Index Stored in Post Parse Representation
35. Hybrid CLOB Mixed complex[n] un/structured XSD [y] B-Tree, IOT Document na unstructured XSD [n] XMLIndex Relational World XMLDB World XML Data Storage XMLType column/tables XMLType Views Obj.Rel. Binary XML Content complex[n] structured XSD [y] B-Tree, IOT (Object) Relational Objects Mixed complex[y] un/structured XSD [y/n] XMLIndex Relational Tables
36.
37. Partition XML data EMPLOYEES_PROJ_TAB PROJ_DETAILS_TAB EMP_PROJ_P11 “employees”.”employee” reference_id EMP_PROJ_P12
38. XML Partitioning Object Relational Partitioning Equi-Partitioning since version Oracle 11.1.0.7.0 Binary XML Partitioning Range, List, Hash Local partitioned XMLIndex LOCAL keyword in XMLIndex create syntax XMLIndex is not supported for HASH partitioning Partition Key on virtual Column (Binary XML) Partition Key on column (Object Relational)
43. FUNCTION BASEDNotIndexed: LOCATOR column, pointer to XML fragments (XDB.X$...) SECONDARY INDEXES Unstructured XMLIndex f (x) Path Table
44. Structured XMLIndex (SXI) Content Table(s) BasedonXMLTABLE syntax XMLTable construct canbe nestedbut: Only 1 extra XMLType allowed VIRTUAL column is passed CanbemaintainedManually Secondaryindexespossible Structured XMLIndex f (x) Content Tables
45.
46. Driving access on CONTENT BTree Index bookstore Secondary Oracle Text Index Function based Index (XPath) book whitepaper StructuredXMLIndex Unstructured XMLIndex title author author chapter title author id paragraph content structured content Structured XMLIndex
50. XML Schema will be parsed only once If registered in the XDB Repository XML Schema will be cached in memory (SGA) No additional parsing No additional validation XML Schema Advantages
51. XML Document structure is known, therefore No parsing is needed when loaded from disk into memory XML OBject (XOB) structures can be applied Memory footprint is much less compared to DOM structure Needed specific nodes can now be handled efficiently in memory XML Schema Advantages
55. XML Schema - Query Rewrite String CHAR String Float bookstore CLOB VARCHAR2 (20) book whitepaper title author author chapter title author id paragraph NUMBER (15) content content
56. XML Design Avoid Cyclic References in XML Schemata For ease of Maintenance: xdb:annotations Is DOM validation, fidelity needed ? CPU / XML parsing: XML Schema validation “overhead” ? Index maintenance overhead, when using “disk” solutions Y X
57. Be aware of what you are doing ! Avoid unneeded (full) XML Schema validation During Storage (Inserts), Generating XML xdb:MaintainDOM=false Avoid Impedance mismatch Java XML Java XML Relational XML Java (“All In One Go Objective”) Avoid XML fragments // and/or via XMLEXISTS Use Indexes Y X
59. Keep XML small Do not use / enforce Pretty Print if not needed Avoid namespace reference “Overkill” Most used Namespace is Leading Use short Namespace References (aliases) Make XML data as “sparse” as possible <employee><name>Marco</name></employee> <employee name=“Marco”/> XML Data Partitioning Binary XML if needed Y X
60. Keep XML small (OR specific) Don’t use “meaning full element names” 64Kb DDL “create table” buffer ORA 01792 maximum number of columns in a table or view is 1000 Break XML up Out of Line CLOB (unstructured) Not Accessed Data Don’t create objects if you don’t need it Use xdb:defaultTable=“” for global types
62. Customer Use Case Memory / DOM Memory / DOM CLOB Oracle Advanced Queue XMLType BLOB Process Checks Validation XML Schema (JAVA) Store in ETL Tables Shred Elements Via XMLDOM
64. New XML Approach Rewrite on Disk / XOB (Relational) CLOB Oracle Advanced Queue BLOB Store in ETL Tables Oracle Workflow Validation Against XML Schema Checks XMLType Table (O.R)
65. Using the CBO as an XML Parser… ORA-31186 ORA-31186 ORA-31186 ORA-31186: Document contains too many nodes Cause: Unable to load the document because it has exceeded the maximum allocated number of DOM nodes.
66. Using the (XML) Relational Mindset Design XSD as you would with E(E)R Design for proper physical access, performance: Storage, Index Content Awareness Partitioning Overkill of “meaning full” data parsing Avoid Redundancy, whitespace, “Pretty Print” Design with the future in mind
67. So in short: Balanced Design Inserts, Updates & Deletes XML Future Changes Index Maintenance Selects In Memory Via Indexes XML Validation Strict, Lazy Client Side Possibilities
68. Reward Optimal performance Out performing XML Proper design will give you 10, 100 fold performance increase over XML handling… …also known as…ehh… …standard relational database performance…
69.
70. References Oracle XML DB http://www.oracle.com/pls/db112/homepage XML DB FAQ Thread http://forums.oracle.com/forums/thread.jspa?threadID=410714 Blog http://technology.amis.nl/blog http://blog.gralike.com
Editor's Notes
Square wheel JSON?
Emp/Dept tables, Foreign/Primary Keys…Showing here ONLY 1 XML document…