• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Design Concepts For Xml Applications That Will Perform
 

Design Concepts For Xml Applications That Will Perform

on

  • 5,998 views

S307479 - Oracle XMLDB - Design Concepts for XML Applications That Will Perform - AMIS - Marco Gralike

S307479 - Oracle XMLDB - Design Concepts for XML Applications That Will Perform - AMIS - Marco Gralike

Oracle Open World 2009 Presentation

Statistics

Views

Total Views
5,998
Views on SlideShare
5,833
Embed Views
165

Actions

Likes
1
Downloads
0
Comments
0

3 Embeds 165

http://technology.amis.nl 94
http://www.liberidu.com 63
http://www.slideshare.net 8

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Design Concepts For Xml Applications That Will Perform Design Concepts For Xml Applications That Will Perform Presentation Transcript

    • Oracle XML Database
      Design Concepts for XML Applications That Will Perform !
      Marco Gralike, AMIS, 2009
    • Started as DBA with Oracle 7 on Windows NT 3.1 (1994)
      Experienced with Oracle 7.x / 8.x / 9.x / 10.x and 11.1
      Oracle 11g Beta tester for Oracle XMLDB
      Active Oracle OTN XMLDB Forum Member
      Oracle ACE Award for XMLDB Community Contributions
      OakTable Network member
      Introductions
    • Or a short story
      “Why XML on Diskcan be faster than XML in Memory…”
    • Disclaimer
      The following are “Rules of Numb”
      Bare in mind: Every environment has its own unique criteria and needs regarding business needs and its architecture, etc…
      “Maintainability”
      “Extendibility”
      …so pay attention to:
      “Choice”
      “Design”
      “Testing”
      “Performance”
    • A Customer Use-Case
    • Initial State
      No performance
      12.000 “Cases” / night (4 Hour Window)
      4 hours are not enough anymore
      The “XML” part “looks like it takes too long”
      Original database system version 8.1.X
      Future Wishes
      The need to be able to handle 120.000 “Cases” / night
      In the near future hardware/OS from OpenVMS to HPUX
      Customer Case
    • An overview
      Memory
      / DOM
      Memory
      / DOM
      CLOB
      Oracle
      Advanced Queue
      XMLType
      BLOB
      Process
      Checks
      Validation
      XML Schema
      (JAVA)
      Store in
      ETL Tables
      Oracle
      Workflow
      Shred Elements
      Via XMLDOM
    • 10.000 “Cases” (~ 10 Mb size)
    • How expensive are 1.000 “Cases” ?
    • The Cost of Mixing Worlds
    • BLOB2CLOB and CLOB2XMLType
    • Feeding data to the database
      Why BLOB ?  XML data & PDF data
      Why CLOB ?  Conversion needed for XML handling
      Why XMLType  Needed to check XML element content
      XML Validation (well-formedness)
      Memory
      / DOM
      CLOB
      Oracle
      Advanced Queue
      XMLType
      BLOB
    • Different data models.
      XPath models an XML document as
      a tree while most general purpose
      programming languages
      have no native data types for a tree.
      Different programming paradigms.
      XSLT is a functional language, while Java
      is object-oriented and Perl is a procedural one.
      Effect/Costs
      Unnecessary CPU and Memory Overhead
      A lot of expensive type and encoding conversions
      Impedance Mismatch
    • If you deal with XML  Handle it via XML(DB)
      So if it is relational, do it the relational way…
      If XML use XQuery, or others like XPath etc…
      If you mix worlds be careful regarding
      Information loss (PK/FK  XML) ?
      Whitespace  NULL  Whitespace ?
      Impedance mismatch
      The General Rule !
    • XML Document Validation
    • Validate XML Document via its XML Schema
    • Validation on content and structure
      XML Schema  Validation on XML structure
       PL/SQL Wrapper with JAVA XML Parser
      Memory
      / DOM
      Validation
      XML Schema
      ( JAVA based)
      XMLType
      Shred Elements
      via XMLDOM
      Process
      Checks
    • Java XML Parser
    • XML Parsers
      Often DOM or Infoset based
      CPU intensive
      Memory intensive
      Parsing, serializing or tree traversals, happen in memory
      Often handle XML tree traversals only via ONE method
      It is not structured, semi-structured or unstructured
      XML content aware
      It is not very “smart” / “content aware” regarding XML handling based on its XML tree’s and/or XML data content
    • XML Schema will be parsed only once
      XML Schema will be cached in memory
      No additional parsing
      No additional validation
      XML Document structure is known, therefore:
      No parsing is needed when loaded from disk into memory
      XML Object (XOB) structures can be applied
      Memory footprint is much less compared to DOM structure
      Needed specific nodes can now be handled efficiently in memory
      XML Schema Registration Advantages
    • XML Schema based - Query Rewrite
      String
      CHAR
      bookstore
      String
      VARCHAR2
      (20)
      Float
      CLOB
      book
      whitepaper
      title
      author
      author
      chapter
      title
      author
      id
      paragraph
      NUMBER
      (15)
      content
      content
    • Checked on
      XML Well-Formedness
      One root element
      Begin & End tags
      If XML Schema reference
      XOB methods will be used if an
      XML Schema is available
      DOM methods will be used if an
      XML Schema information is
      not available
      XMLType – Not just a “Datatype”
    • Some XSD Design Rules
    • Keep XML small !
      Do not use / enforce Pretty Print if not needed
      Avoid namespace reference “Overkill”
      Most used Namespace is Leading
      Use short Namespace References
      Make XML data as “sparse” as possible
      <employee><name>Marco</name></employee>
      <employee name=“Marco”/>
      XML Data Partitioning
      Binary XML if possible
      Y
      X
    • XML Design
      Avoid Cyclic References in XML Schemata
      For ease of Maintenance: xdb:annotations
      Is DOM validation, fidelity needed ?
      CPU: XML parsing- XML Schema validation “overhead” ?
      Index maintenance overhead, if implemented via disk
      Y
      X
    • XML Document Handling
      Shredding & Storing XML
    • Check Total Amount
    • XML Content
      TABLE “B”
      TABLE “A”
      TABLE “C”
    • Think in “3D” or in “Driving Table” terms
      maxoccurs=“unbounded”
       Give me the <title> and <content> where <content> contains…
      1
      3
      4
      5
      2
      X
      Y
      6
      Z
      x n rows
    • Checking the Amount…
    • Used Setup
      OpenVMS
      Version 9.2.0.5.0
      1.000 “Cases”
      1) l_xpath := '//case['||i||']/amount_charged/text()' ;
      2) l_xpath := '/case_data/case['||i||']/amount_charged/text()' ;
      3) select sum(to_number(extract(value(tr),'/case_data/case/amount_charged/text()'))
      All in memory: COLLECTION ITERATOR PICKLER FETCH
      The Effect of // (for a 1.000 “Cases)
    • CLOB XMLType (V 11.1.0.6.0)
      ORA-31186
    • Effect of //
      In memory
      10.000 Cases:
      ORA-31186
      Document contains too
      many nodes
      maxoccurs=unbounded
      maxLength, totalDigits, etc
      Increasing volume – XMLType CLOB
      ORA-31186: Document contains too many nodes
      Cause: Unable to load the document because it has exceeded
      the maximum allocated number of DOM nodes.
      Action: Reduces the size of the document
    • XML Document Handling
      Object Relational, Binary XML
    • A Solution based on XMLType O.R.
      Rewrite on Disk
      / XOB
      (Relational)
      CLOB
      Oracle
      Advanced Queue
      BLOB
      Store in
      ETL Tables
      Oracle
      Workflow
      Validation
      Against
      XML Schema
      Checks
      XMLType Table
      (O.R)
    • Driving Access on CONTENT (11gR1, on Disk)
      BTree Index
      BTree Index
      BTree Index
      bookstore
      Secondary Oracle Text Index
      Function based Index (XPath)
      BTree
      Index
      book
      whitepaper
      Unstructured
      XMLIndex
      title
      author
      author
      chapter
      title
      author
      id
      paragraph
      content
      structured
      content
    • Can be influenced via
      Statistics
      Indexes
      XML Schema Registration (XOB)
      Encoding in Binary XML storage
      SQL Re-Write of XPath, XQuery
      Partitioning
      Cost Based Optimizer Advantages
    • O.R. XMLType (V 11.1.0.6.0)
      ORA-31186
      ORA-31186
    • So why can DISK out perform MEMORY
      XML Schema validation based on Registered XML Schema
      Query re-write possible
      Based on plain “old” SQL/database methods
      Optimized CPU handling
      Optimized Memory handling (if needed)
      Multiple optimized solutions possible via Optimizer instead of one XML parser method
      Specific parts of XML can be handled / be driven via:
      specific indexing
      or content
      Full blown validation can be avoided
    • Recap…
    • Be aware of what you are doing !
      Avoid unneeded (full) XML Schema validation
      During Insert
      Generating XML
      Avoid Impedance mismatch
      Java  XML  Java  XML  Relational  XML  Java
      “All In One Go Objective”
      Avoid intermediate XML fragments
      //
      XMLEXISTS
      Use Indexes
      xdb:MaintainDOM=false
      Y
      X
    • XML Data Handling and Design
      Handle XML Smart
      Keep XML Small
      Restrict XML where possible
      Be precise !
      maxoccurs, maxLength
      Provide Oracle of extra / precise information (XSD)
      Register XML Schema
      If possible…
      Y
      X
    • Balanced Design
      • Inserts, Updates & Deletes
      • XML Future Changes
      • Index Maintenance
      • Selects
      • In Memory
      • Via Indexes
      • XML Validation
      • Strict, Lazy
      • Client Side Possibilities
    • Now you why DISK can be faster than MEMORY
      100.000 “Cases” shredded & validated in 5 minutes
      Instead of 1000 “Cases” in 3 minutes…
      Avoiding
      ORA-31186: Document contains too many nodes
      Scalable
      Efficient with Memory and CPU
      Checked in production on a 9.2.0.5.0 database version
      Extra:
      …decreased used PL/SQL code by half…
      …but will have to KNOW what you are doing…
    • Oracle Open World 2009 - XMLDB Sessions
    • References
      XMLDB DevelopersGuide
      http://www.oracle.com/pls/db112/homepage
      The XMLDB Forum
      http://forums.oracle.com/forums/forum.jspa?forumID=34
      XML DB FAQ Thread
      http://forums.oracle.com/forums/thread.jspa?threadID=410714
      Blog
      http://technology.amis.nl/blog
      http://blog.gralike.com