SlideShare a Scribd company logo
1 of 90
Creating Structure in
                 Unstructured Data
                What is possible, today…?



Marco Gralike
“Big Data” = XML ?
Challenges are!
Ahum, the problems are!
WikiPedia
• One string of XML data with
  structured and unstructured
  data sections
• Language: English
• Size      : 42,15 GB
• Pages     : 12.961.997
• Date      : 21 Dec 2012
Adventures into
the unknown…?
Setup
• VirtualBox VM
  – OEL 5U8 (64)
  – 8 GB RAM
• LaCie Little Big Disk
  – RAID 0
  – Thunderbolt
• Database
  – SGA    4GB
  – PGA    2GB
My new LaCie LBD is really fast - 
Defeat?! - 1.000.000 pages only
Status of Technology used
XML - Where are we…?




Gartner
Achieved…?
On the Horizon!
• JSoniq
• Zorba
Building (streaming) Bridges
Oracle XML DB
      • NO cost option
      • C (native / embedded kernel)
      • (XQuery) Standards
      • Code maintained by Oracle
XQuery

                                           XMLType Abstraction
                               DB XQuery                                                 Procedural XQuery

                     XQuery Rewrite                         Pushdown                XVM
                                                                           (use “no query rewrite”)


                                  Relational        Streaming XPath                             DOM Tree
                                                       Evaluation                                Model
                                   Access
       SQL Execution              Methods                                   XMLIndex




            Object-Relational                                             Binary XML


           Relational Storage                                            Secure Files

Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
So about what are we talking ?
WikiPedia
• Structured & Unstructured
  bits and pieces
• A lot of “unbounded”
  elements
• Not a lot of restrictions
• The bit with value is in
  element “tekst”
How do we get this Structured?
Strings = small & defined (12c?)

   Ename  pointer += 100;
<string1/><string2/><string3/>
Flexible, Humans
No Design Patterns
<small/><verybigggr/><bigger/>
<verybigggr>
       <empno>1</empno><ename>Marco</ename>
       <empno>2</empno>
</verybigggr>




 <small/><verybigggr/><bigger/>
We need options!
“XMLType” Container

  In Memory            CLOB
  (document)        (document)

Object Relational   Binary XML
     (data)            (data)
XMLType
      In Memory
      (document)


XOB          XML Schema
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
XMLType
        Object Relational
           (content)


Fully Shredded        Indexes
Something else to Realize !
“What is the fastest way to get this
    stuff in the database…?”
“…it depends…”
“So what is the fastest way to get
    XML in the database…?”
“…it depends…”
“So what is the fastest way to get XML
           in the database…
    … and   useful in my case…?”
Garbage IN – Garbage OUT
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile LOB Column
•   2.5 hours

And no (performant) way
to get the details out…
a.k.a “completely useless”
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile Binary XML
•   …2.5 hours ???
XML Parsing




• SAX   - Simple API for XML
• DOM   - Document Object Module
fast

insert performance   CLOB



                               XMLType
                                CLOB

                       (domain) indexes

                                           XMLType
                                          Binary XML



                                                         XMLType
                                                       Object Relational




                                                                           fast
                             select performance
XML Partitioning
• Object Relational Partitioning
  – Equi-Partitioning since version Oracle 11.1.0.7.0
• Binary XML Partitioning
  – Range, List, Hash
• Local partitioned XMLIndex
  – LOCAL keyword in XMLIndex create syntax
• Partition Key on virtual column (Binary XML)
• Partition Key on column (Object Relational)
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
Driving access on CONTENT
                                                   BTre
                                                    e
                                                  Index
                           bookstore
                                                                          Function
                                                                         based Index
                                                                           (XPath)
        book                                    whitepaper

title   author   author chapter         title     author          id     paragraph
            Unstructured
                                                          Structured XMLIndex
             XMLIndex
                            content                                       structured
                                                                           content
                                                          BTree
                           Oracle XML                     Index
                           Text Index
Structured Data
Structured XMLIndex (SXI)
• CONTENT TABLE(s)
• Based on XMLTABLE syntax        Structured
                                  XMLIndex
• XMLTable construct can be          f (x)

  nested:
  – VIRTUAL column alias
• Can be maintained manually
• Secondary indexes possible
                                   Content
                                   Tables
Describe CONTENT TABLE




• A “regular” heap table with columns…
• Ideal for secondary indexes, if needed.
CONTENT TABLE(s)

 Structured
 XMLIndex
    f (x)




  Content
  Tables
Semi-Structured Data
Unstructured XMLIndex (UXI)
• PATH TABLE
• Use Path Subsetting                 Unstructured
   – Full Blown XMLIndex can be BIG    XMLIndex
                                          f (x)
• Token Tables (XDB.X$......)
   – Query re-write on Tokens
   – Fuzzy Searches, //
   – Optimizer Statistics
• Can be maintained manually
   – Recorded in Pending Table
                                        Path Table
• Secondary indexes possible
Describe PATH TABLE
What’s hidden…
PATH TABLE

Unstructured
 XMLIndex
    f (x)




 Path Table
Binary XML – No Index
Binary XML + XMLIndex (SXI)
Binary XML + XMLIndex + Sec.Ind.
Binary XML + XMLIndex + Sec.Ind.
Un-Structured Data
XML Full Tekst Index
• Based on Oracle Text Index, XQuery Full Text
• XML Namespace Aware
• XML Semantic aware full text search
  – Full-Tekst Selection Expression – contains text
  – Logical Full Text Operator – ftor, ftand, ftMildNot
  – Context Aware full text search
Balanced Design
• Inserts, Updates & Deletes
  – XML Future Changes
  – Index Maintenance           In Memory   On Disk

• Selects
  – In Memory
  – Via Indexes
• XML Validation
  – Strict, Lazy
  – Client Side Possibilities
Reward
• Optimal performance
• Out performing XML
• Proper design will give
  performance increase over
  XML handling…


…proper design is still key…
References
Oracle XML DB
  – http://www.oracle.com/pls/db112/homepage
XML DB FAQ Thread
  – http://forums.oracle.com/forums/thread.jspa?thr
    eadID=410714
Personal Blog
  – http://www.xmldb.nl
  – http://technology.amis.nl
References
Daniela Florescu, Oracle Corporation
  Advances in XML and XQuery
Sam Idicula, Oracle XML DB Development Team
  Binary XML Storage and Query Processing in Oracle
Jinyu Wang, Scott Brewton
  Making XML Technology Easier to Use
Joel Spolsky - Joel on Software
  Back to Basics
References
Oracle XML DB Main page material
• Oracle XML DB : Best Practices to Get Optimal
  Performance out of XML Queries (PDF)
• Oracle XML DB : Choosing the Best XMLType
  Storage Option for Your Use Case (PDF)
• A Request for Comments for the Oracle Binary
  XML Format

More Related Content

What's hot

XFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereXFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereMarco Gralike
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Marco Gralike
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1Marco Gralike
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformMarco Gralike
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesMarco Gralike
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...Marco Gralike
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBMarco Gralike
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
 
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Marco Gralike
 
Ordina Oracle Open World
Ordina Oracle Open WorldOrdina Oracle Open World
Ordina Oracle Open WorldMarco Gralike
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Marco Gralike
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...Marco Gralike
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancementsscacharya
 
Database Programming
Database ProgrammingDatabase Programming
Database ProgrammingHenry Osborne
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 

What's hot (20)

XFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereXFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in there
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will Perform
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New Features
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDB
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
 
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
 
Ordina Oracle Open World
Ordina Oracle Open WorldOrdina Oracle Open World
Ordina Oracle Open World
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the Database
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancements
 
Xml parsers
Xml parsersXml parsers
Xml parsers
 
Xml processors
Xml processorsXml processors
Xml processors
 
Database Programming
Database ProgrammingDatabase Programming
Database Programming
 
Java XML Parsing
Java XML ParsingJava XML Parsing
Java XML Parsing
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 

Viewers also liked

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Peter Wren-Hilton
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityGreat Wide Open
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehousephanleson
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataHealth Catalyst
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BIMonaheng Diaho
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementDataWorks Summit
 

Viewers also liked (9)

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
 

Similar to Hotsos 2013 - Creating Structure in Unstructured Data

Expertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesExpertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesMarco Gralike
 
SQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysSQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysMichael Rys
 
Making your data work harder than you do
Making your data work harder than you doMaking your data work harder than you do
Making your data work harder than you doSusan Jane Williams
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mappingThomas Maroschik
 
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...InSync2011
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
Easy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolEasy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolHasitha Guruge
 
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...Dr.-Ing. Thomas Hartmann
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language Ann Joseph
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLsomisguided
 
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Dr.-Ing. Thomas Hartmann
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 

Similar to Hotsos 2013 - Creating Structure in Unstructured Data (20)

Expertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesExpertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use Cases
 
Xml databases
Xml databasesXml databases
Xml databases
 
SQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysSQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRys
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 
XML Technologies
XML TechnologiesXML Technologies
XML Technologies
 
Agile xml
Agile xmlAgile xml
Agile xml
 
Making your data work harder than you do
Making your data work harder than you doMaking your data work harder than you do
Making your data work harder than you do
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mapping
 
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Easy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolEasy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping Tool
 
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
 
XML
XMLXML
XML
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XML
 
XML
XMLXML
XML
 
XMl
XMlXMl
XMl
 
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
 
Unit iv xml dom
Unit iv xml domUnit iv xml dom
Unit iv xml dom
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 

More from Marco Gralike

UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudMarco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseeProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseMarco Gralike
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseMarco Gralike
 
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIUKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIMarco Gralike
 
An introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xAn introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xMarco Gralike
 
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3Marco Gralike
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 
Flexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenFlexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenMarco Gralike
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverBGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverMarco Gralike
 

More from Marco Gralike (11)

UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
 
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseeProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory Database
 
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIUKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
 
An introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xAn introduction into Oracle VM V3.x
An introduction into Oracle VM V3.x
 
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
Flexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenFlexibiliteit & Snel Schakelen
Flexibiliteit & Snel Schakelen
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverBGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
 
Amis ACE
Amis ACEAmis ACE
Amis ACE
 

Recently uploaded

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 

Hotsos 2013 - Creating Structure in Unstructured Data

  • 1. Creating Structure in Unstructured Data What is possible, today…? Marco Gralike
  • 2.
  • 3.
  • 4.
  • 7. WikiPedia • One string of XML data with structured and unstructured data sections • Language: English • Size : 42,15 GB • Pages : 12.961.997 • Date : 21 Dec 2012
  • 9. Setup • VirtualBox VM – OEL 5U8 (64) – 8 GB RAM • LaCie Little Big Disk – RAID 0 – Thunderbolt • Database – SGA 4GB – PGA 2GB
  • 10. My new LaCie LBD is really fast - 
  • 11. Defeat?! - 1.000.000 pages only
  • 13. XML - Where are we…? Gartner
  • 15. On the Horizon! • JSoniq • Zorba
  • 17. Oracle XML DB • NO cost option • C (native / embedded kernel) • (XQuery) Standards • Code maintained by Oracle
  • 18. XQuery XMLType Abstraction DB XQuery Procedural XQuery XQuery Rewrite Pushdown XVM (use “no query rewrite”) Relational Streaming XPath DOM Tree Evaluation Model Access SQL Execution Methods XMLIndex Object-Relational Binary XML Relational Storage Secure Files Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
  • 19. So about what are we talking ?
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. WikiPedia • Structured & Unstructured bits and pieces • A lot of “unbounded” elements • Not a lot of restrictions • The bit with value is in element “tekst”
  • 28. How do we get this Structured?
  • 29.
  • 30.
  • 31. Strings = small & defined (12c?) Ename  pointer += 100;
  • 35. <verybigggr> <empno>1</empno><ename>Marco</ename> <empno>2</empno> </verybigggr> <small/><verybigggr/><bigger/>
  • 36.
  • 37.
  • 38.
  • 39.
  • 41. “XMLType” Container In Memory CLOB (document) (document) Object Relational Binary XML (data) (data)
  • 42. XMLType In Memory (document) XOB XML Schema
  • 43. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 44. XMLType Object Relational (content) Fully Shredded Indexes
  • 45. Something else to Realize !
  • 46. “What is the fastest way to get this stuff in the database…?”
  • 48. “So what is the fastest way to get XML in the database…?”
  • 50. “So what is the fastest way to get XML in the database… … and useful in my case…?”
  • 51. Garbage IN – Garbage OUT
  • 52. WikiPedia • SQL*Loader • Parallel or Direct • Securefile LOB Column • 2.5 hours And no (performant) way to get the details out… a.k.a “completely useless”
  • 53. WikiPedia • SQL*Loader • Parallel or Direct • Securefile Binary XML • …2.5 hours ???
  • 54. XML Parsing • SAX - Simple API for XML • DOM - Document Object Module
  • 55. fast insert performance CLOB XMLType CLOB (domain) indexes XMLType Binary XML XMLType Object Relational fast select performance
  • 56.
  • 57. XML Partitioning • Object Relational Partitioning – Equi-Partitioning since version Oracle 11.1.0.7.0 • Binary XML Partitioning – Range, List, Hash • Local partitioned XMLIndex – LOCAL keyword in XMLIndex create syntax • Partition Key on virtual column (Binary XML) • Partition Key on column (Object Relational)
  • 58. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 59. Driving access on CONTENT BTre e Index bookstore Function based Index (XPath) book whitepaper title author author chapter title author id paragraph Unstructured Structured XMLIndex XMLIndex content structured content BTree Oracle XML Index Text Index
  • 61. Structured XMLIndex (SXI) • CONTENT TABLE(s) • Based on XMLTABLE syntax Structured XMLIndex • XMLTable construct can be f (x) nested: – VIRTUAL column alias • Can be maintained manually • Secondary indexes possible Content Tables
  • 62. Describe CONTENT TABLE • A “regular” heap table with columns… • Ideal for secondary indexes, if needed.
  • 63. CONTENT TABLE(s) Structured XMLIndex f (x) Content Tables
  • 65. Unstructured XMLIndex (UXI) • PATH TABLE • Use Path Subsetting Unstructured – Full Blown XMLIndex can be BIG XMLIndex f (x) • Token Tables (XDB.X$......) – Query re-write on Tokens – Fuzzy Searches, // – Optimizer Statistics • Can be maintained manually – Recorded in Pending Table Path Table • Secondary indexes possible
  • 69. Binary XML – No Index
  • 70. Binary XML + XMLIndex (SXI)
  • 71. Binary XML + XMLIndex + Sec.Ind.
  • 72. Binary XML + XMLIndex + Sec.Ind.
  • 74. XML Full Tekst Index • Based on Oracle Text Index, XQuery Full Text • XML Namespace Aware • XML Semantic aware full text search – Full-Tekst Selection Expression – contains text – Logical Full Text Operator – ftor, ftand, ftMildNot – Context Aware full text search
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85. Balanced Design • Inserts, Updates & Deletes – XML Future Changes – Index Maintenance In Memory On Disk • Selects – In Memory – Via Indexes • XML Validation – Strict, Lazy – Client Side Possibilities
  • 86. Reward • Optimal performance • Out performing XML • Proper design will give performance increase over XML handling… …proper design is still key…
  • 87.
  • 88. References Oracle XML DB – http://www.oracle.com/pls/db112/homepage XML DB FAQ Thread – http://forums.oracle.com/forums/thread.jspa?thr eadID=410714 Personal Blog – http://www.xmldb.nl – http://technology.amis.nl
  • 89. References Daniela Florescu, Oracle Corporation Advances in XML and XQuery Sam Idicula, Oracle XML DB Development Team Binary XML Storage and Query Processing in Oracle Jinyu Wang, Scott Brewton Making XML Technology Easier to Use Joel Spolsky - Joel on Software Back to Basics
  • 90. References Oracle XML DB Main page material • Oracle XML DB : Best Practices to Get Optimal Performance out of XML Queries (PDF) • Oracle XML DB : Choosing the Best XMLType Storage Option for Your Use Case (PDF) • A Request for Comments for the Oracle Binary XML Format

Editor's Notes

  1. See also OOW 2010, S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text – Nipun Agarwal, Oracle