Data Storage and Retrieval in an
          XML World


                By
           Dare Obasanjo
What is XML?
    eXtensible Markup Language
   Meta-markup language developed by W3C to
    deal with shortcomings of HT...
XML and Data
   XML also provided a way to describe
    structured data
   XML has many advantages as a data storage
   ...
Structuring XML

   Since XML is a way to describe structured
    data there should be a means to specify
    the structu...
Document Type Definitions
(DTDs)
   DTDs were inherited from SGML.
   DTDs have a different syntax from XML
   They are...
Sample DTD and XML
document
<!ELEMENT gatech_student (name, age)>
<!ATTLIST gatech_student gtnum CDATA>
<!ELEMENT name (#P...
DTDs Unsatisfactory

   DTDs proved inadequate due to to a
    number of reasons. The main reasons
    being
       They...
XML Data Reduced (XDR)
   A recommendation for XML schemas was
    submitted to the W3C by the Microsoft
    Corporation ...
Sample XDR and XML
document
<Schema name="myschema" xmlns="urn:schemas-microsoft-com:xml-data"
       xmlns:dt="urn:schema...
XML Schema Definitions (XSD)

   W3C standard
   XSD outshines XDR in the following
    ways
       Supports more datat...
Sample XSD and XML
document
<schema xmlns="http://www.w3.org/2001/XMLSchema" >
 <element name="gatech_student">
 <complexT...
Querying XML

   It is sometimes necessary to extract subsets of
    the data stored within an XML document.
   A number...
XML Path Language (XPath)
   XPath is a language for addressing parts of an
    XML document using a syntax that resemble...
Sample XPath queries
   /gatech_student/name
    Selects all name elements that are children of the root element gatech_s...
XML Query Language
  (XQuery)
   XQuery is an attempt to provide a query language that
    provides the same breadth of f...
XQuery Expressions

   path expressions
    element constructors
    FLWR expressions
    expressions involving operat...
XML Usage Models I (Review)
   Document-centric
       Semi structured documents
       Irregular content
       Human...
XML Usage Models II (Review)
   Data-centric
       Structured
       Appears in a regular order
       Mechanical cre...
XML Storage in a data centric
model
   Stored in database (typically an RDBMS)
   One may want to extract data from a
  ...
Middleware components
   Could be full blown application or an API.
   Different strategies used
       ADO – same API ...
Sample jxTransformer Query
   SQL
   SELECT EmployeeID, FirstName, LastName, Title, HireDate, Salary
FROM Employees e WHE...
Sample DatabaseDOM template
<XMLDATABASEMAP>
  <TEMPLATE>
   <EMPLOYEE_LIST>
    <EMPLOYEE NO="EMPNO" GENDER="SEX">
      ...
XML-enabled databases

   An XML-enabled database understands
    how to convert data to XML and back
   Big 3 RDBMS ven...
XML and DB2
   Uses DB2 Extender to add XML support
   Can store an entire XML document and its DTD
    as a user-define...
SAMPLE DB2 XML EXTENDER TABLE
   AND QUERY

   TABLE
    TABLE mail_user user_name
        VARCHAR(20) NOT NULL PRIMARY K...
XML and Oracle 9i
   XML documents can be stored as whole
    documents in user-defined columns of type
    XMLType or CL...
SAMPLE ORACLE 9i TABLE AND
   QUERY

   TABLE
      CREATE TABLE mail_user(
              user_name VARCHAR2(20),
       ...
XML and SQL Server

   XML can be retrieved from relational
    rows using FOR XML clause in SQL
       RAW
       AUTO...
XML Storage in a document
centric model
   Stored in a content management system
   A content management system typicall...
XML Storage in a Hybrid
Model
   Where both data-centric and document-centric
    models are in use best choice is native...
Tamino – a commercial native
XML database
    Created by Software AG.
    Features
       Storage & retrieval of XML do...
Tamino Schemas
   Schemas in Tamino are DTD-based and are
    mainly used as a way to describe how the XML
    data shoul...
Tamino and SQL

   Tamino ships with a SQL engine
   Schemas can be used to creating mappings
    from SQL to XML
     ...
Tamino programming support
   APIs available for accessing XML store from
    both Java and Microsoft's Jscript
   C pro...
dbXML – an Open Source
native XML database
   Created by the dbXML group.
   Lightweight and modular
       Can easily ...
dbXML programming support

   Written in Java and has implementation
    of XML:DB initatives XML Database API.
   Expos...
Conclusion

   Paper on this topic:
       http://www.25hoursaday.com/StoringAndQueryingXML.html



   Missed Opportuni...
Upcoming SlideShare
Loading in...5
×

Powerpoint

530

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
530
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Powerpoint

  1. 1. Data Storage and Retrieval in an XML World By Dare Obasanjo
  2. 2. What is XML?  eXtensible Markup Language  Meta-markup language developed by W3C to deal with shortcomings of HTML  HTML grew too complex and unwieldy (over 100 tags in latest version).  XML allowed for domain specific markup.  Semantics not document specific but application specific.  XML is a subset of the Standardized and General Markup Language (SGML).
  3. 3. XML and Data  XML also provided a way to describe structured data  XML has many advantages as a data storage and interchange format  Built in support for internationalization via unicode.  Platform independence.  Human readable format makes it easier to debug.  Extensibility - new versions of a format don’t have to break apps based on older versions of the format.  Large number of off-the-shelf tools for processing XML documents already exist.
  4. 4. Structuring XML  Since XML is a way to describe structured data there should be a means to specify the structure of an XML document.  DTDs and Schemas are different mechanisms for providing a grammar for an XML document.  An XML document that conforms to a DTD or schema is considered to be valid.
  5. 5. Document Type Definitions (DTDs)  DTDs were inherited from SGML.  DTDs have a different syntax from XML  They are used to specify legal elements that can occur in an XML document and the order they occur in.
  6. 6. Sample DTD and XML document <!ELEMENT gatech_student (name, age)> <!ATTLIST gatech_student gtnum CDATA> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  7. 7. DTDs Unsatisfactory  DTDs proved inadequate due to to a number of reasons. The main reasons being  They used a different syntax than XML  Non-existent support for datatypes  Lack of control over occurrence of elements
  8. 8. XML Data Reduced (XDR)  A recommendation for XML schemas was submitted to the W3C by the Microsoft Corporation as a potential XML schema standard.  XDR tackled some of the problems of DTDs  XDR schemas are XML files  Support for a number of datatypes analogous to those used in relational database management systems and popular programming languages  Occurrence of elements is controllable
  9. 9. Sample XDR and XML document <Schema name="myschema" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="age" dt:type="ui1" /> <ElementType name="name" dt:type="string" /> <AttributeType name="gtnum" dt:type="string" /> <ElementType name="gatech_student" order="seq"> <element type="name" minOccurs="1" maxOccurs="1"/> <element type="age" minOccurs="1" maxOccurs="1"/> <attribute type="gtnum" /> </ElementType> </Schema> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  10. 10. XML Schema Definitions (XSD)  W3C standard  XSD outshines XDR in the following ways  Supports more datatypes  Provides the ability to create custom data types  Supports object oriented programming concepts like inheritance and polymorphism.
  11. 11. Sample XSD and XML document <schema xmlns="http://www.w3.org/2001/XMLSchema" > <element name="gatech_student"> <complexType> <sequence> <element name="name" type="string"/> <element name="age" type="unsignedInt"/> </sequence> <attribute name="gtnum"> <simpleType> <restriction base="string"> <pattern value="gtd{3}[A-Za-z]{1}"/> </restriction> </simpleType> </attribute> </complexType> </element> </schema> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  12. 12. Querying XML  It is sometimes necessary to extract subsets of the data stored within an XML document.  A number of languages have been created for querying XML documents including Lorel, Quilt, UnQL, Xduce, XML-QL, Xpath, XQL, Xquery and YaTL.  XPath is a W3C recommendation and XQuery is a W3C working draft.
  13. 13. XML Path Language (XPath)  XPath is a language for addressing parts of an XML document using a syntax that resembles hierarchical paths used to address parts of a filesystem or URL  Also provides functions for interacting with selected data  Functions for the accessing information about document nodes  Functions for the manipulating of strings, numbers and booleans.  Developers can add functions to the XPath library.
  14. 14. Sample XPath queries  /gatech_student/name Selects all name elements that are children of the root element gatech_student.  //age Selects all age elements in the document.  /gatech_student/* Selects all child elements of the root element gatech_student.  /gatech_student[@gtnum] Selects all gtnum attributes of the gatech_student elements in the document.  //*[name()='age'] Selects all elements that are named "age".  /gatech_student/age/ancestor::* Selects all ancestors of all the age elements that are children of the gatech_student element (which should select the gatech_student element).
  15. 15. XML Query Language (XQuery)  XQuery is an attempt to provide a query language that provides the same breadth of functionality and underlying formalism as SQL does for relational databases.  XQuery is a functional language where each query is an expression.  XQuery has a sophisticated type system based on XML schema datatypes and supports the manipulation of the document nodes unlike XPath.  W3C is also working towards creating an alternate version of XQuery that has the same semantics but uses XML based syntax instead called XQueryX.
  16. 16. XQuery Expressions  path expressions  element constructors  FLWR expressions  expressions involving operators and functions  conditional expressions  quantified expressions  expressions that test or modify datatypes
  17. 17. XML Usage Models I (Review)  Document-centric  Semi structured documents  Irregular content  Human creation and/or consumption is primary aspect  Sample XHTML document <html xmlns ="http://www.w3.org/1999/xhtml"> <head> <title>Sample Web Page</title> </head> <body> <p> All XHTML documents must be well-formed and valid. </p> <img src="http://www.example.com/sample.jpg" height ="50" width = "25"/> <br /> <br /> </body> </html>
  18. 18. XML Usage Models II (Review)  Data-centric  Structured  Appears in a regular order  Mechanical creation [and consumption].  XML usage is incidental  Sample SOAP message <SOAP-ENV:Envelope xmlns:SOAP- ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP- ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <m:GetLastTradePrice xmlns:m="Some-URI"> <symbol>DIS</symbol> </m:GetLastTradePrice> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
  19. 19. XML Storage in a data centric model  Stored in database (typically an RDBMS)  One may want to extract data from a database as XML, store XML into a database or both  Primary choices for retrieving data  Middleware components  XML-enabled databases
  20. 20. Middleware components  Could be full blown application or an API.  Different strategies used  ADO – same API for XML & SQL access.  jxTransformer – custom queries used to specify how the results of a SQL query should be converted to XML.  DatabaseDOM - user created template file contains the SQL to XML mappings for the SQL query results.  DB2XML - default mapping of SQL results to XML data exists that cannot be altered by the user
  21. 21. Sample jxTransformer Query  SQL SELECT EmployeeID, FirstName, LastName, Title, HireDate, Salary FROM Employees e WHERE HireDate >= {d ’2000-01-01’}  jxTransformer custom query xml_document( xml_element(’result’, SELECT xml_element(’Employees_Info’ xml_attribute(’ID’, e.EmployeeID), xml_element(’name’, xml_element(’first’, e.FirstName), xml_element(’last’, e.LastName) ), xml_element(’title’, e.Title), xml_element(’hiredate’, e.HireDate), xml_element(’salary’, e.Salary) ) FROM EMPLOYEES e WHERE e.HireDate >= {d ’2000-01-01’} ) )
  22. 22. Sample DatabaseDOM template <XMLDATABASEMAP> <TEMPLATE> <EMPLOYEE_LIST> <EMPLOYEE NO="EMPNO" GENDER="SEX"> <NAME> <FIRST>FIRSTNME</FIRST> <MIDDLE_INITIAL>MIDINIT</MIDDLE_INITIAL> <LAST>LASTNAME</LAST> </NAME> </EMPLOYEE> </EMPLOYEE_LIST> </TEMPLATE> <DATABASE> <MAXRETURNROWS>1000</MAXRETURNROWS> <JDBC> <URL>jdbc:db2:sample</URL> <DRIVER>COM.ibm.db2.jdbc.app.DB2Driver</DRIVER> </JDBC> <USERID>paul</USERID> <PASSWORD>XXXXXX</PASSWORD> <TABLE>EMPLOYEE</TABLE> <SCHEMA>PAUL</SCHEMA> </DATABASE> </XMLDATABASEMAP>
  23. 23. XML-enabled databases  An XML-enabled database understands how to convert data to XML and back  Big 3 RDBMS vendors all have different XML strategies.
  24. 24. XML and DB2  Uses DB2 Extender to add XML support  Can store an entire XML document and its DTD as a user-defined column of an xml type  XMLCLOB  XMLVARCHAR  XMLFile  Option to shred the document into multiple tables and columns also available.  XML data can be queried with syntax that is compliant with W3C XPath recommendation.  Updating of XML data is also possible using stored procedures.
  25. 25. SAMPLE DB2 XML EXTENDER TABLE AND QUERY  TABLE TABLE mail_user user_name VARCHAR(20) NOT NULL PRIMARY KEY passwd VARCHAR(10) mailbox XMLVARCHAR  QUERY SELECT user_name FROM mail_user WHERE extractVarchar(mailbox,"/Mailbox/Inbox/Email/Subject") LIKE "%XML%"
  26. 26. XML and Oracle 9i  XML documents can be stored as whole documents in user-defined columns of type XMLType or CLOB/BLOB  Shredded documents can be reconstituted using the XML SQL Utility.  Querying XML possible via two means  Oracle Text (CONTAINS & WITHIN clauses in SQL) for BLOB or VARCHAR2 columns  XMLType columns can be queried via extract() and existsNode() functions which use XPath.  Relational views of XML data possible.
  27. 27. SAMPLE ORACLE 9i TABLE AND QUERY  TABLE CREATE TABLE mail_user( user_name VARCHAR2(20), passwd VARCHAR2(10), mailbox SYS.XMLTYPE );  QUERY SELECT user_name FROM mail_user m WHERE m.mailbox.extract('/Mailbox/Inbox/Email/Subject/text( )').getStringVal() like '%XML%'
  28. 28. XML and SQL Server  XML can be retrieved from relational rows using FOR XML clause in SQL  RAW  AUTO  EXPLICIT  XML views of relational data possible.  Specified using XSD files for mapping  Queried using XPath
  29. 29. XML Storage in a document centric model  Stored in a content management system  A content management system typically consists of a repository that stores a variety of XML documents, an editor and an engine that provides one or more of the following features  version, revison and access control  ability to reuse documents in different formats  collaboration  web publishing facilities  support for a variety of text editors (e.g. Microsoft Word, Adobe Framemaker, etc)  indexing and search capabilities
  30. 30. XML Storage in a Hybrid Model  Where both data-centric and document-centric models are in use best choice is native XML database.  A native XML database is a database that has an XML document as its fundamental (logical) unit of storage and defines a (logical) model for an XML document, as opposed to the data in that document, and stores and retrieves documents according to that model.  At a minimum, the model must include elements, attributes, PCDATA, and document order.
  31. 31. Tamino – a commercial native XML database  Created by Software AG.  Features  Storage & retrieval of XML documents  Storage & retrieval relational data  Interfacing with external applications and data sources.  Transactional (ACID properties)  Querying via X-Query (based on XPath NOT XQuery)  Indexing  GUI tools  Web based administration  Schema editor  Interactive query interface
  32. 32. Tamino Schemas  Schemas in Tamino are DTD-based and are mainly used as a way to describe how the XML data should be indexed  Document storage choices  Specify a pre-existing DTD which is then converted to a Tamino schema,  store a well-formed XML document without a schema which means that default indexing ensues  Create a schema from scratch for the XML document being stored  Schemas are also used as a way to specify data types which is important for type based operations during querying (e.g. numeric ops)
  33. 33. Tamino and SQL  Tamino ships with a SQL engine  Schemas can be used to creating mappings from SQL to XML  XML can be retrieved from RDBMS sources either internal (SQL engine) or external  Schemas can also be used to represent joins across different document types (which could mean different data sources)
  34. 34. Tamino programming support  APIs available for accessing XML store from both Java and Microsoft's Jscript  C programmers can interact with the SQL engine using the SQL precompiler  ODBC, OLE DB and JDBC clients can communicate with SQL Engine  X-Tensions framework allows developers to extend the functionality of Tamino by using C++ COM objects or Java objects
  35. 35. dbXML – an Open Source native XML database  Created by the dbXML group.  Lightweight and modular  Can easily be embedded in applications  XML Documents arranged in hierarchical filesystem-like manner.  Querying via XPath.  Indexing support but no transactions or schemas.  Command line administration tools
  36. 36. dbXML programming support  Written in Java and has implementation of XML:DB initatives XML Database API.  Exposes CORBA API to enable access from any language with CORBA bindings.
  37. 37. Conclusion  Paper on this topic:  http://www.25hoursaday.com/StoringAndQueryingXML.html  Missed Opportunities  eXcelon  Questions???
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×