Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Data Storage and Retrieval in an XML World By Dare Obasanjo
  • 2. What is XML?  eXtensible Markup Language  Meta-markup language developed by W3C to deal with shortcomings of HTML  HTML grew too complex and unwieldy (over 100 tags in latest version).  XML allowed for domain specific markup.  Semantics not document specific but application specific.  XML is a subset of the Standardized and General Markup Language (SGML).
  • 3. XML and Data  XML also provided a way to describe structured data  XML has many advantages as a data storage and interchange format  Built in support for internationalization via unicode.  Platform independence.  Human readable format makes it easier to debug.  Extensibility - new versions of a format don’t have to break apps based on older versions of the format.  Large number of off-the-shelf tools for processing XML documents already exist.
  • 4. Structuring XML  Since XML is a way to describe structured data there should be a means to specify the structure of an XML document.  DTDs and Schemas are different mechanisms for providing a grammar for an XML document.  An XML document that conforms to a DTD or schema is considered to be valid.
  • 5. Document Type Definitions (DTDs)  DTDs were inherited from SGML.  DTDs have a different syntax from XML  They are used to specify legal elements that can occur in an XML document and the order they occur in.
  • 6. Sample DTD and XML document <!ELEMENT gatech_student (name, age)> <!ATTLIST gatech_student gtnum CDATA> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  • 7. DTDs Unsatisfactory  DTDs proved inadequate due to to a number of reasons. The main reasons being  They used a different syntax than XML  Non-existent support for datatypes  Lack of control over occurrence of elements
  • 8. XML Data Reduced (XDR)  A recommendation for XML schemas was submitted to the W3C by the Microsoft Corporation as a potential XML schema standard.  XDR tackled some of the problems of DTDs  XDR schemas are XML files  Support for a number of datatypes analogous to those used in relational database management systems and popular programming languages  Occurrence of elements is controllable
  • 9. Sample XDR and XML document <Schema name="myschema" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="age" dt:type="ui1" /> <ElementType name="name" dt:type="string" /> <AttributeType name="gtnum" dt:type="string" /> <ElementType name="gatech_student" order="seq"> <element type="name" minOccurs="1" maxOccurs="1"/> <element type="age" minOccurs="1" maxOccurs="1"/> <attribute type="gtnum" /> </ElementType> </Schema> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  • 10. XML Schema Definitions (XSD)  W3C standard  XSD outshines XDR in the following ways  Supports more datatypes  Provides the ability to create custom data types  Supports object oriented programming concepts like inheritance and polymorphism.
  • 11. Sample XSD and XML document <schema xmlns="" > <element name="gatech_student"> <complexType> <sequence> <element name="name" type="string"/> <element name="age" type="unsignedInt"/> </sequence> <attribute name="gtnum"> <simpleType> <restriction base="string"> <pattern value="gtd{3}[A-Za-z]{1}"/> </restriction> </simpleType> </attribute> </complexType> </element> </schema> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  • 12. Querying XML  It is sometimes necessary to extract subsets of the data stored within an XML document.  A number of languages have been created for querying XML documents including Lorel, Quilt, UnQL, Xduce, XML-QL, Xpath, XQL, Xquery and YaTL.  XPath is a W3C recommendation and XQuery is a W3C working draft.
  • 13. XML Path Language (XPath)  XPath is a language for addressing parts of an XML document using a syntax that resembles hierarchical paths used to address parts of a filesystem or URL  Also provides functions for interacting with selected data  Functions for the accessing information about document nodes  Functions for the manipulating of strings, numbers and booleans.  Developers can add functions to the XPath library.
  • 14. Sample XPath queries  /gatech_student/name Selects all name elements that are children of the root element gatech_student.  //age Selects all age elements in the document.  /gatech_student/* Selects all child elements of the root element gatech_student.  /gatech_student[@gtnum] Selects all gtnum attributes of the gatech_student elements in the document.  //*[name()='age'] Selects all elements that are named "age".  /gatech_student/age/ancestor::* Selects all ancestors of all the age elements that are children of the gatech_student element (which should select the gatech_student element).
  • 15. XML Query Language (XQuery)  XQuery is an attempt to provide a query language that provides the same breadth of functionality and underlying formalism as SQL does for relational databases.  XQuery is a functional language where each query is an expression.  XQuery has a sophisticated type system based on XML schema datatypes and supports the manipulation of the document nodes unlike XPath.  W3C is also working towards creating an alternate version of XQuery that has the same semantics but uses XML based syntax instead called XQueryX.
  • 16. XQuery Expressions  path expressions  element constructors  FLWR expressions  expressions involving operators and functions  conditional expressions  quantified expressions  expressions that test or modify datatypes
  • 17. XML Usage Models I (Review)  Document-centric  Semi structured documents  Irregular content  Human creation and/or consumption is primary aspect  Sample XHTML document <html xmlns =""> <head> <title>Sample Web Page</title> </head> <body> <p> All XHTML documents must be well-formed and valid. </p> <img src="" height ="50" width = "25"/> <br /> <br /> </body> </html>
  • 18. XML Usage Models II (Review)  Data-centric  Structured  Appears in a regular order  Mechanical creation [and consumption].  XML usage is incidental  Sample SOAP message <SOAP-ENV:Envelope xmlns:SOAP- ENV="" SOAP- ENV:encodingStyle=""> <SOAP-ENV:Body> <m:GetLastTradePrice xmlns:m="Some-URI"> <symbol>DIS</symbol> </m:GetLastTradePrice> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
  • 19. XML Storage in a data centric model  Stored in database (typically an RDBMS)  One may want to extract data from a database as XML, store XML into a database or both  Primary choices for retrieving data  Middleware components  XML-enabled databases
  • 20. Middleware components  Could be full blown application or an API.  Different strategies used  ADO – same API for XML & SQL access.  jxTransformer – custom queries used to specify how the results of a SQL query should be converted to XML.  DatabaseDOM - user created template file contains the SQL to XML mappings for the SQL query results.  DB2XML - default mapping of SQL results to XML data exists that cannot be altered by the user
  • 21. Sample jxTransformer Query  SQL SELECT EmployeeID, FirstName, LastName, Title, HireDate, Salary FROM Employees e WHERE HireDate >= {d ’2000-01-01’}  jxTransformer custom query xml_document( xml_element(’result’, SELECT xml_element(’Employees_Info’ xml_attribute(’ID’, e.EmployeeID), xml_element(’name’, xml_element(’first’, e.FirstName), xml_element(’last’, e.LastName) ), xml_element(’title’, e.Title), xml_element(’hiredate’, e.HireDate), xml_element(’salary’, e.Salary) ) FROM EMPLOYEES e WHERE e.HireDate >= {d ’2000-01-01’} ) )
  • 23. XML-enabled databases  An XML-enabled database understands how to convert data to XML and back  Big 3 RDBMS vendors all have different XML strategies.
  • 24. XML and DB2  Uses DB2 Extender to add XML support  Can store an entire XML document and its DTD as a user-defined column of an xml type  XMLCLOB  XMLVARCHAR  XMLFile  Option to shred the document into multiple tables and columns also available.  XML data can be queried with syntax that is compliant with W3C XPath recommendation.  Updating of XML data is also possible using stored procedures.
  • 25. SAMPLE DB2 XML EXTENDER TABLE AND QUERY  TABLE TABLE mail_user user_name VARCHAR(20) NOT NULL PRIMARY KEY passwd VARCHAR(10) mailbox XMLVARCHAR  QUERY SELECT user_name FROM mail_user WHERE extractVarchar(mailbox,"/Mailbox/Inbox/Email/Subject") LIKE "%XML%"
  • 26. XML and Oracle 9i  XML documents can be stored as whole documents in user-defined columns of type XMLType or CLOB/BLOB  Shredded documents can be reconstituted using the XML SQL Utility.  Querying XML possible via two means  Oracle Text (CONTAINS & WITHIN clauses in SQL) for BLOB or VARCHAR2 columns  XMLType columns can be queried via extract() and existsNode() functions which use XPath.  Relational views of XML data possible.
  • 27. SAMPLE ORACLE 9i TABLE AND QUERY  TABLE CREATE TABLE mail_user( user_name VARCHAR2(20), passwd VARCHAR2(10), mailbox SYS.XMLTYPE );  QUERY SELECT user_name FROM mail_user m WHERE m.mailbox.extract('/Mailbox/Inbox/Email/Subject/text( )').getStringVal() like '%XML%'
  • 28. XML and SQL Server  XML can be retrieved from relational rows using FOR XML clause in SQL  RAW  AUTO  EXPLICIT  XML views of relational data possible.  Specified using XSD files for mapping  Queried using XPath
  • 29. XML Storage in a document centric model  Stored in a content management system  A content management system typically consists of a repository that stores a variety of XML documents, an editor and an engine that provides one or more of the following features  version, revison and access control  ability to reuse documents in different formats  collaboration  web publishing facilities  support for a variety of text editors (e.g. Microsoft Word, Adobe Framemaker, etc)  indexing and search capabilities
  • 30. XML Storage in a Hybrid Model  Where both data-centric and document-centric models are in use best choice is native XML database.  A native XML database is a database that has an XML document as its fundamental (logical) unit of storage and defines a (logical) model for an XML document, as opposed to the data in that document, and stores and retrieves documents according to that model.  At a minimum, the model must include elements, attributes, PCDATA, and document order.
  • 31. Tamino – a commercial native XML database  Created by Software AG.  Features  Storage & retrieval of XML documents  Storage & retrieval relational data  Interfacing with external applications and data sources.  Transactional (ACID properties)  Querying via X-Query (based on XPath NOT XQuery)  Indexing  GUI tools  Web based administration  Schema editor  Interactive query interface
  • 32. Tamino Schemas  Schemas in Tamino are DTD-based and are mainly used as a way to describe how the XML data should be indexed  Document storage choices  Specify a pre-existing DTD which is then converted to a Tamino schema,  store a well-formed XML document without a schema which means that default indexing ensues  Create a schema from scratch for the XML document being stored  Schemas are also used as a way to specify data types which is important for type based operations during querying (e.g. numeric ops)
  • 33. Tamino and SQL  Tamino ships with a SQL engine  Schemas can be used to creating mappings from SQL to XML  XML can be retrieved from RDBMS sources either internal (SQL engine) or external  Schemas can also be used to represent joins across different document types (which could mean different data sources)
  • 34. Tamino programming support  APIs available for accessing XML store from both Java and Microsoft's Jscript  C programmers can interact with the SQL engine using the SQL precompiler  ODBC, OLE DB and JDBC clients can communicate with SQL Engine  X-Tensions framework allows developers to extend the functionality of Tamino by using C++ COM objects or Java objects
  • 35. dbXML – an Open Source native XML database  Created by the dbXML group.  Lightweight and modular  Can easily be embedded in applications  XML Documents arranged in hierarchical filesystem-like manner.  Querying via XPath.  Indexing support but no transactions or schemas.  Command line administration tools
  • 36. dbXML programming support  Written in Java and has implementation of XML:DB initatives XML Database API.  Exposes CORBA API to enable access from any language with CORBA bindings.
  • 37. Conclusion  Paper on this topic:   Missed Opportunities  eXcelon  Questions???