Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Data Storage and Retrieval in an XML World By Dare Obasanjo
  • 2. What is XML?  eXtensible Markup Language  Meta-markup language developed by W3C to deal with shortcomings of HTML  HTML grew too complex and unwieldy (over 100 tags in latest version).  XML allowed for domain specific markup.  Semantics not document specific but application specific.  XML is a subset of the Standardized and General Markup Language (SGML).
  • 3. XML and Data  XML also provided a way to describe structured data  XML has many advantages as a data storage and interchange format  Built in support for internationalization via unicode.  Platform independence.  Human readable format makes it easier to debug.  Extensibility - new versions of a format don’t have to break apps based on older versions of the format.  Large number of off-the-shelf tools for processing XML documents already exist.
  • 4. Structuring XML  Since XML is a way to describe structured data there should be a means to specify the structure of an XML document.  DTDs and Schemas are different mechanisms for providing a grammar for an XML document.  An XML document that conforms to a DTD or schema is considered to be valid.
  • 5. Document Type Definitions (DTDs)  DTDs were inherited from SGML.  DTDs have a different syntax from XML  They are used to specify legal elements that can occur in an XML document and the order they occur in.
  • 6. Sample DTD and XML document <!ELEMENT gatech_student (name, age)> <!ATTLIST gatech_student gtnum CDATA> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  • 7. DTDs Unsatisfactory  DTDs proved inadequate due to to a number of reasons. The main reasons being  They used a different syntax than XML  Non-existent support for datatypes  Lack of control over occurrence of elements
  • 8. XML Data Reduced (XDR)  A recommendation for XML schemas was submitted to the W3C by the Microsoft Corporation as a potential XML schema standard.  XDR tackled some of the problems of DTDs  XDR schemas are XML files  Support for a number of datatypes analogous to those used in relational database management systems and popular programming languages  Occurrence of elements is controllable
  • 9. Sample XDR and XML document <Schema name="myschema" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="age" dt:type="ui1" /> <ElementType name="name" dt:type="string" /> <AttributeType name="gtnum" dt:type="string" /> <ElementType name="gatech_student" order="seq"> <element type="name" minOccurs="1" maxOccurs="1"/> <element type="age" minOccurs="1" maxOccurs="1"/> <attribute type="gtnum" /> </ElementType> </Schema> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  • 10. XML Schema Definitions (XSD)  W3C standard  XSD outshines XDR in the following ways  Supports more datatypes  Provides the ability to create custom data types  Supports object oriented programming concepts like inheritance and polymorphism.
  • 11. Sample XSD and XML document <schema xmlns="" > <element name="gatech_student"> <complexType> <sequence> <element name="name" type="string"/> <element name="age" type="unsignedInt"/> </sequence> <attribute name="gtnum"> <simpleType> <restriction base="string"> <pattern value="gtd{3}[A-Za-z]{1}"/> </restriction> </simpleType> </attribute> </complexType> </element> </schema> <gatech_student gtnum="gt000x"> <name>George Burdell</name> <age>21</age> </gatech_student>
  • 12. Querying XML  It is sometimes necessary to extract subsets of the data stored within an XML document.  A number of languages have been created for querying XML documents including Lorel, Quilt, UnQL, Xduce, XML-QL, Xpath, XQL, Xquery and YaTL.  XPath is a W3C recommendation and XQuery is a W3C working draft.
  • 13. XML Path Language (XPath)  XPath is a language for addressing parts of an XML document using a syntax that resembles hierarchical paths used to address parts of a filesystem or URL  Also provides functions for interacting with selected data  Functions for the accessing information about document nodes  Functions for the manipulating of strings, numbers and booleans.  Developers can add functions to the XPath library.
  • 14. Sample XPath queries  /gatech_student/name Selects all name elements that are children of the root element gatech_student.  //age Selects all age elements in the document.  /gatech_student/* Selects all child elements of the root element gatech_student.  /gatech_student[@gtnum] Selects all gtnum attributes of the gatech_student elements in the document.  //*[name()='age'] Selects all elements that are named "age".  /gatech_student/age/ancestor::* Selects all ancestors of all the age elements that are children of the gatech_student element (which should select the gatech_student element).
  • 15. XML Query Language (XQuery)  XQuery is an attempt to provide a query language that provides the same breadth of functionality and underlying formalism as SQL does for relational databases.  XQuery is a functional language where each query is an expression.  XQuery has a sophisticated type system based on XML schema datatypes and supports the manipulation of the document nodes unlike XPath.  W3C is also working towards creating an alternate version of XQuery that has the same semantics but uses XML based syntax instead called XQueryX.
  • 16. XQuery Expressions  path expressions  element constructors  FLWR expressions  expressions involving operators and functions  conditional expressions  quantified expressions  expressions that test or modify datatypes
  • 17. XML Usage Models I (Review)  Document-centric  Semi structured documents  Irregular content  Human creation and/or consumption is primary aspect  Sample XHTML document <html xmlns =""> <head> <title>Sample Web Page</title> </head> <body> <p> All XHTML documents must be well-formed and valid. </p> <img src="" height ="50" width = "25"/> <br /> <br /> </body> </html>
  • 18. XML Usage Models II (Review)  Data-centric  Structured  Appears in a regular order  Mechanical creation [and consumption].  XML usage is incidental  Sample SOAP message <SOAP-ENV:Envelope xmlns:SOAP- ENV="" SOAP- ENV:encodingStyle=""> <SOAP-ENV:Body> <m:GetLastTradePrice xmlns:m="Some-URI"> <symbol>DIS</symbol> </m:GetLastTradePrice> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
  • 19. XML Storage in a data centric model  Stored in database (typically an RDBMS)  One may want to extract data from a database as XML, store XML into a database or both  Primary choices for retrieving data  Middleware components  XML-enabled databases
  • 20. Middleware components  Could be full blown application or an API.  Different strategies used  ADO – same API for XML & SQL access.  jxTransformer – custom queries used to specify how the results of a SQL query should be converted to XML.  DatabaseDOM - user created template file contains the SQL to XML mappings for the SQL query results.  DB2XML - default mapping of SQL results to XML data exists that cannot be altered by the user
  • 21. Sample jxTransformer Query  SQL SELECT EmployeeID, FirstName, LastName, Title, HireDate, Salary FROM Employees e WHERE HireDate >= {d ’2000-01-01’}  jxTransformer custom query xml_document( xml_element(’result’, SELECT xml_element(’Employees_Info’ xml_attribute(’ID’, e.EmployeeID), xml_element(’name’, xml_element(’first’, e.FirstName), xml_element(’last’, e.LastName) ), xml_element(’title’, e.Title), xml_element(’hiredate’, e.HireDate), xml_element(’salary’, e.Salary) ) FROM EMPLOYEES e WHERE e.HireDate >= {d ’2000-01-01’} ) )
  • 23. XML-enabled databases  An XML-enabled database understands how to convert data to XML and back  Big 3 RDBMS vendors all have different XML strategies.
  • 24. XML and DB2  Uses DB2 Extender to add XML support  Can store an entire XML document and its DTD as a user-defined column of an xml type  XMLCLOB  XMLVARCHAR  XMLFile  Option to shred the document into multiple tables and columns also available.  XML data can be queried with syntax that is compliant with W3C XPath recommendation.  Updating of XML data is also possible using stored procedures.
  • 25. SAMPLE DB2 XML EXTENDER TABLE AND QUERY  TABLE TABLE mail_user user_name VARCHAR(20) NOT NULL PRIMARY KEY passwd VARCHAR(10) mailbox XMLVARCHAR  QUERY SELECT user_name FROM mail_user WHERE extractVarchar(mailbox,"/Mailbox/Inbox/Email/Subject") LIKE "%XML%"
  • 26. XML and Oracle 9i  XML documents can be stored as whole documents in user-defined columns of type XMLType or CLOB/BLOB  Shredded documents can be reconstituted using the XML SQL Utility.  Querying XML possible via two means  Oracle Text (CONTAINS & WITHIN clauses in SQL) for BLOB or VARCHAR2 columns  XMLType columns can be queried via extract() and existsNode() functions which use XPath.  Relational views of XML data possible.
  • 27. SAMPLE ORACLE 9i TABLE AND QUERY  TABLE CREATE TABLE mail_user( user_name VARCHAR2(20), passwd VARCHAR2(10), mailbox SYS.XMLTYPE );  QUERY SELECT user_name FROM mail_user m WHERE m.mailbox.extract('/Mailbox/Inbox/Email/Subject/text( )').getStringVal() like '%XML%'
  • 28. XML and SQL Server  XML can be retrieved from relational rows using FOR XML clause in SQL  RAW  AUTO  EXPLICIT  XML views of relational data possible.  Specified using XSD files for mapping  Queried using XPath
  • 29. XML Storage in a document centric model  Stored in a content management system  A content management system typically consists of a repository that stores a variety of XML documents, an editor and an engine that provides one or more of the following features  version, revison and access control  ability to reuse documents in different formats  collaboration  web publishing facilities  support for a variety of text editors (e.g. Microsoft Word, Adobe Framemaker, etc)  indexing and search capabilities
  • 30. XML Storage in a Hybrid Model  Where both data-centric and document-centric models are in use best choice is native XML database.  A native XML database is a database that has an XML document as its fundamental (logical) unit of storage and defines a (logical) model for an XML document, as opposed to the data in that document, and stores and retrieves documents according to that model.  At a minimum, the model must include elements, attributes, PCDATA, and document order.
  • 31. Tamino – a commercial native XML database  Created by Software AG.  Features  Storage & retrieval of XML documents  Storage & retrieval relational data  Interfacing with external applications and data sources.  Transactional (ACID properties)  Querying via X-Query (based on XPath NOT XQuery)  Indexing  GUI tools  Web based administration  Schema editor  Interactive query interface
  • 32. Tamino Schemas  Schemas in Tamino are DTD-based and are mainly used as a way to describe how the XML data should be indexed  Document storage choices  Specify a pre-existing DTD which is then converted to a Tamino schema,  store a well-formed XML document without a schema which means that default indexing ensues  Create a schema from scratch for the XML document being stored  Schemas are also used as a way to specify data types which is important for type based operations during querying (e.g. numeric ops)
  • 33. Tamino and SQL  Tamino ships with a SQL engine  Schemas can be used to creating mappings from SQL to XML  XML can be retrieved from RDBMS sources either internal (SQL engine) or external  Schemas can also be used to represent joins across different document types (which could mean different data sources)
  • 34. Tamino programming support  APIs available for accessing XML store from both Java and Microsoft's Jscript  C programmers can interact with the SQL engine using the SQL precompiler  ODBC, OLE DB and JDBC clients can communicate with SQL Engine  X-Tensions framework allows developers to extend the functionality of Tamino by using C++ COM objects or Java objects
  • 35. dbXML – an Open Source native XML database  Created by the dbXML group.  Lightweight and modular  Can easily be embedded in applications  XML Documents arranged in hierarchical filesystem-like manner.  Querying via XPath.  Indexing support but no transactions or schemas.  Command line administration tools
  • 36. dbXML programming support  Written in Java and has implementation of XML:DB initatives XML Database API.  Exposes CORBA API to enable access from any language with CORBA bindings.
  • 37. Conclusion  Paper on this topic:   Missed Opportunities  eXcelon  Questions???