Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center [email_address]
  2. 2. Plan for Today’s Talk <ul><li>Thoughts on DB and web technologies </li></ul><ul><ul><li>The web and web “querying” </li></ul></ul><ul><ul><li>Semistructured databases </li></ul></ul><ul><ul><li>Object-relational databases </li></ul></ul><ul><ul><li>XML and databases </li></ul></ul><ul><li>XML/DB research at IBM Almaden </li></ul><ul><ul><li>The XPERANTO project </li></ul></ul><ul><ul><ul><li>Motivation and approach </li></ul></ul></ul><ul><ul><ul><li>Whirlwind tour of the system </li></ul></ul></ul>
  3. 3. The Web is Great at Supporting URL-Based Sharing <ul><li>Ex: Online conference proceedings </li></ul><ul><li>Web browsers have given us </li></ul><ul><ul><li>Universal file access (ftp++) </li></ul></ul><ul><ul><li>Universal document access (html) </li></ul></ul><ul><ul><li>Universal service access (forms) </li></ul></ul><ul><li>What more could we navigational couch potatoes possibly want? </li></ul><ul><ul><li>Universal platform for e-shopping! </li></ul></ul>
  4. 4. The Web is Lousy at Supporting Parametric Searches <ul><li>Ex: Find all the used Musicman Sterling bass guitars currently available for under $750 within a 50-mile radius of my San Jose home </li></ul><ul><li>This is hard for a number of reasons </li></ul><ul><ul><li>Data buried in web pages, news groups, classified ads, store sites, auction sites, … </li></ul></ul><ul><ul><li>No schema (no metal fish, please!) </li></ul></ul><ul><ul><li>No data types (miles, US$, instruments) </li></ul></ul><ul><ul><li>No regularity within/across (good!) sites </li></ul></ul>
  5. 5. Aren’t We Supposed to be the Experts on Data Management? <ul><li>The DB community brought the world </li></ul><ul><ul><li>Data models, schemas, and views </li></ul></ul><ul><ul><li>Query languages, optimizers, fast joins </li></ul></ul><ul><ul><li>Scalable parallel servers </li></ul></ul><ul><ul><li>Federated database systems </li></ul></ul><ul><li>What do we have in our bag of tricks? </li></ul><ul><ul><li>Semistructured databases </li></ul></ul><ul><ul><li>Object-relational database systems </li></ul></ul>
  6. 6. Is Semistructured Database Technology the Answer? <ul><li>Database characteristics </li></ul><ul><ul><li>Collections of [name, value] pairs or maybe [name, type, value] triples </li></ul></ul><ul><ul><li>Collections typically set<any> or list<any> </li></ul></ul><ul><li>System characteristics </li></ul><ul><ul><li>“Typeloose” query languages </li></ul></ul><ul><ul><li>Indexes for nested, typeloose structures </li></ul></ul><ul><ul><li>Appropriate query processing techniques </li></ul></ul>
  7. 7. Are Semistructured Databases the Answer? (2) <ul><li>No, because schemas are critical for </li></ul><ul><ul><li>Data readers </li></ul></ul><ul><ul><ul><li>What info is in a given collection? </li></ul></ul></ul><ul><ul><ul><li>Thus, what queries might make sense? </li></ul></ul></ul><ul><ul><li>Data writers </li></ul></ul><ul><ul><ul><li>What should I call this piece of info? </li></ul></ul></ul><ul><ul><ul><li>Is it okay to put this kind of data here? </li></ul></ul></ul><ul><ul><li>Efficient/effective query processors </li></ul></ul><ul><ul><ul><li>Indexing, statistics, ... (e.g., range queries) </li></ul></ul></ul><ul><ul><ul><li>Integration mappings (e.g., unit conversions) </li></ul></ul></ul>
  8. 8. Are Semistructured Databases the Answer? (3) <ul><li>It has some nice features, though </li></ul><ul><ul><li>Flexible, dynamic schemas </li></ul></ul><ul><ul><ul><li>Forgiving w.r.t. variations and exceptions </li></ul></ul></ul><ul><ul><ul><li>Schema evolution is not a big deal </li></ul></ul></ul><ul><ul><li>Richer data modeling (vs. relational) </li></ul></ul><ul><ul><ul><li>Nested structures, ordered collections </li></ul></ul></ul><ul><ul><li>More powerful query languages </li></ul></ul><ul><ul><ul><li>Blurring of schema and data querying </li></ul></ul></ul><ul><ul><ul><li>Ordering, nesting, restructuring handled </li></ul></ul></ul>
  9. 9. Is Object-Relational Database Technology the Answer? <ul><li>Database characteristics </li></ul><ul><ul><li>Base types, user-defined structured types, inheritance, reference types, collections </li></ul></ul><ul><ul><li>Collections are well-typed </li></ul></ul><ul><li>System characteristics </li></ul><ul><ul><li>Extended SQL-based query languages </li></ul></ul><ul><ul><li>Support for methods (fenced/unfenced) </li></ul></ul><ul><ul><li>Also triggers, LOBs, extensible indexes </li></ul></ul>
  10. 10. Are Object-Relational Databases the Answer? (2) <ul><li>No, because most O-R DBMSs have </li></ul><ul><ul><li>Overly rigid schemas </li></ul></ul><ul><ul><ul><li>Every instance is of one (known) type </li></ul></ul></ul><ul><ul><ul><li>Evolving a type can be a major burden </li></ul></ul></ul><ul><ul><ul><li>Distributed type management is hard </li></ul></ul></ul><ul><ul><li>Crufty old storage managers </li></ul></ul><ul><ul><ul><li>Ragged or sparse records poorly supported </li></ul></ul></ul><ul><ul><li>Insufficient power in extended SQL </li></ul></ul><ul><ul><ul><li>Prehistoric assumptions get in the way </li></ul></ul></ul><ul><ul><ul><li>Weak on restructuring, schema-querying </li></ul></ul></ul>
  11. 11. Is XML the Answer? ( Yes!! ...What Was the Question Again?) <ul><li>Structured documents (for the web) </li></ul><book> <booktitle> Tables Are The Answer </booktitle> <author id = “cdate”> <name> <firstname> Chris </firstname> <lastname> Date </lastname> </name> <address> <city> Saratoga </city> <state> CA </state> </address> </author> </book>
  12. 12. Is XML the Answer? (2) <ul><li>W3C’s XML Schema working group </li></ul><ul><ul><li>Typed elements, attributes, documents </li></ul></ul><ul><ul><li>Simple types and complex types </li></ul></ul><ul><ul><li>Derived types (extension, restriction) </li></ul></ul><ul><ul><li>Facets, anonymous types, groups, … </li></ul></ul><ul><ul><li>Uniqueness, keys and key references </li></ul></ul><ul><li>W3C’s XML Query working group </li></ul><ul><ul><li>XML-QL, Xpath, XQL, XSL/T, XSQL, … </li></ul></ul><ul><ul><li>Recommendation due in late 2000 (?) </li></ul></ul>
  13. 13. Is XML the Answer? (3) <ul><li>XML Schema might help because </li></ul><ul><ul><li>XML has achieved a huge mindshare for data interchange on the web </li></ul></ul><ul><ul><li>DTD standardization is happening for documents within vertical industries, and XML Schemas should take over </li></ul></ul><ul><ul><li>When finished, XML Schema should be a widely used schema description tool </li></ul></ul><ul><ul><ul><li>Similar to O-R schemas, but with more flexibility (and web-based sex appeal) </li></ul></ul></ul>
  14. 14. Some Useful XML+DB Topics <ul><li>Publish documents with XML Schemas from O-R databases </li></ul><ul><ul><li>B2B e-commerce messages </li></ul></ul><ul><ul><li>B2C comparison shopping (if permitted!) </li></ul></ul><ul><ul><li>Robust O-R DB-resident web sites with XML for page content generation </li></ul></ul><ul><li>Use XML Schema as the central data model for data integration middleware </li></ul><ul><ul><li>I.e., web information integration </li></ul></ul>
  15. 15. Useful XML+DB Topics (2) <ul><li>Build a “native” XML Repository on top of an O-R DBMS </li></ul><ul><ul><li>Map from XML Schema model to O-R DBMS modeling constructs </li></ul></ul><ul><ul><li>Map from XML queries to O-R queries (including tag variables and loose typing) </li></ul></ul><ul><ul><li>Thereby provide XML document storage management with industrial-strength robustness, scalability, and performance </li></ul></ul>
  16. 16. Useful XML+DB Topics (3) <ul><li>Evolve XML-QL into a complete web data manipulation language </li></ul><ul><ul><li>Typing a la XML Schema </li></ul></ul><ul><ul><li>Ordered/unordered collections </li></ul></ul><ul><ul><li>XPath-inspired expressions </li></ul></ul><ul><ul><li>Easier grouping and aggregation </li></ul></ul><ul><ul><li>Updates (insert/delete, modify) </li></ul></ul><ul><ul><li>Etc. </li></ul></ul>
  17. 17. The XPERANTO Project <ul><li>Middleware for publishing O-R (or plain relational) DB content on the web </li></ul><ul><ul><li>Provides a virtual XML document view </li></ul></ul><ul><ul><li>Based on a “pure XML” approach </li></ul></ul><ul><ul><li>Using XML-QL (as W3C placeholder) </li></ul></ul><ul><li>Born at Almaden in summer of 1999 </li></ul><ul><ul><li>Mike Carey, Dana Florescu, Zack Ives, Ying Lu, Jai Shanmugasundaram, Beau Shekita, Subbu Subramanian </li></ul></ul>
  18. 18. The XPERANTO Belief System <ul><li>Databases contain, and will continue to contain, the world’s “data jewels” </li></ul><ul><ul><li>Transactional data (RDBMS) </li></ul></ul><ul><ul><li>Important multimedia assets (ORDBMS) </li></ul></ul><ul><li>XML application developers of the future may not love SQL like we do </li></ul><ul><ul><li>View databases as default XML documents </li></ul></ul><ul><ul><li>Let them define appropriate (query-able) views of these XML documents </li></ul></ul>
  19. 19. XPERANTO Architecture Views XML Schema O-R Database SQL Query Processor Stored Tables System Catalog Metadata Services View Services Type & Table Services Query Translation XQGM XML-QL Parser XQGM Query Rewrite SQL Translation XML Schema Generator Catalog Info XML Tagger Data Tuples Table & Type Info SQL Queries
  20. 20. XPERANTO Components <ul><li>XML-QL Parser </li></ul><ul><ul><li>Neutral query representation ( XQGM ) </li></ul></ul><ul><li>Query Rewrite </li></ul><ul><ul><li>View composition and other rewrites </li></ul></ul><ul><li>SQL Translation </li></ul><ul><ul><li>Produce SQL query(s) to get the required data from the underlying DBMS </li></ul></ul><ul><li>XML Tagger </li></ul><ul><ul><li>Tag and structure the tabular results </li></ul></ul>
  21. 21. XPERANTO Components <ul><li>View Services </li></ul><ul><ul><li>Repository for XML view definitions </li></ul></ul><ul><li>Type & Table Services </li></ul><ul><ul><li>Interface (and cache) for DB catalog info </li></ul></ul><ul><li>XML Schema Generator </li></ul><ul><ul><li>Give DB catalog info in XML Schema form for default views </li></ul></ul><ul><ul><li>Infer XML Schema info for queries and non-default view definitions </li></ul></ul>
  22. 22. Consider a Simple O-R Schema Create Table book AS (bookID CHAR(30), name VARCHAR(255) , publisher VARCHAR(30)) Create Table publisher AS (name VARCHAR(30), address VARCHAR(255)) Create Type author_type AS (bookID CHAR(30), first VARCHAR(30) , last VARCHAR(30)) Create Table author OF author_type ( REF IS ssn USER GENERATED )
  23. 23. Part of the Default XML View <simpleType name=”string255” source=”string”> <maxLength value=”255” /> </simpleType> <simpleType name=”string30” source=”string”> <maxLength value=”30” /> </simpleType> <complexType name=“bookTupleType”> <element name=“bookID” type=“string30” /> <element name=“name” type=“string255” /> <element name=“publisher” type=“string30” /> </complexType> <complexType name=“bookSetType”> <element name=“bookTuple” type=“bookTupleType” maxOccurs=“*” /> </complexType> <element name=“book” type=“bookSetType” /> . . .
  24. 24. XPERANTO’s Default Views <ul><li>XPERANTO generates default O-R to XML Schema mappings </li></ul><ul><ul><li>Each DB shown as an XML file </li></ul></ul><ul><ul><li>Subtyping handled via XML Schema’s refinement facilities </li></ul></ul><ul><ul><li>OIDs and references become ids/idrefs </li></ul></ul><ul><li>“ Don’t use this at home!” </li></ul><ul><ul><li>Application developers are expected to define the real view(s) using XML-QL </li></ul></ul>
  25. 25. Creating a Better XML View WHERE <library.book.bookTuple> <bookID> $bid </> <name> $name </> <publisher> $bpub </> </> IN “db2:xml:books/library”, $bpub = “Kluwer” CONSTRUCT <book id=$bid> <name> $bname </> {WHERE <library.publisher.publisherTuple> <name> $bpub </> <address> $addr </> </> IN “db2:xml:books/library” CONSTRUCT <publisher> <address> $addr </> </>} {WHERE <library.author.authorTuple> <bookID> $bid </> <first> $fname </> <last> $lname </> </> IN “db2:xml:books/library” CONSTRUCT <author first=$fname last=$lname/>} </> . . .
  26. 26. XPERANTO Query Rewrite <ul><li>XML-QL queries first translated into XQGM representation </li></ul><ul><ul><li>Neutral, well-poised for more features </li></ul></ul><ul><ul><li>Easier to go from XML-QL to SQL </li></ul></ul><ul><ul><li>Borrow rewrites from DB2 UDB engine </li></ul></ul><ul><li>XQGM is an extension of DB2’s QGM </li></ul><ul><ul><li>XML data type for “columns” </li></ul></ul><ul><ul><li>Set of XML-specific functions </li></ul></ul>
  27. 27. SQL Generation and XML Document Tagging/Structuring <ul><li>Sorted Outer Union queries are used to obtain the data </li></ul><ul><ul><li>Fetch the data in one query that brings it back in the appropriate order </li></ul></ul><ul><ul><li>Tag and nest it to create XML document </li></ul></ul><ul><li>Advantages of this approach </li></ul><ul><ul><li>Shown to be stable as well as fast </li></ul></ul><ul><ul><li>Simple (linear-space) tagging possible </li></ul></ul><ul><ul><ul><li>Just watch for nesting-related changes </li></ul></ul></ul>
  28. 28. Outer Union Query Example WITH OuterUnion (type, bookID, bookName, pubName, pubAddr, authFirst, authLast) AS ( SELECT ‘0’, b.bookID, b.name, NULL, NULL, NULL, NULL FROM book b WHERE b.publisher = “Kluwer” UNION ALL SELECT ‘1’, b.bookID, NULL, p.name, p.address, NULL, NULL FROM book b, publisher p WHERE b.publisher = “Kluwer” and b.publisher = p.name UNION ALL SELECT ‘2’, b.bookID, NULL, NULL, NULL, a.first, a.last FROM book b, author a WHERE b.publisher = “Kluwer” and b.bookID = a.bookID ) SELECT * FROM OuterUnion ORDER BY bookID
  29. 29. XPERANTO Project Summary <ul><li>Goal is to publish O-R data in XML form </li></ul><ul><ul><li>Default XML views </li></ul></ul><ul><ul><li>XML-QL for defining useful views </li></ul></ul><ul><ul><li>“ Look Ma, no SQL!” </li></ul></ul><ul><li>Currently (re)building our prototype </li></ul><ul><ul><li>View composition is our first stop </li></ul></ul><ul><ul><li>Updates in addition to queries </li></ul></ul><ul><ul><li>Queries over both data and metadata </li></ul></ul><ul><ul><li>Other needs for XML web sites...? </li></ul></ul>
  30. 30. A Few Closing Remarks <ul><li>DB community must ensure that the web will support real queries…! </li></ul><ul><ul><li>XML Schema and XML Query standards need ongoing input from DB researchers </li></ul></ul><ul><ul><li>Large-scale technologies needed for XML indexing, caching, querying, etc. </li></ul></ul><ul><li>DB community should also work on important underlying technologies </li></ul><ul><ul><li>Publishing XML both from and to RDBMSs and ORDBMSs, for example! </li></ul></ul>