Temporal Information and XML Carlo Zaniolo Department of Computer Science University of California, Los Angeles
A Short History of Time in Databases
Relational model: between 33 and 48 temporal DB proposals counted:
A struggle to get around the limitations of relational (flat) tables and a rigid query language (SQL)
A key issue : Temporal interval coalescing is needed after each projection!
Clustering, indexing, query optimization for temporal information add to the complexity
Time stamping the individual tuples: If we want the salary history, we have to coalesce the last three tuples into one:
Bob Bob Bob Bob name 1996-12-31 1996-02-01 d02 Tech Leader 70000 10003 1996-01-31 1995-10-01 d02 Sr Engineer 70000 10003 1995-09-30 1995-06-01 d01 Engineer 70000 10003 1995-05-31 1995-01-01 d01 Engineer 60000 10003 end start deptno title salary empno Bob Bob name 1996-12-31 1995-06-01 70000 10003 1995-05-31 1995-01-01 60000 10003 end start salary empno
XML: hierarchical views with temporal groups
Temporal grouped models are more natural and powerful, but they did not fit in the flat relational model
XML Query languages can easily express temporal queries on these views.
Time-stamped tuples in relations
Temporally grouped time-stamped attribute values
Bob Bob Bob Bob name 1996-12-31 1996-02-01 d02 Tech Leader 70000 10003 1996-01-31 1995-10-01 d02 Sr Engineer 70000 10003 1995-09-30 1995-06-01 d01 Engineer 70000 10003 1995-05-31 1995-01-01 d01 Engineer 60000 10003 end start deptno title salary empno d02 1995-10-01: 1996-12-31 Tech Leader 1996-02-01: 1996-12-31 Sr Engineer 1995-10-01:1996-01-31 70000 1995-06-01:1996-12-31 d01 1995-01-01: 1995-09-30 Engineer 1995-01-01: 1995-09-30 60000 1995-01-01: 1996-05-31 10003 1995-01-01: 1996-12-31 Bob 1995-01-01: 1996-12-31 deptno title salary empno name
Historical XML Database Architecture Two Approaches
Native XML databases
Historical data are stored in native XML database
XML queries can be specified directly upon the database
Native XML databases: Tamino ( Software AG), eXcelon(XIS)
Historical view decomposed into relational databases as binary tables
Historical data can then be published as XML document through SQL/XML publishing functions; or queried through a middleware as XML views
Historical XML Views: Architecture Historical Database SQL Queries Historical Data Relational Data: Current Content XML VIEWS Temporal Queries Current Database Active Rules/ update logs
Relational Storage of Temporal Relational Data
employee( empno , name, sal, title, deptno)
Attribute history tables: employee_sal ( empno , sal, tstart, tend)
employee_title( empno , title, tstart, tend)
An internal relation for each time-varying attribute
XQuery statements on the XML views translated into SQL statements on the internal relations
Simulated data with history of 300,024 employees
Comparing: Native XML DBs:
SoftwareAG’s Tamino (text-based storage). XPath
eXcelon’s XIS (XML Information Server) (OODBMS-based storage). XQuery
Preliminary Performance Comparisons Storage Size:
Performance Comparisons (cont’d) Query Performance of DB2 and Tamino: Q2: history query Q4,Q6: snapshot queries Q3,Q5: interval queries Q1 : scan of databases Q7 : join
Performance Comparisons (cont’d)
Indexing: R* trees
Temporal clustering : tuples from the same time period should be assigned to same page
Page Usefulness method. A page with employee records for a department. After 60% quit that page is only 40% useful.
Compression should not be ruled out:
sparingly used in DBs, but important for XML
DB2 mainframes, Oracle …
Updates not a problem for histories.
XML Query languages are powerful and temporal queries can be expressed in XQuery without any extension, but not for all users
User-friendly QBE-like language for temporally grouped model
SQLXML temporal views and queries
ROLLUPS-like temporal views (and SQL:1999)
Different views—but the same RDBMS-based implementation underneath .
XML Representation of DB History Table Columns as XML Elements < employees tstart =" 1995-01-01 " tend =" 1996-12-31 "> < employee tstart =" 1995-01-01 " tend =" 1996-12-31 "> < empno tstart =" 1995-01-01 " tend =" 1996-12-31 "> 10003 </ empno > < name tstart =" 1995-01-01 " tend =" 1996-12-31 "> Bob </ name > < salary tstart =" 1995-01-01 " tend =" 1995-05-31 "> 60000 </ salary > < salary tstart =" 1995-06-01 " tend =" 1996-12-31 "> 70000 </ salary > < title tstart =" 1995-01-01 " tend =" 1995-09-30 "> Engineer </ title > < title tstart =" 1995-10-01 " tend =" 1996-01-31 "> Sr Engineer </ title > < title tstart =" 1996-02-01 " tend =" 1996-12-31 "> Tech Leader </ title > < dept tstart =" 1995-01-01 " tend =" 1995-09-30 "> QA </ dept > < dept tstart =" 1995-10-01 " tend =" 1996-12-31 "> RD </ dept > < DOB tstart =" 1995-01-01 " tend =" 1996-12-31 "> 1945-04-09 </ DOB > </ employee > <!-- More … --> </ employees >
S. Sarawagi, S. Thomas,R. Agrawal: Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications, SIGMOD 1998
Fusheng Wang, Carlo Zaniolo: Publishing and Querying the Histories of Archived Relational Databases in XML. 4thInternational Conference on Web Information Systems Engineering, December 10th - 12th, 2003 Roma, Italy.
Haixun Wang, Carlo Zaniolo, Chang Richard Luo: ATLaS: a Small but Complete SQL Extension for Data Mining and Data Streams. VLDB 2003--Demo.
Haixun Wang and Carlo Zaniolo: ATLaS: A Native Extension of SQL for Data Mining. SIAM International Conference on Data Mining 2003, San Francisco, CA, May 1-3, 2003
Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services, VLDB 2001.
Haixun Wang, Carlo Zaniolo: Using SQL to Build New Aggregates and Extenders for Object- Relational Systems. VLDB 2000.