zaniolo.ppt
Upcoming SlideShare
Loading in...5
×
 

zaniolo.ppt

on

  • 359 views

 

Statistics

Views

Total Views
359
Slideshare-icon Views on SlideShare
359
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    zaniolo.ppt zaniolo.ppt Presentation Transcript

    • Temporal Information and XML Carlo Zaniolo Department of Computer Science University of California, Los Angeles
    • A Short History of Time in Databases
      • Relational model: between 33 and 48 temporal DB proposals counted:
        • A struggle to get around the limitations of relational (flat) tables and a rigid query language (SQL)
        • A key issue : Temporal interval coalescing is needed after each projection!
        • Clustering, indexing, query optimization for temporal information add to the complexity
    • Coalescing
      • Time stamping the individual tuples: If we want the salary history, we have to coalesce the last three tuples into one:
      Bob Bob Bob Bob name 1996-12-31 1996-02-01 d02 Tech Leader 70000 10003 1996-01-31 1995-10-01 d02 Sr Engineer 70000 10003 1995-09-30 1995-06-01 d01 Engineer 70000 10003 1995-05-31 1995-01-01 d01 Engineer 60000 10003 end start deptno title salary empno Bob Bob name 1996-12-31 1995-06-01 70000 10003 1995-05-31 1995-01-01 60000 10003 end start salary empno
    • XML
      • XML: hierarchical views with temporal groups
        • Temporal grouped models are more natural and powerful, but they did not fit in the flat relational model
        • XML Query languages can easily express temporal queries on these views.
    • History Tables
        • Time-stamped tuples in relations
        • Temporally grouped time-stamped attribute values
      Bob Bob Bob Bob name 1996-12-31 1996-02-01 d02 Tech Leader 70000 10003 1996-01-31 1995-10-01 d02 Sr Engineer 70000 10003 1995-09-30 1995-06-01 d01 Engineer 70000 10003 1995-05-31 1995-01-01 d01 Engineer 60000 10003 end start deptno title salary empno d02 1995-10-01: 1996-12-31 Tech Leader 1996-02-01: 1996-12-31 Sr Engineer 1995-10-01:1996-01-31 70000 1995-06-01:1996-12-31 d01 1995-01-01: 1995-09-30 Engineer 1995-01-01: 1995-09-30 60000 1995-01-01: 1996-05-31 10003 1995-01-01: 1996-12-31 Bob 1995-01-01: 1996-12-31 deptno title salary empno name
    • Historical XML Database Architecture Two Approaches
      • Native XML databases
        • Historical data are stored in native XML database
        • XML queries can be specified directly upon the database
        • Native XML databases: Tamino ( Software AG), eXcelon(XIS)
      • XML-enabled RDBMS
        • Historical view decomposed into relational databases as binary tables
        • Historical data can then be published as XML document through SQL/XML publishing functions; or queried through a middleware as XML views
    • Historical XML Views: Architecture Historical Database SQL Queries Historical Data Relational Data: Current Content XML VIEWS Temporal Queries Current Database Active Rules/ update logs
    • Relational Storage of Temporal Relational Data
      • Relational schema:
      • employee( empno , name, sal, title, deptno)
      • Attribute history tables: employee_sal ( empno , sal, tstart, tend)
          • employee_title( empno , title, tstart, tend)
      • An internal relation for each time-varying attribute
      • XQuery statements on the XML views translated into SQL statements on the internal relations
    • Experiments
      • Simulated data with history of 300,024 employees
      • Comparing: Native XML DBs:
        • SoftwareAG’s Tamino (text-based storage). XPath
        • eXcelon’s XIS (XML Information Server) (OODBMS-based storage). XQuery
      • Against DB2.
    • Preliminary Performance Comparisons Storage Size:
    • Performance Comparisons (cont’d) Query Performance of DB2 and Tamino: Q2: history query Q4,Q6: snapshot queries Q3,Q5: interval queries Q1 : scan of databases Q7 : join
    • Performance Comparisons (cont’d)
    • Related Problems
      • Query Performance:
        • Indexing: R* trees
        • Temporal clustering : tuples from the same time period should be assigned to same page
          • Page Usefulness method. A page with employee records for a department. After 60% quit that page is only 40% useful.
      • Compression should not be ruled out:
        • sparingly used in DBs, but important for XML
        • DB2 mainframes, Oracle …
        • Updates not a problem for histories.
    • Research (cont.)
      • XML Query languages are powerful and temporal queries can be expressed in XQuery without any extension, but not for all users
        • User-friendly QBE-like language for temporally grouped model
        • SQLXML temporal views and queries
        • ROLLUPS-like temporal views (and SQL:1999)
      • Different views—but the same RDBMS-based implementation underneath .
    • XML Representation of DB History Table Columns as XML Elements < employees tstart =&quot; 1995-01-01 &quot; tend =&quot; 1996-12-31 &quot;> < employee tstart =&quot; 1995-01-01 &quot; tend =&quot; 1996-12-31 &quot;> < empno tstart =&quot; 1995-01-01 &quot; tend =&quot; 1996-12-31 &quot;> 10003 </ empno > < name tstart =&quot; 1995-01-01 &quot; tend =&quot; 1996-12-31 &quot;> Bob </ name > < salary tstart =&quot; 1995-01-01 &quot; tend =&quot; 1995-05-31 &quot;> 60000 </ salary > < salary tstart =&quot; 1995-06-01 &quot; tend =&quot; 1996-12-31 &quot;> 70000 </ salary > < title tstart =&quot; 1995-01-01 &quot; tend =&quot; 1995-09-30 &quot;> Engineer </ title > < title tstart =&quot; 1995-10-01 &quot; tend =&quot; 1996-01-31 &quot;> Sr Engineer </ title > < title tstart =&quot; 1996-02-01 &quot; tend =&quot; 1996-12-31 &quot;> Tech Leader </ title > < dept tstart =&quot; 1995-01-01 &quot; tend =&quot; 1995-09-30 &quot;> QA </ dept > < dept tstart =&quot; 1995-10-01 &quot; tend =&quot; 1996-12-31 &quot;> RD </ dept > < DOB tstart =&quot; 1995-01-01 &quot; tend =&quot; 1996-12-31 &quot;> 1945-04-09 </ DOB > </ employee > <!-- More … --> </ employees >
    • Thank you!
        • http://wis.cs.ucla.edu
    • References
      • S. Sarawagi, S. Thomas,R. Agrawal: Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications, SIGMOD 1998
      • Fusheng Wang, Carlo Zaniolo: Publishing and Querying the Histories of Archived Relational Databases in XML. 4thInternational Conference on Web Information Systems Engineering, December 10th - 12th, 2003 Roma, Italy.
      • Haixun Wang, Carlo Zaniolo, Chang Richard Luo: ATLaS: a Small but Complete SQL Extension for Data Mining and Data Streams. VLDB 2003--Demo.
      • Haixun Wang and Carlo Zaniolo: ATLaS: A Native Extension of SQL for Data Mining. SIAM International Conference on Data Mining 2003, San Francisco, CA, May 1-3, 2003
      • Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services, VLDB 2001.
      • Haixun Wang, Carlo Zaniolo: Using SQL to Build New Aggregates and Extenders for Object- Relational Systems. VLDB 2000.