• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SciQL, Bridging the Gap between Science and Relational DBMS
 

SciQL, Bridging the Gap between Science and Relational DBMS

on

  • 530 views

Presented at the 15th International Database Engineering & Applications Symposium (IDEAS 2011), September 21-23, 2011, Lisbon, Portugal.

Presented at the 15th International Database Engineering & Applications Symposium (IDEAS 2011), September 21-23, 2011, Lisbon, Portugal.

Statistics

Views

Total Views
530
Views on SlideShare
486
Embed Views
44

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 44

http://planet-data.eu 25
http://www.planet-data.eu 18
http://planet-data.org 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SciQL, Bridging the Gap between Science and Relational DBMS SciQL, Bridging the Gap between Science and Relational DBMS Presentation Transcript

    • SciQLBridging The Gap Between Science And Relational DBMS Martin Kersten, Ying Zhang, Milena Ivanova, Niels Nes CWI Amsterdam IDEAS 2011, Sep. 21-23, 2011,!"#$%&()*+,#-&$.#/(012#&+$#%3$%#,( Lisbon, Portugal 2.#(4&#$5()*+,#-&$".1(6&$& !"#$%&()&"#*+,-( ./0/123 4")*()5"%%,%*(*#-(( 6!7(8 9:7;;9
    • Who needs arrays anyway? Seismology – 1-D waveforms, 3-D spatial data Astronomy – temporal ordered rasters Climate simulation – temporal ordered grid Remote sensing – images of 2-D or higher Genomics – ordered DNA strings Scientists love arrays: HDF5, NETCDF, FITS, MSEED, … but also use: lists, tables, XML, ...2011-09-22 IDEAS 2011 2
    • Arrays In DBMS Research issues already in the 80’s OODB, multi-dimensional DBMS, Sequence DBMS, ... Algebraic frameworks The Longhorn Array Database (S)RAM, AQL, AML, ... RasDaMan SQL language extension Store large arrays in chunks as BLOBs RasQL, AQuery, SRQL, … Array query (RasQL) optimisation on top of DBMS a notion of order Known to work up to 12 TBs! SQL:1999, SQL:2003 PostgreSQL 8.1 collection type, C-style arrays SciDB aggregation functions over arrays Array DBMS from scratch Overlapping chunks for parallel execution2011-09-22 IDEAS 2011 3
    • What is the problem with RDBMS? Appropriate array denotations? Functional complete operation set? Size limitations (due to BLOB representations)? Existing foreign files? Scale? ...2011-09-22 IDEAS 2011 4
    • SciQL An array query language based on SQL:2003 Pronounced as ‘cycle’ Distinguish features: Arrays and tables as first class citizens of DBMSs Seamless integration of relational and array paradigms Named dimensions with constraints Flexible structure-based grouping Seismology use case2011-09-22 IDEAS 2011 5
    • Array Definitions Dimensions and cell values Dimension range: [(start|∗) : (step|∗) : (stop|∗)] A short cut for integer-typed dimensions: [size] Dimension data type: scalar data types Cells: ≽0 value(s) / cell all data types of normal table columns2011-09-22 IDEAS 2011 6
    • Array Definitions Fixed array CREATE ARRAY A1 ( x INT DIMENSION[0:1:4], y INT DIMENSION[0:1:4], v FLOAT DEFAULT 0.0); y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null2011-09-22 IDEAS 2011 7
    • Array Definitions Unbounded array CREATE ARRAY A2 ( x INT DIMENSION, y INT DIMENSION, v FLOAT DEFAULT 0.0); y 3 2 null 1 0 x 0 1 2 32011-09-22 IDEAS 2011 8
    • Array Definitions Unbounded array CREATE ARRAY A2 ( INSERT INTO A2 VALUES x INT DIMENSION, (1,0,5.5), (1,1,0.4), (2,2,4.5); y INT DIMENSION, v FLOAT DEFAULT 0.0); y 3 2 null 1 0 x 0 1 2 32011-09-22 IDEAS 2011 8
    • Array Definitions Unbounded array CREATE ARRAY A2 ( INSERT INTO A2 VALUES x INT DIMENSION, (1,0,5.5), (1,1,0.4), (2,2,4.5); y INT DIMENSION, v FLOAT DEFAULT 0.0); y null 3 2 0.0 4.5 null null 1 0.4 0.0 0 5.5 0.0 x 0 1 2 3 null2011-09-22 IDEAS 2011 8
    • Array Definitions Unbounded array CREATE ARRAY A2 ( INSERT INTO A2 VALUES x INT DIMENSION, (1,0,5.5), (1,1,0.4), (2,2,4.5); y INT DIMENSION, v FLOAT DEFAULT 0.0); current range y null 3 2 0.0 4.5 null null 1 0.4 0.0 0 5.5 0.0 x 0 1 2 3 null2011-09-22 IDEAS 2011 8
    • Array & Table Coercions SELECT x, y, v FROM A1; CREATE ARRAY A1 ( x y v x INT DIMENSION[0:1:4], y INT DIMENSION[0:1:4], 0 0 0.0 v FLOAT DEFAULT 0.0); 0 1 0.0 full materialisation! y null 0 2 0.0 3 0.0 0.0 0.0 0.0 0 3 0.0 2 0.0 0.0 0.0 0.0 null null 1 0 0.0 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 1 1 0.0 x 0 1 2 3 1 2 0.0 null 1 3 0.0 2 0 0.0 2 1 0.0 2 2 0.0 2 3 0.0 3 0 0.0 3 1 0.0 3 2 0.0 3 3 0.02011-09-22 IDEAS 2011 9
    • Array & Table Coercions SELECT [x], [y], v FROM T2; dimension qualifiers: ‘[’, ‘]’ CREATE TABLE T2 ( x INT, y INT, y v FLOAT DEFAULT 0.0); null INSERT INTO T2 VALUES 3 (1,0,5.5), (1,1,0.4), (2,2,4.5), (1,1,1.3); 2 0.0 4.5 x y v null null 1 0 5.5 1 0.4 0.0 1 1 0.4 0 5.5 0.0 2 2 4.5 x 1 1 1.3 0 1 2 3 null An unbounded array dimension ranges derived from the minimal bounding box cells values from the table or the column default duplicates are overwritten arbitrarily2011-09-22 IDEAS 2011 10
    • Array Modifications DELETE FROM A1 WHERE x = 1; y null 3 0.0 null 0.0 0.0 2 0.0 null 0.0 0.0 null null 1 0.0 null 0.0 0.0 0 0.0 null 0.0 0.0 x 0 1 2 3 null creates holes in the array2011-09-22 IDEAS 2011 11
    • Array Modifications UPDATE A1 SET v = 0.5 WHERE y = 1; INSERT INTO A1 VALUES (0,1,0.5), (1,1,0.5), (2,1,0.5), (3,1,0.5); y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.5 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null set/change cell values overwrite existing values2011-09-22 IDEAS 2011 12
    • Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 0.0 0.0 0.0 0.0 0.0null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.0 0.0 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null2011-09-22 IDEAS 2011 13
    • Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 0.0 0.0 0.0 0.0 0.0null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.0 0.0 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null2011-09-22 IDEAS 2011 13
    • Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null2011-09-22 IDEAS 2011 13
    • Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null2011-09-22 IDEAS 2011 13
    • Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null2011-09-22 IDEAS 2011 13
    • Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 1.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 1.0 0.0null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 1.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 1.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null2011-09-22 IDEAS 2011 13
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 14
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 14
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 14
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 14
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 14
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 14
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.125 0.25 0.25 0.25 0 0.125 0.25 0.25 0.25 0 1 2 3 x null2011-09-22 IDEAS 2011 15
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 16
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 17
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 18
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null2011-09-22 IDEAS 2011 19
    • Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.1 0.1 0.0 null null 1 0.125 0.2 0.3 0.25 0 0.0 0.125 0.125 0.167 0 1 2 3 x null2011-09-22 IDEAS 2011 20
    • Seismology Use Case Recent aftershock in Chili 2TB waveform data at 100Hz detecting seismic events using STA/LTA (e.g., 2 sec / 15 sec) remove false positives window-based 3 min. cuts further analysis: digital signal processing operations Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long2011-09-22 IDEAS 2011 21
    • Seismology Use Case Recent aftershock in Chili CREATE ARRAY MSeed ( station VARCHAR(5) DIMENSION [‘0’:*:‘ZZZZZ’]; time TIMESTAMP DIMENSION, 2TB waveform data at 100Hz data DECIMAL(8,6) ); detecting seismic events using STA/LTA (e.g., 2 sec / 15 sec) station remove false positives efg window-based 3 min. cuts bce further analysis: digital signal bcd processing operations abc Current problems time accessing waveform files too slow unpacking and positioning MSEED data every time take too long2011-09-22 IDEAS 2011 22
    • Seismology Use Case Recent aftershock in Chili --- avg of 2 sec. windows: SELECT M.station, M.time, AVG(M.data) 2TB waveform data at 100Hz FROM MSeed AS M GROUP BY detecting seismic events using M[station][time - INTERVAL ‘2’ SECOND : time]; STA/LTA (e.g., 2 sec / 15 sec) remove false positives window-based 3 min. cuts further analysis: digital signal processing operations Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long2011-09-22 IDEAS 2011 23
    • Seismology Use Case Recent aftershock in Chili CREATE TABLE Event( station VARCHAR(5), time TIMESTAMP, 2TB waveform data at 100Hz ratio FLOAT, PRIMARY KEY (station, time)); detecting seismic events using STA/LTA (e.g., 2 sec / 15 sec) INSERT INTO Event SELECT M1.station, M1.time, remove false positives AVG(M1.data)/AVG(M2.data) AS ratio FROM MSeed AS M1, MSeed AS M2 WHERE M1.station = M2.station window-based 3 min. cuts AND M1.time = M2.time GROUP BY further analysis: digital signal M1[station][time - INTERVAL ‘2’ SECOND: time], processing operations M2[station][time - INTERVAL ‘15’ SECOND: time] HAVING AVG(M1.data)/AVG(M2.data) > ?delta; Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long2011-09-22 IDEAS 2011 24
    • Seismology Use Case Recent aftershock in Chili -- detect isolated errors by direct environment -- using wave propagation statics 2TB waveform data at 100Hz CREATE TABLE Neighbors( station1 VARCHAR(5), detecting seismic events using station2 VARCHAR(5), STA/LTA (e.g., 2 sec / 15 sec) mindelay INTERVAL SECOND, maxdelay INTERVAL SECOND, remove false positives weight FLOAT ); window-based 3 min. cuts -- remove the false positives from Event further analysis: digital signal processing operations DELETE FROM Event WHERE id NOT IN ( SELECT E1.id Current problems FROM Event AS E1, Event AS E2, Neighbor AS N WHERE E1.station = N.station1 AND E2.station = N.station2 accessing waveform files too slow AND E2.time BETWEEN E1.time + N.mindelay AND E1.time + N.maxdelay unpacking and positioning MSEED AND E1.ratio > E2.ratio * N.weight); data every time take too long2011-09-22 IDEAS 2011 25
    • Seismology Use Case Recent aftershocks in Chili -- pass time series to a UDF, written in, e.g., C: SELECT myfunction(M[station].*) 2TB waveform data at 100Hz FROM MSeed AS M, Event AS E WHERE M.station = E.station detecting seismic events using AND M.time = E.time STA/LTA (e.g., 2 sec / 15 sec) GROUP BY DISTINCT M[station][time - INTERVAL ‘1’ MINUTE : remove false positives time + INTERVAL ‘2’ MINUTE]; window-based 3 min. cuts further analysis: digital signal processing operations Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long2011-09-22 IDEAS 2011 26
    • Conclusion SciQL: a first step towards a tailored scientific DBMS A symbiosis of relational and array paradigms Under active implementation Open issues: Appropriate array denotations Functional complete operation set Size limitations (due to BLOB representations) Existing foreign files !"#$%&()*+,#-&$.#/(012#&+$#%3$%#,( 2.#(4&#$5()*+,#-&$".1(6&$& Scale !"#$%&()&"#*+,-( ./0/123 4")*()5"%%,%*(*#-(( 6!7(8 9:7;;92011-09-22 IDEAS 2011 27