Your SlideShare is downloading. ×
Finding something different:        Arrays in database systems,             the next frontier ?                   Martin K...
Science applications© M Kersten 2012
Public database of 4-40 TBRelational schema of around 200 pages SQLRelational tables up to 20B elementsFinding closely rel...
The LOFAR radio telescope Complex image processing pipeline (Blue-gene ) Transient Sky Objects database (50TB/yr) Finding ...
Datawarehouse of seismic dataHighly compressed file repository (>3.5M files and 15- 150 TB)About to explode due to sensor ...
Remote sensingProcessing pipeline to interpret images < 1TB/ yrFinding and detecting forest fires© M Kersten 2012
Matlab               RDBMS                  Python               SQL                    C R               *-API           ...
AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation appro...
What is an array?An array is a systematic arrangement of objects addressed by dimension values.      Get(A, X, Y,…) => Val...
Who needs them anyway ?Seismology         – partial time-seriesClimate simulation – temporal ordered gridAstronomy        ...
Arrays in DBMSRelational prototype built on arrays, Peterlee IS Vehicle(1975)Persistent programming languages, Astral (198...
Array declarations:CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);CREATE TABLE tictactoe (...
SQL 2003Arrays are attribute type constructorsArrays can be declared without a maximum cardinalityArray nesting is unrestr...
Breaks large C++ arrays (rasters) into disjoint chunksMaps chunks into large binary objects (blob)Provide function interfa...
Breaks large C++ arrays (rasters) into overlapping  chunksBuilt storage manager from scratchMap-reduce processing modelPro...
What is the problem?-  Appropriate array denotations? Query language-  Functional complete operation set ?-  Mature implem...
AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation appro...
MonetDB SciQLSciQL (pronounced ‘cycle’ )•  A backward compatible extension of SQL’03•  Symbiosis of relational and array p...
Table vs ArraysCREATE TABLE tmpA collection of tuplesIndexed by a (primary) keyDefault handlingExplicitly created using  I...
Table vs arraysCREATE TABLE tmp             CREATE ARRAY tmpA collection of tuples       A collection of a priori defined ...
SciQL examplesCREATE TABLE matrix (  x integer,  y integer,  value floatPRIMARY KEY (x,y) );INSERT INTO matrix VALUES(0,0,...
SciQL examplesCREATE TABLE matrix (             CREATE ARRAY matrix (  x integer,                        x integer DIMENSI...
SciQL examplesCREATE TABLE matrix (         CREATE ARRAY matrix (  x integer,                     x integer DIMENSION[2], ...
SciQL examplesCREATE TABLE matrix (          CREATE ARRAY matrix (  x integer,                     x integer DIMENSION[2],...
SciQL unbounded arraysCREATE TABLE matrix (       CREATE ARRAY matrix (  x integer,                  x integer DIMENSION, ...
SciQL DimensionsUnbounded Dimensions  scalar-type DIMENSIONBounded Dimensions  scalar-type DIMENSION[stop]  scalar-type DI...
SciQL table queries-- Dimension names make query formulation easierCREATE ARRAY matrix (  x integer DIMENSION,  y integer ...
SciQL array queriesCREATE ARRAY matrix (           CREATE ARRAY result(  x integer DIMENSION,            x integer DIMENSI...
SciQL array viewsCREATE ARRAY vmatrix (  x integer DIMENSION[-1:5],  y integer DIMENSION[-1:5],  value float DEFAULT -1 )A...
SciQL tiling examples                   V0,3   V1,3   V2,3   V3,3                   V0,2   V1,2   V2,2   V3,2             ...
SciQL tiling examples                   V0,3   V1,3   V2,3   V3,3                   V0,2   V1,2   V2,2   V3,2             ...
SciQL tiling examples                   V0,3   V1,3   V2,3   V3,3       Anchor       Point       V0,2   V1,2   V2,2   V3,2...
SciQL tiling examples                   V0,3   V1,3   V2,3   V3,3  Anchor  Point            V0,2   V1,2   V2,2   V3,2     ...
SciQL, A Query Language for Science Applications•  Seamless integration of array-, set-, and sequence-   semantics.•  Dime...
AgendaArray support in database systemsSciQL array query languageUse-case exerciseA crash course on column-storesSciQL imp...
Seismology use caseRietbrock: Chili earthquake  … 2TB of wave fronts  … filter by sta/lta  … remove false positives  … win...
Use case, a SciQL dreamRietbrock: Chili earthquakecreate array mseed ( tick     timestamp dimension[ ‘2010’:*], data decim...
Use case, a SciQL dreamRietbrock: … filter by sta/lta--- average by window of 5 secondsselect A.tick, avg(A.data)from msee...
Use case, a SciQL dreamRietbrock: … filter by sta/ltaselect A.tickfrom mseed A, mseed Bwhere A.tick = B.tickand avg(A.data...
Use case, a SciQL dreamRietbrock: … filter by sta/ltacreate view candidates(  station string,  tick timestamp,  ratio floa...
Use case, a SciQL dreamRietbrock: … remove false positives-- remove isolated errors by direct environment-- using wave pro...
Use case, a SciQL dreamRietbrock: … remove false positivesselect A.tick, B.tick  from candidates A, candidates B, neighbor...
Use case, a SciQL dreamRietbrock: … remove false positivesdelete from candidates select A.tick from candidates A, candidat...
Use case, a SciQL dreamRietbrock: … window-based 3 min cuts  … heuristic testsselect B.station, myfunction(B.data)  from c...
AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation appro...
Storing Relations in MonetDBVoid          Void            Void        Void   Void1000           1000           1000       ...
BAT Data Structure                                          BAT:                                          binary associati...
Processing Model (MonetDB Kernel)‫‏‬  l    Bulk processing:         l  full materialization of all intermediate results ...
The Software Stack                                  Strategic optimizationFront-ends            SQL 03               MAL  ...
MonetDB Front-end: SQL    EXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x;                   function user.s2_1():void;     ...
AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation appro...
SciQL implementation•  Use the complete MonetDB software stack  •  Extend the SQL catalog to support SciQL  •  Extend the ...
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
Slicing a portion of an array is a ‘selection’© M Kersten 2012
˜© M Kersten 2012
It works© M Kersten 2012
Conclusions•  The language definition is ‘finished’•  Functional prototype is ‘around the corner’•  Exposure to real life ...
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
Science DBMS landscape                    MonetDB 5.23                  SciDB 0.5              RasdamanArchitecture       ...
Upcoming SlideShare
Loading in...5
×

Arrays in Databases, the next frontier?

1,813

Published on

The talk was delivered by Martin Kersten from CWI, Netherland, at the workshop on "Global Scientific Data Infrastructures: The Findability Challenge", held in Taormina, Sicily, Italy, on May 10-11, 2012.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,813
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Arrays in Databases, the next frontier?"

  1. 1. Finding something different: Arrays in database systems, the next frontier ? Martin Kersten CWI© M Kersten 2012
  2. 2. Science applications© M Kersten 2012
  3. 3. Public database of 4-40 TBRelational schema of around 200 pages SQLRelational tables up to 20B elementsFinding closely related sky objects 446   columns   >585   million  rows   6  columns   >  20  Billion   © M Kersten 2012 rows  
  4. 4. The LOFAR radio telescope Complex image processing pipeline (Blue-gene ) Transient Sky Objects database (50TB/yr) Finding transients within 4 seconds timeframe© M Kersten 2012
  5. 5. Datawarehouse of seismic dataHighly compressed file repository (>3.5M files and 15- 150 TB)About to explode due to sensor networkFinding warning signals© M Kersten 2012
  6. 6. Remote sensingProcessing pipeline to interpret images < 1TB/ yrFinding and detecting forest fires© M Kersten 2012
  7. 7. Matlab RDBMS Python SQL C R *-API SciQL Interdependent Software libaries FITS, mSEED, geoTIFF,… HDF5, NETCFD Datavault© M Kersten 2012
  8. 8. AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation approach© M Kersten 2012
  9. 9. What is an array?An array is a systematic arrangement of objects addressed by dimension values. Get(A, X, Y,…) => Value Set(A, X, Y,…) <= ValueThere are many species: vector, bit array, dynamic array, parallel array, sparse array, variable length array, jagged array© M Kersten 2012
  10. 10. Who needs them anyway ?Seismology – partial time-seriesClimate simulation – temporal ordered gridAstronomy – temporal ordered imagesRemote sensing – image processingSocial networks – graph algorithmsGenomics – ordered stringsForensics – images, strings, graphsScientists ‘love them’ : MSEED, NETCDF, FITS, CSV,..© M Kersten 2012
  11. 11. Arrays in DBMSRelational prototype built on arrays, Peterlee IS Vehicle(1975)Persistent programming languages, Astral (1980), Plain (1980)Object-orientation and persistent languages were the make belief to handle them, O2(1992)Several array algebras AML(2002), Aquery(2003), RAM (2004), SRAM(2012)© M Kersten 2012
  12. 12. Array declarations:CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);CREATE TABLE tictactoe ( squares integer[3][3] );Array operations: denotation ([]), contains (@>), is contained in (<@), append, concat (||), dimension, lower, upper, prepend, to-string, from- string, …Array constraints: none, no enforcement of dimensions. © M Kersten 2012
  13. 13. SQL 2003Arrays are attribute type constructorsArrays can be declared without a maximum cardinalityArray nesting is unrestricted.Query results can be converted into arrays.CREATE TABLE listbox( choices CHAR(3) ARRAY[1000] NOT NULL);INSERT INTO listbox_choicesVALUES( Department Names,ARRAY(SELECT name FROM sales.depts ORDER BY 1)); © M Kersten 2012
  14. 14. Breaks large C++ arrays (rasters) into disjoint chunksMaps chunks into large binary objects (blob)Provide function interface to access themRASCAL, a SQL92 extensionKnown to work up to 12 TBs.© M Kersten 2012
  15. 15. Breaks large C++ arrays (rasters) into overlapping chunksBuilt storage manager from scratchMap-reduce processing modelProvide function interface to access themAQL, a crippled SQL92© M Kersten 2012
  16. 16. What is the problem?-  Appropriate array denotations? Query language-  Functional complete operation set ?-  Mature implementations? Systems-  Size limitations due to (blob) representations ?-  Scale out?-  Community awareness? Education© M Kersten 2012
  17. 17. AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation approach© M Kersten 2012
  18. 18. MonetDB SciQLSciQL (pronounced ‘cycle’ )•  A backward compatible extension of SQL’03•  Symbiosis of relational and array paradigm•  Flexible structure-based grouping•  Capitalizes the MonetDB physical array storage •  Recycling, an adaptive ‘materialized view’ •  Zero-cost attachment contract for cooperative clients http://www.scilens.org/Resources/SciQL© M Kersten 2012
  19. 19. Table vs ArraysCREATE TABLE tmpA collection of tuplesIndexed by a (primary) keyDefault handlingExplicitly created using INS/UPD/DEL © M Kersten 2012
  20. 20. Table vs arraysCREATE TABLE tmp CREATE ARRAY tmpA collection of tuples A collection of a priori defined tuplesIndexed by a (primary) key Indexed by dimension expressionsDefault handling Implicitly defined by default value,Explicitly created using To be updated with INS/DEL/UPD INS/UPD/DEL © M Kersten 2012
  21. 21. SciQL examplesCREATE TABLE matrix ( x integer, y integer, value floatPRIMARY KEY (x,y) );INSERT INTO matrix VALUES(0,0,0),(0,1,0),(1,1,0)(1,0,0); 0 0 0 0 1 0 1 1 0 1 0 0 © M Kersten 2012
  22. 22. SciQL examplesCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0);PRIMARY KEY (x,y) );INSERT INTO matrix VALUES(0,0,0),(0,1,0),(1,1,0)(1,0,0); null … … … 0 0 0 null null null … 0 1 0 0 0 0 null … 1 1 0 1 0 0 0 0 null null 1 0 0 0 1 © M Kersten 2012
  23. 23. SciQL examplesCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0);PRIMARY KEY (x,y) );DELETE matrix WHERE y=1 DELETE matrix WHERE y=1 A hole in the array 0 0 0 null null 1 0 0 1 0 0 0 0 1 © M Kersten 2012
  24. 24. SciQL examplesCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0);PRIMARY KEY (x,y) );INSERT INTO matrix VALUES INSERT INTO matrix VALUES(0,1,1), (1,1,2) (0,1,1), (1,1,2) 0 0 0 1 0 0 1 2 1 0 1 1 0 0 0 1 1 2 0 1 © M Kersten 2012
  25. 25. SciQL unbounded arraysCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION, y integer, y integer DIMENSION, value float value float DEFAULT 0);PRIMARY KEY (x,y) );INSERT INTO matrix VALUES INSERT INTO matrix VALUES(0,2,1), (0,1,2) (0,2,1), (0,1,2) 0 2 1 2 1 0 0 1 2 1 0 0 0 0 2 © M Kersten 2012 0 1
  26. 26. SciQL DimensionsUnbounded Dimensions scalar-type DIMENSIONBounded Dimensions scalar-type DIMENSION[stop] scalar-type DIMENSION[first: step: stop] scalar-type DIMENSION[*: *: *]timestamp DIMENSION [ ‘2010-01-19’ : ‘1’ minute : *]© M Kersten 2012
  27. 27. SciQL table queries-- Dimension names make query formulation easierCREATE ARRAY matrix ( x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 );-- simple checker boarding aggregationSELECT sum(value) FROM matrix WHERE (x + y) % 2 = 0© M Kersten 2012
  28. 28. SciQL array queriesCREATE ARRAY matrix ( CREATE ARRAY result( x integer DIMENSION, x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 ); value float DEFAULT 0 );-- group based aggregation to construct an unbounded vectorSELECT [x], sum(value) FROM matrix WHERE (x + y) % 2 = 0 GROUP BY x;© M Kersten 2012
  29. 29. SciQL array viewsCREATE ARRAY vmatrix ( x integer DIMENSION[-1:5], y integer DIMENSION[-1:5], value float DEFAULT -1 )AS SELECT x, y, value FROM matrix; -1 -1 -1 -1 -1 0 0 -1 -1 0 0 -1 -1 -1 -1 -1© M Kersten 2012
  30. 30. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1AnchorPoint V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x : 1 : x+2][y : 1 : y+2];© M Kersten 2012
  31. 31. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1AnchorPoint V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY DISTINCT matrix[x:1:x+2][y:1:y+2];© M Kersten 2012
  32. 32. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 Anchor Point V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 null V0,0 V1,0 V2,0 V3,0 null null SELECT x, y, avg(value) FROM matrix GROUP BY DISTINCT matrix[x-1:1:x+1][y:1:y+2];© M Kersten 2012
  33. 33. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 Anchor Point V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x][y], matrix[x-1][y], matrix[x+1][y], matrix[x][y-1], matrix[x][y+1];© M Kersten 2012
  34. 34. SciQL, A Query Language for Science Applications•  Seamless integration of array-, set-, and sequence- semantics.•  Dimension constraints as a declarative means for indexed access to array cells.•  Structural grouping to generalize the value-based grouping towards selective access to groups of cells based on positional relationships for aggregation.© M Kersten 2012
  35. 35. AgendaArray support in database systemsSciQL array query languageUse-case exerciseA crash course on column-storesSciQL implementation approach© M Kersten 2012
  36. 36. Seismology use caseRietbrock: Chili earthquake … 2TB of wave fronts … filter by sta/lta … remove false positives … window-based 3 min cuts … heuristic tests … interactive response required …How can a database system help? Scanning 2TB on modern pc takes >3 hours© M Kersten 2012
  37. 37. Use case, a SciQL dreamRietbrock: Chili earthquakecreate array mseed ( tick timestamp dimension[ ‘2010’:*], data decimal(8,6), station string );© M Kersten 2012
  38. 38. Use case, a SciQL dreamRietbrock: … filter by sta/lta--- average by window of 5 secondsselect A.tick, avg(A.data)from mseed Agroup by A[tick:1:tick + 5 seconds]© M Kersten 2012
  39. 39. Use case, a SciQL dreamRietbrock: … filter by sta/ltaselect A.tickfrom mseed A, mseed Bwhere A.tick = B.tickand avg(A.data) / avg(B.data) > deltagroup by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds]© M Kersten 2012
  40. 40. Use case, a SciQL dreamRietbrock: … filter by sta/ltacreate view candidates( station string, tick timestamp, ratio float ) asselect A.station, A.tick, avg(A.data) / avg(B.data) as ratio from mseed A, mseed B where A.tick = B.tick and avg(A.data) / avg(B.data) > delta group by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds]© M Kersten 2012
  41. 41. Use case, a SciQL dreamRietbrock: … remove false positives-- remove isolated errors by direct environment-- using wave propagation staticscreate table neighbors( head string, tail string, delay timestamp, weight float)© M Kersten 2012
  42. 42. Use case, a SciQL dreamRietbrock: … remove false positivesselect A.tick, B.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio;© M Kersten 2012
  43. 43. Use case, a SciQL dreamRietbrock: … remove false positivesdelete from candidates select A.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio;© M Kersten 2012
  44. 44. Use case, a SciQL dreamRietbrock: … window-based 3 min cuts … heuristic testsselect B.station, myfunction(B.data) from candidates A, mseed B where A.tick = B.tick group by distinct B[tick:tick + 3 minutes];-- using a User Defined Function written in C.© M Kersten 2012
  45. 45. AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation approach© M Kersten 2012
  46. 46. Storing Relations in MonetDBVoid Void Void Void Void1000 1000 1000 1000 1000 . . . . . . . . . . . . . . . . . . . . . . . . .Virtual OID: seqbase=1000 (increment=1) © M Kersten 2012
  47. 47. BAT Data Structure BAT: binary association table Head Tail BUN: binary unit Hash tables, Head & Tail: BUN heap: T-trees, - consecutive memory R-trees, blocks (arrays)‫‏‬ block (array)‫‏‬ ... - memory-mapped file files Tail Heap: - best-effort duplicate elimination for strings© M Kersten 2012 (~ dictionary encoding)
  48. 48. Processing Model (MonetDB Kernel)‫‏‬ l  Bulk processing: l  full materialization of all intermediate results l  Binary (i.e., 2-column) algebra core: l  select, join, semijoin, outerjoin l  union, intersection, diff (BAT-wise & column-wise)‫‏‬ l  group, count, max, min, sum, avg l  reverse, mirror, mark l  Runtime operational optimization: l  Choosing optimal algorithm & implementation according to input properties and system status© M Kersten 2012
  49. 49. The Software Stack Strategic optimizationFront-ends SQL 03 MAL Optimizers Tactical optimization: MAL -> MAL rewritesBack-end(s) MonetDB 5 MAL Runtime Kernel MonetDB kernel operational optimization © M Kersten 2012
  50. 50. MonetDB Front-end: SQL EXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x; function user.s2_1():void; barrier _73 := language.dataflow(); _2:bat[:oid,:int] := sql.bind("sys","t","c",0); _7:bat[:oid,:int] := sql.bind("sys","s","x",0); _10 := bat.reverse(_7); _11 := algebra.join(_2,_10); _13 := algebra.markT(_11,0@0); _14 := bat.reverse(_13); _15:bat[:oid,:int] := sql.bind("sys","t","a",0); _17 := algebra.leftjoin(_14,_15); _18 := bat.reverse(_11); _19 := algebra.markT(_18,0@0); _20 := bat.reverse(_19); _21:bat[:oid,:int] := sql.bind("sys","s","z",0); _23 := algebra.leftjoin(_20,_21); exit _73; _24 := sql.resultSet(2,1,_17); sql.rsColumn(_24,"sys.t","a","int",32,0,_17); sql.rsColumn(_24,"sys.s","z","int",32,0,_23); _33 := io.stdout(); sql.exportResult(_33,_24); end s2_1;© M Kersten 2012
  51. 51. AgendaArray support in database systemsSciQL array query languageA crash course on column-storesSciQL implementation approach© M Kersten 2012
  52. 52. SciQL implementation•  Use the complete MonetDB software stack •  Extend the SQL catalog to support SciQL •  Extend the Kernel to support array processing •  Extend the optimizer stack for performance•  Aim for a functional implementation first •  Use tabular representation of arrays •  Reuse the SQL code generator© M Kersten 2012
  53. 53. © M Kersten 2012
  54. 54. © M Kersten 2012
  55. 55. © M Kersten 2012
  56. 56. © M Kersten 2012
  57. 57. © M Kersten 2012
  58. 58. © M Kersten 2012
  59. 59. Slicing a portion of an array is a ‘selection’© M Kersten 2012
  60. 60. ˜© M Kersten 2012
  61. 61. It works© M Kersten 2012
  62. 62. Conclusions•  The language definition is ‘finished’•  Functional prototype is ‘around the corner’•  Exposure to real life cases and external libraries•  MonetDB’s core technology was essential•  Challenge: ARRAYS FILES© M Kersten 2012
  63. 63. © M Kersten 2012
  64. 64. © M Kersten 2012
  65. 65. © M Kersten 2012
  66. 66. Science DBMS landscape MonetDB 5.23 SciDB 0.5 RasdamanArchitecture Server approach Server approach Plugin(Oracle, DB2, Informix, Mysql, Postgresql)Open source Mozilla License GPL 3.0 Commercial GPL 3.0 Dual licenseDownloads >12.000 /month Tens up to now ??SQL SQL 2003 ?? SQL92++Interoperability {JO}DBC, C(++),Python, … C++ UDF C++, Java, OGCArray language SciQL AQL RASQLArray model Fixed+variable bounds Fixed arrays Fixed+variable boundsScience Linked libraries Linked libraries Linked librariesForeign files Vaults of csv, FITS, ?? Tiff,png,jpg.., NETCDF, MSEED csv,,NETCDF,HDF4,Distribution 50-200 node cluster 4 node cluster 20-nodeDistribution tech Dynamic partial replication Static fragmentation Static fragmentationExecutor Various schemes Map-reduce Tile streamingLargest demo Skyserver SDSS 6 3TB --- 12TB, IGN –F (on Postgresql)Storage tuning Query adaptive Schema definitions Workload driven © M Kersten Heuristics + cost baseOptimization 2012 ?? Heuristics +cost based

×