Arrays in database systems, the next frontier?

1,182 views

Published on

This presentation was presented by Martin Kersten (CWI), well known in the Dutch eScience and scientific computing community, at the Netherlands eScience Center (NLeSC) on November 9, 2011 in Amsterdam, Netherlands.

Abstract of the presentation:
This presentation gives an introduction to NoSQL (Not only SQL) (pdf) databases with examples from MonetDB and discussed, applications and limitations.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,182
On SlideShare
0
From Embeds
0
Number of Embeds
68
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Arrays in database systems, the next frontier?

  1. 1. Arrays in database systems, the next frontier ? Martin Kersten CWI NLeSC 9 Nov 2011
  2. 2. “We cant solveproblems by usingthe same kind ofthinking we usedwhen we createdthem.” NLeSC 9 Nov 2011
  3. 3. AgendaA crash course on column-storesColumn stores for science applicationsThe SciQL array query language NLeSC 9 Nov 2011
  4. 4. The world of column stores MotivationRelational DBMSs dominate since the late 1970s / 1980s l  Transactional workloads (OLTP, row-wise access) l  I/O based processing l  Ingres, Postgresql, MySQL, Oracle, SQLserver, DB2, …Column stores dominate product development since 2005 l  Datawarehouses and business intelligence applications l  Startups: Infobright, Aster Data, Greenplum, LucidDB,.. l  Commercial: Microsoft, IBM, SAP,… MonetDB, the pioneer NLeSC 9 Nov 2011
  5. 5. The world of column storesWorkload changes: Transactions (OLTP) vs ...‫‏‬ NLeSC 9 Nov 2011
  6. 6. The world of column storesWorkload changes: ... vs OLAP, BI, Data Mining, ... NLeSC 9 Nov 2011
  7. 7. The world of column stores Databases hit The Memory Wall§  Detailed and exhaustive analysis for different workloads using 4 RDBMSs by Ailamaki, DeWitt, Hill,, Wood in VLDB 1999: “DBMSs On A Modern Processor: Where Does Time Go?”‫‏‬§  CPU is 60%-90% idle, waiting for memory: §  L1 data stalls §  L1 instruction stalls §  L2 data stalls §  TLB stalls §  Branch mispredictions §  Resource stalls NLeSC 9 Nov 2011
  8. 8. The world of column storesHardware Changes: The Memory WallTrip to memory = 1000s of instructions! NLeSC 9 Nov 2011
  9. 9. Storing Relations in MonetDBVoid Void Void Void Void1000 1000 1000 1000 1000 . . . . . . . . . . . . . . . . . . . . . . . . .Virtual OID: seqbase=1000 (increment=1) NLeSC 9 Nov 2011
  10. 10. BAT Data Structure BAT: binary association table Head Tail BUN: binary unitHash tables, Head & Tail: BUN heap:T-trees, - consecutive memoryR-trees, blocks (arrays)‫‏‬ block (array)‫‏‬... - memory-mapped file files Tail Heap: - best-effort duplicate elimination for strings (~ dictionary encoding) NLeSC 9 Nov 2011
  11. 11. MonetDB Front-end: SQLl  SQL 2003 l  Parse SQL into logical n-ary relational algebra tree l  Translate n-ary relational algebra into logical 2-ary relational algebra l  Turn logical 2-ary plan into physical 2-ary plan (MAL program)l  Front-end specific strategic optimization: l  Heuristic optimization during all three previous steps l  Primary key and distinct constraints: l  Create and maintain hash indices l  Foreign key constraints l  Create and maintain foreign key join indices NLeSC 9 Nov 2011
  12. 12. MonetDB Front-end: SQLEXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x;function user.s2_1():void;barrier _73 := language.dataflow(); _2:bat[:oid,:int] := sql.bind("sys","t","c",0); _7:bat[:oid,:int] := sql.bind("sys","s","x",0); _10 := bat.reverse(_7); _11 := algebra.join(_2,_10); _13 := algebra.markT(_11,0@0); _14 := bat.reverse(_13); _15:bat[:oid,:int] := sql.bind("sys","t","a",0); _17 := algebra.leftjoin(_14,_15); _18 := bat.reverse(_11); _19 := algebra.markT(_18,0@0); _20 := bat.reverse(_19); _21:bat[:oid,:int] := sql.bind("sys","s","z",0); _23 := algebra.leftjoin(_20,_21);exit _73; _24 := sql.resultSet(2,1,_17); sql.rsColumn(_24,"sys.t","a","int",32,0,_17); sql.rsColumn(_24,"sys.s","z","int",32,0,_23); _33 := io.stdout(); sql.exportResult(_33,_24);end s2_1; NLeSC 9 Nov 2011
  13. 13. MonetDB/5 Back-end: MALl  MAL: Monet Assembly Language l  textual interface l  Interpreted languagel  Designed as system interface language l  Reduced, concise syntax l  Strict typing l  Meant for automatic generation and parsing/rewriting/processing l  Not meant to be typed by humansl  Efficient parser l  Low overhead l  Inherent support for tactical optimization: MAL -> MAL l  Support for optimizer plug-ins l  Support for runtime schedulersl  Binary-algebra corel  Flow control (MAL is computational complete)‫‏‬ NLeSC 9 Nov 2011
  14. 14. Processing Model (MonetDB Kernel)‫‏‬l  Bulk processing: l  full materialization of all intermediate resultsl  Binary (i.e., 2-column) algebra core: l  select, join, semijoin, outerjoin l  union, intersection, diff (BAT-wise & column-wise)‫‏‬ l  group, count, max, min, sum, avg l  reverse, mirror, markl  Runtime operational optimization: l  Choosing optimal algorithm & implementation according to input properties and system status NLeSC 9 Nov 2011
  15. 15. Processing Model (MonetDB Kernel)‫‏‬ l  Heavy use of code expansion to reduce cost1 algebra operator select()‫‏‬3 overloaded operators select(“=“,value) select(“between”,L,H) select(“fcn”,parm)‫‏‬10 operator algorithms scan hash-lookup bin-search bin-tree pos-lookup scan_range_select_oid_int(),~1500(!) routines hash_equi_select_void_str(), …(macro expansion)‫‏‬ •  ~1500 selection routines •  149 unary operations •  335 join/group operations •  ... NLeSC 9 Nov 2011
  16. 16. The Software StackFront-ends XQuery SQL 03 SciQL RDF OptimizersBack-end(s) MonetDB 4 MonetDB 5 Kernel MonetDB kernel NLeSC 9 Nov 2011
  17. 17. The Software Stack Strategic optimizationFront-ends XQuery SQL 03 MAL Optimizers Tactical optimization: MAL -> MAL rewritesBack-end(s) MonetDB 4 MonetDB 5 MAL Runtime Kernel MonetDB kernel operational optimization NLeSC 9 Nov 2011
  18. 18. MonetDB vs Traditional DBMS Architecturel  Architecture-Conscious Query Processing vs Magnetic disk I/O conscious processing -  l  Data layout, algorithms, cost modelsl  RISC Relational Algebra (operator-at-a-time) - vs Tuple-at-a-time Iterator Model l  Faster through simplicity: no tuple expression interpreterl  Multi-Model: ODMG, SQL, XML/XQuery, ..., RDF/SPARQL vs Relational with Bolt-on Subsystems -  l  Columns as the building block for complex data structuresl  Decoupling of Transactions from Execution/Buffering vs ARIES integrated into Execution/Buffering/Indexing -  l  ACID, but not ARIES.. Pay as you need transaction overhead.l  Run-Time Indexing and Query Optimization - vs Static DBA/Workload-driven Optimization & Indexing l  Extensible Optimizer Framework; l  cracking, recycling, sampling-based runtime optimization NLeSC 9 Nov 2011
  19. 19. EvolutionIt is not the strongest of thespecies that survives, nor themost intelligent, but the onemost responsive to change. Charles Darwin (1809 - 1882) NLeSC 9 Nov 2011
  20. 20. AgendaA crash course on column-storesColumn stores for science applicationsThe SciQL array query language NLeSC 9 Nov 2011
  21. 21. NLeSC 9 Nov 2011
  22. 22. SkyServer Schema 446  columns   >585  million  rows   6  columns  >  20  Billion  rows   NLeSC 9 Nov 2011
  23. 23. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. motivation & idea ACM TODS 35(4), Dec. 2010Motivation: l  scientific databases, data analytics l  Terabytes of data (observational , transactional) l  Prevailing read-only workload l  Ad-hoc queries with commonalitiesBackground:l  Operator-at-a-time execution paradigm Ø  Automatic materialization of intermediatesl  Canonical column-store organization Ø  Intermediates have reduced dimensionality and finer granularity Ø  Simplified overlap analysisRecycling idea:l  instead of garbage collecting,l  keep the intermediates and reuse them l  speed up query streams with commonalities l  low cost and self-organization NLeSC 9 Nov 2011
  24. 24. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. fit into MonetDB ACM TODS 35(4), Dec. 2010 SQL   XQuery  func6on  user.s1_2(A0:date,  ...):void;        X5  :=  sql.bind("sys","lineitem",...);        X10  :=  algebra.select(X5,A0);          X12  :=  sql.bindIdx("sys","lineitem",...);     MAL        X15  :=  algebra.join(X10,X12);          X25  :=  m6me.addmonths(A1,A2);          ...   Recycler   Tac6cal  Op6mizer   Op6mizer  func6on  user.s1_2(A0:date,  ...):void;        X5  :=  sql.bind("sys","lineitem",...);   MAL        X10  :=  algebra.select(X5,A0);          X12  :=  sql.bindIdx("sys","lineitem",...);     Run-­‐6me  Support        X15  :=  algebra.join(X10,X12);     MonetDB  Kernel        X25  :=  m6me.addmonths(A1,A2);          ...   Admission  &  Evic6on   MonetDB   Recycle  Pool   Server   NLeSC 9 Nov 2011
  25. 25. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. instruction matching ACM TODS 35(4), Dec. 2010Run time comparison ofl  instruction typesl  argument values Y3  :=  sql.bind("sys","orders","o_orderdate",0);   Exact     X1  :=  sql.bind("sys","orders","o_orderdate",0);   …   matching   Name Value Data type Size X1 10 :bat[:oid,:date] T1 “sys” :str T2 “orders” :str … NLeSC 9 Nov 2011
  26. 26. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. instruction subsumption ACM TODS 35(4), Dec. 2010Y3  :=  algebra.select(X1,20,45);  X3  :=  algebra.select(X1,10,80);  …  X5   :=  algebra.select(X1,20,60);  X5   Name Value Data type Size X1 10 :bat[:oid,:int] 2000 X3 130 :bat[:oid,:int] 700 X5 150 :bat[:oid,:int] 350 … NLeSC 9 Nov 2011
  27. 27. “An architecture for recycling Recycler intermediates in a column-store”. Ivanova, Kersten, Nes, Goncalves. SkyServer evaluation ACM TODS 35(4), Dec. 2010Sloan Digital Sky Survey / SkyServerhttp://cas.sdss.orgl  100 GB subset of DR4l  100-query batch from January 2008 logl  1.5GB intermediates, 99% reusel  Join intermediates major consumer of memory and major contributor to savings NLeSC 9 Nov 2011
  28. 28. AgendaA crash course on column-storesColumn stores for science applicationsThe SciQL array query language NLeSC 9 Nov 2011
  29. 29. What is an array?An array is a systematic arrangement of objects addressed by dimension values. Get(A, X, Y,…) => Value Set(A, X, Y,…) <= ValueThere are many species: vector, bit array, dynamic array, parallel array, sparse array, variable length array, jagged array NLeSC 9 Nov 2011
  30. 30. Who needs them anyway ?Seismology – partial time-seriesClimate simulation – temporal ordered gridAstronomy – temporal ordered imagesRemote sensing – image processingSocial networks – graph algorithmsGenomics – ordered stringsScientists ‘love them’ : MSEED, NETCDF, FITS, CSV,.. NLeSC 9 Nov 2011
  31. 31. Arrays in DBMSRelational prototype built on arrays, Peterlee IS(1975)Persistent programming languages, Astral (1980), Plain (1980)Object-orientation and persistent languages were the make belief to handle them, O2(1992) NLeSC 9 Nov 2011
  32. 32. PostgreSQL 8.3Array declarations:CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);CREATE TABLE tictactoe ( squares integer[3][3] );Array operations: denotation ([]), contains (@>), is contained in (<@), append, concat (||), dimension, lower, upper, prepend, to-string, from- stringArray constraints: none, no enforcement of dimensions. NLeSC 9 Nov 2011
  33. 33. MysqlFrom the MySQL forum May 2010:“>How to store multiple values in a single field? Is there any arraydata type concept in mysql?>As Jörg said "Multiple values in a single field" would be an explicitviolation of the relational model..."“Is there any experience beyond encoding it as blobs? NLeSC 9 Nov 2011
  34. 34. RasdamanBreaks large C++ arrays (rasters) into disjoint chunksMaps chunks into large binary objects (blob)Provide function interface to access themRASCAL, a SQL92 extensionKnown to work up to 12 TBs. NLeSC 9 Nov 2011
  35. 35. SciDBBreaks large C++ arrays (rasters) into overlapping chunksBuilt storage manager from scratchMap-reduce processing modelProvide function interface to access themAQL, a crippled SQL92 NLeSC 9 Nov 2011
  36. 36. What is the problem?-  Appropriate array denotations?-  Functional complete operation set ?-  Scale ?-  Size limitations due to (blob) representations ?-  Community awareness? NLeSC 9 Nov 2011
  37. 37. MonetDB SciQLSciQL (pronounced ‘cycle’ )•  A backward compatible extension of SQL’03•  Symbiosis of relational and array paradigm•  Flexible structure-based grouping•  Capitalizes the MonetDB physical array storage •  Recycling, an adaptive ‘materialized view’ •  Zero-cost attachment contract for cooperative clients http://www.cwi.nl/~mk/SciQL.pdf NLeSC 9 Nov 2011
  38. 38. Table vs arraysCREATE TABLE tmpA collection of tuplesIndexed by a (primary) keyDefault handlingExplicitly created using INS/UPD/DEL NLeSC 9 Nov 2011
  39. 39. Table vs arraysCREATE TABLE tmp CREATE ARRAY tmpA collection of tuples A collection of a priori defined tuplesIndexed by a (primary) key Indexed by dimension expressionsDefault handling Implicitly defined by default value,Explicitly created using To be updated with INS/DEL/UPD INS/UPD/DEL NLeSC 9 Nov 2011
  40. 40. SciQL examplesCREATE TABLE matrix ( x integer, y integer, value floatPRIMARY KEY (x,y) );INSERT INTO matrix VALUES(0,0,0),(0,1,0),(1,1,0)(1,0,0); 0 0 0 0 1 0 1 1 0 1 0 0 NLeSC 9 Nov 2011
  41. 41. SciQL examplesCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0);PRIMARY KEY (x,y) );INSERT INTO matrix VALUES null … … …(0,0,0),(0,1,0),(1,1,0)(1,0,0); null null null … 0 0 0 0 0 0 null … 1 0 1 0 0 0 0 0 null null 1 1 0 0 1 1 0 0 NLeSC 9 Nov 2011
  42. 42. SciQL examplesCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0);PRIMARY KEY (x,y) );DELETE matrix WHERE y=1 DELETE matrix WHERE y=1 A hole in the array 0 0 0 null null 1 1 0 0 0 0 0 0 1 NLeSC 9 Nov 2011
  43. 43. SciQL examplesCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0);PRIMARY KEY (x,y) );INSERT INTO matrix VALUES INSERT INTO matrix VALUES(0,1,1), (1,1,2) (0,1,1), (1,1,2) 0 0 0 1 2 1 1 0 0 0 0 0 0 1 1 0 1 1 1 2 NLeSC 9 Nov 2011
  44. 44. SciQL unbounded arraysCREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION, y integer, y integer DIMENSION, value float value float DEFAULT 0);PRIMARY KEY (x,y) );INSERT INTO matrix VALUES INSERT INTO matrix VALUES(0,2,1), (0,1,2) (0,2,1), (0,1,2) 0 2 1 2 1 0 0 1 2 1 0 0 0 0 2 0 1 NLeSC 9 Nov 2011
  45. 45. SciQL DimensionsUnbounded Dimensions scalar-type DIMENSIONBounded Dimensions scalar-type DIMENSION[stop] scalar-type DIMENSION[first: step: stop] scalar-type DIMENSION[*: *: *]timestamp DIMENSION [ timestamp ‘2010-01-19’ : *: timestamp ‘1’ minute] NLeSC 9 Nov 2011
  46. 46. SciQL table queriesCREATE ARRAY matrix ( x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 );-- simple checker boarding aggregationSELECT sum(value) FROM matrix WHERE (x + y) % 2 = 0 NLeSC 9 Nov 2011
  47. 47. SciQL array queriesCREATE ARRAY matrix ( x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 );-- group based aggregation to construct an unbounded vectorSELECT [x], sum(value) FROM matrix WHERE (x + y) % 2 = 0 GROUP BY x; NLeSC 9 Nov 2011
  48. 48. SciQL array viewsCREATE ARRAY vmatrix ( x integer DIMENSION[-1:5], y integer DIMENSION[-1:5], value float DEFAULT -1 )AS SELECT x, y, value FROM matrix; -1 -1 -1 -1 -1 0 0 -1 -1 0 0 -1 -1 -1 -1 -1 NLeSC 9 Nov 2011
  49. 49. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1AnchorPoint V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x:1:x+2][y:1:y+2]; NLeSC 9 Nov 2011
  50. 50. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1AnchorPoint V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY DISTINCT matrix[x:1:x+2][y:1:y+2]; NLeSC 9 Nov 2011
  51. 51. SciQL tiling examples V0,3 V1,3 V2,3 V3,3AnchorPoint V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 null V0,0 V1,0 V2,0 V3,0 null nullSELECT x, y, avg(value)FROM matrixGROUP BY DISTINCT matrix[x-1:1:x+1][y:1:y+2]; NLeSC 9 Nov 2011
  52. 52. SciQL tiling examples V0,3 V1,3 V2,3 V3,3AnchorPoint V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x][y], matrix[x-1][y], matrix[x+1][y], matrix[x][y-1], matrix[x][y+1]; NLeSC 9 Nov 2011
  53. 53. SciQL, A Query Language for Science Applications•  Seamless integration of array-, set-, and sequence- semantics.•  Dimension constraints as a declarative means for indexed access to array cells.•  Structural grouping to generalize the value-based grouping towards selective access to groups of cells based on positional relationships for aggregation. NLeSC 9 Nov 2011
  54. 54. Seismology use caseRietbrock: Chili earthquake … 2TB of wave fronts … filter by sta/lta … remove false positives … window-based 3 min cuts … heuristic tests … interactive response required …How can a database system help? Scanning 2TB on modern pc takes >3 hours NLeSC 9 Nov 2011
  55. 55. Use case, a SciQL dreamRietbrock: Chili earthquakecreate array mseed ( tick timestamp dimension[timestamp ‘2010’:*], data decimal(8,6), station string ); NLeSC 9 Nov 2011
  56. 56. Use case, a SciQL dreamRietbrock: … filter by sta/lta--- average by window of 5 secondsselect A.tick, avg(A.data)from mseed Agroup by A[tick:1:tick + 5 seconds] NLeSC 9 Nov 2011
  57. 57. Use case, a SciQL dreamRietbrock: … filter by sta/ltaselect A.tickfrom mseed A, mseed Bwhere A.tick = B.tickand avg(A.data) / avg(B.data) > deltagroup by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds] NLeSC 9 Nov 2011
  58. 58. Use case, a SciQL dreamRietbrock: … filter by sta/ltacreate view candidates( station string, tick timestamp, ratio float ) asselect A.station, A.tick, avg(A.data) / avg(B.data) as ratio from mseed A, mseed B where A.tick = B.tick and avg(A.data) / avg(B.data) > delta group by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds] NLeSC 9 Nov 2011
  59. 59. Use case, a SciQL dreamRietbrock: … remove false positives-- remove isolated errors by direct environment-- using wave propagation staticscreate table neighbors( head string, tail string, delay timestamp, weight float) NLeSC 9 Nov 2011
  60. 60. Use case, a SciQL dreamRietbrock: … remove false positivesselect A.tick, B.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio; NLeSC 9 Nov 2011
  61. 61. Use case, a SciQL dreamRietbrock: … remove false positivesdelete from candidates select A.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio; NLeSC 9 Nov 2011
  62. 62. Use case, a SciQL dreamRietbrock: … window-based 3 min cuts … heuristic testsselect B.station, myfunction(B.data) from candidates A, mseed B where A.tick = B.tick group by distinct B[tick:tick + 3 minutes];-- using a User Defined Function written in C. NLeSC 9 Nov 2011
  63. 63. Use caseRietbrock: … interactive response required …The query over 2TB of seismic data will be handled before he finishes his coffee. NLeSC 9 Nov 2011
  64. 64. Status•  The language definition is ‘finished’•  The grammar is included in SQL parser•  Semantic checks added to SQL parser•  A test suite is being built•  Runtime support features and software stack•  …•  Exposure to real life cases and external libraries NLeSC 9 Nov 2011
  65. 65. NLeSC 9 Nov 2011
  66. 66. NLeSC 9 Nov 2011
  67. 67. Science DBMS landscape MonetDB 5.23 SciDB 0.5 RasdamanArchitecture Server approach Server approach Plugin(Oracle, DB2, Informix, Mysql, Postgresql)Open source Mozilla License GPL 3.0 Commercial GPL 3.0 Dual licenseDownloads >12.000 /month Tens up to now ??SQL SQL 2003 ?? SQL92++Interoperability {JO}DBC, C(++),Python, … C++ UDF C++, Java, OGCArray language SciQL AQL RASQLArray model Fixed+variable bounds Fixed arrays Fixed+variable boundsScience Linked libraries Linked libraries Linked librariesForeign files Vaults of csv, FITS, ?? Tiff,png,jpg.., NETCDF, MSEED csv,,NETCDF,HDF4,Distribution 50-200 node cluster 4 node cluster 20-nodeDistribution tech Dynamic partial replication Static fragmentation Static fragmentationExecutor Various schemes Map-reduce Tile streamingLargest demo Skyserver SDSS 6 3TB --- 12TB, IGN –F (on Postgresql)Storage tuning Query adaptive Schema definitions Workload drivenOptimization Heuristics + cost base ?? Heuristics +cost based NLeSC 9 Nov 2011

×