SlideShare a Scribd company logo
Finding something different:
        Arrays in database systems,
             the next frontier ?

                   Martin Kersten
                       CWI



© M Kersten 2012
Science applications




© M Kersten 2012
Public database of 4-40 TB
Relational schema of around 200 pages SQL
Relational tables up to 20B elements
Finding closely related sky objects

                             446	
  
                         columns	
  
                            >585	
  
                        million	
  rows	
  


                       6	
  columns	
  
                       >	
  20	
  Billion	
  
 © M Kersten 2012            rows	
  
The LOFAR radio telescope
 Complex image processing pipeline (Blue-gene )
 Transient Sky Objects database (50TB/yr)
 Finding transients within 4 seconds timeframe




© M Kersten 2012
Datawarehouse of seismic data
Highly compressed file repository
 (>3.5M files and 15- 150 TB)
About to explode due to sensor network
Finding warning signals




© M Kersten 2012
Remote sensing
Processing pipeline to interpret images < 1TB/ yr


Finding and detecting forest fires




© M Kersten 2012
Matlab
               RDBMS                  Python
               SQL                    C R
               *-API

                                                   SciQL


                                          Interdependent
                                          Software libaries
                       FITS, mSEED,
                       geoTIFF,…
                         HDF5,
                         NETCFD
                        Datavault
© M Kersten 2012
Agenda

Array support in database systems


SciQL array query language


A crash course on column-stores


SciQL implementation approach


© M Kersten 2012
What is an array?
An array is a systematic arrangement of objects
 addressed by dimension values.
      Get(A, X, Y,…) => Value
      Set(A, X, Y,…) <= Value


There are many species:
 vector, bit array, dynamic array, parallel array,
 sparse array, variable length array, jagged array



© M Kersten 2012
Who needs them anyway ?
Seismology         – partial time-series
Climate simulation – temporal ordered grid
Astronomy          – temporal ordered images
Remote sensing     – image processing
Social networks    – graph algorithms
Genomics           – ordered strings
Forensics          – images, strings, graphs
Scientists ‘love them’ : MSEED, NETCDF, FITS,
 CSV,..
© M Kersten 2012
Arrays in DBMS
Relational prototype built on arrays, Peterlee IS
 Vehicle(1975)


Persistent programming languages, Astral (1980), Plain
  (1980)


Object-orientation and persistent languages were the
 make belief to handle them, O2(1992)


Several array algebras AML(2002), Aquery(2003), RAM
  (2004), SRAM(2012)

© M Kersten 2012
Array declarations:
CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);
CREATE TABLE tictactoe ( squares integer[3][3] );



Array operations: denotation ([]), contains (@>), is
  contained in (<@), append, concat (||),
  dimension, lower, upper, prepend, to-string, from-
  string, …


Array constraints: none, no enforcement of
  dimensions.
 © M Kersten 2012
SQL 2003
Arrays are attribute type constructors
Arrays can be declared without a maximum cardinality
Array nesting is unrestricted.
Query results can be converted into arrays.


CREATE TABLE listbox( choices CHAR(3) ARRAY[1000] NOT NULL);
INSERT INTO listbox_choices
VALUES( 'Department Names',
ARRAY(SELECT name FROM sales.depts ORDER BY 1));




 © M Kersten 2012
Breaks large C++ arrays (rasters) into disjoint chunks

Maps chunks into large binary objects (blob)

Provide function interface to access them

RASCAL, a SQL92 extension

Known to work up to 12 TBs.


© M Kersten 2012
Breaks large C++ arrays (rasters) into overlapping
  chunks

Built storage manager from scratch

Map-reduce processing model

Provide function interface to access them

AQL, a crippled SQL92


© M Kersten 2012
What is the problem?

-  Appropriate array denotations? Query language
-  Functional complete operation set ?
-  Mature implementations? Systems
-  Size limitations due to (blob) representations ?
-  Scale out?
-  Community awareness? Education



© M Kersten 2012
Agenda

Array support in database systems


SciQL array query language


A crash course on column-stores


SciQL implementation approach


© M Kersten 2012
MonetDB SciQL

SciQL (pronounced ‘cycle’ )
•  A backward compatible extension of SQL’03
•  Symbiosis of relational and array paradigm
•  Flexible structure-based grouping
•  Capitalizes the MonetDB physical array storage
  •  Recycling, an adaptive ‘materialized view’
  •  Zero-cost attachment contract for cooperative clients
                   http://www.scilens.org/Resources/SciQL


© M Kersten 2012
Table vs Arrays

CREATE TABLE tmp
A collection of tuples


Indexed by a (primary) key


Default handling


Explicitly created using
  INS/UPD/DEL


 © M Kersten 2012
Table vs arrays

CREATE TABLE tmp             CREATE ARRAY tmp
A collection of tuples       A collection of a priori defined tuples


Indexed by a (primary) key   Indexed by dimension expressions


Default handling             Implicitly defined by default value,


Explicitly created using     To be updated with INS/DEL/UPD
  INS/UPD/DEL


 © M Kersten 2012
SciQL examples
CREATE TABLE matrix (
  x integer,
  y integer,
  value float
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES
(0,0,0),(0,1,0),(1,1,0)(1,0,0);
         0      0    0
         0      1    0
         1      1    0
         1      0    0
  © M Kersten 2012
SciQL examples
CREATE TABLE matrix (             CREATE ARRAY matrix (
  x integer,                        x integer DIMENSION[2],
  y integer,                        y integer DIMENSION[2],
  value float                       value float DEFAULT 0);
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES
(0,0,0),(0,1,0),(1,1,0)(1,0,0);
                                           null   …      …      …
         0      0    0
                                           null   null   null   …
         0      1    0
                                            0      0
                                                   0     null   …
         1      1    0                 1
                                       0    0      0
                                                   0     null   null
         1      0    0
                                            0      1
  © M Kersten 2012
SciQL examples
CREATE TABLE matrix (         CREATE ARRAY matrix (
  x integer,                     x integer DIMENSION[2],
  y integer,                     y integer DIMENSION[2],
  value float                    value float DEFAULT 0);
PRIMARY KEY (x,y) );


DELETE matrix WHERE y=1       DELETE matrix WHERE y=1
                              A hole in the array

        0       0   0
                                                    null   null
        1       0   0                        1
                                             0       0      0
                                                     0      1
 © M Kersten 2012
SciQL examples
CREATE TABLE matrix (          CREATE ARRAY matrix (
  x integer,                     x integer DIMENSION[2],
  y integer,                     y integer DIMENSION[2],
  value float                    value float DEFAULT 0);
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES      INSERT INTO matrix VALUES
(0,1,1), (1,1,2)               (0,1,1), (1,1,2)
         0      0    0
         1      0    0
                                                  1   2
                                           1
         0      1    1
                                           0      0   0
         1      1    2
                                                  0   1
  © M Kersten 2012
SciQL unbounded arrays
CREATE TABLE matrix (       CREATE ARRAY matrix (
  x integer,                  x integer DIMENSION,
  y integer,                  y integer DIMENSION,
  value float                 value float DEFAULT 0);
PRIMARY KEY (x,y) );


INSERT INTO matrix VALUES   INSERT INTO matrix VALUES
(0,2,1), (0,1,2)            (0,2,1), (0,1,2)


         0      2    1                  2      1   0

         0      1    2                  1      0   0
                                        0      0   2
  © M Kersten 2012                             0   1
SciQL Dimensions
Unbounded Dimensions
  scalar-type DIMENSION


Bounded Dimensions
  scalar-type DIMENSION[stop]
  scalar-type DIMENSION[first: step: stop]
  scalar-type DIMENSION[*: *: *]


timestamp DIMENSION [ ‘2010-01-19’ : ‘1’ minute : *]

© M Kersten 2012
SciQL table queries
-- Dimension names make query formulation easier
CREATE ARRAY matrix (
  x integer DIMENSION,
  y integer DIMENSION,
  value float DEFAULT 0 );


-- simple checker boarding aggregation
SELECT sum(value) FROM matrix WHERE (x + y) % 2 = 0




© M Kersten 2012
SciQL array queries
CREATE ARRAY matrix (           CREATE ARRAY result(
  x integer DIMENSION,            x integer DIMENSION,
  y integer DIMENSION,            value float DEFAULT 0 );
  value float DEFAULT 0 );



-- group based aggregation to construct an unbounded vector
SELECT [x], sum(value) FROM matrix
  WHERE (x + y) % 2 = 0
  GROUP BY x;

© M Kersten 2012
SciQL array views
CREATE ARRAY vmatrix (
  x integer DIMENSION[-1:5],
  y integer DIMENSION[-1:5],
  value float DEFAULT -1 )
AS SELECT x, y, value FROM matrix;


                   -1   -1   -1    -1
                   -1   0      0   -1
                   -1   0      0   -1
                   -1   -1   -1    -1



© M Kersten 2012
SciQL tiling examples
                   V0,3   V1,3   V2,3   V3,3


                   V0,2   V1,2   V2,2   V3,2


                   V0,1   V1,1   V2,1   V3,1

Anchor
Point              V0,0   V1,0   V2,0   V3,0




           SELECT x, y, avg(value)
           FROM matrix
           GROUP BY matrix[x : 1 : x+2][y : 1 : y+2];


© M Kersten 2012
SciQL tiling examples
                   V0,3   V1,3   V2,3   V3,3


                   V0,2   V1,2   V2,2   V3,2


                   V0,1   V1,1   V2,1   V3,1

Anchor
Point              V0,0   V1,0   V2,0   V3,0




         SELECT x, y, avg(value)
         FROM matrix
         GROUP BY DISTINCT matrix[x:1:x+2][y:1:y+2];


© M Kersten 2012
SciQL tiling examples
                   V0,3   V1,3   V2,3   V3,3

       Anchor
       Point       V0,2   V1,2   V2,2   V3,2


                   V0,1   V1,1   V2,1   V3,1
           null

                   V0,0   V1,0   V2,0   V3,0
           null                                null



     SELECT x, y, avg(value)
     FROM matrix
     GROUP BY DISTINCT matrix[x-1:1:x+1][y:1:y+2];


© M Kersten 2012
SciQL tiling examples
                   V0,3   V1,3   V2,3   V3,3

  Anchor
  Point            V0,2   V1,2   V2,2   V3,2


                   V0,1   V1,1   V2,1   V3,1


                   V0,0   V1,0   V2,0   V3,0




           SELECT x, y, avg(value)
           FROM matrix
           GROUP BY matrix[x][y],
            matrix[x-1][y], matrix[x+1][y],
            matrix[x][y-1], matrix[x][y+1];
© M Kersten 2012
SciQL, A Query Language for Science Applications


•  Seamless integration of array-, set-, and sequence-
   semantics.
•  Dimension constraints as a declarative means for
   indexed access to array cells.
•  Structural grouping to generalize the value-based
   grouping towards selective access to groups of cells
   based on positional relationships for aggregation.




© M Kersten 2012
Agenda
Array support in database systems

SciQL array query language

Use-case exercise

A crash course on column-stores

SciQL implementation approach

© M Kersten 2012
Seismology use case
Rietbrock: Chili earthquake
  … 2TB of wave fronts
  … filter by sta/lta
  … remove false positives
  … window-based 3 min cuts
  … heuristic tests
  … interactive response required …


How can a database system help?
  Scanning 2TB on modern pc takes >3 hours

© M Kersten 2012
Use case, a SciQL dream
Rietbrock: Chili earthquake
create array mseed (
 tick     timestamp dimension[ ‘2010’:*],
 data decimal(8,6),
 station string );




© M Kersten 2012
Use case, a SciQL dream
Rietbrock: … filter by sta/lta


--- average by window of 5 seconds
select A.tick, avg(A.data)
from mseed A
group by A[tick:1:tick + 5 seconds]




© M Kersten 2012
Use case, a SciQL dream
Rietbrock: … filter by sta/lta
select A.tick
from mseed A, mseed B
where A.tick = B.tick
and avg(A.data) / avg(B.data) > delta
group by A[tick:tick + 5 seconds],
  B[tick:tick + 15 seconds]



© M Kersten 2012
Use case, a SciQL dream
Rietbrock: … filter by sta/lta
create view candidates(
  station string,
  tick timestamp,
  ratio float ) as
select A.station, A.tick, avg(A.data) / avg(B.data) as ratio
  from mseed A, mseed B
  where A.tick = B.tick
  and avg(A.data) / avg(B.data) > delta
  group by A[tick:tick + 5 seconds],
   B[tick:tick + 15 seconds]
© M Kersten 2012
Use case, a SciQL dream
Rietbrock: … remove false positives
-- remove isolated errors by direct environment
-- using wave propagation statics

create table neighbors(
  head string,
  tail string,
  delay timestamp,
  weight float)

© M Kersten 2012
Use case, a SciQL dream
Rietbrock: … remove false positives
select A.tick, B.tick
  from candidates A, candidates B, neighbors N
 where A.station = N.head
 and B.station = N.tail
 and B.tick = A.tick + N.delay
 and B.ratio * N.weight < A.ratio;




© M Kersten 2012
Use case, a SciQL dream
Rietbrock: … remove false positives
delete from candidates
 select A.tick
 from candidates A, candidates B, neighbors N
 where A.station = N.head
 and B.station = N.tail
 and B.tick = A.tick + N.delay
 and B.ratio * N.weight < A.ratio;



© M Kersten 2012
Use case, a SciQL dream
Rietbrock: … window-based 3 min cuts
  … heuristic tests


select B.station, myfunction(B.data)
  from candidates A, mseed B
 where A.tick = B.tick
 group by distinct B[tick:tick + 3 minutes];


-- using a User Defined Function written in C.

© M Kersten 2012
Agenda

Array support in database systems


SciQL array query language


A crash course on column-stores


SciQL implementation approach


© M Kersten 2012
Storing Relations in MonetDB




Void          Void            Void        Void   Void
1000           1000           1000        1000   1000
  .             .               .           .      .

  .             .               .           .      .

  .             .               .           .      .

  .             .               .           .      .

  .             .               .           .      .




Virtual OID: seqbase=1000 (increment=1)
   © M Kersten 2012
BAT Data Structure




                                          BAT:
                                          binary association table
                   Head   Tail
                                          BUN:
                                          binary unit

  Hash tables,                            Head & Tail:
                                          BUN heap:
  T-trees,                                - consecutive memory
  R-trees,                                  blocks (arrays)‫‏‬
                                            block (array)‫‏‬
  ...                                     - memory-mapped file
                                                             files

                                          Tail Heap:
                                           - best-effort duplicate
                                             elimination for strings
© M Kersten 2012                            (~ dictionary encoding)
Processing Model (MonetDB Kernel)‫‏‬

  l    Bulk processing:
         l  full materialization of all intermediate results

  l    Binary (i.e., 2-column) algebra core:
         l  select, join, semijoin, outerjoin
         l  union, intersection, diff (BAT-wise & column-wise)‫‏‬
         l  group, count, max, min, sum, avg
         l  reverse, mirror, mark

  l    Runtime operational optimization:
         l  Choosing optimal algorithm & implementation according to
             input properties and system status


© M Kersten 2012
The Software Stack

                                  Strategic optimization

Front-ends            SQL 03               MAL

                     Optimizers   Tactical optimization:
                                  MAL -> MAL rewrites

Back-end(s)          MonetDB 5             MAL

                                       Runtime
  Kernel        MonetDB kernel        operational
                                      optimization




  © M Kersten 2012
MonetDB Front-end: SQL
    EXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x;
                   function user.s2_1():void;
                   barrier _73 := language.dataflow();
                     _2:bat[:oid,:int] := sql.bind("sys","t","c",0);
                     _7:bat[:oid,:int] := sql.bind("sys","s","x",0);
                     _10 := bat.reverse(_7);
                     _11 := algebra.join(_2,_10);
                     _13 := algebra.markT(_11,0@0);
                     _14 := bat.reverse(_13);
                     _15:bat[:oid,:int] := sql.bind("sys","t","a",0);
                     _17 := algebra.leftjoin(_14,_15);
                     _18 := bat.reverse(_11);
                     _19 := algebra.markT(_18,0@0);
                     _20 := bat.reverse(_19);
                     _21:bat[:oid,:int] := sql.bind("sys","s","z",0);
                     _23 := algebra.leftjoin(_20,_21);
                   exit _73;
                     _24 := sql.resultSet(2,1,_17);
                     sql.rsColumn(_24,"sys.t","a","int",32,0,_17);
                     sql.rsColumn(_24,"sys.s","z","int",32,0,_23);
                     _33 := io.stdout();
                     sql.exportResult(_33,_24);
                   end s2_1;
© M Kersten 2012
Agenda

Array support in database systems


SciQL array query language


A crash course on column-stores


SciQL implementation approach


© M Kersten 2012
SciQL implementation
•  Use the complete MonetDB software stack
  •  Extend the SQL catalog to support SciQL
  •  Extend the Kernel to support array processing
  •  Extend the optimizer stack for performance


•  Aim for a functional implementation first
  •  Use tabular representation of arrays
  •  Reuse the SQL code generator




© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
Slicing a portion of an array is a ‘selection’




© M Kersten 2012
˜




© M Kersten 2012
It works




© M Kersten 2012
Conclusions
•  The language definition is ‘finished’
•  Functional prototype is ‘around the corner’
•  Exposure to real life cases and external libraries
•  MonetDB’s core technology was essential
•  Challenge:
                               ARRAYS




                      FILES
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
© M Kersten 2012
Science DBMS landscape
                    MonetDB 5.23                  SciDB 0.5              Rasdaman
Architecture        Server approach               Server approach        Plugin(Oracle, DB2, Informix,
                                                                         Mysql, Postgresql)
Open source         Mozilla License               GPL 3.0 Commercial     GPL 3.0 Dual license
Downloads           >12.000 /month                Tens up to now         ??
SQL                 SQL 2003                      ??                     SQL92++
Interoperability    {JO}DBC, C(++),Python, …      C++ UDF                C++, Java, OGC
Array language      SciQL                         AQL                    RASQL
Array model         Fixed+variable bounds         Fixed arrays           Fixed+variable bounds
Science             Linked libraries              Linked libraries       Linked libraries
Foreign files       Vaults of csv, FITS,          ??                     Tiff,png,jpg..,
                    NETCDF, MSEED                                        csv,,NETCDF,HDF4,
Distribution        50-200 node cluster           4 node cluster         20-node
Distribution tech   Dynamic partial replication   Static fragmentation   Static fragmentation
Executor            Various schemes               Map-reduce             Tile streaming
Largest demo        Skyserver SDSS 6 3TB          ---                    12TB, IGN –F (on Postgresql)
Storage tuning      Query adaptive                Schema definitions     Workload driven
    © M Kersten Heuristics + cost base
Optimization    2012                              ??                     Heuristics +cost based

More Related Content

What's hot

Metric Embeddings and Expanders
Metric Embeddings and ExpandersMetric Embeddings and Expanders
Metric Embeddings and Expanders
Grigory Yaroslavtsev
 
FANNY HANIFAH (208700530)_INVENTORY_T1
FANNY HANIFAH (208700530)_INVENTORY_T1FANNY HANIFAH (208700530)_INVENTORY_T1
FANNY HANIFAH (208700530)_INVENTORY_T1
Nifya Nafhhan
 
Chapter 15
Chapter 15Chapter 15
Chapter 15
ramiz100111
 
Disney Effects: Building web/mobile castle in OpenGL 2D & 3D
Disney Effects: Building web/mobile castle in OpenGL 2D & 3DDisney Effects: Building web/mobile castle in OpenGL 2D & 3D
Disney Effects: Building web/mobile castle in OpenGL 2D & 3D
SVWB
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
frdos
 
Texto de matemática y lógica
Texto de matemática y lógicaTexto de matemática y lógica
Texto de matemática y lógica
Odín Zapata
 
Window on Teaching: Visualising students' feedback - Federico Botta
Window on Teaching: Visualising students' feedback - Federico BottaWindow on Teaching: Visualising students' feedback - Federico Botta
Window on Teaching: Visualising students' feedback - Federico Botta
TeachingGrid
 

What's hot (7)

Metric Embeddings and Expanders
Metric Embeddings and ExpandersMetric Embeddings and Expanders
Metric Embeddings and Expanders
 
FANNY HANIFAH (208700530)_INVENTORY_T1
FANNY HANIFAH (208700530)_INVENTORY_T1FANNY HANIFAH (208700530)_INVENTORY_T1
FANNY HANIFAH (208700530)_INVENTORY_T1
 
Chapter 15
Chapter 15Chapter 15
Chapter 15
 
Disney Effects: Building web/mobile castle in OpenGL 2D & 3D
Disney Effects: Building web/mobile castle in OpenGL 2D & 3DDisney Effects: Building web/mobile castle in OpenGL 2D & 3D
Disney Effects: Building web/mobile castle in OpenGL 2D & 3D
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
 
Texto de matemática y lógica
Texto de matemática y lógicaTexto de matemática y lógica
Texto de matemática y lógica
 
Window on Teaching: Visualising students' feedback - Federico Botta
Window on Teaching: Visualising students' feedback - Federico BottaWindow on Teaching: Visualising students' feedback - Federico Botta
Window on Teaching: Visualising students' feedback - Federico Botta
 

Similar to Arrays in Databases, the next frontier?

SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
PlanetData Network of Excellence
 
SciQL, A Query Language for Science Applications
SciQL, A Query Language for Science ApplicationsSciQL, A Query Language for Science Applications
SciQL, A Query Language for Science Applications
PlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
PlanetData Network of Excellence
 
Yasser y thesis
Yasser y thesisYasser y thesis
Yasser y thesis
Yasser Yahiaoui
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
Jordan McBain
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
krishna_093
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 
ICSM07.ppt
ICSM07.pptICSM07.ppt
ICSM07.ppt
Ptidej Team
 
principle component analysis.pptx
principle component analysis.pptxprinciple component analysis.pptx
principle component analysis.pptx
wahid ullah
 
Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCA
Abdullah al Mamun
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
Arrays
ArraysArrays
Arrays
Komal Singh
 
SQLBits X SQL Server 2012 Spatial Indexing
SQLBits X SQL Server 2012 Spatial IndexingSQLBits X SQL Server 2012 Spatial Indexing
SQLBits X SQL Server 2012 Spatial Indexing
Michael Rys
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
Advanced-Concepts-Team
 
Write appropriate SQL DDL statements (Create Table Statements) for d.pdf
Write appropriate SQL DDL statements (Create Table Statements) for d.pdfWrite appropriate SQL DDL statements (Create Table Statements) for d.pdf
Write appropriate SQL DDL statements (Create Table Statements) for d.pdf
info961251
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
Kazuki Yoshida
 
Enter The Matrix
Enter The MatrixEnter The Matrix
Enter The Matrix
Mike Anderson
 
Sets, maps and hash tables (Java Collections)
Sets, maps and hash tables (Java Collections)Sets, maps and hash tables (Java Collections)
Sets, maps and hash tables (Java Collections)
Fulvio Corno
 
An introduction to scala
An introduction to scalaAn introduction to scala
An introduction to scala
Mohsen Zainalpour
 
Introducing Reactive Machine Learning
Introducing Reactive Machine LearningIntroducing Reactive Machine Learning
Introducing Reactive Machine Learning
Jeff Smith
 

Similar to Arrays in Databases, the next frontier? (20)

SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
SciQL, A Query Language for Science Applications
SciQL, A Query Language for Science ApplicationsSciQL, A Query Language for Science Applications
SciQL, A Query Language for Science Applications
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
Yasser y thesis
Yasser y thesisYasser y thesis
Yasser y thesis
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
ICSM07.ppt
ICSM07.pptICSM07.ppt
ICSM07.ppt
 
principle component analysis.pptx
principle component analysis.pptxprinciple component analysis.pptx
principle component analysis.pptx
 
Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCA
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
Arrays
ArraysArrays
Arrays
 
SQLBits X SQL Server 2012 Spatial Indexing
SQLBits X SQL Server 2012 Spatial IndexingSQLBits X SQL Server 2012 Spatial Indexing
SQLBits X SQL Server 2012 Spatial Indexing
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
 
Write appropriate SQL DDL statements (Create Table Statements) for d.pdf
Write appropriate SQL DDL statements (Create Table Statements) for d.pdfWrite appropriate SQL DDL statements (Create Table Statements) for d.pdf
Write appropriate SQL DDL statements (Create Table Statements) for d.pdf
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Enter The Matrix
Enter The MatrixEnter The Matrix
Enter The Matrix
 
Sets, maps and hash tables (Java Collections)
Sets, maps and hash tables (Java Collections)Sets, maps and hash tables (Java Collections)
Sets, maps and hash tables (Java Collections)
 
An introduction to scala
An introduction to scalaAn introduction to scala
An introduction to scala
 
Introducing Reactive Machine Learning
Introducing Reactive Machine LearningIntroducing Reactive Machine Learning
Introducing Reactive Machine Learning
 

More from PlanetData Network of Excellence

Dl2014 slides
Dl2014 slidesDl2014 slides
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
PlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
PlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
PlanetData Network of Excellence
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
PlanetData Network of Excellence
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
PlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
PlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
PlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
PlanetData Network of Excellence
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
PlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
PlanetData Network of Excellence
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
PlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
PlanetData Network of Excellence
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
PlanetData Network of Excellence
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
PlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
PlanetData Network of Excellence
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
PlanetData Network of Excellence
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
PlanetData Network of Excellence
 

More from PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 

Recently uploaded

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 

Recently uploaded (20)

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 

Arrays in Databases, the next frontier?

  • 1. Finding something different: Arrays in database systems, the next frontier ? Martin Kersten CWI © M Kersten 2012
  • 3. Public database of 4-40 TB Relational schema of around 200 pages SQL Relational tables up to 20B elements Finding closely related sky objects 446   columns   >585   million  rows   6  columns   >  20  Billion   © M Kersten 2012 rows  
  • 4. The LOFAR radio telescope Complex image processing pipeline (Blue-gene ) Transient Sky Objects database (50TB/yr) Finding transients within 4 seconds timeframe © M Kersten 2012
  • 5. Datawarehouse of seismic data Highly compressed file repository (>3.5M files and 15- 150 TB) About to explode due to sensor network Finding warning signals © M Kersten 2012
  • 6. Remote sensing Processing pipeline to interpret images < 1TB/ yr Finding and detecting forest fires © M Kersten 2012
  • 7. Matlab RDBMS Python SQL C R *-API SciQL Interdependent Software libaries FITS, mSEED, geoTIFF,… HDF5, NETCFD Datavault © M Kersten 2012
  • 8. Agenda Array support in database systems SciQL array query language A crash course on column-stores SciQL implementation approach © M Kersten 2012
  • 9. What is an array? An array is a systematic arrangement of objects addressed by dimension values. Get(A, X, Y,…) => Value Set(A, X, Y,…) <= Value There are many species: vector, bit array, dynamic array, parallel array, sparse array, variable length array, jagged array © M Kersten 2012
  • 10. Who needs them anyway ? Seismology – partial time-series Climate simulation – temporal ordered grid Astronomy – temporal ordered images Remote sensing – image processing Social networks – graph algorithms Genomics – ordered strings Forensics – images, strings, graphs Scientists ‘love them’ : MSEED, NETCDF, FITS, CSV,.. © M Kersten 2012
  • 11. Arrays in DBMS Relational prototype built on arrays, Peterlee IS Vehicle(1975) Persistent programming languages, Astral (1980), Plain (1980) Object-orientation and persistent languages were the make belief to handle them, O2(1992) Several array algebras AML(2002), Aquery(2003), RAM (2004), SRAM(2012) © M Kersten 2012
  • 12. Array declarations: CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]); CREATE TABLE tictactoe ( squares integer[3][3] ); Array operations: denotation ([]), contains (@>), is contained in (<@), append, concat (||), dimension, lower, upper, prepend, to-string, from- string, … Array constraints: none, no enforcement of dimensions. © M Kersten 2012
  • 13. SQL 2003 Arrays are attribute type constructors Arrays can be declared without a maximum cardinality Array nesting is unrestricted. Query results can be converted into arrays. CREATE TABLE listbox( choices CHAR(3) ARRAY[1000] NOT NULL); INSERT INTO listbox_choices VALUES( 'Department Names', ARRAY(SELECT name FROM sales.depts ORDER BY 1)); © M Kersten 2012
  • 14. Breaks large C++ arrays (rasters) into disjoint chunks Maps chunks into large binary objects (blob) Provide function interface to access them RASCAL, a SQL92 extension Known to work up to 12 TBs. © M Kersten 2012
  • 15. Breaks large C++ arrays (rasters) into overlapping chunks Built storage manager from scratch Map-reduce processing model Provide function interface to access them AQL, a crippled SQL92 © M Kersten 2012
  • 16. What is the problem? -  Appropriate array denotations? Query language -  Functional complete operation set ? -  Mature implementations? Systems -  Size limitations due to (blob) representations ? -  Scale out? -  Community awareness? Education © M Kersten 2012
  • 17. Agenda Array support in database systems SciQL array query language A crash course on column-stores SciQL implementation approach © M Kersten 2012
  • 18. MonetDB SciQL SciQL (pronounced ‘cycle’ ) •  A backward compatible extension of SQL’03 •  Symbiosis of relational and array paradigm •  Flexible structure-based grouping •  Capitalizes the MonetDB physical array storage •  Recycling, an adaptive ‘materialized view’ •  Zero-cost attachment contract for cooperative clients http://www.scilens.org/Resources/SciQL © M Kersten 2012
  • 19. Table vs Arrays CREATE TABLE tmp A collection of tuples Indexed by a (primary) key Default handling Explicitly created using INS/UPD/DEL © M Kersten 2012
  • 20. Table vs arrays CREATE TABLE tmp CREATE ARRAY tmp A collection of tuples A collection of a priori defined tuples Indexed by a (primary) key Indexed by dimension expressions Default handling Implicitly defined by default value, Explicitly created using To be updated with INS/DEL/UPD INS/UPD/DEL © M Kersten 2012
  • 21. SciQL examples CREATE TABLE matrix ( x integer, y integer, value float PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES (0,0,0),(0,1,0),(1,1,0)(1,0,0); 0 0 0 0 1 0 1 1 0 1 0 0 © M Kersten 2012
  • 22. SciQL examples CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0); PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES (0,0,0),(0,1,0),(1,1,0)(1,0,0); null … … … 0 0 0 null null null … 0 1 0 0 0 0 null … 1 1 0 1 0 0 0 0 null null 1 0 0 0 1 © M Kersten 2012
  • 23. SciQL examples CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0); PRIMARY KEY (x,y) ); DELETE matrix WHERE y=1 DELETE matrix WHERE y=1 A hole in the array 0 0 0 null null 1 0 0 1 0 0 0 0 1 © M Kersten 2012
  • 24. SciQL examples CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION[2], y integer, y integer DIMENSION[2], value float value float DEFAULT 0); PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES INSERT INTO matrix VALUES (0,1,1), (1,1,2) (0,1,1), (1,1,2) 0 0 0 1 0 0 1 2 1 0 1 1 0 0 0 1 1 2 0 1 © M Kersten 2012
  • 25. SciQL unbounded arrays CREATE TABLE matrix ( CREATE ARRAY matrix ( x integer, x integer DIMENSION, y integer, y integer DIMENSION, value float value float DEFAULT 0); PRIMARY KEY (x,y) ); INSERT INTO matrix VALUES INSERT INTO matrix VALUES (0,2,1), (0,1,2) (0,2,1), (0,1,2) 0 2 1 2 1 0 0 1 2 1 0 0 0 0 2 © M Kersten 2012 0 1
  • 26. SciQL Dimensions Unbounded Dimensions scalar-type DIMENSION Bounded Dimensions scalar-type DIMENSION[stop] scalar-type DIMENSION[first: step: stop] scalar-type DIMENSION[*: *: *] timestamp DIMENSION [ ‘2010-01-19’ : ‘1’ minute : *] © M Kersten 2012
  • 27. SciQL table queries -- Dimension names make query formulation easier CREATE ARRAY matrix ( x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 ); -- simple checker boarding aggregation SELECT sum(value) FROM matrix WHERE (x + y) % 2 = 0 © M Kersten 2012
  • 28. SciQL array queries CREATE ARRAY matrix ( CREATE ARRAY result( x integer DIMENSION, x integer DIMENSION, y integer DIMENSION, value float DEFAULT 0 ); value float DEFAULT 0 ); -- group based aggregation to construct an unbounded vector SELECT [x], sum(value) FROM matrix WHERE (x + y) % 2 = 0 GROUP BY x; © M Kersten 2012
  • 29. SciQL array views CREATE ARRAY vmatrix ( x integer DIMENSION[-1:5], y integer DIMENSION[-1:5], value float DEFAULT -1 ) AS SELECT x, y, value FROM matrix; -1 -1 -1 -1 -1 0 0 -1 -1 0 0 -1 -1 -1 -1 -1 © M Kersten 2012
  • 30. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 Anchor Point V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x : 1 : x+2][y : 1 : y+2]; © M Kersten 2012
  • 31. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 Anchor Point V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY DISTINCT matrix[x:1:x+2][y:1:y+2]; © M Kersten 2012
  • 32. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 Anchor Point V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 null V0,0 V1,0 V2,0 V3,0 null null SELECT x, y, avg(value) FROM matrix GROUP BY DISTINCT matrix[x-1:1:x+1][y:1:y+2]; © M Kersten 2012
  • 33. SciQL tiling examples V0,3 V1,3 V2,3 V3,3 Anchor Point V0,2 V1,2 V2,2 V3,2 V0,1 V1,1 V2,1 V3,1 V0,0 V1,0 V2,0 V3,0 SELECT x, y, avg(value) FROM matrix GROUP BY matrix[x][y], matrix[x-1][y], matrix[x+1][y], matrix[x][y-1], matrix[x][y+1]; © M Kersten 2012
  • 34. SciQL, A Query Language for Science Applications •  Seamless integration of array-, set-, and sequence- semantics. •  Dimension constraints as a declarative means for indexed access to array cells. •  Structural grouping to generalize the value-based grouping towards selective access to groups of cells based on positional relationships for aggregation. © M Kersten 2012
  • 35. Agenda Array support in database systems SciQL array query language Use-case exercise A crash course on column-stores SciQL implementation approach © M Kersten 2012
  • 36. Seismology use case Rietbrock: Chili earthquake … 2TB of wave fronts … filter by sta/lta … remove false positives … window-based 3 min cuts … heuristic tests … interactive response required … How can a database system help? Scanning 2TB on modern pc takes >3 hours © M Kersten 2012
  • 37. Use case, a SciQL dream Rietbrock: Chili earthquake create array mseed ( tick timestamp dimension[ ‘2010’:*], data decimal(8,6), station string ); © M Kersten 2012
  • 38. Use case, a SciQL dream Rietbrock: … filter by sta/lta --- average by window of 5 seconds select A.tick, avg(A.data) from mseed A group by A[tick:1:tick + 5 seconds] © M Kersten 2012
  • 39. Use case, a SciQL dream Rietbrock: … filter by sta/lta select A.tick from mseed A, mseed B where A.tick = B.tick and avg(A.data) / avg(B.data) > delta group by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds] © M Kersten 2012
  • 40. Use case, a SciQL dream Rietbrock: … filter by sta/lta create view candidates( station string, tick timestamp, ratio float ) as select A.station, A.tick, avg(A.data) / avg(B.data) as ratio from mseed A, mseed B where A.tick = B.tick and avg(A.data) / avg(B.data) > delta group by A[tick:tick + 5 seconds], B[tick:tick + 15 seconds] © M Kersten 2012
  • 41. Use case, a SciQL dream Rietbrock: … remove false positives -- remove isolated errors by direct environment -- using wave propagation statics create table neighbors( head string, tail string, delay timestamp, weight float) © M Kersten 2012
  • 42. Use case, a SciQL dream Rietbrock: … remove false positives select A.tick, B.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio; © M Kersten 2012
  • 43. Use case, a SciQL dream Rietbrock: … remove false positives delete from candidates select A.tick from candidates A, candidates B, neighbors N where A.station = N.head and B.station = N.tail and B.tick = A.tick + N.delay and B.ratio * N.weight < A.ratio; © M Kersten 2012
  • 44. Use case, a SciQL dream Rietbrock: … window-based 3 min cuts … heuristic tests select B.station, myfunction(B.data) from candidates A, mseed B where A.tick = B.tick group by distinct B[tick:tick + 3 minutes]; -- using a User Defined Function written in C. © M Kersten 2012
  • 45. Agenda Array support in database systems SciQL array query language A crash course on column-stores SciQL implementation approach © M Kersten 2012
  • 46. Storing Relations in MonetDB Void Void Void Void Void 1000 1000 1000 1000 1000 . . . . . . . . . . . . . . . . . . . . . . . . . Virtual OID: seqbase=1000 (increment=1) © M Kersten 2012
  • 47. BAT Data Structure BAT: binary association table Head Tail BUN: binary unit Hash tables, Head & Tail: BUN heap: T-trees, - consecutive memory R-trees, blocks (arrays)‫‏‬ block (array)‫‏‬ ... - memory-mapped file files Tail Heap: - best-effort duplicate elimination for strings © M Kersten 2012 (~ dictionary encoding)
  • 48. Processing Model (MonetDB Kernel)‫‏‬ l  Bulk processing: l  full materialization of all intermediate results l  Binary (i.e., 2-column) algebra core: l  select, join, semijoin, outerjoin l  union, intersection, diff (BAT-wise & column-wise)‫‏‬ l  group, count, max, min, sum, avg l  reverse, mirror, mark l  Runtime operational optimization: l  Choosing optimal algorithm & implementation according to input properties and system status © M Kersten 2012
  • 49. The Software Stack Strategic optimization Front-ends SQL 03 MAL Optimizers Tactical optimization: MAL -> MAL rewrites Back-end(s) MonetDB 5 MAL Runtime Kernel MonetDB kernel operational optimization © M Kersten 2012
  • 50. MonetDB Front-end: SQL EXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x; function user.s2_1():void; barrier _73 := language.dataflow(); _2:bat[:oid,:int] := sql.bind("sys","t","c",0); _7:bat[:oid,:int] := sql.bind("sys","s","x",0); _10 := bat.reverse(_7); _11 := algebra.join(_2,_10); _13 := algebra.markT(_11,0@0); _14 := bat.reverse(_13); _15:bat[:oid,:int] := sql.bind("sys","t","a",0); _17 := algebra.leftjoin(_14,_15); _18 := bat.reverse(_11); _19 := algebra.markT(_18,0@0); _20 := bat.reverse(_19); _21:bat[:oid,:int] := sql.bind("sys","s","z",0); _23 := algebra.leftjoin(_20,_21); exit _73; _24 := sql.resultSet(2,1,_17); sql.rsColumn(_24,"sys.t","a","int",32,0,_17); sql.rsColumn(_24,"sys.s","z","int",32,0,_23); _33 := io.stdout(); sql.exportResult(_33,_24); end s2_1; © M Kersten 2012
  • 51. Agenda Array support in database systems SciQL array query language A crash course on column-stores SciQL implementation approach © M Kersten 2012
  • 52. SciQL implementation •  Use the complete MonetDB software stack •  Extend the SQL catalog to support SciQL •  Extend the Kernel to support array processing •  Extend the optimizer stack for performance •  Aim for a functional implementation first •  Use tabular representation of arrays •  Reuse the SQL code generator © M Kersten 2012
  • 53. © M Kersten 2012
  • 54. © M Kersten 2012
  • 55. © M Kersten 2012
  • 56. © M Kersten 2012
  • 57. © M Kersten 2012
  • 58. © M Kersten 2012
  • 59. Slicing a portion of an array is a ‘selection’ © M Kersten 2012
  • 61. It works © M Kersten 2012
  • 62. Conclusions •  The language definition is ‘finished’ •  Functional prototype is ‘around the corner’ •  Exposure to real life cases and external libraries •  MonetDB’s core technology was essential •  Challenge: ARRAYS FILES © M Kersten 2012
  • 63. © M Kersten 2012
  • 64. © M Kersten 2012
  • 65. © M Kersten 2012
  • 66. Science DBMS landscape MonetDB 5.23 SciDB 0.5 Rasdaman Architecture Server approach Server approach Plugin(Oracle, DB2, Informix, Mysql, Postgresql) Open source Mozilla License GPL 3.0 Commercial GPL 3.0 Dual license Downloads >12.000 /month Tens up to now ?? SQL SQL 2003 ?? SQL92++ Interoperability {JO}DBC, C(++),Python, … C++ UDF C++, Java, OGC Array language SciQL AQL RASQL Array model Fixed+variable bounds Fixed arrays Fixed+variable bounds Science Linked libraries Linked libraries Linked libraries Foreign files Vaults of csv, FITS, ?? Tiff,png,jpg.., NETCDF, MSEED csv,,NETCDF,HDF4, Distribution 50-200 node cluster 4 node cluster 20-node Distribution tech Dynamic partial replication Static fragmentation Static fragmentation Executor Various schemes Map-reduce Tile streaming Largest demo Skyserver SDSS 6 3TB --- 12TB, IGN –F (on Postgresql) Storage tuning Query adaptive Schema definitions Workload driven © M Kersten Heuristics + cost base Optimization 2012 ?? Heuristics +cost based