MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing

MonetDB/DataCell

Exploiting the Power of Relational
Databases for Efficient Stream
Processing

CWI
Project Meeting@Innsbruck
Feb 28 - Mar 04, 2011

Wednesday, March 02, 2011

DBMS versus DSMS
1

2
One-time query
Incoming data

DB
answer
4
1 Store incoming tuples
2 Submit one-time query 3

3 Query processing on the already stored data
4 Create answer Disk storage


DBMS versus DSMS
1

2
One-time query
Incoming data

DB
answer
4
1 Store incoming tuples
2 Submit one-time query 3

3 Query processing on the already stored data
4 Create answer Disk storage

4 3
2

Input stream
Continuous queries
notification 1
Memory
1 Submit continuous queries
2 Incoming streams
A data stream is a never
3 Input stream is processed on the fly ending sequence of tuples
4 The produced results are continuously delivered to the clients


One-time Queries versus Continuous Queries
arrival time of q

One-time Continuous
query query

t of data
tn t n+1

One-time query
q Evaluated once over the already stored tuples

Continuous query

q Waits for future incoming tuples
q Evaluated continuously as new tuples arrive


One-time Queries versus Continuous Queries
arrival time of q

One-time Continuous
query query

t of data
tn t n+1

One-time query
q Evaluated once over the already stored tuples

Continuous query

q Waits for future incoming tuples
www
q Evaluated continuously as new tuples arrive


Observation
• Nowadays stream systems are built from scratch

• Redesign operators and optimizations

• Relational Databases are considered inefficient and too complex

• Modern stream applications require both management of
stored and streaming data


Goals
• We design the DataCell on top of an existing DataBase Kernel

• Exploit database techniques, query optimization and operators

• Provide full language functionalities (SQL’03)

• Research questions
• is it viable?
• multi-query processing/scheduling
• real-time processing


The Basic Idea of DataCell
• Stream tuples are first stored in (appended to) baskets.

• We evaluate the continuous queries over the baskets.
Instead of throwing each incoming tuple against the waiting queries (Data Streams)
tuple

Query
Set

first collect the data and then throw the queries against the tuples (DataBase)

tuple Query
Set

• Once a tuple is seen, it is dropped from its basket.


The MonetDB/DataCell stack
SQL Query

SQL

Query parser

Query Optimizer

MAL

MAL Interpreter

Query Executor


The MonetDB/DataCell stack
SQL Query

SQL

Query parser + CQ

Query Optimizer + DC opt

Continuous Query Scheduler

MAL

MAL Interpreter

Query Executor


DataCell Components
Receptor <=> Listens to a stream

Emitter <=> Delivers events to the clients

Factory <=> Continuous query

Basket <=> Holds events

Input Stream Output Stream
R Q E


DataCell Architecture
SQL Compiler

Data Columns MAL Optimizer
DataCell
R1 id a
a E1
id c Continuous Query Scheduler
id b id a’

id k’

R2 id k
E2
id b’

R3
E3
id k’’
id m

Legend id n id n’

Basket

Receptor
Disk Storage
Emitter
Factory

DataCell Architecture
SQL Compiler SPARQL Compiler

Data Columns MAL Optimizer
DataCell
R1 id a
a E1
id c Continuous Query Scheduler
id b id a’

id k’

R2 id k
E2
id b’

R3
E3
id k’’
id m

Legend id n id n’

Basket

Receptor
Disk Storage
Emitter
Factory

Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets

q Semantics:
All qualifying tuples in a basket expression are removed by the factories

Tumbling window
Q1: Select * From [Select * from X top 3] as S where S.a>10;

Sliding window
Q2: SELECT * FROM (
[Select * From X top 1]
Union
Select * From X top 2 offset 1) as S
WHERE S.a>10;

q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket

q Allow to process predicate windows on a stream.
q out of order processing


Basket Expressions
q Syntax:

q Semantics:
12
Tumbling window 3
Q1
100
14

Sliding window
Q2: SELECT * FROM (
Union
WHERE S.a>10;




Basket Expressions
q Syntax:

q Semantics:
12
Tumbling window 3
Q1
12
100 100
14

Sliding window
Q2: SELECT * FROM (
Union
WHERE S.a>10;




Basket Expressions
q Syntax:

q Semantics:
12
Tumbling window 3
Q1
12
100 100
14

Sliding window
Q2: SELECT * FROM (
12
[Select * From X top 1] 3
Union Q2
100
14
WHERE S.a>10;




Basket Expressions
q Syntax:

q Semantics:
12
Tumbling window 3
Q1
12
100 100
14

Sliding window
Q2: SELECT * FROM (
12
[Select * From X top 1] 3 12
Union Q2
100 100
14
WHERE S.a>10;




Query processing strategies
Separate Baskets

• Each continuous query is encapsulated within a single factory
• Each factory f has it own input baskets, that are accessed only by f
• If more than one factory are interested for the same data, we create
multiple copies of this data

• Factories are completely independent
• Exploit column-store to minimize the overhead of replication
bcopy1
Q1

b bcopy2
Qcopy Q2

bcopy3
Q3


Shared Baskets

• Exploit query similarities to avoid replication
• Baskets are shared among factories
• Two new (cheap) factories Locker, Unlocker

Q1

b

Q2

Q3


Shared Baskets


FL1 Q1

b

Lock FL2 Q2

FL3 Q3


Shared Baskets


FL1 Q1 FU1
b

Lock FL2 Q2 FU2

FL3 Q3 FU3


Shared Baskets


FL1 Q1 FU1
b

Lock FL2 Q2 FU2 Unlock

FL3 Q3 FU3


Summary

+ = DataCell


MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing

Recommended

Recommended

More Related Content

More from PlanetData Network of Excellence

More from PlanetData Network of Excellence (20)

Recently uploaded

Recently uploaded (20)

MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing