• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing
 

MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing

on

  • 1,514 views

The goal of the MonetDB/DataCell project is to exploit the power of Relational DBMS (RDBMS) for efficient processing of continues queries over streaming data. This presentation first identifies the ...

The goal of the MonetDB/DataCell project is to exploit the power of Relational DBMS (RDBMS) for efficient processing of continues queries over streaming data. This presentation first identifies the essential differences between processing one-time queries and continues queries. It then presents the current archtecture of MonetDB/DataCell and some ideas of how to extend an existing RDBMS with just a handful of new components to handle continues queries.

The presentation was presented by Ying Zhang (Centrum Wiskunde & Informatica) at the PlanetData project Meeting on February 28 - March 4, 2011 in Innsbruck, Austria.

Statistics

Views

Total Views
1,514
Views on SlideShare
1,236
Embed Views
278

Actions

Likes
3
Downloads
0
Comments
1

5 Embeds 278

http://www.planet-data.eu 199
http://planet-data.eu 59
http://localhost 16
http://139.91.183.4 3
http://drupal7.planet-data.eu 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing Presentation Transcript

    • MonetDB/DataCell Exploiting the Power of Relational Databases for Efficient Stream Processing CWI Project Meeting@Innsbruck Feb 28 - Mar 04, 2011Wednesday, March 02, 2011
    • DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storageWednesday, March 02, 2011
    • DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storage 4 3 2 Input stream Continuous queries notification 1 Memory 1 Submit continuous queries 2 Incoming streams A data stream is a never 3 Input stream is processed on the fly ending sequence of tuples 4 The produced results are continuously delivered to the clientsWednesday, March 02, 2011
    • One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arriveWednesday, March 02, 2011
    • One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arriveWednesday, March 02, 2011
    • One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arriveWednesday, March 02, 2011
    • One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arriveWednesday, March 02, 2011
    • One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples www q Evaluated continuously as new tuples arriveWednesday, March 02, 2011
    • Observation • Nowadays stream systems are built from scratch • Redesign operators and optimizations • Relational Databases are considered inefficient and too complex • Modern stream applications require both management of stored and streaming dataWednesday, March 02, 2011
    • Goals • We design the DataCell on top of an existing DataBase Kernel • Exploit database techniques, query optimization and operators • Provide full language functionalities (SQL’03) • Research questions • is it viable? • multi-query processing/scheduling • real-time processingWednesday, March 02, 2011
    • The Basic Idea of DataCell • Stream tuples are first stored in (appended to) baskets. • We evaluate the continuous queries over the baskets. Instead of throwing each incoming tuple against the waiting queries (Data Streams) tuple Query Set first collect the data and then throw the queries against the tuples (DataBase) tuple Query Set • Once a tuple is seen, it is dropped from its basket.Wednesday, March 02, 2011
    • The MonetDB/DataCell stack SQL Query SQL Query parser Query Optimizer MAL MAL Interpreter Query ExecutorWednesday, March 02, 2011
    • The MonetDB/DataCell stack SQL Query SQL Query parser + CQ Query Optimizer + DC opt Continuous Query Scheduler MAL MAL Interpreter Query ExecutorWednesday, March 02, 2011
    • DataCell Components Receptor <=> Listens to a stream Emitter <=> Delivers events to the clients Factory <=> Continuous query Basket <=> Holds events Input Stream Output Stream R Q EWednesday, March 02, 2011
    • DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter FactoryWednesday, March 02, 2011
    • DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter FactoryWednesday, March 02, 2011
    • DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter FactoryWednesday, March 02, 2011
    • DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter FactoryWednesday, March 02, 2011
    • DataCell Architecture SQL Compiler SPARQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter FactoryWednesday, March 02, 2011
    • Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories Tumbling window Q1: Select * From [Select * from X top 3] as S where S.a>10; Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processingWednesday, March 02, 2011
    • Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processingWednesday, March 02, 2011
    • Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processingWednesday, March 02, 2011
    • Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processingWednesday, March 02, 2011
    • Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 Union Q2 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processingWednesday, March 02, 2011
    • Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processingWednesday, March 02, 2011
    • Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processingWednesday, March 02, 2011
    • Query processing strategies Separate Baskets • Each continuous query is encapsulated within a single factory • Each factory f has it own input baskets, that are accessed only by f • If more than one factory are interested for the same data, we create multiple copies of this data • Factories are completely independent • Exploit column-store to minimize the overhead of replication bcopy1 Q1 b bcopy2 Qcopy Q2 bcopy3 Q3Wednesday, March 02, 2011
    • Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker Q1 b Q2 Q3Wednesday, March 02, 2011
    • Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 b Lock FL2 Q2 FL3 Q3Wednesday, March 02, 2011
    • Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 FL3 Q3 FU3Wednesday, March 02, 2011
    • Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3Wednesday, March 02, 2011
    • Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3Wednesday, March 02, 2011
    • Summary + = DataCellWednesday, March 02, 2011