SlideShare a Scribd company logo
MonetDB/DataCell

                   Exploiting the Power of Relational
                     Databases for Efficient Stream
                               Processing

                                        CWI
                             Project Meeting@Innsbruck
                               Feb 28 - Mar 04, 2011




Wednesday, March 02, 2011
DBMS versus DSMS
                                                                            1

                                                         2
                                        One-time query
                                                                                Incoming data

                                                                 DB
                                                answer
                                            4
   1    Store incoming tuples
   2    Submit one-time query                                3

   3    Query processing on the already stored data
   4    Create answer                                                 Disk storage




Wednesday, March 02, 2011
DBMS versus DSMS
                                                                                        1

                                                             2
                                          One-time query
                                                                                              Incoming data

                                                                         DB
                                                   answer
                                               4
   1    Store incoming tuples
   2    Submit one-time query                                     3

   3    Query processing on the already stored data
   4    Create answer                                                             Disk storage


                                      4                      3
                                                                                                  2



                                                                                                     Input stream
                                                      Continuous queries
                                    notification                              1
                                                                                            Memory
   1    Submit continuous queries
   2    Incoming streams
                                                                                    A data stream is a never
   3    Input stream is processed on the fly                                        ending sequence of tuples
   4    The produced results are continuously delivered to the clients

Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                                                                          www
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
Observation
   • Nowadays stream systems are built from scratch

   • Redesign operators and optimizations

  • Relational Databases are considered inefficient and too complex

   • Modern stream applications require both management of
      stored and streaming data




Wednesday, March 02, 2011
Goals
   • We design the DataCell on top of an existing DataBase Kernel

   • Exploit database techniques, query optimization and operators

   • Provide full language functionalities (SQL’03)

   • Research questions
      • is it viable?
      • multi-query processing/scheduling
      • real-time processing



Wednesday, March 02, 2011
The Basic Idea of DataCell
      • Stream tuples are first stored in (appended to) baskets.

      • We evaluate the continuous queries over the baskets.
             Instead of throwing each incoming tuple against the waiting queries (Data Streams)
                              tuple

                                      Query
                                       Set



             first collect the data and then throw the queries against the tuples (DataBase)

                            tuple      Query
                                        Set



      • Once a tuple is seen, it is dropped from its basket.


Wednesday, March 02, 2011
The MonetDB/DataCell stack
                                    SQL Query

                              SQL



                              Query parser



                            Query Optimizer




                             MAL


                             MAL Interpreter


                                    Query Executor




Wednesday, March 02, 2011
The MonetDB/DataCell stack
                                        SQL Query

                                  SQL



                                   Query parser + CQ



                                Query Optimizer + DC opt


                            Continuous Query Scheduler

                                  MAL


                                 MAL Interpreter


                                        Query Executor




Wednesday, March 02, 2011
DataCell Components
                            Receptor   <=>   Listens to a stream


                            Emitter    <=>   Delivers events to the clients


                            Factory    <=>   Continuous query


                            Basket     <=>   Holds events


        Input Stream                                          Output Stream
                                R            Q            E


Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler        SPARQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                         id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories

           Tumbling window
           Q1: Select * From [Select * from X top 3] as S where S.a>10;

           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                            100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                            100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3
                     Union                                                          Q2
                                                                            100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3            12
                     Union                                                          Q2
                                                                            100          100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3            12
                     Union                                                          Q2
                                                                            100          100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Query processing strategies
            Separate Baskets

     • Each continuous query is encapsulated within a single factory
     • Each factory f has it own input baskets, that are accessed only by f
     • If more than one factory are interested for the same data, we create
          multiple copies of this data

     • Factories are completely independent
     • Exploit column-store to minimize the overhead of replication
                                          bcopy1
                                                   Q1

                            b             bcopy2
                                  Qcopy            Q2


                                          bcopy3
                                                   Q3

Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                        Q1

                    b

                                        Q2




                                        Q3




Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1

                    b

                            Lock   FL2   Q2




                                   FL3   Q3




Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2



                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2       Unlock




                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2       Unlock




                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Summary




                            +   =   DataCell




Wednesday, March 02, 2011

More Related Content

More from PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
PlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
PlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
PlanetData Network of Excellence
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
PlanetData Network of Excellence
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
PlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
PlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
PlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
PlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
PlanetData Network of Excellence
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
PlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
PlanetData Network of Excellence
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
PlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
PlanetData Network of Excellence
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
PlanetData Network of Excellence
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
PlanetData Network of Excellence
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
PlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
PlanetData Network of Excellence
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
PlanetData Network of Excellence
 

More from PlanetData Network of Excellence (20)

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing

  • 1. MonetDB/DataCell Exploiting the Power of Relational Databases for Efficient Stream Processing CWI Project Meeting@Innsbruck Feb 28 - Mar 04, 2011 Wednesday, March 02, 2011
  • 2. DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storage Wednesday, March 02, 2011
  • 3. DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storage 4 3 2 Input stream Continuous queries notification 1 Memory 1 Submit continuous queries 2 Incoming streams A data stream is a never 3 Input stream is processed on the fly ending sequence of tuples 4 The produced results are continuously delivered to the clients Wednesday, March 02, 2011
  • 4. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 5. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 6. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 7. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 8. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples www q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 9. Observation • Nowadays stream systems are built from scratch • Redesign operators and optimizations • Relational Databases are considered inefficient and too complex • Modern stream applications require both management of stored and streaming data Wednesday, March 02, 2011
  • 10. Goals • We design the DataCell on top of an existing DataBase Kernel • Exploit database techniques, query optimization and operators • Provide full language functionalities (SQL’03) • Research questions • is it viable? • multi-query processing/scheduling • real-time processing Wednesday, March 02, 2011
  • 11. The Basic Idea of DataCell • Stream tuples are first stored in (appended to) baskets. • We evaluate the continuous queries over the baskets. Instead of throwing each incoming tuple against the waiting queries (Data Streams) tuple Query Set first collect the data and then throw the queries against the tuples (DataBase) tuple Query Set • Once a tuple is seen, it is dropped from its basket. Wednesday, March 02, 2011
  • 12. The MonetDB/DataCell stack SQL Query SQL Query parser Query Optimizer MAL MAL Interpreter Query Executor Wednesday, March 02, 2011
  • 13. The MonetDB/DataCell stack SQL Query SQL Query parser + CQ Query Optimizer + DC opt Continuous Query Scheduler MAL MAL Interpreter Query Executor Wednesday, March 02, 2011
  • 14. DataCell Components Receptor <=> Listens to a stream Emitter <=> Delivers events to the clients Factory <=> Continuous query Basket <=> Holds events Input Stream Output Stream R Q E Wednesday, March 02, 2011
  • 15. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 16. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 17. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 18. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 19. DataCell Architecture SQL Compiler SPARQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 20. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories Tumbling window Q1: Select * From [Select * from X top 3] as S where S.a>10; Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 21. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 22. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 23. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 24. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 Union Q2 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 25. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 26. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 27. Query processing strategies Separate Baskets • Each continuous query is encapsulated within a single factory • Each factory f has it own input baskets, that are accessed only by f • If more than one factory are interested for the same data, we create multiple copies of this data • Factories are completely independent • Exploit column-store to minimize the overhead of replication bcopy1 Q1 b bcopy2 Qcopy Q2 bcopy3 Q3 Wednesday, March 02, 2011
  • 28. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker Q1 b Q2 Q3 Wednesday, March 02, 2011
  • 29. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 b Lock FL2 Q2 FL3 Q3 Wednesday, March 02, 2011
  • 30. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 FL3 Q3 FU3 Wednesday, March 02, 2011
  • 31. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3 Wednesday, March 02, 2011
  • 32. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3 Wednesday, March 02, 2011
  • 33. Summary + = DataCell Wednesday, March 02, 2011