Optimization of Continuous Queries in Federated
Database and Stream Processing Systems
Yuanzhen Ji1, Zbigniew Jerzak1, Anisoara Nica1, Gregor Hackenbroich1,
Christof Fetzer2
1SAP SE 2TU Dresden
1firstname.lastname@sap.com 2christof.fetzer@tu-dresden.de
March 16, 2015 BTW 2015
Agenda
• Introduction
• Federated Continuous Query Execution
• Query Optimization Problem
• Our Optimization Solution
• Evaluation
• Conclusions
2
• Problem: optimizing continuous queries (CQ) for federated execution over
a native stream processing engine (SPE) and column-oriented in-memory
database (CIMDB).
– operators: select, join, project, aggregate
• Goal: maximize query throughput (amount of data processed in unit time)
Introduction
3
SPE
CIMDB
data
streams
query
results
data flow
Introduction
• Motivation:
– “No one size fits all” (Cyclops[LHB13], [JI13])
– obtain the best of both worlds (SPE, CIMDB)
• Application Scenario:
– analyzing energy consumption data collected from smart plugs
installed in households (DEBS 2014 Grand Challenge)
• Main contributions:
– a static cost-based optimizer for federated systems
• extends established optimization techniques
• considers the feasibility property of CQ
– showed the potential of federated CQ execution over a SPE and a CIMDB
• up to 8.5x as high as throughput of pure SPE based processing
• up to 1.8x as high as throughput of pure CIMDB based processing
4
Federated Continuous Query Execution
• send relevant input data from SPE to CIMDB
• trigger re-evaluation of query pieces moved to CIMDB
• take results of query pieces executed in CIMDB back to SPE
5
SPE
CIMDB
data
streams
query
results
SQL
query
MIG
MIG
data flow
Query Optimization Problem
• Problem: determine the optimal execution
plan for a given CQ
– currently at deployment time
• Feasibility of continuous queries [AN04]:
– feasible execution plan: can keep up
with data arrival rate
– feasible query: has at least one feasible plan
6
SPE CIMDB
• Feasibility-dependent optimization objective:
– feasible queries: find the feasible plan with least resource consumption
– infeasible queries: find the plan which with maximal throughput
• State of the art: either consider feasibility of CQ but not the federation
context, or the federation context but not the feasibility of CQ.
Optimization Solution
Cost Model – Operator Cost (1)
• Operator cost: CPU cost caused by tuples arrived from data sources within
unit-time
For an 𝑂 with k direct upstream operators:
– li: # tuples produced by the i-th upstream operator as a result of
unit-time source arrivals
– ci: time to process a single tuple from the i-th upstream operator
7
𝑢 > 1  bottleneck  infeasible plan
𝑢(𝑂) = 𝑖=1
𝑘
li 𝑐𝑖 = l1 𝑐1 + l2 𝑐2
O
l1=300
=200
=0.001
= 0.002l2
c1
c2
= 300* 0.001+ 200 * 0.002 = 0.7
Optimization Solution
Cost Model – Operator Cost (2)
• A query piece executed in CIMDB and its corresponding MIG operator:
– treated as a composite operator and cost as a whole
– cost includes data transfer (in & out) cost and query execution cost
8
SPE
CIMDB
data
streams
query
results
SQL
query
MIG
data flow
• Execution plan cost: C(P) = <𝐶 𝑏 𝑃 , 𝐶 𝑢 𝑃 > (m operator)
– Two components: bottleneck cost: 𝐶 𝑏 𝑃 = max{𝑢(𝑂𝑗): 𝑗 ∈ [1, 𝑚]}
total utilization cost: 𝐶 𝑢 𝑃 = 𝑗=1
𝑚
𝑢(𝑂𝑗)
(m: # operators in P)
– 𝑃 is infeasible if 𝐶 𝑏 𝑃 >1
Optimization Solution
Cost Model – Execution Plan Cost
9
𝐶 𝑏 𝑃 = 1.1
𝐶 𝑢 𝑃 = 2.6
𝑢(𝑂1)=0.5
O3
O1
O2
O4
𝑢(𝑂2)=0.3
𝑢(𝑂3)=1.1 𝑢(𝑂4)=0.7
Optimization Solution
Optimal Execution Plan
• An execution plan P of a CQ is an optimal plan, iff for any other plan P’ of
CQ, one of the following conditions is satisfied:
– Condition 1: P is feasible but P’ is infeasible
(Cb(P) ≤ 1 < Cb(P’) )
– Condition 2: Both P and P’ are feasible, but P has lower Cu(P)
(Cb(P) ≤ 1, Cb(P’) ≤ 1, and Cu(P) ≤ Cu(P’) )
– Condition 3: Both P and P’ are feasible, but P has lower Cu(P)
(1 < Cb(P) ≤ Cb(P’) )
10
Optimization Solution
Two Phase-Optimization
• Large search space (# possible plans):
– many semantically equivalent logical plans
– A logical plan with n operators -> 2n possible placement decisions
• Two-Phase optimization:
– Phase One: determine the optimal logical plan (consider join ordering,
etc.)
– Phase two: determine placement for each operator in the logical plan
produced in phase-one.
• Bottom-up plan construction following dynamic programming (DP) model
• Proved applicability of DP for feasibility-dependent optimization objective
in paper.
11
• For each operator O in a logical plan, the optimal sub-plan until O, where
O is placed in the SPE, can be build from the optimal sub-plans until direct
upstream operators of O.
• For a large logical plan: divide into smaller pieces, optimize and compose
in post order.
Optimization Solution
Pruning in Phase Two
12
I1
𝑶 𝟐
𝑺𝑷𝑬
𝑶 𝟏
𝑺𝑷𝑬
𝑶 𝟐
𝑺𝑷𝑬
𝑶 𝟏
𝑫𝑩 I2
𝐶 𝐼1 < 𝐶 𝐼2
Evaluation
Setup
• Setup: HP Z620 workstation with 24-cores (1.2GHz per core) and 96 GB
RAM, running SUSE Linux.
• Data: real-world energy consumption data from smart plugs installed in
households (DEBS 2014 Grand Challenge).
• Tested queries:
13
26.1
3.1
18.7
0
5
10
15
20
25
30
SELECT in
SPE
All in SPE All in DB
Max.throughput(thousand/s)
0
5
10
15
20
25
30
0 5 10 15 20 25 30 35 40
Actualthroughput(thousand/s)
Requested throughput (thousand/s)
Evaluation
Optimizer effectiveness (1)
• Examine 10 source stream data rates picked from
range [1,000, 40,000] (tuples/s)
• measure throughput of devised optimal query
14
Max. throughput comparisonActual vs. requested throughput
PROJECT
INNER JOIN
AGGR (avg)
SELECT SELECT
WINDOW
(5 min)
WINDOW
(5 min)
AGGR (cnt)
SELECT IN SPE
Evaluation
Optimizer effectiveness (2)
15
0
5
10
15
20
25
30
0 5 10 15 20 25 30 35 40
Actualthroughput(thousand/s)
Requested throughput (thousand/s)
18.1
28.6
6.0
18.0
0
5
10
15
20
25
30
SELECT in
SPE
SEL, JOIN,
P in SPE
All in SPE All in DB
Max.throughput(thousand/s)
P1
P2
P1
P2
Max. throughput comparisonActual vs. requested throughput
• Examine data rates ranging from 1000 to 40,000
tuples/s, at 1000 tuples/s increment
• measure throughput of devised optimal query
P1
PROJECT
INNER JOIN
AGGR
(avg, max)
AGGR
(avg, max)
SELECT SELECT
WINDOW
(5 min)
WINDOW
(1 min)
SELECT IN SPE (P1)
SEL, JOIN, P IN SPE (P2)
Evaluation
Influence of Feasibility Check
16
0
5
10
15
20
25
30
0 5 10 15 20 25 30 35 40
Actualthroughput(thousand/s)
Requested throughput (thousand/s)
PROJECT
INNER JOIN
AGGR
(avg, max)
AGGR
(avg, max)
SELECT SELECT
WINDOW
(5 min)
WINDOW
(1 min)
SELECT IN SPE (with feasibility check)
SEL, JOIN, P IN SPE (with feasibility check)
SEL IN SPE (without feasibility check)
Evaluation
Optimization Time
• Tested with join queries (2-way, 5-way, 8-way).
17
11
312
8411
64
327168
2-way (6) 5-way (15) 8-way (24)
#enumeratedplansinPhase-Two
(logscale)
With pruning
Without pruning
0.9
68.6 100.5
12.3
908.6
61335.3
2-way (6) 5-way (15) 8-way (24)
Timeinmillisecond
(logscale)
Phase-One
Phase-Two
16+ million
PROJECT
INNER JOIN
AGGR
(avg, max)
AGGR
(avg, max)
SELECT SELECT
WINDOW
(5 min)
WINDOW
(1 min)
Conclusion
• Exploits the potential of federated execution of CQ over SPE and IMDB.
• Presents a static optimizer which extends traditional optimization
techniques to consider feasibility of CQ.
• Evaluation show promising results.
For examined queries, throughput of devised federated plan is
– up to 8.5 times as high as throughput of pure SPE-based plan
– up to 1.8 times as high as throughput of pure CIMDB-based plan
18
References
[AN04] Ayad, A. M. & Naughton, J. F., Static Optimization of Conjunctive Queries with Sliding Windows over
Infinite Streams, SIGMOD, 2004
[FKC+09] Franklin, M. J.; Krishnamurthy, S.; Conway, N.; Li, A., Russakovsky, A. & Thombre, N., Continuous
Analytics: Rethinking query processing in a network-effect world. CIDR, 2009
[KS09] Kraemer, J. & Seeger B., Semantics and implementation of continuous sliding window queries over data
streams, ACM TODS, 2009
[BCD+10] Botan, I.; Cho, Y.; Derakhshan, R.; Dindar, N.; Gupta, A.; Haas, L. M.; Kim, K.; Lee, C.; Mundada, G.;
Shan, M.-C.; Tatbul, N.; Yan, Y.; Yun, B. & Zhang, J. A demonstration of the MaxStream federated stream
processing system. ICDE, 2010
[LMB+10] Liu, M.; Mihaylov, S. R.; Bao, Z.; Jacob, M.; Ives, Z. G.; Loo, B. T. & Guha, S. SmartCIS: integrating
digital and physical environments. SIGMOD Record, 2010
[LIM+12] Liarou, E.; Idreos, S.; Manegold, S. & Kersten, M. MonetDB/DataCell: online analytics in a streaming
column-store, PVLDB, 2012
[LHB13] Lim, H.; Han, Y. & Babu, S. How to Fit when No One Size Fits, CIDR, 2013
[Ji13] Ji, Y., Database support for processing complex aggregate queries over data streams , EDBT Workshops,
2013
[CDK+14] Çetintemel, U.; Du, J.; Kraska, T.; Madden, S.; Maier, D.; Meehan, J.; Pavlo, A.; Stonebraker, M.;
Sutherland, E.; Tatbul, N.; Tufte, K.; Wang, H. & Zdonik, S. B., S-Store: A streaming NewSQL system for big
velocity applications, PVLDB, 2014
[DLB+11] Daum, M.; Lauterwald, F.; Baumgärtel, P.; Pollner, N. & Meyer-Wegener, K., Efficient and Cost-aware
Operator Placement in Heterogeneous Stream-processing Environments, DEBS, 2011
19
Thank you!
Query Optimization Problem
State-of-the-Art
21
CQ
optimization
Federation
context
Optimization
Granularity
Feasibility-
dependent opt.
[VN02, AN04] √ operator √
Traditional distributed,
federated DBMS, e.g.,
[DH02, BCE+05]
√ operator
MaxStream [BCD+10] √
Cyclops [LHB13] √ √ query
ASPEN [LMB+10] √ √ operator
Operator placement,
e.g., [DLB+11]
√ √/X operator
query
Semantics
• Adopt the abstract semantics defined in [ABW06], which is based on:
– Two data types:
• Stream (S): a possibly infinite bag of elements <s, t>, where s is a
tuple belonging to the schema of S and t is the timestamp of s.
• Time-varying Relation (R): a mapping from T to a finite but
unbounded bag of tuples belonging to the schema of R.
– Three classes of query operators:
• stream-to-relation (S2R) operators: produce one relation from one
stream (e.g., window operators)
• relation-to-relation (R2R) operators: produce one relation from
one or more relations.
• relation-to-stream (R2S) operators: produce one stream from one
relation.
22
SPE
continuous query
streaming data query results
Introduction
From DBMS to SPE
• Increasing interests in processing high-velocity data streams generated in
real-time using continuous queries (CQ).
 Need a new processing paradigm
DBMS
one-shot
queries
query results
stored data
23
Introduction
From DBMS to SPE
• However, many applications require:
– persisting input streaming data/query results for on-demand analysis
– combining streaming data with static data during processing.
24
DBMS
one-shot
queries
query results
stored data
SPE
continuous query
streaming data query results
store data
access
stored data
Introduction
Build SPE on Top of DBMS Kernel
• Exploit and merge technologies from both worlds in an integration way.
– Truviso Continuous Analytics [FKC+09], HP Lab work [CH10], DataCell
[LIM+12], S-Store [CDK+14]
25
SPE + DBMS
one-shot
queries query results
stored data
continuous query
streaming data query results
in-memory
table
buffers
in UDFs

Optimization of Continuous Queries in Federated Database and Stream Processing Systems

  • 1.
    Optimization of ContinuousQueries in Federated Database and Stream Processing Systems Yuanzhen Ji1, Zbigniew Jerzak1, Anisoara Nica1, Gregor Hackenbroich1, Christof Fetzer2 1SAP SE 2TU Dresden 1firstname.lastname@sap.com 2christof.fetzer@tu-dresden.de March 16, 2015 BTW 2015
  • 2.
    Agenda • Introduction • FederatedContinuous Query Execution • Query Optimization Problem • Our Optimization Solution • Evaluation • Conclusions 2
  • 3.
    • Problem: optimizingcontinuous queries (CQ) for federated execution over a native stream processing engine (SPE) and column-oriented in-memory database (CIMDB). – operators: select, join, project, aggregate • Goal: maximize query throughput (amount of data processed in unit time) Introduction 3 SPE CIMDB data streams query results data flow
  • 4.
    Introduction • Motivation: – “Noone size fits all” (Cyclops[LHB13], [JI13]) – obtain the best of both worlds (SPE, CIMDB) • Application Scenario: – analyzing energy consumption data collected from smart plugs installed in households (DEBS 2014 Grand Challenge) • Main contributions: – a static cost-based optimizer for federated systems • extends established optimization techniques • considers the feasibility property of CQ – showed the potential of federated CQ execution over a SPE and a CIMDB • up to 8.5x as high as throughput of pure SPE based processing • up to 1.8x as high as throughput of pure CIMDB based processing 4
  • 5.
    Federated Continuous QueryExecution • send relevant input data from SPE to CIMDB • trigger re-evaluation of query pieces moved to CIMDB • take results of query pieces executed in CIMDB back to SPE 5 SPE CIMDB data streams query results SQL query MIG MIG data flow
  • 6.
    Query Optimization Problem •Problem: determine the optimal execution plan for a given CQ – currently at deployment time • Feasibility of continuous queries [AN04]: – feasible execution plan: can keep up with data arrival rate – feasible query: has at least one feasible plan 6 SPE CIMDB • Feasibility-dependent optimization objective: – feasible queries: find the feasible plan with least resource consumption – infeasible queries: find the plan which with maximal throughput • State of the art: either consider feasibility of CQ but not the federation context, or the federation context but not the feasibility of CQ.
  • 7.
    Optimization Solution Cost Model– Operator Cost (1) • Operator cost: CPU cost caused by tuples arrived from data sources within unit-time For an 𝑂 with k direct upstream operators: – li: # tuples produced by the i-th upstream operator as a result of unit-time source arrivals – ci: time to process a single tuple from the i-th upstream operator 7 𝑢 > 1  bottleneck  infeasible plan 𝑢(𝑂) = 𝑖=1 𝑘 li 𝑐𝑖 = l1 𝑐1 + l2 𝑐2 O l1=300 =200 =0.001 = 0.002l2 c1 c2 = 300* 0.001+ 200 * 0.002 = 0.7
  • 8.
    Optimization Solution Cost Model– Operator Cost (2) • A query piece executed in CIMDB and its corresponding MIG operator: – treated as a composite operator and cost as a whole – cost includes data transfer (in & out) cost and query execution cost 8 SPE CIMDB data streams query results SQL query MIG data flow
  • 9.
    • Execution plancost: C(P) = <𝐶 𝑏 𝑃 , 𝐶 𝑢 𝑃 > (m operator) – Two components: bottleneck cost: 𝐶 𝑏 𝑃 = max{𝑢(𝑂𝑗): 𝑗 ∈ [1, 𝑚]} total utilization cost: 𝐶 𝑢 𝑃 = 𝑗=1 𝑚 𝑢(𝑂𝑗) (m: # operators in P) – 𝑃 is infeasible if 𝐶 𝑏 𝑃 >1 Optimization Solution Cost Model – Execution Plan Cost 9 𝐶 𝑏 𝑃 = 1.1 𝐶 𝑢 𝑃 = 2.6 𝑢(𝑂1)=0.5 O3 O1 O2 O4 𝑢(𝑂2)=0.3 𝑢(𝑂3)=1.1 𝑢(𝑂4)=0.7
  • 10.
    Optimization Solution Optimal ExecutionPlan • An execution plan P of a CQ is an optimal plan, iff for any other plan P’ of CQ, one of the following conditions is satisfied: – Condition 1: P is feasible but P’ is infeasible (Cb(P) ≤ 1 < Cb(P’) ) – Condition 2: Both P and P’ are feasible, but P has lower Cu(P) (Cb(P) ≤ 1, Cb(P’) ≤ 1, and Cu(P) ≤ Cu(P’) ) – Condition 3: Both P and P’ are feasible, but P has lower Cu(P) (1 < Cb(P) ≤ Cb(P’) ) 10
  • 11.
    Optimization Solution Two Phase-Optimization •Large search space (# possible plans): – many semantically equivalent logical plans – A logical plan with n operators -> 2n possible placement decisions • Two-Phase optimization: – Phase One: determine the optimal logical plan (consider join ordering, etc.) – Phase two: determine placement for each operator in the logical plan produced in phase-one. • Bottom-up plan construction following dynamic programming (DP) model • Proved applicability of DP for feasibility-dependent optimization objective in paper. 11
  • 12.
    • For eachoperator O in a logical plan, the optimal sub-plan until O, where O is placed in the SPE, can be build from the optimal sub-plans until direct upstream operators of O. • For a large logical plan: divide into smaller pieces, optimize and compose in post order. Optimization Solution Pruning in Phase Two 12 I1 𝑶 𝟐 𝑺𝑷𝑬 𝑶 𝟏 𝑺𝑷𝑬 𝑶 𝟐 𝑺𝑷𝑬 𝑶 𝟏 𝑫𝑩 I2 𝐶 𝐼1 < 𝐶 𝐼2
  • 13.
    Evaluation Setup • Setup: HPZ620 workstation with 24-cores (1.2GHz per core) and 96 GB RAM, running SUSE Linux. • Data: real-world energy consumption data from smart plugs installed in households (DEBS 2014 Grand Challenge). • Tested queries: 13
  • 14.
    26.1 3.1 18.7 0 5 10 15 20 25 30 SELECT in SPE All inSPE All in DB Max.throughput(thousand/s) 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40 Actualthroughput(thousand/s) Requested throughput (thousand/s) Evaluation Optimizer effectiveness (1) • Examine 10 source stream data rates picked from range [1,000, 40,000] (tuples/s) • measure throughput of devised optimal query 14 Max. throughput comparisonActual vs. requested throughput PROJECT INNER JOIN AGGR (avg) SELECT SELECT WINDOW (5 min) WINDOW (5 min) AGGR (cnt) SELECT IN SPE
  • 15.
    Evaluation Optimizer effectiveness (2) 15 0 5 10 15 20 25 30 05 10 15 20 25 30 35 40 Actualthroughput(thousand/s) Requested throughput (thousand/s) 18.1 28.6 6.0 18.0 0 5 10 15 20 25 30 SELECT in SPE SEL, JOIN, P in SPE All in SPE All in DB Max.throughput(thousand/s) P1 P2 P1 P2 Max. throughput comparisonActual vs. requested throughput • Examine data rates ranging from 1000 to 40,000 tuples/s, at 1000 tuples/s increment • measure throughput of devised optimal query P1 PROJECT INNER JOIN AGGR (avg, max) AGGR (avg, max) SELECT SELECT WINDOW (5 min) WINDOW (1 min) SELECT IN SPE (P1) SEL, JOIN, P IN SPE (P2)
  • 16.
    Evaluation Influence of FeasibilityCheck 16 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40 Actualthroughput(thousand/s) Requested throughput (thousand/s) PROJECT INNER JOIN AGGR (avg, max) AGGR (avg, max) SELECT SELECT WINDOW (5 min) WINDOW (1 min) SELECT IN SPE (with feasibility check) SEL, JOIN, P IN SPE (with feasibility check) SEL IN SPE (without feasibility check)
  • 17.
    Evaluation Optimization Time • Testedwith join queries (2-way, 5-way, 8-way). 17 11 312 8411 64 327168 2-way (6) 5-way (15) 8-way (24) #enumeratedplansinPhase-Two (logscale) With pruning Without pruning 0.9 68.6 100.5 12.3 908.6 61335.3 2-way (6) 5-way (15) 8-way (24) Timeinmillisecond (logscale) Phase-One Phase-Two 16+ million PROJECT INNER JOIN AGGR (avg, max) AGGR (avg, max) SELECT SELECT WINDOW (5 min) WINDOW (1 min)
  • 18.
    Conclusion • Exploits thepotential of federated execution of CQ over SPE and IMDB. • Presents a static optimizer which extends traditional optimization techniques to consider feasibility of CQ. • Evaluation show promising results. For examined queries, throughput of devised federated plan is – up to 8.5 times as high as throughput of pure SPE-based plan – up to 1.8 times as high as throughput of pure CIMDB-based plan 18
  • 19.
    References [AN04] Ayad, A.M. & Naughton, J. F., Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams, SIGMOD, 2004 [FKC+09] Franklin, M. J.; Krishnamurthy, S.; Conway, N.; Li, A., Russakovsky, A. & Thombre, N., Continuous Analytics: Rethinking query processing in a network-effect world. CIDR, 2009 [KS09] Kraemer, J. & Seeger B., Semantics and implementation of continuous sliding window queries over data streams, ACM TODS, 2009 [BCD+10] Botan, I.; Cho, Y.; Derakhshan, R.; Dindar, N.; Gupta, A.; Haas, L. M.; Kim, K.; Lee, C.; Mundada, G.; Shan, M.-C.; Tatbul, N.; Yan, Y.; Yun, B. & Zhang, J. A demonstration of the MaxStream federated stream processing system. ICDE, 2010 [LMB+10] Liu, M.; Mihaylov, S. R.; Bao, Z.; Jacob, M.; Ives, Z. G.; Loo, B. T. & Guha, S. SmartCIS: integrating digital and physical environments. SIGMOD Record, 2010 [LIM+12] Liarou, E.; Idreos, S.; Manegold, S. & Kersten, M. MonetDB/DataCell: online analytics in a streaming column-store, PVLDB, 2012 [LHB13] Lim, H.; Han, Y. & Babu, S. How to Fit when No One Size Fits, CIDR, 2013 [Ji13] Ji, Y., Database support for processing complex aggregate queries over data streams , EDBT Workshops, 2013 [CDK+14] Çetintemel, U.; Du, J.; Kraska, T.; Madden, S.; Maier, D.; Meehan, J.; Pavlo, A.; Stonebraker, M.; Sutherland, E.; Tatbul, N.; Tufte, K.; Wang, H. & Zdonik, S. B., S-Store: A streaming NewSQL system for big velocity applications, PVLDB, 2014 [DLB+11] Daum, M.; Lauterwald, F.; Baumgärtel, P.; Pollner, N. & Meyer-Wegener, K., Efficient and Cost-aware Operator Placement in Heterogeneous Stream-processing Environments, DEBS, 2011 19
  • 20.
  • 21.
    Query Optimization Problem State-of-the-Art 21 CQ optimization Federation context Optimization Granularity Feasibility- dependentopt. [VN02, AN04] √ operator √ Traditional distributed, federated DBMS, e.g., [DH02, BCE+05] √ operator MaxStream [BCD+10] √ Cyclops [LHB13] √ √ query ASPEN [LMB+10] √ √ operator Operator placement, e.g., [DLB+11] √ √/X operator query
  • 22.
    Semantics • Adopt theabstract semantics defined in [ABW06], which is based on: – Two data types: • Stream (S): a possibly infinite bag of elements <s, t>, where s is a tuple belonging to the schema of S and t is the timestamp of s. • Time-varying Relation (R): a mapping from T to a finite but unbounded bag of tuples belonging to the schema of R. – Three classes of query operators: • stream-to-relation (S2R) operators: produce one relation from one stream (e.g., window operators) • relation-to-relation (R2R) operators: produce one relation from one or more relations. • relation-to-stream (R2S) operators: produce one stream from one relation. 22
  • 23.
    SPE continuous query streaming dataquery results Introduction From DBMS to SPE • Increasing interests in processing high-velocity data streams generated in real-time using continuous queries (CQ).  Need a new processing paradigm DBMS one-shot queries query results stored data 23
  • 24.
    Introduction From DBMS toSPE • However, many applications require: – persisting input streaming data/query results for on-demand analysis – combining streaming data with static data during processing. 24 DBMS one-shot queries query results stored data SPE continuous query streaming data query results store data access stored data
  • 25.
    Introduction Build SPE onTop of DBMS Kernel • Exploit and merge technologies from both worlds in an integration way. – Truviso Continuous Analytics [FKC+09], HP Lab work [CH10], DataCell [LIM+12], S-Store [CDK+14] 25 SPE + DBMS one-shot queries query results stored data continuous query streaming data query results in-memory table buffers in UDFs