Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)

Jonas Traub Philipp M. Grulich Alejandro Rodríguez Cuéllar Sebastian Breß
Asterios Katsifodimos Tilmann Rabl Volker Markl
Efficient Window Aggregation with
General Stream Slicing
22nd International Conference on Extending Database Technology
March 26-29, 2019, Lisbon, Portugal

Stream Processing Pipelines
27.03.2019 Efficient Window Aggregation with General Stream Slicing 2
A stream processing pipeline is a series of concurrently running operators.

Window
Aggregation

Window
Aggregation
53

Window
Aggregation
8

Motivation

Stream Slicing Example

The number of slices depends on the workload.

We store partial aggregates instead of all tuples.  Small memory footprint.

We assign each tuple to exactly one slice.  O(1) per-tuple complexity.

We require just a few computation steps to calculate final aggregates.  Low latency.

We share partial aggregations among all users and queries.  Efficiency by preventing redundancy.

Workload
Characteristics

Workload
Characteristics
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility

Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility

Workload
Characteristics
Window
Types
Context Free
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility

Workload
Characteristics
Window
Types
Context Free
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility

Workload
Characteristics
Window
Types
Context Free
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
General Stream Slicing combines generality and efficiency in a single solution.

Window Aggregation Concepts
Variations of Stream SlicingNon-Slicing Techniques

General Slicing Core

General Slicing Core
The General Slicing Core adapts to work load characteristics
and provides extension point for user-defined window types and aggregation functions.

General Stream Slicing Internals

Part 1: Three Fundamental Operations on Slices

Merge Slices

Merge Slices Split Slices

Merge Slices Split Slices Update Slices

Part 2: Adapt to Workload Characteristics:

Do we need to store original tuples?

Do we potentially need to split slices?

Do we potentially need
to remove tuples from slices?

Do we potentially need
to remove tuples from slices?
General Stream Slicing adapts to current workload characteristics.

Impact of Workload Characteristics (Example)

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
Count-based tumbling window
with a length of 5 tuples.

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
11 13 12

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
11 13 12
What if the stream is out-of-order?

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
5
49
Out-of-order Tuple

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
5
49

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
5
49
13 12

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
5
49
13 12

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
5
49
13 12
5

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
5
49
13 125 + - 3
5

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
5
49
13 123 1+ -5 + - 3
5

1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
5
49
13 123 1+ -5 + - 3
5
What if the aggregation function is not invertible?

In-order Processing with Context Free Windows

In-order Processing with Context Free Windows
Slicing techniques scale to large numbers of concurrent windows.

Impact of Stream Order

Impact of Stream Order
Slicing techniques are robust against out-of-order tuples.

Impact of Aggregation Functions (20% out-of-order)

Impact of Aggregation Functions (20% out-of-order)
Stream Slicing performs well on many different kinds of aggregation functions.

Efficient Window Aggregation with General Stream Slicing

• We identify workload characteristics which impact
applicability and performance of window aggregation techniques.

• We present a generally applicable and highly efficient solution for
streaming window aggregation.

• We show that general stream slicing is generally applicable and
oﬀers better performance than alternative approaches.

• We show that general stream slicing is generally applicable and
oﬀers better performance than alternative approaches.
tu-berlin-dima.github.io/scotty-window-processor
Open Source Repository:

Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)

Recommended

Recommended

More Related Content

Similar to Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)

Similar to Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper) (20)

More from Jonas Traub

More from Jonas Traub (17)

Recently uploaded

Recently uploaded (20)

Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)