Flink Forward 2018: Efficient Window Aggregation with Stream Slicing

Efficient Window Aggregation
with Stream Slicing
Berlin, September 3-5, 2018
Philipp M. Grulich
Research Assistant (DFKI)
Jonas Traub
Research Associate (TU Berlin)

Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
2

3

4

5

6

7

8

9

10

11

Stream Slicing Research
12

13
CIKM 2016

14
ICDE 2018
CIKM 2016

Flink Windowing Bottlenecks
15
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:

16
.sum()
Example Query:
Processing with Buckets:
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<4,3>
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<0:60, 3><4,3>
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<0:60, 3><4,3>
<15,6>
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<0:60, 3>
<10:70, 6>
<4,3>
<15,6>
<0:60, 9>
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<0:60, 3>
<10:70, 6>
<4,3>
<15,6>
<0:60, 9>
<55,6>
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<0:60, 3>
<10:70, 6>
...
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<0:60, 3>
<10:70, 6>
...
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
<66,1>
Events: Buckets:
Eventtime

16
.sum()
Example Query:
<0:60, 3>
<10:70, 6>
...
<60:120, 1>
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
<66,1>
<10:70, 13>
Events: Buckets:
Eventtime

17
Number of Buckets = Window Length / Slide Length

17
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) --> 6 Buckets

17
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets

17
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:

17
--> 6 Buckets
--> 8640 Buckets
● Every event is assigned to many windows.

17
--> 6 Buckets
--> 8640 Buckets
● Repeated aggregations --> aggregation function is called on every window

17
--> 6 Buckets
--> 8640 Buckets
● High memory consumption --> especially for windows without incremental aggregation

17
--> 6 Buckets
--> 8640 Buckets
● High memory consumption --> especially for windows without incremental aggregation
● Check for merging windows

Architecture Overview
18

Session Window Aggregate Sharing
19

Out-of-Order Processing and Sessions
20

Multi-Window Processing (Example: Fitness Tracker)
21
[...].window(
// Daily report:
TumblingEventTimeWindows.of(Time.days(1)),
// Monitoring dashboard (last hour):
SlidingEventTimeWindows.of(Time.hours(1), Time.seconds(1)),
// Activity periods:
EventTimeSessionWindows.of(Time.minutes(1)))
.sum()
Multi-Window Processing

Stream Slicing Performance
22

Stream Slicing Performance
23

Runtime-Dynamic Windows
24

24
Event Stream:
Window Definition Stream:
<WindowDefinition>

24
Event Stream:
Dynamic Window Operator
<WindowDefinition>

24
Event Stream:
Output Stream:
<Window, Agg>
<WindowDefinition>

24
.dynamicWindow(windowDefinitionStream)
.sum()
Event Stream:
Output Stream:
<Window, Agg>
<WindowDefinition>

From Research to Production
25
Research
Production

● Implement complete fault-tolerance and state-management
25
Research
Production

● State migration
25
Research
Production

● State migration
○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated
25
Research
Production

● State migration
● Sophisticated testing
25
Research
Production

● State migration
● Sophisticated testing
How to expose multi-windows and dynamic-windows to users?
25
Research
Production

Wrap-Up
Scotty Features:
- stream slicing
- pre-aggregation
- aggregate sharing
- out-of-order processing
- session window support
- multi-window support
- runtime-dynamic window support
Let’s bring it to production!
JIRA: [FLINK-7001]
26
This talk is supported by the European Union Horizon 2020 Projects
Proteus (687691), Streamline (688191), SAGE (671500), and
E2Data (780245) and by the German Ministry for Education and
Research as Berlin Big Data Center (01IS14013A) and Software
Campus (01IS12056).

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing

Recommended

Recommended

More Related Content

More from Jonas Traub

More from Jonas Traub (9)

Recently uploaded

Recently uploaded (20)

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing