Efficient Window Aggregation
with Stream Slicing
Berlin, September 3-5, 2018
Philipp M. Grulich
Research Assistant (DFKI)
Jonas Traub
Research Associate (TU Berlin)
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
2
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
3
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
4
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
5
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
6
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
7
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
8
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
9
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
10
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
11
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Research
12
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Research
13
CIKM 2016
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Research
14
ICDE 2018
CIKM 2016
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
15
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<4,3>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3><4,3>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3><4,3>
<15,6>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
<4,3>
<15,6>
<0:60, 9>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
<4,3>
<15,6>
<0:60, 9>
<55,6>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
...
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
...
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
<66,1>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
...
<60:120, 1>
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
<66,1>
<10:70, 13>
Events: Buckets:
Eventtime
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) --> 6 Buckets
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
● Repeated aggregations --> aggregation function is called on every window
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
● Repeated aggregations --> aggregation function is called on every window
● High memory consumption --> especially for windows without incremental aggregation
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
● Repeated aggregations --> aggregation function is called on every window
● High memory consumption --> especially for windows without incremental aggregation
● Check for merging windows
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Session Window Aggregate Sharing
19
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Session Window Aggregate Sharing
19
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Session Window Aggregate Sharing
19
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Multi-Window Processing (Example: Fitness Tracker)
21
[...].window(
// Daily report:
TumblingEventTimeWindows.of(Time.days(1)),
// Monitoring dashboard (last hour):
SlidingEventTimeWindows.of(Time.hours(1), Time.seconds(1)),
// Activity periods:
EventTimeSessionWindows.of(Time.minutes(1)))
.sum()
Multi-Window Processing
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Performance
22
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Performance
23
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
Event Stream:
Window Definition Stream:
<WindowDefinition>
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
Event Stream:
Dynamic Window Operator
Window Definition Stream:
<WindowDefinition>
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
Event Stream:
Dynamic Window Operator
Output Stream:
<Window, Agg>
Window Definition Stream:
<WindowDefinition>
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
.dynamicWindow(windowDefinitionStream)
.sum()
Event Stream:
Dynamic Window Operator
Output Stream:
<Window, Agg>
Window Definition Stream:
<WindowDefinition>
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
25
Research
Production
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
25
Research
Production
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
25
Research
Production
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated
25
Research
Production
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated
● Sophisticated testing
25
Research
Production
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated
● Sophisticated testing
How to expose multi-windows and dynamic-windows to users?
25
Research
Production
Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Wrap-Up
Scotty Features:
- stream slicing
- pre-aggregation
- aggregate sharing
- out-of-order processing
- session window support
- multi-window support
- runtime-dynamic window support
Let’s bring it to production!
JIRA: [FLINK-7001]
26
This talk is supported by the European Union Horizon 2020 Projects
Proteus (687691), Streamline (688191), SAGE (671500), and
E2Data (780245) and by the German Ministry for Education and
Research as Berlin Big Data Center (01IS14013A) and Software
Campus (01IS12056).

Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window Aggregation with Stream Slicing"

  • 1.
    Efficient Window Aggregation withStream Slicing Berlin, September 3-5, 2018 Philipp M. Grulich Research Assistant (DFKI) Jonas Traub Research Associate (TU Berlin)
  • 2.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 2
  • 3.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 3
  • 4.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 4
  • 5.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 5
  • 6.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 6
  • 7.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 7
  • 8.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 8
  • 9.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 9
  • 10.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 10
  • 11.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 11
  • 12.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Research 12
  • 13.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Research 13 CIKM 2016
  • 14.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Research 14 ICDE 2018 CIKM 2016
  • 15.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 15 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query:
  • 16.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: Events: Buckets: Eventtime
  • 17.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <4,3> Events: Buckets: Eventtime
  • 18.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <0:60, 3><4,3> Events: Buckets: Eventtime
  • 19.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <0:60, 3><4,3> <15,6> Events: Buckets: Eventtime
  • 20.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <0:60, 3> <10:70, 6> <4,3> <15,6> <0:60, 9> Events: Buckets: Eventtime
  • 21.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <0:60, 3> <10:70, 6> <4,3> <15,6> <0:60, 9> <55,6> Events: Buckets: Eventtime
  • 22.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <0:60, 3> <10:70, 6> ... <4,3> <15,6> <0:60, 9> <55,6> <0:60, 15> <10:70, 12> Events: Buckets: Eventtime
  • 23.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <0:60, 3> <10:70, 6> ... <4,3> <15,6> <0:60, 9> <55,6> <0:60, 15> <10:70, 12> <66,1> Events: Buckets: Eventtime
  • 24.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 16 .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))) .sum() Example Query: Processing with Buckets: <0:60, 3> <10:70, 6> ... <60:120, 1> <4,3> <15,6> <0:60, 9> <55,6> <0:60, 15> <10:70, 12> <66,1> <10:70, 13> Events: Buckets: Eventtime
  • 25.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length
  • 26.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) --> 6 Buckets
  • 27.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10)) --> 6 Buckets --> 8640 Buckets
  • 28.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10)) --> 6 Buckets --> 8640 Buckets Overlapping windows cause:
  • 29.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10)) --> 6 Buckets --> 8640 Buckets Overlapping windows cause: ● Every event is assigned to many windows.
  • 30.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10)) --> 6 Buckets --> 8640 Buckets Overlapping windows cause: ● Every event is assigned to many windows. ● Repeated aggregations --> aggregation function is called on every window
  • 31.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10)) --> 6 Buckets --> 8640 Buckets Overlapping windows cause: ● Every event is assigned to many windows. ● Repeated aggregations --> aggregation function is called on every window ● High memory consumption --> especially for windows without incremental aggregation
  • 32.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Flink Windowing Bottlenecks 17 Number of Buckets = Window Length / Slide Length SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10)) --> 6 Buckets --> 8640 Buckets Overlapping windows cause: ● Every event is assigned to many windows. ● Repeated aggregations --> aggregation function is called on every window ● High memory consumption --> especially for windows without incremental aggregation ● Check for merging windows
  • 33.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Architecture Overview 18
  • 34.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Architecture Overview 18
  • 35.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Architecture Overview 18
  • 36.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Architecture Overview 18
  • 37.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Architecture Overview 18
  • 38.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Architecture Overview 18
  • 39.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Session Window Aggregate Sharing 19
  • 40.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Session Window Aggregate Sharing 19
  • 41.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Session Window Aggregate Sharing 19
  • 42.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Out-of-Order Processing and Sessions 20
  • 43.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Out-of-Order Processing and Sessions 20
  • 44.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Out-of-Order Processing and Sessions 20
  • 45.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Out-of-Order Processing and Sessions 20
  • 46.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Out-of-Order Processing and Sessions 20
  • 47.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Out-of-Order Processing and Sessions 20
  • 48.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Multi-Window Processing (Example: Fitness Tracker) 21 [...].window( // Daily report: TumblingEventTimeWindows.of(Time.days(1)), // Monitoring dashboard (last hour): SlidingEventTimeWindows.of(Time.hours(1), Time.seconds(1)), // Activity periods: EventTimeSessionWindows.of(Time.minutes(1))) .sum() Multi-Window Processing
  • 49.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Performance 22
  • 50.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Stream Slicing Performance 23
  • 51.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Runtime-Dynamic Windows 24
  • 52.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Runtime-Dynamic Windows 24 Event Stream: Window Definition Stream: <WindowDefinition>
  • 53.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Runtime-Dynamic Windows 24 Event Stream: Dynamic Window Operator Window Definition Stream: <WindowDefinition>
  • 54.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Runtime-Dynamic Windows 24 Event Stream: Dynamic Window Operator Output Stream: <Window, Agg> Window Definition Stream: <WindowDefinition>
  • 55.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Runtime-Dynamic Windows 24 .dynamicWindow(windowDefinitionStream) .sum() Event Stream: Dynamic Window Operator Output Stream: <Window, Agg> Window Definition Stream: <WindowDefinition>
  • 56.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing From Research to Production 25 Research Production
  • 57.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing From Research to Production ● Implement complete fault-tolerance and state-management 25 Research Production
  • 58.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing From Research to Production ● Implement complete fault-tolerance and state-management ● State migration 25 Research Production
  • 59.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing From Research to Production ● Implement complete fault-tolerance and state-management ● State migration ○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated 25 Research Production
  • 60.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing From Research to Production ● Implement complete fault-tolerance and state-management ● State migration ○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated ● Sophisticated testing 25 Research Production
  • 61.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing From Research to Production ● Implement complete fault-tolerance and state-management ● State migration ○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated ● Sophisticated testing How to expose multi-windows and dynamic-windows to users? 25 Research Production
  • 62.
    Jonas Traub (TUBerlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing Wrap-Up Scotty Features: - stream slicing - pre-aggregation - aggregate sharing - out-of-order processing - session window support - multi-window support - runtime-dynamic window support Let’s bring it to production! JIRA: [FLINK-7001] 26 This talk is supported by the European Union Horizon 2020 Projects Proteus (687691), Streamline (688191), SAGE (671500), and E2Data (780245) and by the German Ministry for Education and Research as Berlin Big Data Center (01IS14013A) and Software Campus (01IS12056).