Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures such as temporary sensor outages. Session windows are widely used to separate different periods of user activity from each other. Current versions of Apache Flink use Window Buckets to process stream aggregations with session windows and out-of-order tuples. This Approach does not share partial aggregates among overlapping windows. In our talk, we present Scotty, a high throughput operator for window discretization and aggregation in Apache Flink. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all overlapping windows including session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to tumbling and sliding windows and (2) processes out-of-order tuples efficiently. Scotty was first published at ICDE 2018 (http://www.user.tu-berlin.de/powibol/assets/publications/traub-scotty-icde-2018.pdf).
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
1. Efficient Window Aggregation
with Stream Slicing
Berlin, September 3-5, 2018
Philipp M. Grulich
Research Assistant (DFKI)
Jonas Traub
Research Associate (TU Berlin)
2. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
2
3. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
3
4. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
4
5. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
5
6. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
6
7. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
7
8. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
8
9. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
9
10. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
10
11. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
11
12. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Research
12
13. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Research
13
CIKM 2016
14. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Research
14
ICDE 2018
CIKM 2016
15. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
15
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
16. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
Events: Buckets:
Eventtime
17. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<4,3>
Events: Buckets:
Eventtime
18. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3><4,3>
Events: Buckets:
Eventtime
19. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3><4,3>
<15,6>
Events: Buckets:
Eventtime
20. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
<4,3>
<15,6>
<0:60, 9>
Events: Buckets:
Eventtime
21. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
<4,3>
<15,6>
<0:60, 9>
<55,6>
Events: Buckets:
Eventtime
22. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
...
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
Events: Buckets:
Eventtime
23. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
...
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
<66,1>
Events: Buckets:
Eventtime
24. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
16
.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))
.sum()
Example Query:
Processing with Buckets:
<0:60, 3>
<10:70, 6>
...
<60:120, 1>
<4,3>
<15,6>
<0:60, 9>
<55,6>
<0:60, 15>
<10:70, 12>
<66,1>
<10:70, 13>
Events: Buckets:
Eventtime
25. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
26. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)) --> 6 Buckets
27. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
28. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
29. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
30. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
● Repeated aggregations --> aggregation function is called on every window
31. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
● Repeated aggregations --> aggregation function is called on every window
● High memory consumption --> especially for windows without incremental aggregation
32. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Flink Windowing Bottlenecks
17
Number of Buckets = Window Length / Slide Length
SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10))
SlidingEventTimeWindows.of(Time.day(1), Time.seconds(10))
--> 6 Buckets
--> 8640 Buckets
Overlapping windows cause:
● Every event is assigned to many windows.
● Repeated aggregations --> aggregation function is called on every window
● High memory consumption --> especially for windows without incremental aggregation
● Check for merging windows
33. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
34. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
35. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
36. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
37. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
38. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Architecture Overview
18
39. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Session Window Aggregate Sharing
19
40. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Session Window Aggregate Sharing
19
41. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Session Window Aggregate Sharing
19
42. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
43. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
44. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
45. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
46. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
47. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Out-of-Order Processing and Sessions
20
49. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Performance
22
50. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Performance
23
51. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
52. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
Event Stream:
Window Definition Stream:
<WindowDefinition>
53. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
Event Stream:
Dynamic Window Operator
Window Definition Stream:
<WindowDefinition>
54. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
Event Stream:
Dynamic Window Operator
Output Stream:
<Window, Agg>
Window Definition Stream:
<WindowDefinition>
55. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Runtime-Dynamic Windows
24
.dynamicWindow(windowDefinitionStream)
.sum()
Event Stream:
Dynamic Window Operator
Output Stream:
<Window, Agg>
Window Definition Stream:
<WindowDefinition>
56. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
25
Research
Production
57. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
25
Research
Production
58. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
25
Research
Production
59. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated
25
Research
Production
60. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated
● Sophisticated testing
25
Research
Production
61. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
From Research to Production
● Implement complete fault-tolerance and state-management
● State migration
○ Hard limitation: Aggregated buckets in state snapshots cannot be migrated
● Sophisticated testing
How to expose multi-windows and dynamic-windows to users?
25
Research
Production
62. Jonas Traub (TU Berlin), Philipp M. Grulich (DFKI) - Efficient Window Aggregation with Stream Slicing
Wrap-Up
Scotty Features:
- stream slicing
- pre-aggregation
- aggregate sharing
- out-of-order processing
- session window support
- multi-window support
- runtime-dynamic window support
Let’s bring it to production!
JIRA: [FLINK-7001]
26
This talk is supported by the European Union Horizon 2020 Projects
Proteus (687691), Streamline (688191), SAGE (671500), and
E2Data (780245) and by the German Ministry for Education and
Research as Berlin Big Data Center (01IS14013A) and Software
Campus (01IS12056).