Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
dbisINSTITUT FÜR INFORMATIK
HUMBOLDT−UNIVERSITÄT ZU ERLINB
Feeding a Squirrel in Time—Windows in Flink
Apache Flink Meetup...
–MatthiasJ.Sax–WindowsinApacheFlink
1/21
About Me
Ph. D. student in CS, DBIS Group, HU Berlin
involved in Stratosphere res...
–MatthiasJ.Sax–WindowsinApacheFlink
2/21
Stream Processing
Processing data in motion:
external sources create data constan...
–MatthiasJ.Sax–WindowsinApacheFlink
3/21
Other Systems
Apache Storm
widely used in industry
different processing guarantees...
–MatthiasJ.Sax–WindowsinApacheFlink
4/21
Other Systems (cont.)
Apache Samza
similar to Storm
at-least-once processing
acti...
–MatthiasJ.Sax–WindowsinApacheFlink
5/21
Other Systems (cont.)
Google Dataflow
similar API to Flink offering very rich seman...
–MatthiasJ.Sax–WindowsinApacheFlink
6/21
Other Systems (cont.)
Apache Spark (Streaming)
micro-batching (no real streaming)...
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
1012812
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
1012812 10
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
1012812 10
0
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
1012812 10
0
sum
10
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
1012812 10
0
sum
10
12
10
sum
...
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
1012812 10
0
sum
10
12
10
sum
...
–MatthiasJ.Sax–WindowsinApacheFlink
7/21
Batch vs. Stream Processing
1012812 1012812
sum
42
1012812 10
0
sum
10
12
10
sum
...
–MatthiasJ.Sax–WindowsinApacheFlink
8/21
Batch vs. Stream Processing (cont.)
DataSet API
DataSet <Tuple2 <String ,Integer ...
–MatthiasJ.Sax–WindowsinApacheFlink
8/21
Batch vs. Stream Processing (cont.)
DataSet API
DataSet <Tuple2 <String ,Integer ...
–MatthiasJ.Sax–WindowsinApacheFlink
9/21
Count Based Windows
1012812
–MatthiasJ.Sax–WindowsinApacheFlink
9/21
Count Based Windows
1012812 1012
–MatthiasJ.Sax–WindowsinApacheFlink
9/21
Count Based Windows
1012812 1012
sum
22
–MatthiasJ.Sax–WindowsinApacheFlink
9/21
Count Based Windows
1012812 1012
sum
22
812
sum
20
–MatthiasJ.Sax–WindowsinApacheFlink
9/21
Count Based Windows
1012812 1012
sum
22
812
sum
20
1012812
–MatthiasJ.Sax–WindowsinApacheFlink
9/21
Count Based Windows
1012812 1012
sum
22
812
sum
20
1012812 1012
sum
22
128
sum
20...
–MatthiasJ.Sax–WindowsinApacheFlink
10/21
Count Based Windows
Count Based Window (tumbling)
DataStream <Tuple2 <String ,In...
–MatthiasJ.Sax–WindowsinApacheFlink
10/21
Count Based Windows
Count Based Window (tumbling)
DataStream <Tuple2 <String ,In...
–MatthiasJ.Sax–WindowsinApacheFlink
10/21
Count Based Windows
Count Based Window (tumbling)
DataStream <Tuple2 <String ,In...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
abcd
Ext. prod. 1
1234
Ext. prod. 2
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
abcd
Ext. prod. 1
1234
Ext. prod. 2
time: 5 4 3 2 1
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
abcd
Ext. prod. 1
1234
Ext. prod. 2
time: 5 4 3 2 1
S...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
abcd
Ext. prod. 1
1234
Ext. prod. 2
time: 5 4 3 2 1
S...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
bcd
Ext. prod. 1
234
Ext. prod. 2
time: 5 4 3 2 1
Sou...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
bcd
Ext. prod. 1
234
Ext. prod. 2
time: 5 4 3 2 1
Sou...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
cd
Ext. prod. 1
34
Ext. prod. 2
time: 5 4 3 2 1
Sourc...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
cd
Ext. prod. 1
34
Ext. prod. 2
time: 5 4 3 2 1
Sourc...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
d
Ext. prod. 1
4
Ext. prod. 2
time: 5 4 3 2 1
Source ...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
d
Ext. prod. 1
4
Ext. prod. 2
time: 5 4 3 2 1
Source ...
–MatthiasJ.Sax–WindowsinApacheFlink
11/21
The Nature of Data Streams
Ext. prod. 1
Ext. prod. 2
time: 5 4 3 2 1
Source Oper...
–MatthiasJ.Sax–WindowsinApacheFlink
12/21
The Notion of Time
Event Time = Processing Time
–MatthiasJ.Sax–WindowsinApacheFlink
12/21
The Notion of Time
Event Time = Processing Time
Event Time
ProcessingTime
Google...
–MatthiasJ.Sax–WindowsinApacheFlink
12/21
The Notion of Time
Event Time = Processing Time
Event Time
ProcessingTime
Skew
G...
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
1
3
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
1
3
2
4
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
1
3
2
4
1234
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
1
3
2
4
1234
wrong processing order!
2
5
8
7
1
4
7
9
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
1
3
2
4
2
5
8
7
1
4
7
9
wm=3
wm=4
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
4
5
8
7
4
7
9
wm=3
wm=4
11223
–MatthiasJ.Sax–WindowsinApacheFlink
13/21
Watermarks
4
5
8
7
4
7
9
wm=3
wm=4
11223
wm=3
–MatthiasJ.Sax–WindowsinApacheFlink
14/21
Streaming Tradeoffs
Processing Time
no late data / no skew
windows are simple to ...
–MatthiasJ.Sax–WindowsinApacheFlink
14/21
Streaming Tradeoffs
Processing Time
no late data / no skew
windows are simple to ...
–MatthiasJ.Sax–WindowsinApacheFlink
14/21
Streaming Tradeoffs
Processing Time
no late data / no skew
windows are simple to ...
–MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
–MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alte...
–MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alte...
–MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alte...
–MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alte...
–MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alte...
–MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alte...
–MatthiasJ.Sax–WindowsinApacheFlink
16/21
Time Based Windows (cont.)
Sliding Time Window Example
DataStream <...> input = ...
–MatthiasJ.Sax–WindowsinApacheFlink
16/21
Time Based Windows (cont.)
Sliding Time Window Example
DataStream <...> input = ...
–MatthiasJ.Sax–WindowsinApacheFlink
17/21
Advanced Windowing Concepts
global windows (non-parallelized)
–MatthiasJ.Sax–WindowsinApacheFlink
17/21
Advanced Windowing Concepts
global windows (non-parallelized)
Triggers:
closes a...
–MatthiasJ.Sax–WindowsinApacheFlink
17/21
Advanced Windowing Concepts
global windows (non-parallelized)
Triggers:
closes a...
–MatthiasJ.Sax–WindowsinApacheFlink
17/21
Advanced Windowing Concepts
global windows (non-parallelized)
Triggers:
closes a...
–MatthiasJ.Sax–WindowsinApacheFlink
18/21
Stateful Stream Processing
Flink can handle arbitrary user state:
state is store...
–MatthiasJ.Sax–WindowsinApacheFlink
19/21
Summary
The time-problem:
processing time vs. event time vs. ingestion time
time...
–MatthiasJ.Sax–WindowsinApacheFlink
19/21
Summary
The time-problem:
processing time vs. event time vs. ingestion time
time...
–MatthiasJ.Sax–WindowsinApacheFlink
20/21
Summary (cont.)
Flink provides a rich API (Java/Scala) to express different
seman...
–MatthiasJ.Sax–WindowsinApacheFlink
20/21
Summary (cont.)
Flink provides a rich API (Java/Scala) to express different
seman...
dbisINSTITUT FÜR INFORMATIK
HUMBOLDT−UNIVERSITÄT ZU ERLINB
Feeding a Squirrel in Time—Windows in Flink
Apache Flink Meetup...
Upcoming SlideShare
Loading in …5
×

Feeding a Squirrel in Time---Windows in Flink

629 views

Published on

Apache Flink Meetup Munich (November 2015).

Published in: Software
  • Be the first to comment

Feeding a Squirrel in Time---Windows in Flink

  1. 1. dbisINSTITUT FÜR INFORMATIK HUMBOLDT−UNIVERSITÄT ZU ERLINB Feeding a Squirrel in Time—Windows in Flink Apache Flink Meetup Munich Matthias J. Sax mjsax@{informatik.hu-berlin.de|apache.org} @MatthiasJSax Humboldt-Universit¨at zu Berlin Department of Computer Science November 11st 2015
  2. 2. –MatthiasJ.Sax–WindowsinApacheFlink 1/21 About Me Ph. D. student in CS, DBIS Group, HU Berlin involved in Stratosphere research project working on data stream processing and optimization Aeolus: build on top of Apache Storm (https://github.com/mjsax/aeolus) Committer at Apache Flink
  3. 3. –MatthiasJ.Sax–WindowsinApacheFlink 2/21 Stream Processing Processing data in motion: external sources create data constantly data is pushed to the system need to keep up with incoming data rate usage of ingestion buffers (e. g., Apache Kafka) handle data peaks back pressure, dynamic scaling (or even load-shedding) low processing latency (milliseconds) no micro-batching
  4. 4. –MatthiasJ.Sax–WindowsinApacheFlink 3/21 Other Systems Apache Storm widely used in industry different processing guarantees no guarantee at-least-once exactly-once (not for external writes) no ordering guarantees no type system dynamic scaling (to some extent) some high-level abstractions using Trident windows, state, exactly-once-processing
  5. 5. –MatthiasJ.Sax–WindowsinApacheFlink 4/21 Other Systems (cont.) Apache Samza similar to Storm at-least-once processing active state handling
  6. 6. –MatthiasJ.Sax–WindowsinApacheFlink 5/21 Other Systems (cont.) Google Dataflow similar API to Flink offering very rich semantics windows, triggers can deal with late arriving data dynamic scaling only available as service in the cloud
  7. 7. –MatthiasJ.Sax–WindowsinApacheFlink 6/21 Other Systems (cont.) Apache Spark (Streaming) micro-batching (no real streaming) limited semantics exactly-once processing state management no sub-second latency
  8. 8. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812
  9. 9. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812
  10. 10. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42
  11. 11. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42 1012812
  12. 12. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42 1012812 10
  13. 13. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42 1012812 10 0
  14. 14. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42 1012812 10 0 sum 10
  15. 15. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42 1012812 10 0 sum 10 12 10 sum 22
  16. 16. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42 1012812 10 0 sum 10 12 10 sum 22 8 22 sum 30
  17. 17. –MatthiasJ.Sax–WindowsinApacheFlink 7/21 Batch vs. Stream Processing 1012812 1012812 sum 42 1012812 10 0 sum 10 12 10 sum 22 8 22 sum 30 12 30 sum 42
  18. 18. –MatthiasJ.Sax–WindowsinApacheFlink 8/21 Batch vs. Stream Processing (cont.) DataSet API DataSet <Tuple2 <String ,Integer >> input = ... DataSet <Tuple2 <String ,Integer >> result = input.groupBy (0). sum (1);
  19. 19. –MatthiasJ.Sax–WindowsinApacheFlink 8/21 Batch vs. Stream Processing (cont.) DataSet API DataSet <Tuple2 <String ,Integer >> input = ... DataSet <Tuple2 <String ,Integer >> result = input.groupBy (0). sum (1); DataStream API DataStream <Tuple2 <String ,Integer >> input = ... DataStream <Tuple2 <String ,Integer >> result = input.keyBy (0). sum (1);
  20. 20. –MatthiasJ.Sax–WindowsinApacheFlink 9/21 Count Based Windows 1012812
  21. 21. –MatthiasJ.Sax–WindowsinApacheFlink 9/21 Count Based Windows 1012812 1012
  22. 22. –MatthiasJ.Sax–WindowsinApacheFlink 9/21 Count Based Windows 1012812 1012 sum 22
  23. 23. –MatthiasJ.Sax–WindowsinApacheFlink 9/21 Count Based Windows 1012812 1012 sum 22 812 sum 20
  24. 24. –MatthiasJ.Sax–WindowsinApacheFlink 9/21 Count Based Windows 1012812 1012 sum 22 812 sum 20 1012812
  25. 25. –MatthiasJ.Sax–WindowsinApacheFlink 9/21 Count Based Windows 1012812 1012 sum 22 812 sum 20 1012812 1012 sum 22 128 sum 20 812 sum 20
  26. 26. –MatthiasJ.Sax–WindowsinApacheFlink 10/21 Count Based Windows Count Based Window (tumbling) DataStream <Tuple2 <String ,Integer >> input = ... DataStream <Tuple2 <String ,Integer >> result = input.keyBy (0). countWindow (2). sum (1);
  27. 27. –MatthiasJ.Sax–WindowsinApacheFlink 10/21 Count Based Windows Count Based Window (tumbling) DataStream <Tuple2 <String ,Integer >> input = ... DataStream <Tuple2 <String ,Integer >> result = input.keyBy (0). countWindow (2). sum (1); Count Based Window (overlapping) DataStream <Tuple2 <String ,Integer >> input = ... DataStream <Tuple2 <String ,Integer >> result = input.keyBy (0). countWindow (2 ,1). sum (1);
  28. 28. –MatthiasJ.Sax–WindowsinApacheFlink 10/21 Count Based Windows Count Based Window (tumbling) DataStream <Tuple2 <String ,Integer >> input = ... DataStream <Tuple2 <String ,Integer >> result = input.keyBy (0). countWindow (2). sum (1); Count Based Window (overlapping) DataStream <Tuple2 <String ,Integer >> input = ... DataStream <Tuple2 <String ,Integer >> result = input.keyBy (0). countWindow (2 ,1). sum (1); Caution: count-windows applies to each sub-stream
  29. 29. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams abcd Ext. prod. 1 1234 Ext. prod. 2
  30. 30. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams abcd Ext. prod. 1 1234 Ext. prod. 2 time: 5 4 3 2 1
  31. 31. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams abcd Ext. prod. 1 1234 Ext. prod. 2 time: 5 4 3 2 1 Source Operator
  32. 32. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams abcd Ext. prod. 1 1234 Ext. prod. 2 time: 5 4 3 2 1 Source Operator
  33. 33. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams bcd Ext. prod. 1 234 Ext. prod. 2 time: 5 4 3 2 1 Source Operator a1 1.52
  34. 34. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams bcd Ext. prod. 1 234 Ext. prod. 2 time: 5 4 3 2 1 Source Operator a1 1.52
  35. 35. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams cd Ext. prod. 1 34 Ext. prod. 2 time: 5 4 3 2 1 Source Operator a1 1.52 2b 2.53.5
  36. 36. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams cd Ext. prod. 1 34 Ext. prod. 2 time: 5 4 3 2 1 Source Operator a1 1.52 2b 2.53.5
  37. 37. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams d Ext. prod. 1 4 Ext. prod. 2 time: 5 4 3 2 1 Source Operator a1 1.52 2b 2.53.5 c3 4.55
  38. 38. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams d Ext. prod. 1 4 Ext. prod. 2 time: 5 4 3 2 1 Source Operator a1 1.52 2b 2.53.5 c3 4.55
  39. 39. –MatthiasJ.Sax–WindowsinApacheFlink 11/21 The Nature of Data Streams Ext. prod. 1 Ext. prod. 2 time: 5 4 3 2 1 Source Operator a1 1.52 2b 2.53.5 c3 4.55 d4 66.5
  40. 40. –MatthiasJ.Sax–WindowsinApacheFlink 12/21 The Notion of Time Event Time = Processing Time
  41. 41. –MatthiasJ.Sax–WindowsinApacheFlink 12/21 The Notion of Time Event Time = Processing Time Event Time ProcessingTime Google Cloud Dataflow and Flink, William Vambenepe, Flink Forward 2015.
  42. 42. –MatthiasJ.Sax–WindowsinApacheFlink 12/21 The Notion of Time Event Time = Processing Time Event Time ProcessingTime Skew Google Cloud Dataflow and Flink, William Vambenepe, Flink Forward 2015.
  43. 43. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks
  44. 44. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks 1 3
  45. 45. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks 1 3 2 4
  46. 46. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks 1 3 2 4 1234
  47. 47. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks 1 3 2 4 1234 wrong processing order! 2 5 8 7 1 4 7 9
  48. 48. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks 1 3 2 4 2 5 8 7 1 4 7 9 wm=3 wm=4
  49. 49. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks 4 5 8 7 4 7 9 wm=3 wm=4 11223
  50. 50. –MatthiasJ.Sax–WindowsinApacheFlink 13/21 Watermarks 4 5 8 7 4 7 9 wm=3 wm=4 11223 wm=3
  51. 51. –MatthiasJ.Sax–WindowsinApacheFlink 14/21 Streaming Tradeoffs Processing Time no late data / no skew windows are simple to build low latency inherently non-deterministic
  52. 52. –MatthiasJ.Sax–WindowsinApacheFlink 14/21 Streaming Tradeoffs Processing Time no late data / no skew windows are simple to build low latency inherently non-deterministic Event Time (external) late data / skew out-of-order data (windowing more difficult) simpler to reason about semantics (deterministic) increased latency
  53. 53. –MatthiasJ.Sax–WindowsinApacheFlink 14/21 Streaming Tradeoffs Processing Time no late data / no skew windows are simple to build low latency inherently non-deterministic Event Time (external) late data / skew out-of-order data (windowing more difficult) simpler to reason about semantics (deterministic) increased latency Event Time (ingestion) no late data / no skew no out-of-order simplified watermarking
  54. 54. –MatthiasJ.Sax–WindowsinApacheFlink 15/21 Time Based Windows Timestamp Example StreamExecutionEnviroment env = ...
  55. 55. –MatthiasJ.Sax–WindowsinApacheFlink 15/21 Time Based Windows Timestamp Example StreamExecutionEnviroment env = ... // alternatives : ProcessingTime / IngestionTime env. setStreamTimeCharacteristic ( TimeCharacteristic .EventTime );
  56. 56. –MatthiasJ.Sax–WindowsinApacheFlink 15/21 Time Based Windows Timestamp Example StreamExecutionEnviroment env = ... // alternatives : ProcessingTime / IngestionTime env. setStreamTimeCharacteristic ( TimeCharacteristic .EventTime ); DataStream <Tuple > input = ... input. assignTimestamps (
  57. 57. –MatthiasJ.Sax–WindowsinApacheFlink 15/21 Time Based Windows Timestamp Example StreamExecutionEnviroment env = ... // alternatives : ProcessingTime / IngestionTime env. setStreamTimeCharacteristic ( TimeCharacteristic .EventTime ); DataStream <Tuple > input = ... input. assignTimestamps (new TimestampExtractor <Tuple > {
  58. 58. –MatthiasJ.Sax–WindowsinApacheFlink 15/21 Time Based Windows Timestamp Example StreamExecutionEnviroment env = ... // alternatives : ProcessingTime / IngestionTime env. setStreamTimeCharacteristic ( TimeCharacteristic .EventTime ); DataStream <Tuple > input = ... input. assignTimestamps (new TimestampExtractor <Tuple > { public long extractTimestamp (Tuple element , long currentTimestamp ) { return /* extract from element */; }
  59. 59. –MatthiasJ.Sax–WindowsinApacheFlink 15/21 Time Based Windows Timestamp Example StreamExecutionEnviroment env = ... // alternatives : ProcessingTime / IngestionTime env. setStreamTimeCharacteristic ( TimeCharacteristic .EventTime ); DataStream <Tuple > input = ... input. assignTimestamps (new TimestampExtractor <Tuple > { public long extractTimestamp (Tuple element , long currentTimestamp ) { return /* extract from element */; } public long extractWatermark (Tuple element , long currentTimestamp ) { return /* extract from element */; }
  60. 60. –MatthiasJ.Sax–WindowsinApacheFlink 15/21 Time Based Windows Timestamp Example StreamExecutionEnviroment env = ... // alternatives : ProcessingTime / IngestionTime env. setStreamTimeCharacteristic ( TimeCharacteristic .EventTime ); DataStream <Tuple > input = ... input. assignTimestamps (new TimestampExtractor <Tuple > { public long extractTimestamp (Tuple element , long currentTimestamp ) { return /* extract from element */; } public long extractWatermark (Tuple element , long currentTimestamp ) { return /* extract from element */; } public long getCurrentWatermark () { return Long.MIN_VALUE; } });
  61. 61. –MatthiasJ.Sax–WindowsinApacheFlink 16/21 Time Based Windows (cont.) Sliding Time Window Example DataStream <...> input = ... input.keyBy (...) // size = 5s; slide = 1s .timeWindow(Time.of(5, TimeUnit.SECONDS), Time.of(1, TimeUnit.SECONDS )) .reduce (...);
  62. 62. –MatthiasJ.Sax–WindowsinApacheFlink 16/21 Time Based Windows (cont.) Sliding Time Window Example DataStream <...> input = ... input.keyBy (...) // size = 5s; slide = 1s .timeWindow(Time.of(5, TimeUnit.SECONDS), Time.of(1, TimeUnit.SECONDS )) .reduce (...); General Window Example DataStream <...> input = ... input.keyBy (...) .window (...) .apply(new WindowsFunction <... >() { // ... });
  63. 63. –MatthiasJ.Sax–WindowsinApacheFlink 17/21 Advanced Windowing Concepts global windows (non-parallelized)
  64. 64. –MatthiasJ.Sax–WindowsinApacheFlink 17/21 Advanced Windowing Concepts global windows (non-parallelized) Triggers: closes a window (i. e., fires) processing time watermark count delta ... (with different discarding strategies)
  65. 65. –MatthiasJ.Sax–WindowsinApacheFlink 17/21 Advanced Windowing Concepts global windows (non-parallelized) Triggers: closes a window (i. e., fires) processing time watermark count delta ... (with different discarding strategies) Evict: removes tuple from window before function gets applied time, count, delta
  66. 66. –MatthiasJ.Sax–WindowsinApacheFlink 17/21 Advanced Windowing Concepts global windows (non-parallelized) Triggers: closes a window (i. e., fires) processing time watermark count delta ... (with different discarding strategies) Evict: removes tuple from window before function gets applied time, count, delta mix different windows/triggers/evictors
  67. 67. –MatthiasJ.Sax–WindowsinApacheFlink 18/21 Stateful Stream Processing Flink can handle arbitrary user state: state is store reliably distributed snapshots algorithm Example public class CounterSum implements RichReduceFunction <Long > { private OperatorState <Long > counter; public void open( Configuration config) { counter = getRuntimeContext () . getOperatorState ("myCnt", Long.class , 0L); } public Long reduce(Long v1 , Long v2) throws Exception { counter.update(counter.value () + 1); return v1 + v2; } }
  68. 68. –MatthiasJ.Sax–WindowsinApacheFlink 19/21 Summary The time-problem: processing time vs. event time vs. ingestion time time skew, out-of-order tuples, watermarks
  69. 69. –MatthiasJ.Sax–WindowsinApacheFlink 19/21 Summary The time-problem: processing time vs. event time vs. ingestion time time skew, out-of-order tuples, watermarks The window-question: count-based vs. time-based tumbling vs. overlapping intermediate triggers advanced windows (mix of above)
  70. 70. –MatthiasJ.Sax–WindowsinApacheFlink 20/21 Summary (cont.) Flink provides a rich API (Java/Scala) to express different semantics state handling for arbitrary UDF code fault-tolerance with exactly-once guarantees exaclty-once sink available
  71. 71. –MatthiasJ.Sax–WindowsinApacheFlink 20/21 Summary (cont.) Flink provides a rich API (Java/Scala) to express different semantics state handling for arbitrary UDF code fault-tolerance with exactly-once guarantees exaclty-once sink available What else? Python API is coming (right now DataSet only) Google Dataflow on Flink Storm on Flink Apache SAMOA on Flink
  72. 72. dbisINSTITUT FÜR INFORMATIK HUMBOLDT−UNIVERSITÄT ZU ERLINB Feeding a Squirrel in Time—Windows in Flink Apache Flink Meetup Munich Thanks!

×