Tutorial:
The Role of Event-Time Order
in Data Streaming Analysis
VincenzoGulisano
Chalmers University ofTechnology
Gothenburg, Sweden
vincenzo.gulisano@chalmers.se
Dimitris Palyvos-Giannas
Chalmers University ofTechnology
Gothenburg, Sweden
palyvos@chalmers.se
Bastian Havers
Chalmers University ofTechnology &Volvo Cars
Gothenburg, Sweden
havers@chalmers.se
Marina Papatriantafilou
Chalmers University ofTechnology
Gothenburg, Sweden
ptrianta@chalmers.se
2
Agenda
• Motivation, preliminaries and examples about
data streaming and Stream Processing Engines
• Causes of out-of-order data and solutions enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the watermarks
3
https://github.com/ vincenzo- gulisano/debs2020_tutorial_event_time
Agenda
• Motivation, preliminaries and examples about
data streaming and Stream Processing Engines
• Causes of out-of-order data and solutions enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the watermarks
4
Terabyte Scale
1012 bytes
1984
Zetabyte Scale
109 Terabytes
The era of Big Data
5
Where does Big Data Originate?
1 trillion sensors
by 2030
in the Internet of Things (IoT)
uploaded toYouTube
every minute
2.32 billion
Facebook users
219 billion photos
uploaded to Facebook
1 online interaction
every 18 seconds
by 2025
6
500 hours of video
Main Memory
Database Management Systems (DBMSs) vs.
Stream Processing Engines (SPEs)
7
Disk
1 Data
Query Processing
3 Query
results
2 Query
Main Memory
Query Processing
Continuous
QueryData
Query
results
8
time
Borealis
The Aurora Project
STanfordstREamdatAManager
NiagaraCQ
COUGAR
StreamCloud
Flink-related images / code snippets in the following are taken from: https://flink.apache.org/ 9
Data Stream:
unbounded sequence of tuples
sharing the same schema
Example: vehicles’ speed reports
10
time
Field Field
vehicle id text
time (secs) text
speed (Km/h) double
X coordinate double
Y coordinate double
A 8:00 55.5 X1 Y1
Let’s assume each source (e.g.,
vehicle)
produces and delivers
a timestamp-sorted stream
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Continuous query (or simply query):
Directed Acyclic Graph (DAG) of streams and
operators
OP
OP
OP
OP OP
OP OP
source op
(1+ out streams)
sink op
(1+ in streams)
stream
op
(1+ in, 1+ out streams)
11
Data Streaming Operators
Two main types:
• Stateless operators
• do not maintain any state
• one-by-one processing
• if they maintain some state, such state does not evolve depending on the tuples being
processed
• Stateful operators
• maintain a state that evolves depending on the tuples being processed
• produce output tuples that depend on multiple input tuples
12
OP
OP
Stateless Operators
Filter / route tuples based on one (or more) conditions
13
Filter
...Map Transform each tuple
14
Stateful Operators
Aggregate information from multiple
tuples (e.g., max, min, sum, ...)
15
Join tuples coming from 2 streams given a
certain predicate
Aggregate
Join
Windows and Stateful Analysis
Stateful operations are done over windows:
• Time-based (e.g., tuples in the last 10 minutes)
• Tuple-based (e.g., given the last 50 tuples)
16
time
[8:00,9:00)
[8:20,9:20)
[8:40,9:40)
Example of time-based window of size 1 hour and advance 20 minutes
 How many tuple in a window?
 Which time period does a window span?
Time-based sliding window aggregation
(count)
17
Counter: 4
time
[8:00,9:00)
8:05 8:15 8:22 8:45 9:05
Output: 4
Counter: 1
Counter: 2
Counter: 3
Counter: 3
time
8:05 8:15 8:22 8:45 9:05
[8:20,9:20)
Time-based sliding window joining
18
t1
t2
t3
t4
t1
t2
t3
t4
R S
Sliding
window Window
sizeWS
WSWR
Predicate P
Windows and stateful analysis
19
Basic operators and user-defined operators
20
Besides a set of basic operators, SPEs usually
allow the user to define ad-hoc operators
(e.g., when existing aggregation are not enough)
SPEs and operators' variants
• Each SPE might define its own variants
of certain streaming operators:
21
t1
t2
t3
t4
t1
t2
t3
t4
R S
Sliding
window Window
sizeWS
WSWR
Predicate P
Sample Query
"every five minutes, of all vehicles that braked significantly, find the
one that braked the hardest"
22
time
A 8:00 55.5 X1 Y1 ... B 8:07 34.3 X3 Y3 ...
B 8:03 70.3 X2 Y2 ...
Sample Query
Remove
unused fields
Map
Field
vehicle id
time (secs)
speed (Km/h)
X coordinate
Y coordinate
...
Field
vehicle id
time (secs)
speed (Km/h)
Every minute,
compute average
speed of each
vehicle during the
last 2.5 minutes
Aggregate
Field
vehicle id
time (secs)
avg speed (Km/h)
Join
High average
speed and slow
current speed?
Filter
Field
vehicle id
time (secs)
braking factor
Join on
vehicle id in
last minute
Field
vehicle id
time (secs)
avg speed (Km/h)
speed (Km/h)
Aggregate
Every 5 minutes,
produce vehicle that
braked the hardest
during last 5
minutes
23
Agenda
• Motivation, preliminaries and examples about
data streaming and Stream Processing Engines
• Causes of out-of-order data and solutions enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the watermarks
24
From an abstract query
… to streaming application run by an SPE
25
OP1 OP2 OP4 OP6OP3 OP5Source Sink
OP1
OP2
OP4 OP6OP3
OP5Source
Sink
Source OP2
OP3
OP3
OP5
OP1
OP2
OP4 OP6OP3
OP5
Source
Sink
Source OP2
OP3
OP3
OP5
Node Node NodeNode Node
Process
Processes
Process
Process
Process
Process
From an abstract query
… to streaming application run by an SPE
26
27
From an abstract query
…to streaming application run by Flink - 1/3
28
From an abstract query
…to streaming application run by Flink - 1/3
29
From an abstract query
…to streaming application run by Flink - 1/3
Causes of out-of-order data:
30
Sources
themselves
Asynchronous
Distributed
Parallel
executions
Data sources that produce out-of-order data
• Discussed in many related works (e.g., Babu, Shivnath, Utkarsh Srivastava, and JenniferWidom.
"Exploiting k-constraints to reduce memory overhead in continuous queries over data streams." ACM
Transactions on Database Systems (TODS) 29.3 (2004): 545-580.)
• Battery-operated devices, unreliable wireless networks, …
1 trillion sensors
by 2030
in the Internet of Things (IoT)
31
Causes of out-of-order data
32
Sources
themselves
Asynchronous
Distributed
Parallel
executions
The 3-step procedure
(sequential stream join)
33
For each incoming tuple t:
1. compare t with all tuples in opposite window given predicate P
2. add t to its window
3. remove stale tuples from t’s window
Add tuplesto S
Add tuples to R
Prod
R
Prod
S
Consume resultsConsPU
We assume each producer
delivers tuples in timestamp
order
The 3-step procedure, is it enough?
34
t1
t2
t1
t2
R S
WSWR
t3
t1
t2
t1
t2
R S
WSWR
t4
t3
Causes of out-of-order data
35
Asynchronous
Distributed
Parallel
executions
Any operator fed data from multiple logical /
physical stream can potentially observe out-
of-order data
Parallel execution
• General approach
36
OPA OPB
Parallel execution
• General approach
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPA RM
Thread 1 Thread 2 Thread 3
OPA RM
Thread 1 Thread 2
OPA RM
Thread 1 Thread 2
…
37
Parallel execution
• General approach
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPA RM
Thread 1 Thread 2 Thread 3
OPA RM
Thread 1 Thread 2
OPA RM
Thread 1 Thread 2
…
38
Parallel execution
• General approach
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPB RM
Thread 1
OPB RM
Thread n
…
39
Parallel execution
• General approach
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPB RM
Thread 1
OPB RM
Thread n
…
40
Parallel execution
• Stateful operators: Semantic awareness
• Aggregate: count within last hour, group-by vehicle id
41
Previous Subcluster
R
…
R
…
M Agg1
M Agg2
M Agg3
…
…
…
Vehicle A
Parallel execution
• Depending on the stateful operator semantic:
• Partition input stream into keys
• Each key is processed by 1 thread
• # keys >> # threads/nodes
42
Parallel execution
• Depending on the stateful operator semantic:
• Partition input stream into keys
• Each key is processed by 1 thread
• # keys >> # threads/nodes
43
Keys
domain
Agg1 Agg2 Agg3
A
D
E
B
C F
Parallel execution
• Depending on the stateful operator semantics:
• Partition input stream into keys
• Each key is processed by 1 thread
• # keys >> # threads/nodes
44
Keys
domain
Agg1 Agg2 Agg3
A
D
E
B
C F
45
A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Parallel execution
• Stateful operators: Semantic awareness
• Aggregate: count within last hour, group-by vehicle id
46
… R
… R
M Agg1
M Agg2
M Agg3
…
…
…
Map
Map
Vehicle A
Round-robin
(stateless)
Parallel execution
• Stateful operators: Semantic awareness
• Aggregate: count within last hour, group-by vehicle id
47
R…
R…
M Agg1
M Agg2
M Agg3
…
…
…
Vehicle A
Map
Map
Round-robin
(stateless)
A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Inherent disorder
48
Map Aggregate Join Filter Aggregate
Print-to-file operator
P
Map0
Map1
Map2
Map3
P Aggregate Filter AggregateJoin
Disorder from parallelism
how to merge several timestamp-sorted streams...
49
M
...
...into one timestamp-sorted stream?
Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and
Distributed Systems 23.12 (2012): 2351-2365. 50
Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the
2005 ACM SIGMOD international conference on Management of data. 2005. 51
Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big
Data (2016).
52
Which one to choose?
• Different options have different costs
• might require different types of data structures
• might require “special tuples” (more on this in the following part of
the tutorial)
• When the price of merge-sorting is paid by who provides the
tuples rather than who receives them, the system can scale better
53
Tutorial:
The Role of Event-Time Order
in Data Streaming Analysis
VincenzoGulisano
Chalmers University ofTechnology
Gothenburg, Sweden
vincenzo.gulisano@chalmers.se
Dimitris Palyvos-Giannas
Chalmers University ofTechnology
Gothenburg, Sweden
palyvos@chalmers.se
Bastian Havers
Chalmers University ofTechnology &Volvo Cars
Gothenburg, Sweden
havers@chalmers.se
Marina Papatriantafilou
Chalmers University ofTechnology
Gothenburg, Sweden
ptrianta@chalmers.se
Tutorial:
The Role of Event-Time Order
in Data Streaming Analysis
VincenzoGulisano
Chalmers University ofTechnology
Gothenburg, Sweden
vincenzo.gulisano@chalmers.se
Dimitris Palyvos-Giannas
Chalmers University ofTechnology
Gothenburg, Sweden
palyvos@chalmers.se
Bastian Havers
Chalmers University ofTechnology &Volvo Cars
Gothenburg, Sweden
havers@chalmers.se
Marina Papatriantafilou
Chalmers University ofTechnology
Gothenburg, Sweden
ptrianta@chalmers.se
56
Agenda
• Motivation, preliminaries and examples about
data streaming and Stream Processing Engines
• Causes of out-of-order data and solutions enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the watermarks
57
Agenda
• Motivation, preliminaries and examples about
data streaming and Stream Processing Engines
• Causes of out-of-order data and solutions enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the watermarks
Pros/Cons of total-ordering
• Cons:
• Expensive (computation- and latency-wise)
• An “overkill” for certain applications (more of this in the following
slides)
• Pros:
• Determinism
• Synchronization
• Eager purging of stale state
58
Pros/Cons of total-ordering
• Cons:
• Expensive (computation- and latency-wise)
• An “overkill” for certain applications (more of this in the following
slides)
• Pros:
• Determinism
• Synchronization
• Eager purging of stale state
59
Cost
• We need to temporary maintain tuples
• Linear in the number of tuples we receive,
which depends on the streams’ rate
• We need to sort tuples… O(n log(n))
• (n is number of sources or tuples, depending on the case)
• We need data from all sources, the processing latency depends on the slowest source.
• The latency overhead can be estimated based on the sources’ rates1
1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International Conference on
Distributed and Event-based Systems. 2017. 60
Estimating the latency overhead
1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International
Conference on Distributed and Event-based Systems. 2017. 61
Pros/Cons of total-ordering
• Cons:
• Expensive (computation- and latency-wise)
• An “overkill” for certain applications (more of this in the following
slides)
• Pros:
• Determinism
• Synchronization
• Eager purging of stale state
62
Determinism
63
OP1 OP2 OP4 OP6OP3 OP5Source Sink
OP1
OP2
OP4 OP6OP3
OP5Source
Sink
Source OP2
OP3
OP3
OP5
Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international
conference on Management of data. 2005.
Gulisano, Vincenzo. StreamCloud: an elastic parallel-distributed stream processing engine. Diss. 2012.
Determinism
64
t1 t2
S1
S2
t3
t4
t5 t6 t7 t8
t9
t1 t2 t3 t4 t5 t6 t7 t8 t9
…
Tuple-based window, size: 4 / advance: 1
Hwang, Jeong-Hyon, Ugur Cetintemel, and Stan Zdonik. "Fast and reliable stream processing over wide area networks." 2007 IEEE 23rd International
Conference on Data Engineering Workshop. IEEE, 2007.
Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient
stream join." IEEE Transactions on Big Data (2016).
Pros/Cons of total-ordering
• Cons:
• Expensive (computation- and latency-wise)
• An “overkill” for certain applications (more of this in the following
slides)
• Pros:
• Determinism
• Synchronization
• Eager purging of stale state
65
Synchronization
OP1
Source
Source
OP2
OP2
OP4 OP6OP3
OP5
Sink
OP3
OP3
OP5
OP3
OP3
OP3
Replicas
OP4 OP6
OP5
Sink
OP5
Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international
conference on Management of data. 2005. 66
Synchronization
t1
t2
R S
WR
t3
t4
R S
t4
R S
t4
R S
t4
t1
WR
t2
WR WR
t3
Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join."
IEEE Transactions on Big Data (2016).
67
Synchronization
68
t1 t2R
S
t3
t4
t5 t6 t7 t8
t9
• Act as a barrier
• Carry the reconfiguration to be applied
• Change the parallelism degree
• Change other configuration
t3 t4 t5 t6… …
: Wait for , make sure
are ready and stop
: Wait for , make sure
are ready and stop
Najdataei, Hannaneh, et al. "Stretch: Scalable and elastic deterministic streaming analysis with virtual
shared-nothing parallelism." Proceedings of the 13th ACM International Conference on Distributed and Event-
based Systems. 2019.
69
Agenda
• Motivation, preliminaries and examples about
data streaming and Stream Processing Engines
• Causes of out-of-order data and solutions enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the watermarks
Total-Ordering Recap
70
Operator
ts: 7
0 10 20 30 40
ts: 18
ts: 16ts: 35
ts: 3ts: 24ts: 25
ts: 3ts: 7ts: 16
…Merge
Sort
ts: 3 ts: 7 ts: 16
ts: 24 ts: 25 ts: 35
ts: 18✓ Correct window assignment
✓ Order in each window
Relaxing Correctness
71
Correctness
Performance
Flexibility
✓ In-order input streams
✓ Order in each window
✓ Correct window assignment
✗ In-order input streams
✓ Order in each window
✓ Correct window assignment
Buffering Approaches
72
Operator
ts: 25 ts: 3 ts: 7ts: 16ts: 2
Buffer
ts: 3ts: 7ts: 16ts: 25
Sort
Relaxing Correctness
73
Correctness
Performance
Flexibility
✓ In-order input streams
✓ Order in each window
✓ Correct window assignment
✗ In-order input streams
✓ Order in each window
✓ Correct window assignment
✗ In-order input streams
✗ Order in each window
✓ Correct window assignment
Disorder + Correct Window Assignment
1. When to create each window?
2. When to close each window (and produce result)?
74
0 10 20
ts: 7ts: 16 ts: 18ts: 3ts: 21
Creating Windows
75
0 10 20
ts: 7 ts: 18ts: 9
Closing Windows
76
0 10 20
ts: 7ts: 18 ts: 9ts: 21ts: 17 CLOSED
Operator needs some guarantee
it will not receive tuples with ts < W
Safely close all windows where
right_boundary ≤ W
Watermarks
77
Assume we can compute a monotonic function
𝐹: 𝑂 → 𝐸
that returns W ∈ E the earliest event time of any tuple
that can arrive at operator O in the future
0 10 20
ts: 7ts: 18 ts: 9ts: 21ts: 17
We call the value of this monotonic function F* the (low) watermark of operator O!
 Monotonicity ➔ no tuples with ts < W will arrive in the future.
 Solves problem of safely closing windows!
CLOSED
* The watermark of operator O is function of O and all its upstream peers, but we omit the latter for brevity.
W W
Watermarks in Practice
 Watermarks are generated at the sources.
 They (conceptually) flow through the pipeline.
 They propagate regardless of data filtering ➔ all operators have up-to-date view of time
78
S
O1
02 O3
WS
WO2
WO1
WO3
WO1, OUT
WO2, OUT
WO3, OUT
Input Watermark: Earliest ts that O can receive.
Output Watermark: Earliest ts that O can emit.
Input & Output Timestamps
79
Stateless
tsIN tsOUT = tsIN
Stateful
tsIN tsOUT ≠ tsIN
tsIN
2 tsOUT
tsOUT tsOUTtsOUT
2. How is the timestamp set for window results?
State Event Time
tsIN
1
1.Which windows are complete?
Computing Watermarks
80
O
W1
W2
W3
WO,OUT
WO,IN
WO,IN = min W1, W2, W3
0 10 30 40 5020
WO,IN
CLOSED
WO,OUT
1
WO,OUT
2
WO,OUT =
WO,IN if O stateless
g stateO, semanticsO otherwise
tsout
1 = 40
tsout
2 = 20
81
Flink Example
Map Aggregate Filter Aggregate
Join0
Join1
Join2
Join3
P
1. Result Correctness from correct window assignment
0 10 20
t
s
:
7
t
s
:
1
8
t
s
:
9
CLOSED
W W
2. Watermark propagation
S
O
1
0
2
O
3
WS
WO2
WO1
WO3
WO1, OUT
WO2, OUT
WO3, OUT
output watermark
P
Generating Watermarks
• Perfect Watermarks
• Sorted data or very predictable data sources.
• Determinism without sorting for order-independent window functions.
• Disorder inside windows is still possible!
• Sorting possible if needed, but not imposed.
• Heuristic Watermarks
• When impossible to perfectly predict data (e.g., distributed sources).
• Best-effort prediction of event-time progress.
• Possibility for late data.
• More knowledge about internals of sources → less mispredictions (late data).
82
Fast and Slow Watermarks
83
Window Complete
Perfect Watermark
Event Time
Window Complete
Slow Watermark
✗ Performance (Latency) Window Complete
Fast Watermark
✗ Correctness
Window Lifetime
Triggering
84
0h 12h 36h24h
Latency of 24 hours!
Can we do better?
Emit (partial) result every 1h
Completeness Trigger
Emit result when window is complete
Repeated Update Trigger
Periodically emit (partial) result
(e.g., for every tuple, every hour, etc)
Correctness
Performance
Flexibility
Repeated Update Trigger Results
85
Repeated Update Trigger ➔ Every 1h
0h 12h 24h
Discarding
0h 12h 24h
Accummulating
0h 12h 24h
Accummulating +
Retracting
Putting It All Together
Window Complete
Perfect Watermark
Event Time
Window Complete
Slow Watermark
✗ Performance (Latency) Window Complete
Fast Watermark
✗ Correctness
Window Lifetime
Repeated Trigger
Early results
✓ Performance (Latency) Repeated Trigger
Late results
✓ Correctness
Watermark
Trigger
On-Time result
86
… the light at the end of the tunnel …
87
• Motivation, preliminaries and examples
about data streaming and Stream
Processing Engines
• Causes of out-of-order data and solutions
enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the
watermarks
To summarize:
Event time advances based
on:
88
Tuples themselves
Watermarks
Cons Pros
• Determinism, for order
sensitive / insensitive
functions
• Costly merge-sorting
• Coupled processing / output
of tuples
• Decoupled processing /
output of tuples
• Propagation of time
passing even in the absence
of tuples
• Would require special support
for order-sensitive functions
• Latency depends on
frequency of watermarks
Tutorial:
The Role of Event-Time Order
in Data Streaming Analysis
VincenzoGulisano
Chalmers University ofTechnology
Gothenburg, Sweden
vincenzo.gulisano@chalmers.se
Dimitris Palyvos-Giannas
Chalmers University ofTechnology
Gothenburg, Sweden
palyvos@chalmers.se
Bastian Havers
Chalmers University ofTechnology &Volvo Cars
Gothenburg, Sweden
havers@chalmers.se
Marina Papatriantafilou
Chalmers University ofTechnology
Gothenburg, Sweden
ptrianta@chalmers.se

Tutorial: The Role of Event-Time Analysis Order in Data Streaming

  • 1.
    Tutorial: The Role ofEvent-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se
  • 2.
  • 3.
    Agenda • Motivation, preliminariesand examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks 3 https://github.com/ vincenzo- gulisano/debs2020_tutorial_event_time
  • 4.
    Agenda • Motivation, preliminariesand examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks 4
  • 5.
    Terabyte Scale 1012 bytes 1984 ZetabyteScale 109 Terabytes The era of Big Data 5
  • 6.
    Where does BigData Originate? 1 trillion sensors by 2030 in the Internet of Things (IoT) uploaded toYouTube every minute 2.32 billion Facebook users 219 billion photos uploaded to Facebook 1 online interaction every 18 seconds by 2025 6 500 hours of video
  • 7.
    Main Memory Database ManagementSystems (DBMSs) vs. Stream Processing Engines (SPEs) 7 Disk 1 Data Query Processing 3 Query results 2 Query Main Memory Query Processing Continuous QueryData Query results
  • 8.
  • 9.
    Flink-related images /code snippets in the following are taken from: https://flink.apache.org/ 9
  • 10.
    Data Stream: unbounded sequenceof tuples sharing the same schema Example: vehicles’ speed reports 10 time Field Field vehicle id text time (secs) text speed (Km/h) double X coordinate double Y coordinate double A 8:00 55.5 X1 Y1 Let’s assume each source (e.g., vehicle) produces and delivers a timestamp-sorted stream A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2
  • 11.
    Continuous query (orsimply query): Directed Acyclic Graph (DAG) of streams and operators OP OP OP OP OP OP OP source op (1+ out streams) sink op (1+ in streams) stream op (1+ in, 1+ out streams) 11
  • 12.
    Data Streaming Operators Twomain types: • Stateless operators • do not maintain any state • one-by-one processing • if they maintain some state, such state does not evolve depending on the tuples being processed • Stateful operators • maintain a state that evolves depending on the tuples being processed • produce output tuples that depend on multiple input tuples 12 OP OP
  • 13.
    Stateless Operators Filter /route tuples based on one (or more) conditions 13 Filter ...Map Transform each tuple
  • 14.
  • 15.
    Stateful Operators Aggregate informationfrom multiple tuples (e.g., max, min, sum, ...) 15 Join tuples coming from 2 streams given a certain predicate Aggregate Join
  • 16.
    Windows and StatefulAnalysis Stateful operations are done over windows: • Time-based (e.g., tuples in the last 10 minutes) • Tuple-based (e.g., given the last 50 tuples) 16 time [8:00,9:00) [8:20,9:20) [8:40,9:40) Example of time-based window of size 1 hour and advance 20 minutes  How many tuple in a window?  Which time period does a window span?
  • 17.
    Time-based sliding windowaggregation (count) 17 Counter: 4 time [8:00,9:00) 8:05 8:15 8:22 8:45 9:05 Output: 4 Counter: 1 Counter: 2 Counter: 3 Counter: 3 time 8:05 8:15 8:22 8:45 9:05 [8:20,9:20)
  • 18.
    Time-based sliding windowjoining 18 t1 t2 t3 t4 t1 t2 t3 t4 R S Sliding window Window sizeWS WSWR Predicate P
  • 19.
  • 20.
    Basic operators anduser-defined operators 20 Besides a set of basic operators, SPEs usually allow the user to define ad-hoc operators (e.g., when existing aggregation are not enough)
  • 21.
    SPEs and operators'variants • Each SPE might define its own variants of certain streaming operators: 21 t1 t2 t3 t4 t1 t2 t3 t4 R S Sliding window Window sizeWS WSWR Predicate P
  • 22.
    Sample Query "every fiveminutes, of all vehicles that braked significantly, find the one that braked the hardest" 22 time A 8:00 55.5 X1 Y1 ... B 8:07 34.3 X3 Y3 ... B 8:03 70.3 X2 Y2 ...
  • 23.
    Sample Query Remove unused fields Map Field vehicleid time (secs) speed (Km/h) X coordinate Y coordinate ... Field vehicle id time (secs) speed (Km/h) Every minute, compute average speed of each vehicle during the last 2.5 minutes Aggregate Field vehicle id time (secs) avg speed (Km/h) Join High average speed and slow current speed? Filter Field vehicle id time (secs) braking factor Join on vehicle id in last minute Field vehicle id time (secs) avg speed (Km/h) speed (Km/h) Aggregate Every 5 minutes, produce vehicle that braked the hardest during last 5 minutes 23
  • 24.
    Agenda • Motivation, preliminariesand examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks 24
  • 25.
    From an abstractquery … to streaming application run by an SPE 25 OP1 OP2 OP4 OP6OP3 OP5Source Sink OP1 OP2 OP4 OP6OP3 OP5Source Sink Source OP2 OP3 OP3 OP5
  • 26.
    OP1 OP2 OP4 OP6OP3 OP5 Source Sink Source OP2 OP3 OP3 OP5 NodeNode NodeNode Node Process Processes Process Process Process Process From an abstract query … to streaming application run by an SPE 26
  • 27.
    27 From an abstractquery …to streaming application run by Flink - 1/3
  • 28.
    28 From an abstractquery …to streaming application run by Flink - 1/3
  • 29.
    29 From an abstractquery …to streaming application run by Flink - 1/3
  • 30.
    Causes of out-of-orderdata: 30 Sources themselves Asynchronous Distributed Parallel executions
  • 31.
    Data sources thatproduce out-of-order data • Discussed in many related works (e.g., Babu, Shivnath, Utkarsh Srivastava, and JenniferWidom. "Exploiting k-constraints to reduce memory overhead in continuous queries over data streams." ACM Transactions on Database Systems (TODS) 29.3 (2004): 545-580.) • Battery-operated devices, unreliable wireless networks, … 1 trillion sensors by 2030 in the Internet of Things (IoT) 31
  • 32.
    Causes of out-of-orderdata 32 Sources themselves Asynchronous Distributed Parallel executions
  • 33.
    The 3-step procedure (sequentialstream join) 33 For each incoming tuple t: 1. compare t with all tuples in opposite window given predicate P 2. add t to its window 3. remove stale tuples from t’s window Add tuplesto S Add tuples to R Prod R Prod S Consume resultsConsPU We assume each producer delivers tuples in timestamp order
  • 34.
    The 3-step procedure,is it enough? 34 t1 t2 t1 t2 R S WSWR t3 t1 t2 t1 t2 R S WSWR t4 t3
  • 35.
    Causes of out-of-orderdata 35 Asynchronous Distributed Parallel executions Any operator fed data from multiple logical / physical stream can potentially observe out- of-order data
  • 36.
    Parallel execution • Generalapproach 36 OPA OPB
  • 37.
    Parallel execution • Generalapproach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPA RM Thread 1 Thread 2 Thread 3 OPA RM Thread 1 Thread 2 OPA RM Thread 1 Thread 2 … 37
  • 38.
    Parallel execution • Generalapproach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPA RM Thread 1 Thread 2 Thread 3 OPA RM Thread 1 Thread 2 OPA RM Thread 1 Thread 2 … 38
  • 39.
    Parallel execution • Generalapproach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPB RM Thread 1 OPB RM Thread n … 39
  • 40.
    Parallel execution • Generalapproach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPB RM Thread 1 OPB RM Thread n … 40
  • 41.
    Parallel execution • Statefuloperators: Semantic awareness • Aggregate: count within last hour, group-by vehicle id 41 Previous Subcluster R … R … M Agg1 M Agg2 M Agg3 … … … Vehicle A
  • 42.
    Parallel execution • Dependingon the stateful operator semantic: • Partition input stream into keys • Each key is processed by 1 thread • # keys >> # threads/nodes 42
  • 43.
    Parallel execution • Dependingon the stateful operator semantic: • Partition input stream into keys • Each key is processed by 1 thread • # keys >> # threads/nodes 43 Keys domain Agg1 Agg2 Agg3 A D E B C F
  • 44.
    Parallel execution • Dependingon the stateful operator semantics: • Partition input stream into keys • Each key is processed by 1 thread • # keys >> # threads/nodes 44 Keys domain Agg1 Agg2 Agg3 A D E B C F
  • 45.
  • 46.
    A 8:00 55.5X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Parallel execution • Stateful operators: Semantic awareness • Aggregate: count within last hour, group-by vehicle id 46 … R … R M Agg1 M Agg2 M Agg3 … … … Map Map Vehicle A Round-robin (stateless)
  • 47.
    Parallel execution • Statefuloperators: Semantic awareness • Aggregate: count within last hour, group-by vehicle id 47 R… R… M Agg1 M Agg2 M Agg3 … … … Vehicle A Map Map Round-robin (stateless) A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2
  • 48.
    Inherent disorder 48 Map AggregateJoin Filter Aggregate Print-to-file operator P Map0 Map1 Map2 Map3 P Aggregate Filter AggregateJoin Disorder from parallelism
  • 49.
    how to mergeseveral timestamp-sorted streams... 49 M ... ...into one timestamp-sorted stream?
  • 50.
    Gulisano, Vincenzo, etal. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and Distributed Systems 23.12 (2012): 2351-2365. 50
  • 51.
    Balazinska, Magdalena, etal. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005. 51
  • 52.
    Gulisano, Vincenzo, etal. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big Data (2016). 52
  • 53.
    Which one tochoose? • Different options have different costs • might require different types of data structures • might require “special tuples” (more on this in the following part of the tutorial) • When the price of merge-sorting is paid by who provides the tuples rather than who receives them, the system can scale better 53
  • 54.
    Tutorial: The Role ofEvent-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se
  • 55.
    Tutorial: The Role ofEvent-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se
  • 56.
    56 Agenda • Motivation, preliminariesand examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  • 57.
    57 Agenda • Motivation, preliminariesand examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  • 58.
    Pros/Cons of total-ordering •Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 58
  • 59.
    Pros/Cons of total-ordering •Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 59
  • 60.
    Cost • We needto temporary maintain tuples • Linear in the number of tuples we receive, which depends on the streams’ rate • We need to sort tuples… O(n log(n)) • (n is number of sources or tuples, depending on the case) • We need data from all sources, the processing latency depends on the slowest source. • The latency overhead can be estimated based on the sources’ rates1 1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems. 2017. 60
  • 61.
    Estimating the latencyoverhead 1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems. 2017. 61
  • 62.
    Pros/Cons of total-ordering •Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 62
  • 63.
    Determinism 63 OP1 OP2 OP4OP6OP3 OP5Source Sink OP1 OP2 OP4 OP6OP3 OP5Source Sink Source OP2 OP3 OP3 OP5 Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005. Gulisano, Vincenzo. StreamCloud: an elastic parallel-distributed stream processing engine. Diss. 2012.
  • 64.
    Determinism 64 t1 t2 S1 S2 t3 t4 t5 t6t7 t8 t9 t1 t2 t3 t4 t5 t6 t7 t8 t9 … Tuple-based window, size: 4 / advance: 1 Hwang, Jeong-Hyon, Ugur Cetintemel, and Stan Zdonik. "Fast and reliable stream processing over wide area networks." 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 2007. Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big Data (2016).
  • 65.
    Pros/Cons of total-ordering •Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 65
  • 66.
    Synchronization OP1 Source Source OP2 OP2 OP4 OP6OP3 OP5 Sink OP3 OP3 OP5 OP3 OP3 OP3 Replicas OP4 OP6 OP5 Sink OP5 Balazinska,Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005. 66
  • 67.
    Synchronization t1 t2 R S WR t3 t4 R S t4 RS t4 R S t4 t1 WR t2 WR WR t3 Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big Data (2016). 67
  • 68.
    Synchronization 68 t1 t2R S t3 t4 t5 t6t7 t8 t9 • Act as a barrier • Carry the reconfiguration to be applied • Change the parallelism degree • Change other configuration t3 t4 t5 t6… … : Wait for , make sure are ready and stop : Wait for , make sure are ready and stop Najdataei, Hannaneh, et al. "Stretch: Scalable and elastic deterministic streaming analysis with virtual shared-nothing parallelism." Proceedings of the 13th ACM International Conference on Distributed and Event- based Systems. 2019.
  • 69.
    69 Agenda • Motivation, preliminariesand examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  • 70.
    Total-Ordering Recap 70 Operator ts: 7 010 20 30 40 ts: 18 ts: 16ts: 35 ts: 3ts: 24ts: 25 ts: 3ts: 7ts: 16 …Merge Sort ts: 3 ts: 7 ts: 16 ts: 24 ts: 25 ts: 35 ts: 18✓ Correct window assignment ✓ Order in each window
  • 71.
    Relaxing Correctness 71 Correctness Performance Flexibility ✓ In-orderinput streams ✓ Order in each window ✓ Correct window assignment ✗ In-order input streams ✓ Order in each window ✓ Correct window assignment
  • 72.
    Buffering Approaches 72 Operator ts: 25ts: 3 ts: 7ts: 16ts: 2 Buffer ts: 3ts: 7ts: 16ts: 25 Sort
  • 73.
    Relaxing Correctness 73 Correctness Performance Flexibility ✓ In-orderinput streams ✓ Order in each window ✓ Correct window assignment ✗ In-order input streams ✓ Order in each window ✓ Correct window assignment ✗ In-order input streams ✗ Order in each window ✓ Correct window assignment
  • 74.
    Disorder + CorrectWindow Assignment 1. When to create each window? 2. When to close each window (and produce result)? 74 0 10 20 ts: 7ts: 16 ts: 18ts: 3ts: 21
  • 75.
    Creating Windows 75 0 1020 ts: 7 ts: 18ts: 9
  • 76.
    Closing Windows 76 0 1020 ts: 7ts: 18 ts: 9ts: 21ts: 17 CLOSED Operator needs some guarantee it will not receive tuples with ts < W Safely close all windows where right_boundary ≤ W
  • 77.
    Watermarks 77 Assume we cancompute a monotonic function 𝐹: 𝑂 → 𝐸 that returns W ∈ E the earliest event time of any tuple that can arrive at operator O in the future 0 10 20 ts: 7ts: 18 ts: 9ts: 21ts: 17 We call the value of this monotonic function F* the (low) watermark of operator O!  Monotonicity ➔ no tuples with ts < W will arrive in the future.  Solves problem of safely closing windows! CLOSED * The watermark of operator O is function of O and all its upstream peers, but we omit the latter for brevity. W W
  • 78.
    Watermarks in Practice Watermarks are generated at the sources.  They (conceptually) flow through the pipeline.  They propagate regardless of data filtering ➔ all operators have up-to-date view of time 78 S O1 02 O3 WS WO2 WO1 WO3 WO1, OUT WO2, OUT WO3, OUT Input Watermark: Earliest ts that O can receive. Output Watermark: Earliest ts that O can emit.
  • 79.
    Input & OutputTimestamps 79 Stateless tsIN tsOUT = tsIN Stateful tsIN tsOUT ≠ tsIN tsIN 2 tsOUT tsOUT tsOUTtsOUT 2. How is the timestamp set for window results? State Event Time tsIN 1 1.Which windows are complete?
  • 80.
    Computing Watermarks 80 O W1 W2 W3 WO,OUT WO,IN WO,IN =min W1, W2, W3 0 10 30 40 5020 WO,IN CLOSED WO,OUT 1 WO,OUT 2 WO,OUT = WO,IN if O stateless g stateO, semanticsO otherwise tsout 1 = 40 tsout 2 = 20
  • 81.
    81 Flink Example Map AggregateFilter Aggregate Join0 Join1 Join2 Join3 P 1. Result Correctness from correct window assignment 0 10 20 t s : 7 t s : 1 8 t s : 9 CLOSED W W 2. Watermark propagation S O 1 0 2 O 3 WS WO2 WO1 WO3 WO1, OUT WO2, OUT WO3, OUT output watermark P
  • 82.
    Generating Watermarks • PerfectWatermarks • Sorted data or very predictable data sources. • Determinism without sorting for order-independent window functions. • Disorder inside windows is still possible! • Sorting possible if needed, but not imposed. • Heuristic Watermarks • When impossible to perfectly predict data (e.g., distributed sources). • Best-effort prediction of event-time progress. • Possibility for late data. • More knowledge about internals of sources → less mispredictions (late data). 82
  • 83.
    Fast and SlowWatermarks 83 Window Complete Perfect Watermark Event Time Window Complete Slow Watermark ✗ Performance (Latency) Window Complete Fast Watermark ✗ Correctness Window Lifetime
  • 84.
    Triggering 84 0h 12h 36h24h Latencyof 24 hours! Can we do better? Emit (partial) result every 1h Completeness Trigger Emit result when window is complete Repeated Update Trigger Periodically emit (partial) result (e.g., for every tuple, every hour, etc) Correctness Performance Flexibility
  • 85.
    Repeated Update TriggerResults 85 Repeated Update Trigger ➔ Every 1h 0h 12h 24h Discarding 0h 12h 24h Accummulating 0h 12h 24h Accummulating + Retracting
  • 86.
    Putting It AllTogether Window Complete Perfect Watermark Event Time Window Complete Slow Watermark ✗ Performance (Latency) Window Complete Fast Watermark ✗ Correctness Window Lifetime Repeated Trigger Early results ✓ Performance (Latency) Repeated Trigger Late results ✓ Correctness Watermark Trigger On-Time result 86
  • 87.
    … the lightat the end of the tunnel … 87 • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  • 88.
    To summarize: Event timeadvances based on: 88 Tuples themselves Watermarks Cons Pros • Determinism, for order sensitive / insensitive functions • Costly merge-sorting • Coupled processing / output of tuples • Decoupled processing / output of tuples • Propagation of time passing even in the absence of tuples • Would require special support for order-sensitive functions • Latency depends on frequency of watermarks
  • 89.
    Tutorial: The Role ofEvent-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se