Tutorial: The Role of Event-Time Analysis Order in Data Streaming

Tutorial:
The Role of Event-Time Order
in Data Streaming Analysis
VincenzoGulisano
Chalmers University ofTechnology
Gothenburg, Sweden
vincenzo.gulisano@chalmers.se
Dimitris Palyvos-Giannas
Gothenburg, Sweden
palyvos@chalmers.se
Bastian Havers
Chalmers University ofTechnology &Volvo Cars
Gothenburg, Sweden
havers@chalmers.se
Marina Papatriantafilou
Gothenburg, Sweden
ptrianta@chalmers.se

Agenda
• Motivation, preliminaries and examples about
data streaming and Stream Processing Engines
• Causes of out-of-order data and solutions enforcing total-ordering
• Pros/Cons of total-ordering
• Relaxation of total-ordering and the watermarks
3
https://github.com/ vincenzogulisano/debs2020_tutorial_event_time

Agenda
4

Terabyte Scale
1012 bytes
1984
Zetabyte Scale
109 Terabytes
The era of Big Data
5

Where does Big Data Originate?
1 trillion sensors
by 2030
in the Internet of Things (IoT)
uploaded toYouTube
every minute
2.32 billion
Facebook users
219 billion photos
uploaded to Facebook
1 online interaction
every 18 seconds
by 2025
6
500 hours of video

Main Memory
Database Management Systems (DBMSs) vs.
Stream Processing Engines (SPEs)
7
Disk
1 Data
Query Processing
3 Query
results
2 Query
Main Memory
Query Processing
Continuous
QueryData
Query
results

8
time
Borealis
The Aurora Project
STanfordstREamdatAManager
NiagaraCQ
COUGAR
StreamCloud

Flink-related images / code snippets in the following are taken from: https://flink.apache.org/ 9

Data Stream:
unbounded sequence of tuples
sharing the same schema
Example: vehicles’ speed reports
10
time
Field Field
vehicle id text
time (secs) text
speed (Km/h) double
X coordinate double
Y coordinate double
A 8:00 55.5 X1 Y1
Let’s assume each source (e.g.,
vehicle)
produces and delivers
a timestamp-sorted stream
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2

Continuous query (or simply query):
Directed Acyclic Graph (DAG) of streams and
operators
OP
OP
OP
OP OP
OP OP
source op
(1+ out streams)
sink op
(1+ in streams)
stream
op
(1+ in, 1+ out streams)
11

Data Streaming Operators
Two main types:
• Stateless operators
• do not maintain any state
• one-by-one processing
• if they maintain some state, such state does not evolve depending on the tuples being
processed
• Stateful operators
• maintain a state that evolves depending on the tuples being processed
• produce output tuples that depend on multiple input tuples
12
OP
OP

Stateless Operators
Filter / route tuples based on one (or more) conditions
13
Filter
...Map Transform each tuple

Stateful Operators
Aggregate information from multiple
tuples (e.g., max, min, sum, ...)
15
Join tuples coming from 2 streams given a
certain predicate
Aggregate
Join

Windows and Stateful Analysis
Stateful operations are done over windows:
• Time-based (e.g., tuples in the last 10 minutes)
• Tuple-based (e.g., given the last 50 tuples)
16
time
[8:00,9:00)
[8:20,9:20)
[8:40,9:40)
Example of time-based window of size 1 hour and advance 20 minutes
 How many tuple in a window?
 Which time period does a window span?

Time-based sliding window aggregation
(count)
17
Counter: 4
time
[8:00,9:00)
8:05 8:15 8:22 8:45 9:05
Output: 4
Counter: 1
Counter: 2
Counter: 3
Counter: 3
time
8:05 8:15 8:22 8:45 9:05
[8:20,9:20)

Time-based sliding window joining
18
t1
t2
t3
t4
t1
t2
t3
t4
R S
Sliding
window Window
sizeWS
WSWR
Predicate P

Windows and stateful analysis
19

Basic operators and user-defined operators
20
Besides a set of basic operators, SPEs usually
allow the user to define ad-hoc operators
(e.g., when existing aggregation are not enough)

SPEs and operators' variants
• Each SPE might define its own variants
of certain streaming operators:
21
t1
t2
t3
t4
t1
t2
t3
t4
R S
Sliding
window Window
sizeWS
WSWR
Predicate P

Sample Query
"every five minutes, of all vehicles that braked significantly, find the
one that braked the hardest"
22
time
A 8:00 55.5 X1 Y1 ... B 8:07 34.3 X3 Y3 ...
B 8:03 70.3 X2 Y2 ...

Sample Query
Remove
unused fields
Map
Field
vehicle id
time (secs)
speed (Km/h)
X coordinate
Y coordinate
...
Field
vehicle id
time (secs)
speed (Km/h)
Every minute,
compute average
speed of each
vehicle during the
last 2.5 minutes
Aggregate
Field
vehicle id
time (secs)
avg speed (Km/h)
Join
High average
speed and slow
current speed?
Filter
Field
vehicle id
time (secs)
braking factor
Join on
vehicle id in
last minute
Field
vehicle id
time (secs)
avg speed (Km/h)
speed (Km/h)
Aggregate
Every 5 minutes,
produce vehicle that
braked the hardest
during last 5
minutes
23

Agenda
24

From an abstract query
… to streaming application run by an SPE
25
OP1 OP2 OP4 OP6OP3 OP5Source Sink
OP1
OP2
OP4 OP6OP3
OP5Source
Sink
Source OP2
OP3
OP3
OP5

OP1
OP2
OP4 OP6OP3
OP5
Source
Sink
Source OP2
OP3
OP3
OP5
Node Node NodeNode Node
Process
Processes
Process
Process
Process
Process
… to streaming application run by an SPE
26

27
…to streaming application run by Flink - 1/3

28

29

Causes of out-of-order data:
30
Sources
themselves
Asynchronous
Distributed
Parallel
executions

Data sources that produce out-of-order data
• Discussed in many related works (e.g., Babu, Shivnath, Utkarsh Srivastava, and JenniferWidom.
"Exploiting k-constraints to reduce memory overhead in continuous queries over data streams." ACM
Transactions on Database Systems (TODS) 29.3 (2004): 545-580.)
• Battery-operated devices, unreliable wireless networks, …
1 trillion sensors
by 2030
in the Internet of Things (IoT)
31

Causes of out-of-order data
32
Sources
themselves
Asynchronous
Distributed
Parallel
executions

The 3-step procedure
(sequential stream join)
33
For each incoming tuple t:
1. compare t with all tuples in opposite window given predicate P
2. add t to its window
3. remove stale tuples from t’s window
Add tuplesto S
Add tuples to R
Prod
R
Prod
S
Consume resultsConsPU
We assume each producer
delivers tuples in timestamp
order

The 3-step procedure, is it enough?
34
t1
t2
t1
t2
R S
WSWR
t3
t1
t2
t1
t2
R S
WSWR
t4
t3

Causes of out-of-order data
35
Asynchronous
Distributed
Parallel
executions
Any operator fed data from multiple logical /
physical stream can potentially observe out-
of-order data

Parallel execution
• General approach
36
OPA OPB

Parallel execution
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPA RM
Thread 1 Thread 2 Thread 3
OPA RM
Thread 1 Thread 2
OPA RM
Thread 1 Thread 2
…
37

Parallel execution
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPA RM
Thread 1 Thread 2 Thread 3
OPA RM
Thread 1 Thread 2
OPA RM
Thread 1 Thread 2
…
38

Parallel execution
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPB RM
Thread 1
OPB RM
Thread n
…
39

Parallel execution
R: Router
M: Merger
OPA OPB
OPA RM
Thread 1
OPA RM
Thread m
…
OPB RM
Thread 1
OPB RM
Thread n
…
40

Parallel execution
• Stateful operators: Semantic awareness
• Aggregate: count within last hour, group-by vehicle id
41
Previous Subcluster
R
…
R
…
M Agg1
M Agg2
M Agg3
…
…
…
Vehicle A

Parallel execution
• Depending on the stateful operator semantic:
• Partition input stream into keys
• Each key is processed by 1 thread
• # keys >> # threads/nodes
42

Parallel execution
• Depending on the stateful operator semantic:
43
Keys
domain
Agg1 Agg2 Agg3
A
D
E
B
C F

Parallel execution
• Depending on the stateful operator semantics:
44
Keys
domain
Agg1 Agg2 Agg3
A
D
E
B
C F

A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Parallel execution
46
… R
… R
M Agg1
M Agg2
M Agg3
…
…
…
Map
Map
Vehicle A
Round-robin
(stateless)

Parallel execution
47
R…
R…
M Agg1
M Agg2
M Agg3
…
…
…
Vehicle A
Map
Map
Round-robin
(stateless)
A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2

Inherent disorder
48
Map Aggregate Join Filter Aggregate
Print-to-file operator
P
Map0
Map1
Map2
Map3
P Aggregate Filter AggregateJoin
Disorder from parallelism

how to merge several timestamp-sorted streams...
49
M
...
...into one timestamp-sorted stream?

Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and
Distributed Systems 23.12 (2012): 2351-2365. 50

Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the
2005 ACM SIGMOD international conference on Management of data. 2005. 51

Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big
Data (2016).
52

Which one to choose?
• Different options have different costs
• might require different types of data structures
• might require “special tuples” (more on this in the following part of
the tutorial)
• When the price of merge-sorting is paid by who provides the
tuples rather than who receives them, the system can scale better
53

56
Agenda

57
Agenda

Pros/Cons of total-ordering
• Cons:
• Expensive (computation- and latency-wise)
• An “overkill” for certain applications (more of this in the following
slides)
• Pros:
• Determinism
• Synchronization
• Eager purging of stale state
58

• Cons:
slides)
• Pros:
• Determinism
• Synchronization
59

Cost
• We need to temporary maintain tuples
• Linear in the number of tuples we receive,
which depends on the streams’ rate
• We need to sort tuples… O(n log(n))
• (n is number of sources or tuples, depending on the case)
• We need data from all sources, the processing latency depends on the slowest source.
• The latency overhead can be estimated based on the sources’ rates1
1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International Conference on
Distributed and Event-based Systems. 2017. 60

Estimating the latency overhead
1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International
Conference on Distributed and Event-based Systems. 2017. 61

• Cons:
slides)
• Pros:
• Determinism
• Synchronization
62

Determinism
63
OP1 OP2 OP4 OP6OP3 OP5Source Sink
OP1
OP2
OP4 OP6OP3
OP5Source
Sink
Source OP2
OP3
OP3
OP5
Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international
conference on Management of data. 2005.
Gulisano, Vincenzo. StreamCloud: an elastic parallel-distributed stream processing engine. Diss. 2012.

Determinism
64
t1 t2
S1
S2
t3
t4
t5 t6 t7 t8
t9
t1 t2 t3 t4 t5 t6 t7 t8 t9
…
Tuple-based window, size: 4 / advance: 1
Hwang, Jeong-Hyon, Ugur Cetintemel, and Stan Zdonik. "Fast and reliable stream processing over wide area networks." 2007 IEEE 23rd International
Conference on Data Engineering Workshop. IEEE, 2007.
Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient
stream join." IEEE Transactions on Big Data (2016).

• Cons:
slides)
• Pros:
• Determinism
• Synchronization
65

Synchronization
OP1
Source
Source
OP2
OP2
OP4 OP6OP3
OP5
Sink
OP3
OP3
OP5
OP3
OP3
OP3
Replicas
OP4 OP6
OP5
Sink
OP5
Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international
conference on Management of data. 2005. 66

Synchronization
t1
t2
R S
WR
t3
t4
R S
t4
R S
t4
R S
t4
t1
WR
t2
WR WR
t3
Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join."
IEEE Transactions on Big Data (2016).
67

Synchronization
68
t1 t2R
S
t3
t4
t5 t6 t7 t8
t9
• Act as a barrier
• Carry the reconfiguration to be applied
• Change the parallelism degree
• Change other configuration
t3 t4 t5 t6… …
: Wait for , make sure
are ready and stop
: Wait for , make sure
are ready and stop
Najdataei, Hannaneh, et al. "Stretch: Scalable and elastic deterministic streaming analysis with virtual
shared-nothing parallelism." Proceedings of the 13th ACM International Conference on Distributed and Event-
based Systems. 2019.

69
Agenda

Total-Ordering Recap
70
Operator
ts: 7
0 10 20 30 40
ts: 18
ts: 16ts: 35
ts: 3ts: 24ts: 25
ts: 3ts: 7ts: 16
…Merge
Sort
ts: 3 ts: 7 ts: 16
ts: 24 ts: 25 ts: 35
ts: 18✓ Correct window assignment
✓ Order in each window

Relaxing Correctness
71
Correctness
Performance
Flexibility
✓ In-order input streams
✓ Correct window assignment
✗ In-order input streams

Buffering Approaches
72
Operator
ts: 25 ts: 3 ts: 7ts: 16ts: 2
Buffer
ts: 3ts: 7ts: 16ts: 25
Sort

Relaxing Correctness
73
Correctness
Performance
Flexibility
✓ In-order input streams
✗ Order in each window

Disorder + Correct Window Assignment
1. When to create each window?
2. When to close each window (and produce result)?
74
0 10 20
ts: 7ts: 16 ts: 18ts: 3ts: 21

Creating Windows
75
0 10 20
ts: 7 ts: 18ts: 9

Closing Windows
76
0 10 20
ts: 7ts: 18 ts: 9ts: 21ts: 17 CLOSED
Operator needs some guarantee
it will not receive tuples with ts < W
Safely close all windows where
right_boundary ≤ W

Watermarks
77
Assume we can compute a monotonic function
𝐹: 𝑂 → 𝐸
that returns W ∈ E the earliest event time of any tuple
that can arrive at operator O in the future
0 10 20
ts: 7ts: 18 ts: 9ts: 21ts: 17
We call the value of this monotonic function F* the (low) watermark of operator O!
 Monotonicity ➔ no tuples with ts < W will arrive in the future.
 Solves problem of safely closing windows!
CLOSED
* The watermark of operator O is function of O and all its upstream peers, but we omit the latter for brevity.
W W

Watermarks in Practice
 Watermarks are generated at the sources.
 They (conceptually) flow through the pipeline.
 They propagate regardless of data filtering ➔ all operators have up-to-date view of time
78
S
O1
02 O3
WS
WO2
WO1
WO3
WO1, OUT
WO2, OUT
WO3, OUT
Input Watermark: Earliest ts that O can receive.
Output Watermark: Earliest ts that O can emit.

Input & Output Timestamps
79
Stateless
tsIN tsOUT = tsIN
Stateful
tsIN tsOUT ≠ tsIN
tsIN
2 tsOUT
tsOUT tsOUTtsOUT
2. How is the timestamp set for window results?
State Event Time
tsIN
1
1.Which windows are complete?

Computing Watermarks
80
O
W1
W2
W3
WO,OUT
WO,IN
WO,IN = min W1, W2, W3
0 10 30 40 5020
WO,IN
CLOSED
WO,OUT
1
WO,OUT
2
WO,OUT =
WO,IN if O stateless
g stateO, semanticsO otherwise
tsout
1 = 40
tsout
2 = 20

81
Flink Example
Map Aggregate Filter Aggregate
Join0
Join1
Join2
Join3
P
1. Result Correctness from correct window assignment
0 10 20
t
s
:
7
t
s
:
1
8
t
s
:
9
CLOSED
W W
2. Watermark propagation
S
O
1
0
2
O
3
WS
WO2
WO1
WO3
WO1, OUT
WO2, OUT
WO3, OUT
output watermark
P

Generating Watermarks
• Perfect Watermarks
• Sorted data or very predictable data sources.
• Determinism without sorting for order-independent window functions.
• Disorder inside windows is still possible!
• Sorting possible if needed, but not imposed.
• Heuristic Watermarks
• When impossible to perfectly predict data (e.g., distributed sources).
• Best-effort prediction of event-time progress.
• Possibility for late data.
• More knowledge about internals of sources → less mispredictions (late data).
82

Fast and Slow Watermarks
83
Window Complete
Perfect Watermark
Event Time
Window Complete
Slow Watermark
✗ Performance (Latency) Window Complete
Fast Watermark
✗ Correctness
Window Lifetime

Triggering
84
0h 12h 36h24h
Latency of 24 hours!
Can we do better?
Emit (partial) result every 1h
Completeness Trigger
Emit result when window is complete
Repeated Update Trigger
Periodically emit (partial) result
(e.g., for every tuple, every hour, etc)
Correctness
Performance
Flexibility

Repeated Update Trigger Results
85
Repeated Update Trigger ➔ Every 1h
0h 12h 24h
Discarding
0h 12h 24h
Accummulating
0h 12h 24h
Accummulating +
Retracting

Putting It All Together
Window Complete
Perfect Watermark
Event Time
Window Complete
Slow Watermark
✗ Performance (Latency) Window Complete
Fast Watermark
✗ Correctness
Window Lifetime
Repeated Trigger
Early results
✓ Performance (Latency) Repeated Trigger
Late results
✓ Correctness
Watermark
Trigger
On-Time result
86

… the light at the end of the tunnel …
87
• Motivation, preliminaries and examples
about data streaming and Stream
Processing Engines
• Causes of out-of-order data and solutions
enforcing total-ordering
• Relaxation of total-ordering and the
watermarks

To summarize:
Event time advances based
on:
88
Tuples themselves
Watermarks
Cons Pros
• Determinism, for order
sensitive / insensitive
functions
• Costly merge-sorting
• Coupled processing / output
of tuples
• Decoupled processing /
output of tuples
• Propagation of time
passing even in the absence
of tuples
• Would require special support
for order-sensitive functions
• Latency depends on
frequency of watermarks

Tutorial: The Role of Event-Time Analysis Order in Data Streaming

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tutorial: The Role of Event-Time Analysis Order in Data Streaming

Similar to Tutorial: The Role of Event-Time Analysis Order in Data Streaming (20)

Recently uploaded

Recently uploaded (20)

Tutorial: The Role of Event-Time Analysis Order in Data Streaming