Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous"

Flink Forward Berlin 2018
Stream Join in Flink:
from Discrete to Continuous
Xingcan Cui
Shandong University, China

About Me
2017.02 – First Pull Request to Apache Flink
2018.05 – Flink Committer
(mainly on SQL/Table API)
2018.06 – Ph.D. in CS from Shandong University
(supervised by Prof. Xiaohui Yu)
2018.10 – Postdoc at York University

Outline
1. The Stream Join APIs
• Window Join in DataStream API
• Broadcast State in Low-Level API
• Joins in SQL/Table API
2. Details about the Time-Bounded Join
• Why Time-Bounded
• How to Perform
• What’s More
3. Future Work

Overview of the Stream Join APIs
Low-Level Process Function API
DataStream API
SQL/Table API
Simple Lateral Join
Materialized Table Join
Time-Bounded Join
Window Join/Window CoGroup
Broadcast State
Connected Stream

FlatJoinFunction
void join(
IN1 first,
IN2 second,
Collector<OUT> out);
JoinFunction
OUT join(
IN1 first,
IN2 second);
CoGroupFunction
void coGroup(
Iterable<IN1> first,
Iterable<IN2> second,
Collector<OUT> out);
4S1
S2
11
2 1
S1.join(S2)
.where(<shape1>).equalTo(<shape2>)
.window(<window>)
.apply {new JoinFunction()/FlatJoinFunction()};
S1.coGroup(S2)
.where(<shape1>).equalTo(<shape2>)
.window(<window>)
.apply {new CoGroupFunction()};
2
3
1
2shape =
3shape =
×
×
223
2 1
2 22 2
22
1

DataStream API
SQL/Table API
Broadcast State Pattern
Connected Stream

processElement(...)
processBroadcastElement(...)
KeyedBroadcastProcessFunction
void processElement(
IN1 value,
ReadOnlyContext ctx,
Collector<OUT> out)
void processBroadcastElement(
IN2 value,
Context ctx,
Collector<OUT> out)
Broadcast State
S1
S2
high-throughput
low-throughput
MapStateDescriptor state = new MapStateDescriptor();
BroadcastStream broadcast = S2.broadcast(state);
S1.keyBy(<key selector>);
.connect(broadcast)
.process(new KeyedBroadcastProcessFunction());
MapState
r/w
r
×
processElement(...)
processBroadcastElement(...)
12 12 12
15 4 3 2
12
13
14
15
24
25

DataStream API
SQL/Table API
Simple Lateral Join
Time-Bounded Join
Connected Stream

Simple Lateral Join
S 32
SELECT id, f2
FROM S, LATERAL TABLE(udtf(S.f1)) AS T(f2)
SELECT id, f2
FROM S, UNNEST(S1.f1) AS T(f2)
× ×
T
id f1
3 [a, b, c]
id f1
2 [d, e]
a b cd e

Table
SQL on Streams
15 4 3 2 15 4 3 2 𝑆
𝑡
𝑅
SQL
𝑅′
6

Stream to Table Conversion
• Upsert Stream
• Append Stream queue.append(value)
map.put(key, value)
Unique Key
Map
Queue
map.remove(key)
TIME
Time-Bounded Join
Materialized Stream Join
• Retract Stream list.add(value)List
list.remove(value)

S
T
SELECT S.id, T.name
FROM S, T
WHERE S.id = T.id
State
12
12
3
×
State TTL Config
streamQueryConfig
.withIdleStateRetentionTime(Time.hours(1), Time.hours(2));
11 21 31
12 22 33
1. You know the table size will never exceed the state capacity (have duplications or rows will be retracted in time).
2. The rows will be naturally expired after a period time.
3. You don’t really care about the accuracy of the join result (best-effort approach).

Time-Bounded Join
SELECT S.id, T.time
FROM S, T
WHERE S.id = T.id AND
T.time BETWEEN S.time - INTERVAL '4' HOUR AND S.time
S
T 12 12 12
14 3 2
3
1123
33

Summary of the Stream Join APIs
Name Description
Outer
Support
Non-
Equi
Conti-
nuous
API Level
Launched
Version
Window Join/
Window CoGroup
Cut two streams into chunks and join data in
the same chunk.
-/LRF N N DataStream 1.0
Connected Stream
Connect two streams and jointly process
them with a CoProcessFunction.
ProcessFunction 1.2
Simple Lateral Join
Join a stream with an unnested array or a
user-defined table function.
L Y Y SQL/Table 1.2
Time-Bounded Join
Join two append only streams with a
predicate that bounds the time on both sides.
LRF N Y SQL/Table 1.4
Retract Stream Join
Join two retract streams (with idle state
retention time config).
LRF N Y SQL/Table 1.5
Join a high-throughput stream with a low-
throughput stream.
LRF Y Y ProcessFunction 1.5

Append Stream Join
Append Stream
time id ...
3 A ...
8 S ...
... ... ...
Sequential Scan
Time-Ordered Table
=
Join two time-ordered table with only one sequential scan on both sides.

Stateful
Scanner
Make the Join Feasible
𝐿
𝑅 12
14 3 2
3
A sliding window that slides smoothly.
1112
22
23
33
24
34

The Predicates for Time-Bounded Join
2. A time-bound predicate to keep the qualified records near.
Time
𝑋
𝑌
𝑙 𝑟
SELECT 𝐿. 𝑖𝑑, 𝑅. 𝑡𝑖𝑚𝑒
FROM 𝐿, 𝑅
AND
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − 𝑋 AND 𝐿. 𝑡𝑖𝑚𝑒 + 𝑌
WHERE 𝐿. 𝑖𝑑 = 𝑅. 𝑖𝑑
𝐿. 𝑡𝑖𝑚𝑒 BETWEEN 𝑅. 𝑡𝑖𝑚𝑒 − 𝑌 AND 𝑅. 𝑡𝑖𝑚𝑒 + 𝑋
1. An equi-predicate to make the join operator scalable.
Right
Cache
𝑆𝑖𝑧𝑒 = 𝑋
Left
Cache
𝑆𝑖𝑧𝑒 = 𝑌
𝑋
𝑌

The Connected Stream
KeyedCoProcessFunction
void processElement1(
IN1 value,
Context ctx,
Collector<OUT> out)
void processElement2(
IN2 value,
Context ctx,
Collector<OUT> out)
void onTimer(
long timestamp,
OnTimerContext ctx,
Collector<OUT> out)
record’s time watermark
timer
cache
(state)
leftStream
.connect(rightStream)
.keyBy(<key selector for the equi-predicate>)
.process(new KeyedCoProcessFunction())
Map[Long, List[Row]]
time -> records of that time
left timer
right timer

The Logic for processElement()
update the watermark and
the record’s time
if the right
cache has
qualified rows
start
probe the right cache
to perform join
Yes
if the row
need to be
cached
put the row into the
left cache
No
Yes
end
No
if a right timer
need to be
registered
No
register a right timer
Yes
𝑙
watermark
time to clean

The Logic for onTimer()
update the watermark
if the left
cache still
contains rows
start
register another right
timer
calculate the left
expiration time
end
No
Yes
remove expired rows
from the left cache
watermark
left expiration time
time to clean
𝑙

About the Join Result
1. The result for a time-bounded join is still an append stream.
11122223332434
2. When converting the join result to a DataStream, at most one rowtime field can be reserved.
SELECT ltime, CAST(rtime AS TIMESTAMP), ... FROM ...
ltime lid ...
... ... ...
rtime rid ...
... ... ...
rowtime rowtime

Cache
Holding Back Watermarks
S
T 57
The Next
Operator
watermark
8 5
Subtract a fixed value from each watermark.
late!
3

The Integrity Problem
Table15 4 3 2 15 4 3 2 𝑆
𝑡
𝑅
SQL
𝑅4 𝑅5
𝑅4 could be affected by lateness, thus may not be identical with 𝑅5.
𝑙
watermark
expiration time
A. Drop 𝑙 as if it never comes (or use side output).
B. Keep 𝑙 to produce as many results as possible.
≠

P1. More Kinds of Stream Join
1. Time Versioned Join (WIP)
An efficient policy to clean the cache.
2. One-to-One Join (query hints?)

P2. Better State
1. Sorted MapState
mapState.getByRange(start, end);
mapState.removeByRange(start, end);
2. RocksDB Backend for Operator State

P3. Separated Watermarks
watermark
offset

Cache Size Optimization
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 −(-30) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +50 MINUTE
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − (-10) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +30 MINUTE
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − (+ 5) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +15 MINUTE
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − (+ 0) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +20 MINUTE
𝑿
(𝑅𝑆𝑖𝑧𝑒)
𝒀
(𝐿𝑆𝑖𝑧𝑒)
The size of the two caches can be optimized to be the minimum value, 𝑋 + 𝑌.

P4. Query Optimization
2. Setting the Parallelism (hints/sampling inputs)
3. Adaptive Query Optimization
SQL
DataStream Pipeline
Runtime Tasks
1. More Stream Specific Rules

Q&A
https://www.youtube.com/watch?v=yDUfz3OFcOo https://baike.baidu.com/item/%E5%8E%8B%E9%9D%A2%E6%9C%BA/3477203

Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous"

Similar to Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous" (20)

More from Flink Forward

More from Flink Forward (20)

Recently uploaded

Recently uploaded (20)

Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous"