SlideShare a Scribd company logo
Flink Forward Berlin 2018
Stream	Join	in	Flink:	
from Discrete	to	Continuous
Xingcan Cui
Shandong University, China
Flink Forward Berlin 2018
About Me
2017.02 – First Pull Request to Apache Flink
2018.05 – Flink Committer
(mainly on SQL/Table API)
2018.06 – Ph.D. in CS from Shandong University
(supervised by Prof. Xiaohui Yu)
2018.10 – Postdoc at York University
Flink Forward Berlin 2018
Outline
1. The Stream	Join APIs
• Window Join in DataStream API
• Broadcast State	in Low-Level API
• Joins in SQL/Table API
2. Details about the	Time-Bounded Join
• Why	Time-Bounded
• How to Perform
• What’s	More
3. Future Work
Flink Forward Berlin 2018
Overview of the Stream Join APIs
Low-Level Process Function API
DataStream API
SQL/Table API
Simple Lateral Join
Materialized Table Join
Time-Bounded Join
Window Join/Window CoGroup
Broadcast State
Connected Stream
Flink Forward Berlin 2018
FlatJoinFunction
void join(
IN1 first,
IN2 second,
Collector<OUT> out);
JoinFunction
OUT join(
IN1 first,
IN2 second);
CoGroupFunction
void coGroup(
Iterable<IN1> first,
Iterable<IN2> second,
Collector<OUT> out);
Window Join/Window CoGroup
4S1
S2
11
2 1
S1.join(S2)
.where(<shape1>).equalTo(<shape2>)
.window(<window>)
.apply {new JoinFunction()/FlatJoinFunction()};
S1.coGroup(S2)
.where(<shape1>).equalTo(<shape2>)
.window(<window>)
.apply {new CoGroupFunction()};
2
3
1
2shape =
3shape =
×
×
223
2 1
2 22 2
22
1
Flink Forward Berlin 2018
Overview of the Stream Join APIs
Low-Level Process Function API
DataStream API
SQL/Table API
Window Join/Window CoGroup
Broadcast State	Pattern
Connected Stream
Flink Forward Berlin 2018
processElement(...)
processBroadcastElement(...)
KeyedBroadcastProcessFunction
void processElement(
IN1 value,
ReadOnlyContext ctx,
Collector<OUT> out)
void processBroadcastElement(
IN2 value,
Context ctx,
Collector<OUT> out)
Broadcast State
S1
S2
high-throughput
low-throughput
MapStateDescriptor state = new MapStateDescriptor();
BroadcastStream broadcast = S2.broadcast(state);
S1.keyBy(<key selector>);
.connect(broadcast)
.process(new KeyedBroadcastProcessFunction());
MapState
r/w
r
×
processElement(...)
processBroadcastElement(...)
12 12 12
15 4 3 2
12
13
14
15
24
25
Flink Forward Berlin 2018
Overview of the Stream Join APIs
Low-Level Process Function API
DataStream API
SQL/Table API
Simple Lateral Join
Materialized Table Join
Time-Bounded Join
Window Join/Window CoGroup
Broadcast State	Pattern
Connected Stream
Flink Forward Berlin 2018
Simple Lateral Join
S 32
SELECT id, f2
FROM S, LATERAL TABLE(udtf(S.f1)) AS T(f2)
SELECT id, f2
FROM S, UNNEST(S1.f1) AS T(f2)
× ×
T
id f1
3 [a, b, c]
id f1
2 [d, e]
a b cd e
Flink Forward Berlin 2018
Table
SQL on Streams
15 4 3 2 15 4 3 2 𝑆
𝑡
𝑅
SQL
𝑅′
6
Flink Forward Berlin 2018
Stream to Table Conversion
• Upsert Stream
• Append Stream queue.append(value)
map.put(key,	value)
Unique Key
Map
Queue
map.remove(key)
TIME
Time-Bounded Join
Materialized Stream	Join
• Retract	Stream list.add(value)List
list.remove(value)
Flink Forward Berlin 2018
Materialized Table Join
S
T
SELECT S.id, T.name
FROM S, T
WHERE S.id = T.id
State
12
12
3
×
State TTL Config
streamQueryConfig
.withIdleStateRetentionTime(Time.hours(1), Time.hours(2));
11 21 31
12 22 33
1. You	know	the table	size	will	never	exceed	the	state	capacity	(have	duplications	or	rows	will	be	retracted	in	time).
2. The rows	will be naturally	expired after a period time.
3. You	don’t	really	care about the accuracy of the join result (best-effort	approach).
Flink Forward Berlin 2018
Time-Bounded Join
SELECT S.id, T.time
FROM S, T
WHERE S.id = T.id AND
T.time BETWEEN S.time - INTERVAL '4' HOUR AND S.time
S
T 12 12 12
14 3 2
3
1123
33
Flink Forward Berlin 2018
Summary of the Stream Join APIs
Name Description
Outer
Support
Non-
Equi
Conti-
nuous
API Level
Launched
Version
Window Join/
Window CoGroup
Cut two streams into chunks and join data in
the same chunk.
-/LRF N N DataStream 1.0
Connected Stream
Connect two streams and jointly process
them with a CoProcessFunction.
ProcessFunction 1.2
Simple Lateral Join
Join a stream with an unnested array or a
user-defined table function.
L Y Y SQL/Table 1.2
Time-Bounded Join
Join two append only streams with a
predicate that bounds the time on both sides.
LRF N Y SQL/Table 1.4
Retract	Stream	Join
Join two retract streams (with idle state
retention time config).
LRF N Y SQL/Table 1.5
Broadcast State Pattern
Join a high-throughput stream with a low-
throughput stream.
LRF Y Y ProcessFunction 1.5
Flink Forward Berlin 2018
Outline
1. The Stream	Join APIs
• Window Join in DataStream API
• Broadcast State in Low-Level API
• Joins in SQL/Table API
2. Details about the	Time-Bounded Join
• Why	Time-Bounded
• How to Perform
• What’s	More
3. Future Work
Flink Forward Berlin 2018
Append Stream Join
Append Stream
time id ...
3 A ...
8 S ...
... ... ...
Sequential Scan
Time-Ordered Table
=
Join two time-ordered table with only one sequential scan on both sides.
Flink Forward Berlin 2018
Stateful
Scanner
Make the Join Feasible
𝐿
𝑅 12
14 3 2
3
A sliding window that slides smoothly.
1112
22
23
33
24
34
Flink Forward Berlin 2018
The Predicates for Time-Bounded Join
2. A time-bound predicate to keep the qualified records near.
Time
𝑋
𝑌
𝑙 𝑟
SELECT 𝐿. 𝑖𝑑, 𝑅. 𝑡𝑖𝑚𝑒
FROM 𝐿, 𝑅
AND
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − 𝑋	AND 𝐿. 𝑡𝑖𝑚𝑒 + 𝑌
WHERE 𝐿. 𝑖𝑑	 = 	𝑅. 𝑖𝑑
𝐿. 𝑡𝑖𝑚𝑒 BETWEEN 𝑅. 𝑡𝑖𝑚𝑒 − 𝑌	AND 𝑅. 𝑡𝑖𝑚𝑒 + 𝑋
1. An equi-predicate to make the join operator scalable.
Right
Cache
𝑆𝑖𝑧𝑒 = 𝑋
Left
Cache
𝑆𝑖𝑧𝑒 = 𝑌
𝑋
𝑌
Flink Forward Berlin 2018
Outline
1. The Stream	Join APIs
• Window Join in DataStream API
• Broadcast State in Low-Level API
• Joins in SQL/Table API
2. Details about the	Time-Bounded Join
• Why	Time-Bounded
• How to Perform
• What’s	More
3. Future Work
Flink Forward Berlin 2018
The Connected Stream
KeyedCoProcessFunction
void processElement1(
IN1 value,
Context ctx,
Collector<OUT> out)
void processElement2(
IN2 value,
Context ctx,
Collector<OUT> out)
void onTimer(
long timestamp,
OnTimerContext ctx,
Collector<OUT> out)
record’s time watermark
timer
cache
(state)
leftStream
.connect(rightStream)
.keyBy(<key selector for the equi-predicate>)
.process(new KeyedCoProcessFunction())
Map[Long, List[Row]]
time -> records of that time
left timer
right timer
Flink Forward Berlin 2018
The Logic for processElement()
update the watermark and
the record’s time
if the right
cache has
qualified rows
start
probe the right cache
to perform join
Yes
if the row
need to be
cached
put the row into the
left cache
No
Yes
end
No
if a right timer
need to be
registered
No
register a right timer
Yes
𝑙
watermark
time to clean
Flink Forward Berlin 2018
The Logic for onTimer()
update the watermark
if the left
cache still
contains rows
start
register another right
timer
calculate the left
expiration time
end
No
Yes
remove expired rows
from the left cache
watermark
left expiration time
time to clean
𝑙
Flink Forward Berlin 2018
Outline
1. The Stream	Join APIs
• Window Join in DataStream API
• Broadcast State in Low-Level API
• Joins in SQL/Table API
2. Details about the	Time-Bounded Join
• Why	Time-Bounded
• How to Perform
• What’s	More
3. Future Work
Flink Forward Berlin 2018
About the Join Result
1. The result for a time-bounded join is still an	append stream.
11122223332434
2. When converting the join result to a DataStream, at most one rowtime field can be reserved.
SELECT ltime, CAST(rtime AS TIMESTAMP), ... FROM ...
ltime lid ...
... ... ...
rtime rid ...
... ... ...
rowtime rowtime
Flink Forward Berlin 2018
Cache
Holding Back Watermarks
S
T 57
The Next
Operator
watermark
8 5
Subtract a fixed value from each watermark.
late!
3
Flink Forward Berlin 2018
The Integrity Problem
Table15 4 3 2 15 4 3 2 𝑆
𝑡
𝑅
SQL
𝑅4 𝑅5
𝑅4 could be affected by lateness, thus may not be identical with 𝑅5.
𝑙
watermark
expiration time
A. Drop 𝑙 as if it never comes (or use side output).
B. Keep 𝑙 to produce as many results as possible.
≠
Flink Forward Berlin 2018
Outline
1. The Stream	Join APIs
• Window Join in DataStream API
• Broadcast State in Low-Level API
• Joins in SQL/Table API
2. Details about the	Time-Bounded Join
• Why	Time-Bounded
• How to Perform
• What’s	More
3. Future Work
Flink Forward Berlin 2018
P1. More Kinds of Stream Join
1. Time Versioned Join (WIP)
An efficient policy to clean the cache.
2. One-to-One Join (query hints?)
Flink Forward Berlin 2018
P2. Better State
1. Sorted MapState
mapState.getByRange(start, end);
mapState.removeByRange(start, end);
2. RocksDB Backend for Operator State
Flink Forward Berlin 2018
P3. Separated Watermarks
watermark
offset
Flink Forward Berlin 2018
Cache Size Optimization
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 −(-30) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +50 MINUTE
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 −	(-10) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +30 MINUTE
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 −	(+ 5) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +15 MINUTE
𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 −	(+ 0) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +20 MINUTE
𝑿
(𝑅𝑆𝑖𝑧𝑒)
𝒀
(𝐿𝑆𝑖𝑧𝑒)
The size of the two caches can be optimized to be the minimum value, 𝑋 + 𝑌.
Flink Forward Berlin 2018
P4. Query Optimization
2. Setting the Parallelism (hints/sampling inputs)
3. Adaptive Query Optimization
SQL
DataStream Pipeline
Runtime Tasks
1. More Stream Specific Rules
Flink Forward Berlin 2018
Q&A
https://www.youtube.com/watch?v=yDUfz3OFcOo https://baike.baidu.com/item/%E5%8E%8B%E9%9D%A2%E6%9C%BA/3477203

More Related Content

What's hot

Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Flink Forward
 
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Flink Forward
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
Stephan Ewen
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Flink Forward
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
Flink Forward
 
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by FlinkFlink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward
 

What's hot (20)

Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
 
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
 
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
 
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by FlinkFlink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
 

Similar to Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous"

Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
StreamNative
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
Flink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL EngineFlink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL Engine
HostedbyConfluent
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
DataWorks Summit
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
DataWorks Summit
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
Monal Daxini
 
Mutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmMutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's Algorithm
Souvik Roy
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward
 
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward
 
Pulsar connector on flink 1.14
Pulsar connector on flink 1.14Pulsar connector on flink 1.14
Pulsar connector on flink 1.14
宇帆 盛
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
Graduating Flink Streaming - Chicago meetup
Graduating Flink Streaming - Chicago meetupGraduating Flink Streaming - Chicago meetup
Graduating Flink Streaming - Chicago meetup
Márton Balassi
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
C4Media
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 

Similar to Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous" (20)

Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Flink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL EngineFlink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL Engine
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Mutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmMutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's Algorithm
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
 
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
 
Pulsar connector on flink 1.14
Pulsar connector on flink 1.14Pulsar connector on flink 1.14
Pulsar connector on flink 1.14
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
Graduating Flink Streaming - Chicago meetup
Graduating Flink Streaming - Chicago meetupGraduating Flink Streaming - Chicago meetup
Graduating Flink Streaming - Chicago meetup
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 

Recently uploaded

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
ScyllaDB
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 

Recently uploaded (20)

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 

Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete to Continuous"

  • 1. Flink Forward Berlin 2018 Stream Join in Flink: from Discrete to Continuous Xingcan Cui Shandong University, China
  • 2. Flink Forward Berlin 2018 About Me 2017.02 – First Pull Request to Apache Flink 2018.05 – Flink Committer (mainly on SQL/Table API) 2018.06 – Ph.D. in CS from Shandong University (supervised by Prof. Xiaohui Yu) 2018.10 – Postdoc at York University
  • 3. Flink Forward Berlin 2018 Outline 1. The Stream Join APIs • Window Join in DataStream API • Broadcast State in Low-Level API • Joins in SQL/Table API 2. Details about the Time-Bounded Join • Why Time-Bounded • How to Perform • What’s More 3. Future Work
  • 4. Flink Forward Berlin 2018 Overview of the Stream Join APIs Low-Level Process Function API DataStream API SQL/Table API Simple Lateral Join Materialized Table Join Time-Bounded Join Window Join/Window CoGroup Broadcast State Connected Stream
  • 5. Flink Forward Berlin 2018 FlatJoinFunction void join( IN1 first, IN2 second, Collector<OUT> out); JoinFunction OUT join( IN1 first, IN2 second); CoGroupFunction void coGroup( Iterable<IN1> first, Iterable<IN2> second, Collector<OUT> out); Window Join/Window CoGroup 4S1 S2 11 2 1 S1.join(S2) .where(<shape1>).equalTo(<shape2>) .window(<window>) .apply {new JoinFunction()/FlatJoinFunction()}; S1.coGroup(S2) .where(<shape1>).equalTo(<shape2>) .window(<window>) .apply {new CoGroupFunction()}; 2 3 1 2shape = 3shape = × × 223 2 1 2 22 2 22 1
  • 6. Flink Forward Berlin 2018 Overview of the Stream Join APIs Low-Level Process Function API DataStream API SQL/Table API Window Join/Window CoGroup Broadcast State Pattern Connected Stream
  • 7. Flink Forward Berlin 2018 processElement(...) processBroadcastElement(...) KeyedBroadcastProcessFunction void processElement( IN1 value, ReadOnlyContext ctx, Collector<OUT> out) void processBroadcastElement( IN2 value, Context ctx, Collector<OUT> out) Broadcast State S1 S2 high-throughput low-throughput MapStateDescriptor state = new MapStateDescriptor(); BroadcastStream broadcast = S2.broadcast(state); S1.keyBy(<key selector>); .connect(broadcast) .process(new KeyedBroadcastProcessFunction()); MapState r/w r × processElement(...) processBroadcastElement(...) 12 12 12 15 4 3 2 12 13 14 15 24 25
  • 8. Flink Forward Berlin 2018 Overview of the Stream Join APIs Low-Level Process Function API DataStream API SQL/Table API Simple Lateral Join Materialized Table Join Time-Bounded Join Window Join/Window CoGroup Broadcast State Pattern Connected Stream
  • 9. Flink Forward Berlin 2018 Simple Lateral Join S 32 SELECT id, f2 FROM S, LATERAL TABLE(udtf(S.f1)) AS T(f2) SELECT id, f2 FROM S, UNNEST(S1.f1) AS T(f2) × × T id f1 3 [a, b, c] id f1 2 [d, e] a b cd e
  • 10. Flink Forward Berlin 2018 Table SQL on Streams 15 4 3 2 15 4 3 2 𝑆 𝑡 𝑅 SQL 𝑅′ 6
  • 11. Flink Forward Berlin 2018 Stream to Table Conversion • Upsert Stream • Append Stream queue.append(value) map.put(key, value) Unique Key Map Queue map.remove(key) TIME Time-Bounded Join Materialized Stream Join • Retract Stream list.add(value)List list.remove(value)
  • 12. Flink Forward Berlin 2018 Materialized Table Join S T SELECT S.id, T.name FROM S, T WHERE S.id = T.id State 12 12 3 × State TTL Config streamQueryConfig .withIdleStateRetentionTime(Time.hours(1), Time.hours(2)); 11 21 31 12 22 33 1. You know the table size will never exceed the state capacity (have duplications or rows will be retracted in time). 2. The rows will be naturally expired after a period time. 3. You don’t really care about the accuracy of the join result (best-effort approach).
  • 13. Flink Forward Berlin 2018 Time-Bounded Join SELECT S.id, T.time FROM S, T WHERE S.id = T.id AND T.time BETWEEN S.time - INTERVAL '4' HOUR AND S.time S T 12 12 12 14 3 2 3 1123 33
  • 14. Flink Forward Berlin 2018 Summary of the Stream Join APIs Name Description Outer Support Non- Equi Conti- nuous API Level Launched Version Window Join/ Window CoGroup Cut two streams into chunks and join data in the same chunk. -/LRF N N DataStream 1.0 Connected Stream Connect two streams and jointly process them with a CoProcessFunction. ProcessFunction 1.2 Simple Lateral Join Join a stream with an unnested array or a user-defined table function. L Y Y SQL/Table 1.2 Time-Bounded Join Join two append only streams with a predicate that bounds the time on both sides. LRF N Y SQL/Table 1.4 Retract Stream Join Join two retract streams (with idle state retention time config). LRF N Y SQL/Table 1.5 Broadcast State Pattern Join a high-throughput stream with a low- throughput stream. LRF Y Y ProcessFunction 1.5
  • 15. Flink Forward Berlin 2018 Outline 1. The Stream Join APIs • Window Join in DataStream API • Broadcast State in Low-Level API • Joins in SQL/Table API 2. Details about the Time-Bounded Join • Why Time-Bounded • How to Perform • What’s More 3. Future Work
  • 16. Flink Forward Berlin 2018 Append Stream Join Append Stream time id ... 3 A ... 8 S ... ... ... ... Sequential Scan Time-Ordered Table = Join two time-ordered table with only one sequential scan on both sides.
  • 17. Flink Forward Berlin 2018 Stateful Scanner Make the Join Feasible 𝐿 𝑅 12 14 3 2 3 A sliding window that slides smoothly. 1112 22 23 33 24 34
  • 18. Flink Forward Berlin 2018 The Predicates for Time-Bounded Join 2. A time-bound predicate to keep the qualified records near. Time 𝑋 𝑌 𝑙 𝑟 SELECT 𝐿. 𝑖𝑑, 𝑅. 𝑡𝑖𝑚𝑒 FROM 𝐿, 𝑅 AND 𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − 𝑋 AND 𝐿. 𝑡𝑖𝑚𝑒 + 𝑌 WHERE 𝐿. 𝑖𝑑 = 𝑅. 𝑖𝑑 𝐿. 𝑡𝑖𝑚𝑒 BETWEEN 𝑅. 𝑡𝑖𝑚𝑒 − 𝑌 AND 𝑅. 𝑡𝑖𝑚𝑒 + 𝑋 1. An equi-predicate to make the join operator scalable. Right Cache 𝑆𝑖𝑧𝑒 = 𝑋 Left Cache 𝑆𝑖𝑧𝑒 = 𝑌 𝑋 𝑌
  • 19. Flink Forward Berlin 2018 Outline 1. The Stream Join APIs • Window Join in DataStream API • Broadcast State in Low-Level API • Joins in SQL/Table API 2. Details about the Time-Bounded Join • Why Time-Bounded • How to Perform • What’s More 3. Future Work
  • 20. Flink Forward Berlin 2018 The Connected Stream KeyedCoProcessFunction void processElement1( IN1 value, Context ctx, Collector<OUT> out) void processElement2( IN2 value, Context ctx, Collector<OUT> out) void onTimer( long timestamp, OnTimerContext ctx, Collector<OUT> out) record’s time watermark timer cache (state) leftStream .connect(rightStream) .keyBy(<key selector for the equi-predicate>) .process(new KeyedCoProcessFunction()) Map[Long, List[Row]] time -> records of that time left timer right timer
  • 21. Flink Forward Berlin 2018 The Logic for processElement() update the watermark and the record’s time if the right cache has qualified rows start probe the right cache to perform join Yes if the row need to be cached put the row into the left cache No Yes end No if a right timer need to be registered No register a right timer Yes 𝑙 watermark time to clean
  • 22. Flink Forward Berlin 2018 The Logic for onTimer() update the watermark if the left cache still contains rows start register another right timer calculate the left expiration time end No Yes remove expired rows from the left cache watermark left expiration time time to clean 𝑙
  • 23. Flink Forward Berlin 2018 Outline 1. The Stream Join APIs • Window Join in DataStream API • Broadcast State in Low-Level API • Joins in SQL/Table API 2. Details about the Time-Bounded Join • Why Time-Bounded • How to Perform • What’s More 3. Future Work
  • 24. Flink Forward Berlin 2018 About the Join Result 1. The result for a time-bounded join is still an append stream. 11122223332434 2. When converting the join result to a DataStream, at most one rowtime field can be reserved. SELECT ltime, CAST(rtime AS TIMESTAMP), ... FROM ... ltime lid ... ... ... ... rtime rid ... ... ... ... rowtime rowtime
  • 25. Flink Forward Berlin 2018 Cache Holding Back Watermarks S T 57 The Next Operator watermark 8 5 Subtract a fixed value from each watermark. late! 3
  • 26. Flink Forward Berlin 2018 The Integrity Problem Table15 4 3 2 15 4 3 2 𝑆 𝑡 𝑅 SQL 𝑅4 𝑅5 𝑅4 could be affected by lateness, thus may not be identical with 𝑅5. 𝑙 watermark expiration time A. Drop 𝑙 as if it never comes (or use side output). B. Keep 𝑙 to produce as many results as possible. ≠
  • 27. Flink Forward Berlin 2018 Outline 1. The Stream Join APIs • Window Join in DataStream API • Broadcast State in Low-Level API • Joins in SQL/Table API 2. Details about the Time-Bounded Join • Why Time-Bounded • How to Perform • What’s More 3. Future Work
  • 28. Flink Forward Berlin 2018 P1. More Kinds of Stream Join 1. Time Versioned Join (WIP) An efficient policy to clean the cache. 2. One-to-One Join (query hints?)
  • 29. Flink Forward Berlin 2018 P2. Better State 1. Sorted MapState mapState.getByRange(start, end); mapState.removeByRange(start, end); 2. RocksDB Backend for Operator State
  • 30. Flink Forward Berlin 2018 P3. Separated Watermarks watermark offset
  • 31. Flink Forward Berlin 2018 Cache Size Optimization 𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 −(-30) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +50 MINUTE 𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − (-10) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +30 MINUTE 𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − (+ 5) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +15 MINUTE 𝑅. 𝑡𝑖𝑚𝑒 BETWEEN 𝐿. 𝑡𝑖𝑚𝑒 − (+ 0) MINUTE AND 𝐿. 𝑡𝑖𝑚𝑒 +20 MINUTE 𝑿 (𝑅𝑆𝑖𝑧𝑒) 𝒀 (𝐿𝑆𝑖𝑧𝑒) The size of the two caches can be optimized to be the minimum value, 𝑋 + 𝑌.
  • 32. Flink Forward Berlin 2018 P4. Query Optimization 2. Setting the Parallelism (hints/sampling inputs) 3. Adaptive Query Optimization SQL DataStream Pipeline Runtime Tasks 1. More Stream Specific Rules
  • 33. Flink Forward Berlin 2018 Q&A https://www.youtube.com/watch?v=yDUfz3OFcOo https://baike.baidu.com/item/%E5%8E%8B%E9%9D%A2%E6%9C%BA/3477203