Flink SQL: The Challenges to Build a Streaming SQL Engine

Flink SQL:
The Challenges to Build a Streaming SQL Engine
Jinsong (Jingsong) Lee
Staff Engineer at Alibaba
Apache Flink PMC member & Paimon Founder

About Me
• Staff Engineer in Alibaba
(Lake Storage Team Lead)
• PMC member of Apache Flink
(Committer of Apache Iceberg, Beam)
• Founder of Apache Paimon
(A new Lake Format focus on streaming)

What is Flink SQL?
CONTENT
Challenges of Flink SQL
State and Storage
Summary and Futures
Content

What is Flink SQL?
• Data Movement
• Data Warehouse

Main Scenarios of Flink SQL
Data Movement
Data Integration
Connector + Calc + Lookup Join
Data Warehouse
Unbounded Aggregate, Join
PV, UV …
Event Driven
Risk Control, Monitoring alarm
Window, Interval Join, UDF, CEP

Powerful Connector Ecosystem
TiDB
ApsaraDB MySQL
TiDB
ClickHouse
Iceberg
Hudi
Paimon
Streaming & Batch
Calc & UDF & Lookup Join
Flink SQL & Flink CDC
Data Movement: More and more companies are building streaming and batch
unified data integration platforms based on Flink SQL. 100,000 Jobs +

How Flink SQL Works?
SQL
Table API
Logical Plan Physical Plan Transformations JobGraph
Configurable optimizer phases
Catalog
Hive
Metastore
Code Generation
Optimizer
SubQuery
Decorrelation
Filter/Project
PushDown
Join
Reorder
…
Code Optimizations State-of-art Operators Resource Optimizations
Generated operators
JVM intrinsic
Declarative expressions
Operate on binary data
Cache efficient sorter
Compact binary hash map
Hybrid hash join
Full managed memory
IO Manager
Off-Heap memory
Flink Cluster
Submit Job

An Example
SELECT
t1.id, 1 + 2 + t1.value AS v
FROM t1 JOIN t2
WHERE
t1.id = t2.id AND
t2.id < 1000
Scan (t1) Scan (t2)
Join
Filter
Project
t1.id = t2.id
t2.id < 1000
t1.id,
1+2+t1.value
Logical Plan
SQL Query

Changelog Mechanism
word
Hello word cnt
Hello 1
word_count
World 1
Hello 2
SELECT
word,
COUNT(*) as cnt
FROM logs
GROUP BY word
World
Hello
cnt freq
1 2
2 1
1 1
SELECT
cnt,
COUNT(cnt) as freq
FROM word_count
GROUP BY cnt
with changelog
由查询优化器判断是否需要Retraction，用户无感知。
① Changelog makes the streaming query result correct
② Query optimizer determines whether update_before is needed
③ Users are not aware of it
Hello, 1
insert
World, 1
insert
Hello, 1
update_before
Hello, 2
update_after
Hello
insert
World
insert
Hello
insert
Source Word Count Count Frequency

Changelog Make CDC processing transparent
INSERT INTO dynamo_table SELECT
o.order_id, o.total, c.country, CONCAT(msg, ‘_SUFFIX’) AS
msg
FROM Orders_CDC AS o
JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id;;
Lookup Join
Orders
Customers

Challenges of Flink SQL
• Late Data: Unbounded Operators
• Retractions Amplification: Mini-Batch
• Event Ordering
• Nondeterminism

• No Watermark and Late Event
• Unlimited State: or manually evaluated State Time-To-Live
• Upsert Sink: Output in advance, relying on idempotence to
achieve final consistency
Late Data & Unbounded Operators
SELECT SUM(num) FROM T GROUP BY color
Source Upsert Sink
Real World:
GROUP BY color and day…

Retractions Amplification in Complex DAG
Scan (t1) Left Join Left Join
Scan (t2)
Aggregate
Scan (t3)
1 Record
2 Record:
1 -U
1 +U
4 Record:
2 -U
2 +U
8 Record:
4 -U
4 +U
• Flink SQL Changelog Mechanism: +I -U +U -D
• Stateful Operator: Produce -U and +U for Update
• Amplification in Complex DAG, 8X ……

• Use heap memory to hold bundle
• In-memory aggregation before
accessing states and serde operations
• Also ease the downstream loads
• But Lack Mini-Batch Join
Mini-Batch to Reduce Amplification
Mini-Batch aggregation:
table.exec.mini-batch.enabled = true
table.exec.mini-batch.allow-latency = “5000 ms”
table.exec.mini-batch.size = 1000
SELECT SUM(num) FROM T GROUP BY color

Event Ordering for CDC Sources
-- CDC source tables: s1 & s2
s1: id BIGINT, level BIGINT, PRIMARY KEY(id)
s2: id BIGINT, attr VARCHAR, PRIMARY KEY(id)
-- sink table: t1
t1: id BIGINT, level BIGINT, attr VARCHAR,
PRIMARY KEY(id)
-- join s1 and s2 and insert the result into t1
INSERT INTO t1
SELECT s1.*, s2.attr
FROM s1 JOIN s2 ON s1.level = s2.id
Data Shuffle in Distributed environments make changelog out-of-orderness

Event Ordering: Solution
• Sink Upsert Materializer
• Rely on State (State TTL)
• Poor Performance
• Optimize:
• Just use RocksDB inside
checkpoint
• Sink Store supports
version fields

Nondeterminism
CURRENT_TIMESTAMP
RANDOM
• If the source is a CDC source?
• Retraction output different
records?
Group
by
Sum
Sum is incorrect!
CDC
SRC

How to solve Nondeterminism
Streaming Deduplicate
CURRENT_TIMESTAMP
RANDOM
Dedup
CDC
SRC
Group
by
Sum
Or dedup by Streaming Lake Paimon
Deduplicate By
State

Local State
DFS
Periodically CP
dump state files
Task
Managers
（Compute）
Flink Task
Local
Disk
State
Manager
Flink
OP
Main
Storage
Flink Task
state files
State
Manager
Flink
OP
state files
CP files
• Main in Local
• High Performance
• Small State 👍
• Big State ❎ State TTL Local
Disk
Incrementally
Asynchronously

Disaggregated State: Flink 2.0
DFS
Task
Managers
（Compute）
state files
Main
Storage
Optional
Cache
Upload
state files
Flink Task
Local
Disk
State
Manager
Flink
OP
Mem
Async
Flink Task
State
Manager
Flink
OP
Mem
CP1
CP2
CP3
Tasks Share
Local
Disk
• Main in DFS
• Big State 👍
• How to cut data?
• How to Rescale?
• How to Share?

Lake State (Apache Paimon)
Logs
RDBMS
Flink Table Store Flink Table Store
)OLQN 64/
6WUHDPLQJ %DWFK
)OLQN 64/
6WUHDPLQJ %DWFK
binlog
Data Serving
Systems
)OLQN 64/
4XHULHV
Flink Table Store
2'6 ':' ':6
$'6
)OLQN 64/
6WUHDPLQJ %DWFK
Paimon Paimon Paimon
Flink CDC
• Latency: Minute Level
• Merge Engine: Deduplicate, Partial-Update, Aggregation, First Row
• No Data TTL, Performance Improvement, 10X

• Flink SQL: Data movement, Data Warehouse, Event Driven.
• The core concept of Flink SQL is CHANGELOG.
• The user case of Data Movement.
• 4 Challenges of Flink SQL
• Late Data: Unbounded Operators
• Retractions Amplification: Mini-Batch
• Event Ordering
• Nondeterminism
• State of Flink SQL, improvement and alternative
Summary & Futures

Flink SQL: The Challenges to Build a Streaming SQL Engine

More Related Content

What's hot

Similar to Flink SQL: The Challenges to Build a Streaming SQL Engine

More from HostedbyConfluent

Recently uploaded

Flink SQL: The Challenges to Build a Streaming SQL Engine