SlideShare a Scribd company logo
1 of 34
Download to read offline
Flink's SQL Engine:
Let's Open the Engine Room!
Timo Walther, Principal Software Engineer
2024-03-19
About me
Open Source
• Long-term committer since 2014 (before ASF)
• Member of the project management committee (PMC)
• Top 5 contributor (commits), top 1 contributor (additions)
• Among core architects of Flink SQL
Career
• Early Software Engineer @ DataArtisans (acquired by Alibaba)
• SDK Team, SQL Team Lead @ Ververica
• Co-Founder @ Immerok (acquired by Confluent)
• Principal Software Engineer @ Confluent
2
What is Apache Flink®?
Snapshots
• Backup
• Version
• Fork
• A/B test
• Time-travel
• Restore
State
• Store
• Buffer
• Cache
• Model
• Grow
• Expire
Time
• Synchronize
• Progress
• Wait
• Timeout
• Fast-forward
• Replay
Building Blocks for Stream Processing
4
Streams
• Pipeline
• Distribute
• Join
• Enrich
• Control
• Replay
Flink SQL
Flink SQL in a Nutshell
6
Properties
• Abstract the building blocks for stream processing
• Operator topology is determined by planner and optimizer
• Business logic is declared in ANSI SQL
• Internally, the engine works on binary data
• Conceptually a table, but a changelog under the hood!
SELECT 'Hello World';
SELECT * FROM (VALUES (1), (2), (3));
SELECT * FROM MyTable;
SELECT * FROM Orders o JOIN Payments p ON o.id = p.order;
How do I Work with Streams in Flink SQL?
7
• You don’t. You work with dynamic tables!
• A concept similar to materialized views
CREATE TABLE Revenue
(name STRING, total INT)
WITH (…)
INSERT INTO Revenue
SELECT name, SUM(amount)
FROM Transactions
GROUP BY name
CREATE TABLE Transactions
(name STRING, amount INT)
WITH (…)
name amount
Alice 56
Bob 10
Alice 89
name total
Alice 145
Bob 10
So, is Flink SQL a database? No, bring your own data!
Stream-Table Duality - Example
8
An applied changelog becomes a real (materialized) table.
name amount
Alice 56
Bob 10
Alice 89
name total
Alice 56
Bob 10
changelog
+I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56]
145
materialization
CREATE TABLE Revenue
(name STRING, total INT)
WITH (…)
INSERT INTO Revenue
SELECT name, SUM(amount)
FROM Transactions
GROUP BY name
CREATE TABLE Transactions
(name STRING, amount INT)
WITH (…)
Stream-Table Duality - Example
9
An applied changelog becomes a real (materialized) table.
name amount
Alice 56
Bob 10
Alice 89
name total
Alice 56
Bob 10
+I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56]
145
materialization
CREATE TABLE Revenue
(PRIMARY KEY(name) …)
WITH (…)
INSERT INTO Revenue
SELECT name, SUM(amount)
FROM Transactions
GROUP BY name
CREATE TABLE Transactions
(name STRING, amount INT)
WITH (…)
Save ~50% of traffic if the downstream system supports upserting!
Let's Open the Engine Room!
SQL Declaration
SQL Declaration – Variant 1 – Basic
12
-- Example tables
CREATE TABLE Transaction (ts TIMESTAMP(3), tid BIGINT, amount INT);
CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING);
CREATE TABLE Matched (tid BIGINT, amount INT, type STRING);
-- Join two tables based on key within time and store in target table
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
SQL Declaration – Variant 2 – Watermarks
13
-- Add watermarks
CREATE TABLE Transaction (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS);
CREATE TABLE Payment (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS);
CREATE TABLE Matched (tid BIGINT, amount INT, type STRING);
-- Join two tables based on key within time and store in target table
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
SQL Declaration – Variant 3 – Updating
14
-- Transactions can be aborted -> Result is updating
CREATE TABLE Transaction (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert');
CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING);
CREATE TABLE Matched (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert');
-- Join two tables based on key within time and store in target table
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Every Keyword can trigger an Avalanche
CREATE TABLE
• Defines the connector (e.g., Kafka) and the changelog mode (i.e. append/retract/upsert)
àFor the engine: What needs to be processed? What should be produced to the target table?
WATERMARK FOR
• Defines a completeness marker (i.e. "I have seen everything up to time t.")
àFor the engine: Can intermediate results be discarded? Can I trigger result computation?
PRIMARY KEY
• Defines a uniqueness constraint (i.e. "Event with key k will occur one time or will be updated.")
àFor the engine: How do I guarantee the constraint for the target table?
SELECT, JOIN, WHERE
àFor the engine: What needs to be done? How much freedom do I have? How much can I optimize? 15
SQL Planning
17
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
• Break the SQL text into a tree
• Lookup catalogs, databases, tables, views, functions,
and their types
• Resolve all identifiers for columns and fields
e.g. SELECT pi, pi.pi FROM pi.pi.pi;
• Validate input/output columns and arguments/return types
Main drivers: FlinkSqlParserImpl, SqlValidatorImpl, SqlToRelConverter
Output: SqlNode then RelNode
18
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• Rewrite subqueries to joins (e.g.: EXISTS or IN)
• Apply query decorrelation
• Simplify expressions
• Constant folding (e.g.: functions with literal args)
• Initial filter push down
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode
19
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• rewrite subqueries to joins (e.g. EXISTS or IN)
• apply query decorrelation
• simplify expressions
• constant folding (e.g. functions with literal args)
• filter push down (e.g. all the way into the source)
Main drivers: FlinkStreamProgram, FlinkBatchProgram
Output: RelNode
Cost-based Logical Optimization
à Flink Logical Tree
• Projection filter/push down (e.g.: all the way into the source)
• Push aggregate through join
• Reduce aggregate functions (e.g.: AVG -> SUM/COUNT)
• Remove unnecessary sort, aggregate, union, etc.
• …
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: FlinkLogicalRel (RelNode)
20
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• rewrite subqueries to joins (e.g. EXISTS or IN)
• apply query decorrelation
• simplify expressions
• constant folding (e.g. functions with literal args)
• filter push down (e.g. all the way into the source)
Main drivers: FlinkStreamProgram, FlinkBatchProgram
Output: RelNode
Cost-based Logical Optimization
à Flink Logical Tree
• watermark push down (e.g.: all the way into the source)
• projection push down (e.g.: all the way into the source)
• push aggregate through join
• reduce aggregate functions (e.g.: AVG -> SUM/COUNT)
• remove unnecessary sort, aggregate, union, etc.
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode (FlinkLogicalRel)
Flink Rule-based Logical Rewrites
à Flink Logical Tree
• Watermark push down, more projection push down
• Transpose calc past rank to reduce rank input fields
• Transform over window to top-n node
• …
• Sync timestamp columns with watermarks
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: FlinkLogicalRel (RelNode)
21
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• rewrite subqueries to joins (e.g. EXISTS or IN)
• apply query decorrelation
• simplify expressions
• constant folding (e.g. functions with literal args)
• filter push down (e.g. all the way into the source)
Main drivers: FlinkStreamProgram, FlinkBatchProgram
Output: RelNode
Cost-based Logical Optimization
à Flink Logical Tree
• watermark push down (e.g.: all the way into the source)
• projection push down (e.g.: all the way into the source)
• push aggregate through join
• reduce aggregate functions (e.g.: AVG -> SUM/COUNT)
• remove unnecessary sort, aggregate, union, etc.
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode (FlinkLogicalRel)
Flink Rule-based Logical Rewrites
à Flink Logical Tree
• watermark push down, more projection push down
• transpose calc past rank to reduce rank input fields
• transform over window to top-n node
• …
• sync timestamp columns with watermarks
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode (FlinkLogicalRel)
Flink Physical Optimization and Rewrites
à Flink Physical Tree
• Convert to matching physical node
• Add changelog normalize node
• Push watermarks past these special operators
• …
• Changelog mode inference (i.e. is update before necessary?)
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: FlinkPhysicalRel (RelNode)
Physical Tree à Stream Execution Nodes
22
AsyncCalc Calc ChangelogNormalize Correlate Deduplicate DropUpdateBefore Exchange
Expand GlobalGroupAggregate GlobalWindowAggregate GroupAggregate
GroupTableAggregate GroupWindowAggregate IncrementalGroupAggregate IntervalJoin
Join Limit LocalGroupAggregate LocalWindowAggregate LookupJoin Match
OverAggregate Rank Sink Sort SortLimit TableSourceScan TemporalJoin TemporalSort
Union Values WatermarkAssigner WindowAggregate WindowAggregateBase
WindowDeduplicate WindowJoin WindowRank WindowTableFunction
• Recipes for DAG subparts (i.e. StreamExecNodes are templates for stream transformations)
• JSON serializable and stability guarantees (i.e. CompiledPlan)
à End of the SQL stack:
next are JobGraph (incl. concrete runtime classes) and ExecutionGraph (incl. cluster information)
Examples
Example 1 – No watermarks, no updates
24
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Logical Tree
LogicalSink(table=[Matched], fields=[tid, amount, type])
+- LogicalProject(tid=[$1], amount=[$2], type=[$5])
+- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))])
+- LogicalJoin(condition=[=($1, $4)], joinType=[inner])
:- LogicalTableScan(table=[Transaction])
+- LogicalTableScan(table=[Payment])
Physical Tree
Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE])
+- Calc(select=[tid, amount, type], changelogMode=[I])
+- Join(joinType=[InnerJoin], where=[AND(=(tid, tid0), >=(ts0, ts), <=(ts0, +(ts, 600000)))], …, changelogMode=[I])
:- Exchange(distribution=[hash[tid]], changelogMode=[I])
: +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I])
+- Exchange(distribution=[hash[tid]], changelogMode=[I])
+- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Example 1 – No watermarks, no updates
25
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Join Calc Kafka Writer
offset
Kafka Reader
offset
left side
right side txn id
txn
+I[<T>]
+I[<P>]
+I[<T>, <P>] +I[tid, amount, type] Message<k, v>
Example 2 – With watermarks, no updates
26
Logical Tree
LogicalSink(table=[Matched], fields=[tid, amount, type])
+- LogicalProject(tid=[$1], amount=[$2], type=[$5])
+- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))])
+- LogicalJoin(condition=[=($1, $4)], joinType=[inner])
:- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)])
: +- LogicalTableScan(table=[Transaction])
+- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)])
+- LogicalTableScan(table=[Payment])
Physical Tree
Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE])
+- Calc(select=[tid, amount, type], changelogMode=[I])
+- IntervalJoin(joinType=[InnerJoin], windowBounds=[leftLowerBound=…, leftTimeIndex=…, …], …, changelogMode=[I])
:- Exchange(distribution=[hash[tid]], changelogMode=[I])
: +- TableSourceScan(table=[Transaction, watermark=[-(ts, 2000), …]], fields=[ts, tid, amount], changelogMode=[I])
+- Exchange(distribution=[hash[tid]], changelogMode=[I])
+- TableSourceScan(table=[Payment, watermark=[-(ts, 2000), …]], fields=[ts, tid, type], changelogMode=[I])
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Example 2 – With watermarks, no updates
27
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Interval Join Calc Kafka Writer
offset
Kafka Reader
offset
left side
right side
txn id
txn
+I[<T>]
+I[<P>]
+I[<T>, <P>] +I[tid, amount, type] Message<k, v>
W[12:00]
W[11:55]
W[11:55]
timers
Example 3 – With updates, no watermarks
28
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Logical Tree
LogicalSink(table=[Matched], fields=[tid, amount, type])
+- LogicalProject(tid=[$1], amount=[$2], type=[$5])
+- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))])
+- LogicalJoin(condition=[=($1, $4)], joinType=[inner])
:- LogicalTableScan(table=[Transaction])
+- LogicalTableScan(table=[Payment])
Physical Tree
Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE])
+- Calc(select=[tid, amount, type], changelogMode=[I,UA,D])
+- Join(joinType=[InnerJoin], where=[…], …, leftInputSpec=[JoinKeyContainsUniqueKey], changelogMode=[I,UA,D])
:- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D])
: +- ChangelogNormalize(key=[tid], changelogMode=[I,UA,D])
: +- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D])
: +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I,UA,D])
+- Exchange(distribution=[hash[tid]], changelogMode=[I])
+- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Example 3 – With updates, no watermarks
29
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Join Calc Kafka Writer
offset
Kafka Reader
offset
left side
right side txn id
txn
Join
Join
Join
Changelog
Normalize
last seen
+U[<T>]
+I[<P>]
+I[tid, amount, type] Message<k, v>
+I[<T>, <P>]
+I[<T>]
+U[<T>]
+I[<P>]
+U[tid, amount, type] Message<k, v>
+U[<T>,<P>]
+U[<T>]
Flink SQL on Confluent Cloud
Open Source Flink but Cloud-Native
Serverless
• Complexities of infrastructure management are abstracted away
• Automatic upgrades
• Stable APIs
Zero Knobs
• Auto-scaling
• Auto-watermarking
Usage-based
• Scale-to-zero
• Pay only for what you use
31
-- No connection or credentials required
CREATE TABLE MyTable (
uid BIGINT,
name STRING,
PRIMARY KEY (uid) NOT ENFORCED);
-- No tuning required
SELECT * FROM MyTable;
-- Stable for long running statements
INSERT INTO OtherTable SELECT * FROM MyTable;
Open source Flink but Cloud-Native & Complete
One Unified Platform
• Kafka and Flink fully integrated
• Automatic inference or manual creation of topics
• Metadata management via Schema Registry – bidirectional for Avro, JSON, Protobuf
• Consistent semantics across storage and processing - changelogs with append-only, upsert, retract
32
Confluent Cloud should feel like a database. But for streaming!
Confluent Cloud
CLI
Confluent Cloud
SQL Workspace
Thank you!
Feel free to follow:
@twalthr
Flink's SQL Engine: Let's Open the Engine Room!

More Related Content

Similar to Flink's SQL Engine: Let's Open the Engine Room!

Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)Landoop Ltd
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusInfluxData
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streamsRadu Tudoran
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Batch Apex in Salesforce
Batch Apex in SalesforceBatch Apex in Salesforce
Batch Apex in SalesforceDavid Helgerson
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaFlink Forward
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingHostedbyConfluent
 
Writing an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkWriting an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkEventador
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
DBMS_Chap_09_Transaction.pptx (1).pdf
DBMS_Chap_09_Transaction.pptx (1).pdfDBMS_Chap_09_Transaction.pptx (1).pdf
DBMS_Chap_09_Transaction.pptx (1).pdfbadboy624277
 
Writing Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaYardena Meymann
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 

Similar to Flink's SQL Engine: Let's Open the Engine Room! (20)

Overview of Oracle database12c for developers
Overview of Oracle database12c for developersOverview of Oracle database12c for developers
Overview of Oracle database12c for developers
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Hive_p
Hive_pHive_p
Hive_p
 
Batch Apex in Salesforce
Batch Apex in SalesforceBatch Apex in Salesforce
Batch Apex in Salesforce
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
 
Writing an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkWriting an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on Flink
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
DBMS_Chap_09_Transaction.pptx (1).pdf
DBMS_Chap_09_Transaction.pptx (1).pdfDBMS_Chap_09_Transaction.pptx (1).pdf
DBMS_Chap_09_Transaction.pptx (1).pdf
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Writing Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & Akka
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Flink's SQL Engine: Let's Open the Engine Room!

  • 1. Flink's SQL Engine: Let's Open the Engine Room! Timo Walther, Principal Software Engineer 2024-03-19
  • 2. About me Open Source • Long-term committer since 2014 (before ASF) • Member of the project management committee (PMC) • Top 5 contributor (commits), top 1 contributor (additions) • Among core architects of Flink SQL Career • Early Software Engineer @ DataArtisans (acquired by Alibaba) • SDK Team, SQL Team Lead @ Ververica • Co-Founder @ Immerok (acquired by Confluent) • Principal Software Engineer @ Confluent 2
  • 3. What is Apache Flink®?
  • 4. Snapshots • Backup • Version • Fork • A/B test • Time-travel • Restore State • Store • Buffer • Cache • Model • Grow • Expire Time • Synchronize • Progress • Wait • Timeout • Fast-forward • Replay Building Blocks for Stream Processing 4 Streams • Pipeline • Distribute • Join • Enrich • Control • Replay
  • 6. Flink SQL in a Nutshell 6 Properties • Abstract the building blocks for stream processing • Operator topology is determined by planner and optimizer • Business logic is declared in ANSI SQL • Internally, the engine works on binary data • Conceptually a table, but a changelog under the hood! SELECT 'Hello World'; SELECT * FROM (VALUES (1), (2), (3)); SELECT * FROM MyTable; SELECT * FROM Orders o JOIN Payments p ON o.id = p.order;
  • 7. How do I Work with Streams in Flink SQL? 7 • You don’t. You work with dynamic tables! • A concept similar to materialized views CREATE TABLE Revenue (name STRING, total INT) WITH (…) INSERT INTO Revenue SELECT name, SUM(amount) FROM Transactions GROUP BY name CREATE TABLE Transactions (name STRING, amount INT) WITH (…) name amount Alice 56 Bob 10 Alice 89 name total Alice 145 Bob 10 So, is Flink SQL a database? No, bring your own data!
  • 8. Stream-Table Duality - Example 8 An applied changelog becomes a real (materialized) table. name amount Alice 56 Bob 10 Alice 89 name total Alice 56 Bob 10 changelog +I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56] 145 materialization CREATE TABLE Revenue (name STRING, total INT) WITH (…) INSERT INTO Revenue SELECT name, SUM(amount) FROM Transactions GROUP BY name CREATE TABLE Transactions (name STRING, amount INT) WITH (…)
  • 9. Stream-Table Duality - Example 9 An applied changelog becomes a real (materialized) table. name amount Alice 56 Bob 10 Alice 89 name total Alice 56 Bob 10 +I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56] 145 materialization CREATE TABLE Revenue (PRIMARY KEY(name) …) WITH (…) INSERT INTO Revenue SELECT name, SUM(amount) FROM Transactions GROUP BY name CREATE TABLE Transactions (name STRING, amount INT) WITH (…) Save ~50% of traffic if the downstream system supports upserting!
  • 10. Let's Open the Engine Room!
  • 12. SQL Declaration – Variant 1 – Basic 12 -- Example tables CREATE TABLE Transaction (ts TIMESTAMP(3), tid BIGINT, amount INT); CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING); CREATE TABLE Matched (tid BIGINT, amount INT, type STRING); -- Join two tables based on key within time and store in target table INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 13. SQL Declaration – Variant 2 – Watermarks 13 -- Add watermarks CREATE TABLE Transaction (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS); CREATE TABLE Payment (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS); CREATE TABLE Matched (tid BIGINT, amount INT, type STRING); -- Join two tables based on key within time and store in target table INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 14. SQL Declaration – Variant 3 – Updating 14 -- Transactions can be aborted -> Result is updating CREATE TABLE Transaction (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert'); CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING); CREATE TABLE Matched (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert'); -- Join two tables based on key within time and store in target table INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 15. Every Keyword can trigger an Avalanche CREATE TABLE • Defines the connector (e.g., Kafka) and the changelog mode (i.e. append/retract/upsert) àFor the engine: What needs to be processed? What should be produced to the target table? WATERMARK FOR • Defines a completeness marker (i.e. "I have seen everything up to time t.") àFor the engine: Can intermediate results be discarded? Can I trigger result computation? PRIMARY KEY • Defines a uniqueness constraint (i.e. "Event with key k will occur one time or will be updated.") àFor the engine: How do I guarantee the constraint for the target table? SELECT, JOIN, WHERE àFor the engine: What needs to be done? How much freedom do I have? How much can I optimize? 15
  • 17. 17 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree • Break the SQL text into a tree • Lookup catalogs, databases, tables, views, functions, and their types • Resolve all identifiers for columns and fields e.g. SELECT pi, pi.pi FROM pi.pi.pi; • Validate input/output columns and arguments/return types Main drivers: FlinkSqlParserImpl, SqlValidatorImpl, SqlToRelConverter Output: SqlNode then RelNode
  • 18. 18 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • Rewrite subqueries to joins (e.g.: EXISTS or IN) • Apply query decorrelation • Simplify expressions • Constant folding (e.g.: functions with literal args) • Initial filter push down Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode
  • 19. 19 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • rewrite subqueries to joins (e.g. EXISTS or IN) • apply query decorrelation • simplify expressions • constant folding (e.g. functions with literal args) • filter push down (e.g. all the way into the source) Main drivers: FlinkStreamProgram, FlinkBatchProgram Output: RelNode Cost-based Logical Optimization à Flink Logical Tree • Projection filter/push down (e.g.: all the way into the source) • Push aggregate through join • Reduce aggregate functions (e.g.: AVG -> SUM/COUNT) • Remove unnecessary sort, aggregate, union, etc. • … Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: FlinkLogicalRel (RelNode)
  • 20. 20 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • rewrite subqueries to joins (e.g. EXISTS or IN) • apply query decorrelation • simplify expressions • constant folding (e.g. functions with literal args) • filter push down (e.g. all the way into the source) Main drivers: FlinkStreamProgram, FlinkBatchProgram Output: RelNode Cost-based Logical Optimization à Flink Logical Tree • watermark push down (e.g.: all the way into the source) • projection push down (e.g.: all the way into the source) • push aggregate through join • reduce aggregate functions (e.g.: AVG -> SUM/COUNT) • remove unnecessary sort, aggregate, union, etc. Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode (FlinkLogicalRel) Flink Rule-based Logical Rewrites à Flink Logical Tree • Watermark push down, more projection push down • Transpose calc past rank to reduce rank input fields • Transform over window to top-n node • … • Sync timestamp columns with watermarks Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: FlinkLogicalRel (RelNode)
  • 21. 21 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • rewrite subqueries to joins (e.g. EXISTS or IN) • apply query decorrelation • simplify expressions • constant folding (e.g. functions with literal args) • filter push down (e.g. all the way into the source) Main drivers: FlinkStreamProgram, FlinkBatchProgram Output: RelNode Cost-based Logical Optimization à Flink Logical Tree • watermark push down (e.g.: all the way into the source) • projection push down (e.g.: all the way into the source) • push aggregate through join • reduce aggregate functions (e.g.: AVG -> SUM/COUNT) • remove unnecessary sort, aggregate, union, etc. Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode (FlinkLogicalRel) Flink Rule-based Logical Rewrites à Flink Logical Tree • watermark push down, more projection push down • transpose calc past rank to reduce rank input fields • transform over window to top-n node • … • sync timestamp columns with watermarks Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode (FlinkLogicalRel) Flink Physical Optimization and Rewrites à Flink Physical Tree • Convert to matching physical node • Add changelog normalize node • Push watermarks past these special operators • … • Changelog mode inference (i.e. is update before necessary?) Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: FlinkPhysicalRel (RelNode)
  • 22. Physical Tree à Stream Execution Nodes 22 AsyncCalc Calc ChangelogNormalize Correlate Deduplicate DropUpdateBefore Exchange Expand GlobalGroupAggregate GlobalWindowAggregate GroupAggregate GroupTableAggregate GroupWindowAggregate IncrementalGroupAggregate IntervalJoin Join Limit LocalGroupAggregate LocalWindowAggregate LookupJoin Match OverAggregate Rank Sink Sort SortLimit TableSourceScan TemporalJoin TemporalSort Union Values WatermarkAssigner WindowAggregate WindowAggregateBase WindowDeduplicate WindowJoin WindowRank WindowTableFunction • Recipes for DAG subparts (i.e. StreamExecNodes are templates for stream transformations) • JSON serializable and stability guarantees (i.e. CompiledPlan) à End of the SQL stack: next are JobGraph (incl. concrete runtime classes) and ExecutionGraph (incl. cluster information)
  • 24. Example 1 – No watermarks, no updates 24 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Logical Tree LogicalSink(table=[Matched], fields=[tid, amount, type]) +- LogicalProject(tid=[$1], amount=[$2], type=[$5]) +- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))]) +- LogicalJoin(condition=[=($1, $4)], joinType=[inner]) :- LogicalTableScan(table=[Transaction]) +- LogicalTableScan(table=[Payment]) Physical Tree Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE]) +- Calc(select=[tid, amount, type], changelogMode=[I]) +- Join(joinType=[InnerJoin], where=[AND(=(tid, tid0), >=(ts0, ts), <=(ts0, +(ts, 600000)))], …, changelogMode=[I]) :- Exchange(distribution=[hash[tid]], changelogMode=[I]) : +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I]) +- Exchange(distribution=[hash[tid]], changelogMode=[I]) +- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
  • 25. Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Example 1 – No watermarks, no updates 25 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Kafka Reader Join Calc Kafka Writer offset Kafka Reader offset left side right side txn id txn +I[<T>] +I[<P>] +I[<T>, <P>] +I[tid, amount, type] Message<k, v>
  • 26. Example 2 – With watermarks, no updates 26 Logical Tree LogicalSink(table=[Matched], fields=[tid, amount, type]) +- LogicalProject(tid=[$1], amount=[$2], type=[$5]) +- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))]) +- LogicalJoin(condition=[=($1, $4)], joinType=[inner]) :- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)]) : +- LogicalTableScan(table=[Transaction]) +- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)]) +- LogicalTableScan(table=[Payment]) Physical Tree Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE]) +- Calc(select=[tid, amount, type], changelogMode=[I]) +- IntervalJoin(joinType=[InnerJoin], windowBounds=[leftLowerBound=…, leftTimeIndex=…, …], …, changelogMode=[I]) :- Exchange(distribution=[hash[tid]], changelogMode=[I]) : +- TableSourceScan(table=[Transaction, watermark=[-(ts, 2000), …]], fields=[ts, tid, amount], changelogMode=[I]) +- Exchange(distribution=[hash[tid]], changelogMode=[I]) +- TableSourceScan(table=[Payment, watermark=[-(ts, 2000), …]], fields=[ts, tid, type], changelogMode=[I]) INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 27. Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Example 2 – With watermarks, no updates 27 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Kafka Reader Interval Join Calc Kafka Writer offset Kafka Reader offset left side right side txn id txn +I[<T>] +I[<P>] +I[<T>, <P>] +I[tid, amount, type] Message<k, v> W[12:00] W[11:55] W[11:55] timers
  • 28. Example 3 – With updates, no watermarks 28 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Logical Tree LogicalSink(table=[Matched], fields=[tid, amount, type]) +- LogicalProject(tid=[$1], amount=[$2], type=[$5]) +- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))]) +- LogicalJoin(condition=[=($1, $4)], joinType=[inner]) :- LogicalTableScan(table=[Transaction]) +- LogicalTableScan(table=[Payment]) Physical Tree Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE]) +- Calc(select=[tid, amount, type], changelogMode=[I,UA,D]) +- Join(joinType=[InnerJoin], where=[…], …, leftInputSpec=[JoinKeyContainsUniqueKey], changelogMode=[I,UA,D]) :- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D]) : +- ChangelogNormalize(key=[tid], changelogMode=[I,UA,D]) : +- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D]) : +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I,UA,D]) +- Exchange(distribution=[hash[tid]], changelogMode=[I]) +- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
  • 29. Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Example 3 – With updates, no watermarks 29 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Kafka Reader Join Calc Kafka Writer offset Kafka Reader offset left side right side txn id txn Join Join Join Changelog Normalize last seen +U[<T>] +I[<P>] +I[tid, amount, type] Message<k, v> +I[<T>, <P>] +I[<T>] +U[<T>] +I[<P>] +U[tid, amount, type] Message<k, v> +U[<T>,<P>] +U[<T>]
  • 30. Flink SQL on Confluent Cloud
  • 31. Open Source Flink but Cloud-Native Serverless • Complexities of infrastructure management are abstracted away • Automatic upgrades • Stable APIs Zero Knobs • Auto-scaling • Auto-watermarking Usage-based • Scale-to-zero • Pay only for what you use 31 -- No connection or credentials required CREATE TABLE MyTable ( uid BIGINT, name STRING, PRIMARY KEY (uid) NOT ENFORCED); -- No tuning required SELECT * FROM MyTable; -- Stable for long running statements INSERT INTO OtherTable SELECT * FROM MyTable;
  • 32. Open source Flink but Cloud-Native & Complete One Unified Platform • Kafka and Flink fully integrated • Automatic inference or manual creation of topics • Metadata management via Schema Registry – bidirectional for Avro, JSON, Protobuf • Consistent semantics across storage and processing - changelogs with append-only, upsert, retract 32 Confluent Cloud should feel like a database. But for streaming! Confluent Cloud CLI Confluent Cloud SQL Workspace
  • 33. Thank you! Feel free to follow: @twalthr