SlideShare a Scribd company logo
The State
of the
Table API:
2022
David Anderson
–
@alpinegizmo
–
Flink Forward 22
Sep 2021
legacy planner removed
streaming/batch unification
DataStream <-> Table interop
1.14
May 2022
SQL version upgrades
window TVFs in batch
JSON functions
Table Store
1.15
Aug-Sep 2022
MATCH_RECOGNIZE batch
SQL Gateway
1.16
Sep 2021
legacy planner removed
streaming/batch unification
DataStream <-> Table interop
1.14
May 2022
SQL version upgrades
window TVFs in batch
JSON functions
Table Store
1.15
Aug-Sep 2022
MATCH_RECOGNIZE batch
SQL Gateway
1.16
Sep 2021
legacy planner removed
streaming/batch unification
DataStream <-> Table interop
1.14
May 2022
SQL version upgrades
window TVFs in batch
JSON functions
Table Store
1.15
Aug-Sep 2022
MATCH_RECOGNIZE batch
SQL Gateway
1.16
Intro
About me
Apache Flink
● Flink Committer
● Focus on training, documentation, FLIP-220
● Release manager for Flink 1.15.1
● Prolific author of answers about Flink on Stack Overflow
Career
● Researcher: Carnegie Mellon, Mitsubishi Electric, Sun Labs
● Consultant: Machine Learning and Data Engineering
● Trainer: Data Science Retreat and data Artisans / Ververica
● Community Engineering @ immerok
6
David Anderson
@alpinegizmo
Business data is naturally in streams: either bounded or unbounded
Batch processing is a special case of stream processing
8
start now
past future
unbounded
stream
unbounded stream
bounded
stream
Flink jobs are organized as dataflow graphs
9
Transaction
s
Customers
Join Sink
Flink jobs are stateful
10
Transaction
s
Customers
Join Sink
Flink jobs are executed in parallel
Transaction
s
Partition1
Customers
Partition1
Join Sink
Transaction
s
Partition2
Customers
Partition2
Join Sink
shuffle by
customerI
d
DataStreams & Tables,
Batch & Streaming
Runtime
DataSet API
Table / SQL API
unified batch & streaming
Looking back at Flink’s legacy API stack
DataStream API
Runtime
Internal Operator API
Relational Planner / Optimizer
DataStream API
unified batch & streaming
Table / SQL API
unified batch & streaming
Today the Table API is entirely its own thing
Latest Transaction for each Customer (Table)
SELECT
t_id,
t_customer_id,
t_amount,
t_time
FROM (
SELECT *, ROW_NUMBER()
OVER (PARTITION BY t_customer_id
ORDER BY t_time DESC)
AS rownum
FROM Transactions )
WHERE rownum <= 1;
{
"t_id": 1,
"t_customer_id": 1,
"t_amount": 99.08,
"time": 1657144244000
}
Batch
+-------------------------+---------------+-----------------------+--------------+
| t_time | t_id | t_customer_id | t_amount |
+-------------------------+---------------+-----------------------+--------------+
| 2022-07-24 08:00:00.000 | 2 | 0 | 500 |
| 2022-07-24 09:00:00.000 | 3 | 1 | 11 |
+-------------------------+---------------+-----------------------+--------------+
Streaming
+----+-------------------------+--------------+----------------------+--------------+
| op | t_time | t_id | t_customer_id | t_amount |
+----+-------------------------+--------------+----------------------+--------------+
| +I | 2022-08-03 09:17:25.505 | 0 | 1 | 316 |
| +I | 2022-08-03 09:17:26.871 | 1 | 0 | 660 |
| -U | 2022-08-03 09:17:26.871 | 1 | 0 | 660 |
| +U | 2022-08-03 09:17:27.952 | 2 | 0 | 493 |
| -U | 2022-08-03 09:17:25.505 | 0 | 1 | 316 |
| +U | 2022-08-03 09:17:29.046 | 3 | 1 | 35 |
| … | … | … | … | … |
Batch vs Streaming
Latest Transaction for each Customer (DataStream)
DataStream<Transaction> results =
transactionStream
.keyBy(t -> t.t_customer_id)
.process(new LatestTransaction());
public void processElement(
Transaction incoming,
Context context,
Collector<Transaction> out) {
Transaction latest = latestTransaction.value();
if (latest == null ||
(incoming.t_time.isAfter(latest.t_time))) {
latestTransaction.update(incoming);
out.collect(incoming);
}
}
DataStreams
● inputs and outputs: event streams
○ user implements classes for event objects
○ user supplies ser/de
● business logic: low-level code that
reacts to events and timers by
○ reading and writing state
○ creating timers
○ emitting events
Dynamic Tables
● inputs and outputs: event streams
are a history of changes to Tables
○ events insert, update, or delete Rows
○ user provides Table schemas
○ user specifies formats (e.g. CSV or JSON)
● business logic: SQL queries
○ high-level, declarative description compiled
into a dataflow graph
○ the dataflow reacts to these changes and
updates the result(s) (akin to materialized
view maintenance)
Two different programming models
Interoperability
Customers
{
"c_id": 1,
"c_name": "Ramon Stehr"
}
Customers
{
"c_id": 1,
"c_name": "Ramon Stehr"
}
{
"t_id": 1,
"t_customer_id": 1,
"t_amount": 99.08,
"time": 1657144244000
}
Transactions
Customers
{
"c_id": 1,
"c_name": "Ramon Stehr"
}
{
"t_id": 1,
"t_customer_id": 1,
"t_amount": 99.08,
"time": 1657144244000
}
Transactions
In this example, the
transaction stream
may contain duplicates
Deduplicate
Customers
Join Sink
Transaction
s
Deduplicate
Customers
Join Sink
Transaction
s
INSERT INTO Sink
SELECT t_id, c_name, t_amount
FROM Customers
JOIN (SELECT DISTINCT * FROM Transactions) ON c_id = t_customer_id;
+I[25, Renaldo Walsh, 280.49]
+I[27, Stuart Altenwerth, 818.16]
+I[19, Kizzie Reichert, 60.71]
+I[29, Renaldo Walsh, 335.59]
+I[31, Stuart Altenwerth, 948.26]
+I[23, Ashley Towne, 784.84]
+I[41, Louis White, 578.81]
+I[35, Ashley Towne, 585.44]
+I[43, Renaldo Walsh, 503.11]
+I[39, Kizzie Reichert, 625.32]
+I[13, Kizzie Reichert, 840.47]
...
Results
INSERT INTO Sink
SELECT t_id, c_name, t_amount
FROM Customers
JOIN
(SELECT DISTINCT * FROM Transactions)
ON c_id = t_customer_id;
Starting point: POJOs for Customers and Transactions
public class Customer {
// A Flink POJO must have public fields, or getters and setters
public long c_id;
public String c_name;
// A Flink POJO must have a no-args default constructor
public Customer() {}
. . .
}
Seamless interoperability between DataStreams and Tables
KafkaSource<Customer> customerSource =
KafkaSource.<Customer>builder()
.setBootstrapServers("localhost:9092")
.setTopics(CUSTOMER_TOPIC)
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new CustomerDeserializer())
.build();
DataStream<Customer> customerStream =
env.fromSource(
customerSource, WatermarkStrategy.noWatermarks(), "Customers");
tableEnv.createTemporaryView("Customers", customerStream);
Seamless interoperability between DataStreams and Tables
KafkaSource<Customer> customerSource =
KafkaSource.<Customer>builder()
.setBootstrapServers("localhost:9092")
.setTopics(CUSTOMER_TOPIC)
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new CustomerDeserializer())
.build();
DataStream<Customer> customerStream =
env.fromSource(
customerSource, WatermarkStrategy.noWatermarks(), "Customers");
tableEnv.createTemporaryView("Customers", customerStream);
Seamless interoperability between DataStreams and Tables
KafkaSource<Customer> customerSource =
KafkaSource.<Customer>builder()
.setBootstrapServers("localhost:9092")
.setTopics(CUSTOMER_TOPIC)
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new CustomerDeserializer())
.build();
DataStream<Customer> customerStream =
env.fromSource(
customerSource, WatermarkStrategy.noWatermarks(), "Customers");
tableEnv.createTemporaryView("Customers", customerStream);
Seamless interoperability between DataStreams and Tables
// use Flink SQL to do the heavy lifting
Table resultTable =
tableEnv.sqlQuery(
String.join(
"n",
"SELECT t_id, c_name, CAST(t_amount AS DECIMAL(5, 2))",
"FROM Customers",
"JOIN (SELECT DISTINCT * FROM Transactions)”,
"ON c_id = t_customer_id"));
// switch back from Table API to DataStream
DataStream<Row> resultStream = tableEnv.toChangelogStream(resultTable);
Seamless interoperability between DataStreams and Tables
// use Flink SQL to do the heavy lifting
Table resultTable =
tableEnv.sqlQuery(
String.join(
"n",
"SELECT t_id, c_name, CAST(t_amount AS DECIMAL(5, 2))",
"FROM Customers",
"JOIN (SELECT DISTINCT * FROM Transactions)”,
"ON c_id = t_customer_id"));
// switch back from Table API to DataStream
DataStream<Row> resultStream = tableEnv.toChangelogStream(resultTable);
DataStreams
vs Tables
Version
Upgrades
Intro Interoperability
Table Store
SQL Gateway
Interlude
Flink now has a powerful and versatile SQL engine
● Batch / Streaming unification
● The new type system
● DataStream / Table interoperability
● Scala-free classpath
● Catalogs, connectors, formats, CDC
● PyFlink
● Improved semantics
● Optimizations
● Bug fixes, new features, etc.
Use cases?
● ETL (esp joins)
● Analytics
● Anything, really
○ in combination with UDFs
and/or the DataStream API
SQL Features in Flink 1.16
SELECT FROM WHERE
GROUP BY [HAVING]
Non-windowed
TUMBLE, HOP, SESSION windows
Window Table-Valued Functions
TUMBLE, HOP, CUMULATE windows
OVER window
JOIN
Time-Windowed INNER + OUTER JOIN
Non-windowed INNER + OUTER JOIN
MATCH_RECOGNIZE
Set Operations
User-Defined Functions
Scalar
Aggregation
Table-valued
Statement Sets
Streaming
and
Batch
Streaming
only
ORDER BY time
INNER JOIN with
Temporal table
External lookup table
Batch
only
ORDER BY anything
Full TPC-DS support
Table API: Long-term initiatives
FLIP(s) 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16
Blink planner 32
Python 38, 58, 78, 96, 97,
106, 112, 114, 121,
139
Hive 30, 123, 152
CDC 87, 95, 105
Connectors,
Formats
DataStream/Table
interop
136
Version upgrades 190
Table Store 188, 226, 230, 254
SQL Gateway 91
Version Upgrades
Stateful restarts of Flink jobs
● Flink jobs can be restarted from
checkpoints and savepoints
● This requires that each stateful
operator be able to find and load its
state
● Things may have changed, making this
difficult/impossible
○ types
○ topology
DataStream API
● You have enough low-level control to
be able to avoid or cope with
potential problems
Table/SQL API
● New Flink versions can introduce
changes to the SQL planner that
render old state un-restorable
FLIP-190: Flink Version Upgrades for Table/SQL API Programs
Goals
● The same query can always be restarted correctly after upgrading Flink
● Schema and query evolution are out of scope
Status
● Released as BETA in 1.15
Usage
● Only supports streaming
● Must be a complete pipeline, i.e., INSERT INTO sink SELECT . . .
Example: before upgrade
String streamingQueryWithInsert =
String.join(
"n",
"INSERT INTO sink",
"SELECT t_id, c_name, t_amount",
"FROM Customers",
"JOIN (SELECT DISTINCT * FROM Transactions)",
"ON c_id = t_customer_id");
tableEnv.compilePlanSql(streamingQueryWithInsert).writeToFile(planLocation);
Example: after upgrade
TableResult execution =
tableEnv.executePlan(PlanReference.fromFile(planLocation));
Table Store
Typical use case / scenario
Joins Aggregations
intermediate results
aggregated results
Table
Store
Tables backed by connectors vs built-in table storage
CREATE CATALOG my_catalog WITH (
'type'='table-store',
'warehouse'='file:/tmp/table_store'
);
USE CATALOG my_catalog;
-- create a word count table
CREATE TABLE word_count (
word STRING PRIMARY KEY NOT ENFORCED,
cnt BIGINT
);
-- create a word count table
CREATE TABLE word_count (
word STRING PRIMARY KEY NOT ENFORCED,
cnt BIGINT
) WITH (
'connector' = 'filesystem',
'path' = '/tmp/word_count',
'format' = 'csv'
);
Architecture of this built-in table storage
Advantages of the Table Store
● Easy to use
○ drop in the JAR file and start using it
○ provides “normal” tables
● Flexible
○ streaming pipelines
○ batch jobs
○ ad-hoc queries
● Low-latency
● Integrates with
○ Spark
○ Trino
○ Hive
SQL Gateway
SQL Gateway: Architecture
Client
REST
endpoint
Session
Manager
Executor
Catalog
Flink
Cluster
Wrap-up
The ongoing efforts to add
version upgrade support, built-in
table storage, and a SQL
gateway will expand the Table
API into many new use cases.
Thanks!
David Anderson
@alpinegizmo
danderson@apache.org
These examples and more
can be found in the
Immerok Apache Flink
Cookbook at
https://docs.immerok.cloud

More Related Content

What's hot

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
Konstantin Knauf
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
HostedbyConfluent
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
HostedbyConfluent
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 

What's hot (20)

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 

Similar to The Current State of Table API in 2022

Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
Mydbops
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
Guy Harrison
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Jim Czuprynski
 
TechShift: There’s light beyond LAMP
TechShift: There’s light beyond LAMPTechShift: There’s light beyond LAMP
TechShift: There’s light beyond LAMP
Stephen Tallamy
 
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
MariaDB plc
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
VictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to Kafka
ScyllaDB
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New Features
Dave Stokes
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Reportnyin27
 
Flexviews materialized views for my sql
Flexviews materialized views for my sqlFlexviews materialized views for my sql
Flexviews materialized views for my sql
Justin Swanhart
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in HiveDataWorks Summit
 

Similar to The Current State of Table API in 2022 (20)

Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
 
TechShift: There’s light beyond LAMP
TechShift: There’s light beyond LAMPTechShift: There’s light beyond LAMP
TechShift: There’s light beyond LAMP
 
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to Kafka
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New Features
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Report
 
Flexviews materialized views for my sql
Flexviews materialized views for my sqlFlexviews materialized views for my sql
Flexviews materialized views for my sql
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 

More from Flink Forward

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 

More from Flink Forward (9)

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

The Current State of Table API in 2022

  • 1. The State of the Table API: 2022 David Anderson – @alpinegizmo – Flink Forward 22
  • 2. Sep 2021 legacy planner removed streaming/batch unification DataStream <-> Table interop 1.14 May 2022 SQL version upgrades window TVFs in batch JSON functions Table Store 1.15 Aug-Sep 2022 MATCH_RECOGNIZE batch SQL Gateway 1.16
  • 3. Sep 2021 legacy planner removed streaming/batch unification DataStream <-> Table interop 1.14 May 2022 SQL version upgrades window TVFs in batch JSON functions Table Store 1.15 Aug-Sep 2022 MATCH_RECOGNIZE batch SQL Gateway 1.16
  • 4. Sep 2021 legacy planner removed streaming/batch unification DataStream <-> Table interop 1.14 May 2022 SQL version upgrades window TVFs in batch JSON functions Table Store 1.15 Aug-Sep 2022 MATCH_RECOGNIZE batch SQL Gateway 1.16
  • 6. About me Apache Flink ● Flink Committer ● Focus on training, documentation, FLIP-220 ● Release manager for Flink 1.15.1 ● Prolific author of answers about Flink on Stack Overflow Career ● Researcher: Carnegie Mellon, Mitsubishi Electric, Sun Labs ● Consultant: Machine Learning and Data Engineering ● Trainer: Data Science Retreat and data Artisans / Ververica ● Community Engineering @ immerok 6 David Anderson @alpinegizmo
  • 7. Business data is naturally in streams: either bounded or unbounded Batch processing is a special case of stream processing 8 start now past future unbounded stream unbounded stream bounded stream
  • 8. Flink jobs are organized as dataflow graphs 9 Transaction s Customers Join Sink
  • 9. Flink jobs are stateful 10 Transaction s Customers Join Sink
  • 10. Flink jobs are executed in parallel Transaction s Partition1 Customers Partition1 Join Sink Transaction s Partition2 Customers Partition2 Join Sink shuffle by customerI d
  • 12. Runtime DataSet API Table / SQL API unified batch & streaming Looking back at Flink’s legacy API stack DataStream API
  • 13. Runtime Internal Operator API Relational Planner / Optimizer DataStream API unified batch & streaming Table / SQL API unified batch & streaming Today the Table API is entirely its own thing
  • 14. Latest Transaction for each Customer (Table) SELECT t_id, t_customer_id, t_amount, t_time FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY t_customer_id ORDER BY t_time DESC) AS rownum FROM Transactions ) WHERE rownum <= 1; { "t_id": 1, "t_customer_id": 1, "t_amount": 99.08, "time": 1657144244000 }
  • 15. Batch +-------------------------+---------------+-----------------------+--------------+ | t_time | t_id | t_customer_id | t_amount | +-------------------------+---------------+-----------------------+--------------+ | 2022-07-24 08:00:00.000 | 2 | 0 | 500 | | 2022-07-24 09:00:00.000 | 3 | 1 | 11 | +-------------------------+---------------+-----------------------+--------------+ Streaming +----+-------------------------+--------------+----------------------+--------------+ | op | t_time | t_id | t_customer_id | t_amount | +----+-------------------------+--------------+----------------------+--------------+ | +I | 2022-08-03 09:17:25.505 | 0 | 1 | 316 | | +I | 2022-08-03 09:17:26.871 | 1 | 0 | 660 | | -U | 2022-08-03 09:17:26.871 | 1 | 0 | 660 | | +U | 2022-08-03 09:17:27.952 | 2 | 0 | 493 | | -U | 2022-08-03 09:17:25.505 | 0 | 1 | 316 | | +U | 2022-08-03 09:17:29.046 | 3 | 1 | 35 | | … | … | … | … | … | Batch vs Streaming
  • 16. Latest Transaction for each Customer (DataStream) DataStream<Transaction> results = transactionStream .keyBy(t -> t.t_customer_id) .process(new LatestTransaction()); public void processElement( Transaction incoming, Context context, Collector<Transaction> out) { Transaction latest = latestTransaction.value(); if (latest == null || (incoming.t_time.isAfter(latest.t_time))) { latestTransaction.update(incoming); out.collect(incoming); } }
  • 17. DataStreams ● inputs and outputs: event streams ○ user implements classes for event objects ○ user supplies ser/de ● business logic: low-level code that reacts to events and timers by ○ reading and writing state ○ creating timers ○ emitting events Dynamic Tables ● inputs and outputs: event streams are a history of changes to Tables ○ events insert, update, or delete Rows ○ user provides Table schemas ○ user specifies formats (e.g. CSV or JSON) ● business logic: SQL queries ○ high-level, declarative description compiled into a dataflow graph ○ the dataflow reacts to these changes and updates the result(s) (akin to materialized view maintenance) Two different programming models
  • 20. Customers { "c_id": 1, "c_name": "Ramon Stehr" } { "t_id": 1, "t_customer_id": 1, "t_amount": 99.08, "time": 1657144244000 } Transactions
  • 21. Customers { "c_id": 1, "c_name": "Ramon Stehr" } { "t_id": 1, "t_customer_id": 1, "t_amount": 99.08, "time": 1657144244000 } Transactions In this example, the transaction stream may contain duplicates
  • 23. Deduplicate Customers Join Sink Transaction s INSERT INTO Sink SELECT t_id, c_name, t_amount FROM Customers JOIN (SELECT DISTINCT * FROM Transactions) ON c_id = t_customer_id;
  • 24. +I[25, Renaldo Walsh, 280.49] +I[27, Stuart Altenwerth, 818.16] +I[19, Kizzie Reichert, 60.71] +I[29, Renaldo Walsh, 335.59] +I[31, Stuart Altenwerth, 948.26] +I[23, Ashley Towne, 784.84] +I[41, Louis White, 578.81] +I[35, Ashley Towne, 585.44] +I[43, Renaldo Walsh, 503.11] +I[39, Kizzie Reichert, 625.32] +I[13, Kizzie Reichert, 840.47] ... Results INSERT INTO Sink SELECT t_id, c_name, t_amount FROM Customers JOIN (SELECT DISTINCT * FROM Transactions) ON c_id = t_customer_id;
  • 25. Starting point: POJOs for Customers and Transactions public class Customer { // A Flink POJO must have public fields, or getters and setters public long c_id; public String c_name; // A Flink POJO must have a no-args default constructor public Customer() {} . . . }
  • 26. Seamless interoperability between DataStreams and Tables KafkaSource<Customer> customerSource = KafkaSource.<Customer>builder() .setBootstrapServers("localhost:9092") .setTopics(CUSTOMER_TOPIC) .setStartingOffsets(OffsetsInitializer.earliest()) .setValueOnlyDeserializer(new CustomerDeserializer()) .build(); DataStream<Customer> customerStream = env.fromSource( customerSource, WatermarkStrategy.noWatermarks(), "Customers"); tableEnv.createTemporaryView("Customers", customerStream);
  • 27. Seamless interoperability between DataStreams and Tables KafkaSource<Customer> customerSource = KafkaSource.<Customer>builder() .setBootstrapServers("localhost:9092") .setTopics(CUSTOMER_TOPIC) .setStartingOffsets(OffsetsInitializer.earliest()) .setValueOnlyDeserializer(new CustomerDeserializer()) .build(); DataStream<Customer> customerStream = env.fromSource( customerSource, WatermarkStrategy.noWatermarks(), "Customers"); tableEnv.createTemporaryView("Customers", customerStream);
  • 28. Seamless interoperability between DataStreams and Tables KafkaSource<Customer> customerSource = KafkaSource.<Customer>builder() .setBootstrapServers("localhost:9092") .setTopics(CUSTOMER_TOPIC) .setStartingOffsets(OffsetsInitializer.earliest()) .setValueOnlyDeserializer(new CustomerDeserializer()) .build(); DataStream<Customer> customerStream = env.fromSource( customerSource, WatermarkStrategy.noWatermarks(), "Customers"); tableEnv.createTemporaryView("Customers", customerStream);
  • 29. Seamless interoperability between DataStreams and Tables // use Flink SQL to do the heavy lifting Table resultTable = tableEnv.sqlQuery( String.join( "n", "SELECT t_id, c_name, CAST(t_amount AS DECIMAL(5, 2))", "FROM Customers", "JOIN (SELECT DISTINCT * FROM Transactions)”, "ON c_id = t_customer_id")); // switch back from Table API to DataStream DataStream<Row> resultStream = tableEnv.toChangelogStream(resultTable);
  • 30. Seamless interoperability between DataStreams and Tables // use Flink SQL to do the heavy lifting Table resultTable = tableEnv.sqlQuery( String.join( "n", "SELECT t_id, c_name, CAST(t_amount AS DECIMAL(5, 2))", "FROM Customers", "JOIN (SELECT DISTINCT * FROM Transactions)”, "ON c_id = t_customer_id")); // switch back from Table API to DataStream DataStream<Row> resultStream = tableEnv.toChangelogStream(resultTable);
  • 32. Flink now has a powerful and versatile SQL engine ● Batch / Streaming unification ● The new type system ● DataStream / Table interoperability ● Scala-free classpath ● Catalogs, connectors, formats, CDC ● PyFlink ● Improved semantics ● Optimizations ● Bug fixes, new features, etc. Use cases? ● ETL (esp joins) ● Analytics ● Anything, really ○ in combination with UDFs and/or the DataStream API
  • 33. SQL Features in Flink 1.16 SELECT FROM WHERE GROUP BY [HAVING] Non-windowed TUMBLE, HOP, SESSION windows Window Table-Valued Functions TUMBLE, HOP, CUMULATE windows OVER window JOIN Time-Windowed INNER + OUTER JOIN Non-windowed INNER + OUTER JOIN MATCH_RECOGNIZE Set Operations User-Defined Functions Scalar Aggregation Table-valued Statement Sets Streaming and Batch Streaming only ORDER BY time INNER JOIN with Temporal table External lookup table Batch only ORDER BY anything Full TPC-DS support
  • 34. Table API: Long-term initiatives FLIP(s) 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 Blink planner 32 Python 38, 58, 78, 96, 97, 106, 112, 114, 121, 139 Hive 30, 123, 152 CDC 87, 95, 105 Connectors, Formats DataStream/Table interop 136 Version upgrades 190 Table Store 188, 226, 230, 254 SQL Gateway 91
  • 36. Stateful restarts of Flink jobs ● Flink jobs can be restarted from checkpoints and savepoints ● This requires that each stateful operator be able to find and load its state ● Things may have changed, making this difficult/impossible ○ types ○ topology DataStream API ● You have enough low-level control to be able to avoid or cope with potential problems Table/SQL API ● New Flink versions can introduce changes to the SQL planner that render old state un-restorable
  • 37. FLIP-190: Flink Version Upgrades for Table/SQL API Programs Goals ● The same query can always be restarted correctly after upgrading Flink ● Schema and query evolution are out of scope Status ● Released as BETA in 1.15 Usage ● Only supports streaming ● Must be a complete pipeline, i.e., INSERT INTO sink SELECT . . .
  • 38. Example: before upgrade String streamingQueryWithInsert = String.join( "n", "INSERT INTO sink", "SELECT t_id, c_name, t_amount", "FROM Customers", "JOIN (SELECT DISTINCT * FROM Transactions)", "ON c_id = t_customer_id"); tableEnv.compilePlanSql(streamingQueryWithInsert).writeToFile(planLocation);
  • 39. Example: after upgrade TableResult execution = tableEnv.executePlan(PlanReference.fromFile(planLocation));
  • 41. Typical use case / scenario Joins Aggregations intermediate results aggregated results Table Store
  • 42. Tables backed by connectors vs built-in table storage CREATE CATALOG my_catalog WITH ( 'type'='table-store', 'warehouse'='file:/tmp/table_store' ); USE CATALOG my_catalog; -- create a word count table CREATE TABLE word_count ( word STRING PRIMARY KEY NOT ENFORCED, cnt BIGINT ); -- create a word count table CREATE TABLE word_count ( word STRING PRIMARY KEY NOT ENFORCED, cnt BIGINT ) WITH ( 'connector' = 'filesystem', 'path' = '/tmp/word_count', 'format' = 'csv' );
  • 43. Architecture of this built-in table storage
  • 44. Advantages of the Table Store ● Easy to use ○ drop in the JAR file and start using it ○ provides “normal” tables ● Flexible ○ streaming pipelines ○ batch jobs ○ ad-hoc queries ● Low-latency ● Integrates with ○ Spark ○ Trino ○ Hive
  • 47. Wrap-up The ongoing efforts to add version upgrade support, built-in table storage, and a SQL gateway will expand the Table API into many new use cases.
  • 48. Thanks! David Anderson @alpinegizmo danderson@apache.org These examples and more can be found in the Immerok Apache Flink Cookbook at https://docs.immerok.cloud