SlideShare a Scribd company logo
1 of 40
Download to read offline
© 2019 Ververica
Aljoscha Krettek – Software Engineer, Flink PMC, Beam PMC
Timo Walther – Software Engineer, Flink PMC
What's new for Flink's Table & SQL APIs?
Planners, Python, DDL, and more!
© 2019 Ververica
• Very expressive stream processing API
– Transform, aggregate, and join events
– Java and Scala
• Control how events are processed with respect to time
– Timestamps, Watermarks, Windows, Timers, Triggers, Allowed Lateness, …
• Maintain and update application state
– Keyed state, operator state, state backends, checkpointing, …
The DataStream API is great…
© 2019 Ververica
• Writing distributed programs is not always easy
– Stream processing technology spreads rapidly
– New concepts (time, state, ...)
• Requires knowledge & skill
– Continous applications have special requirements
– Programming experience (Java / Scala)
• Users want to focus on their business logic
… but it‘s not made for everyone.
© 2019 Ververica
• Relational APIs are declarative
– User says what is needed, system decides how to compute it
• Queries can be effectively optimized
– Less imperative black-box code
– Well-researched field
• Queries are efficiently executed
– Let Flink deal with state and time
• ”Everybody” knows and uses SQL
Why not SQL (or another relational API)?
© 2019 Ververica
Apache Flink’s Relational APIs
Unified APIs for batch & streaming data
A query specifies exactly the same result
regardless whether its input is
static batch data or streaming data.
tableEnvironment
.scan("clicks")
.groupBy('user)
.select('user, 'url.count as 'cnt)
SELECT user, COUNT(url) AS cnt
FROM clicks
GROUP BY user
LINQ-style Table APIANSI SQL
© 2019 Ververica6
This is joint work with members of the
Apache Flink community.
© 2019 Ververica7
Some of this presents work that is in
progress in the Flink community. Other
things are planned and/or have design
documents. Some were discussed at
one point or another on the mailing lists
or in person.
This represents our understanding of
the current state, this is not a fixed
roadmap, Flink is an open-source
Apache project.
© 2019 Ververica
Evolution in Progress…
FLIP-32
FLIP-55
FLIP-29
FLIP-30
FLIP-57
FLIP-29
FLIP-64 FLIP-68
FLIP-65
FLIP-51
FLIP-69
FLIP-66
FLIP-38
FLIP-37
© 2019 Ververica
New Planner in a Unified Architecture
FLIP-32
© 2019 Ververica
Architecture before Flink 1.9
Internal Stream API
Runtime
DataSet
ExecutionEnvironment
DataStream
StreamExecutionEnvironment
Table / SQL
BatchTableEnvironment StreamTableEnvironment
Does this look unified?
© 2019 Ververica
Architecture in Flink 1.9+
Internal Stream API
Runtime
DataStream
StreamExecutionEnvironment
Table / SQL
(Stream)TableEnvironment
© 2019 Ververica
Alibaba’s Contribution of Blink
• A truly unified runtime operator stack
• Many more runtime operators for better SQL coverage
• Proper cost model for planning
• Improved data structures (sorters, hash tables) and serializers for operating
on binary data
• Support all TPC-H and TPC-DS Queries
• And much more...
© 2019 Ververica
How can we merge Blink gradually?
• Separate the API from the query processor
• Make the query processor pluggable
• Reduce technical debt in the API on the way
– Make the API Scala-free (private members were public in Java !)
– Remove API design fails (nobody needs 7 TableEnvironments or TypeInformation in SQL ")
– Allow pure table programs (regular table users don’t need DataStream API #)
Old Planner Blink Planner
Table / SQL
© 2019 Ververica
How can we merge Blink gradually?
time
© 2019 Ververica
An API is growing up
FLIP-29 / FLIP-30 / FLIP-55 / FLIP-64
© 2019 Ververica
Separation of Concerns
flink-table
flink-table-common
à SPI interfaces for connectors, catalogs, UDFs
flink-table-api-java
à Pure Java table programs
flink-table-api-scala
à Pure Scala table programs
flink-table-api-java-bridge
à Programs that interact with Java DataStream API
flink-table-api-scala-bridge
à Programs that interact with Scala DataStream API
flink-table-planner
flink-table-planner-blink
© 2019 Ververica
Pure Table Programs
val settings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inBatchMode()
.build()
val env = TableEnvironment.create(settings)
env.registerCatalog("enterprise_hive", new HiveCatalog(...))
env.registerCatalog("enterprise_kafka", new SchemaRegistry(...))
env.sqlUpdate("""
INSERT INTO enterprise_hive.sensitive.customers
SELECT * FROM enterprise_kafka.topics.customer
""")
env.execute("ETL pipeline")
Deep catalog integration.
No (Stream)ExecutionEnvironment.
© 2019 Ververica
Pure Table Programs
val settings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inBatchMode()
.build()
val env = TableEnvironment.create(settings)
env.registerCatalog("enterprise_hive", new HiveCatalog(...))
env.registerCatalog("enterprise_kafka", new SchemaRegistry(...))
env.sqlUpdate("""
INSERT INTO enterprise_hive.sensitive.customers
SELECT * FROM enterprise_kafka.topics.customer
""")
env.execute("ETL pipeline")
Goal: no batch/streaming mode + only 1 planner
© 2019 Ververica
Pure Table Programs
env.createTemporaryFunction("parse", classOf[JsonParser])
val data = env.sqlQuery("""
SELECT parse(json) AS customer
FROM enterprise_hive.sensitive.customers
""")
val enrichedData = data
.select('json.flatten())
.dropColumns(12 to 34)
.addColumns('firstname + " " + 'lastname as 'fullname)
env.createView("enrichedData", enrichedData)
Unified functions for Java/Scala.
Functionality beyond SQL.
But still integrated into catalogs.
© 2019 Ververica
A good Type System as a Basis
FLIP-37 / FLIP-65
© 2019 Ververica
What is the Input and Output Type?
case class Customer(name: String, balance: BigDecimal)
class TableFunction[Row] {
def eval(customer: Customer) {
val outputRow = // ... some transformation
this.collect(outputRow)
}
}
class TableFunction[Customer] {
def eval(row: Row) {
val outputCustomer = // ... some transformation
this.collect(outputCustomer)
}
}
© 2019 Ververica
New Data Type Abstraction
• Uncoupled from Flink’s TypeInformation or TypeSerializers
• 24 types defined with parser syntax, semantics, boundaries
• DataType = logical type + runtime class hint for edges of the API
import org.apache.flink.table.api.DataTypes._
ROW(
FIELD("name", VARCHAR(200)),
FIELD("balance", DECIMAL(5, 3))
TIMESTAMP(3).bridgedTo(classOf[java.time.LocalDateTime])
© 2019 Ververica
New Type Inference
case class Customer(name: String, @DataTypeHint("DECIMAL(4, 2)") balance: BigDecimal)
@FunctionHint(output = @DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>"))
class TableFunction[Row] {
def eval(customer: Customer) {
val outputRow = // ... some transformation
this.collect(outputRow)
}
}
class TableFunction[Customer] {
def eval(@DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>") row: Row) {
val outputCustomer = // ... some transformation
this.collect(outputCustomer)
}
}
© 2019 Ververica
New Type Inference
case class Customer(name: String, @DataTypeHint("DECIMAL(4, 2)") balance: BigDecimal)
@FunctionHint(output = @DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>"))
class TableFunction[Row] {
def eval(customer: Customer) {
val outputRow = // ... some transformation
this.collect(outputRow)
}
}
class TableFunction[Customer] {
def eval(@DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>") row: Row) {
val outputCustomer = // ... some transformation
this.collect(outputCustomer)
}
}
More needed?
Override getTypeInference() and
implement functions like a pro.
No gap between UDFs and system
functions anymore!
© 2019 Ververica
SQL end-to-end
FLIP-66 / FLIP-68 / FLIP-69 / FLIP-59
© 2019 Ververica
LOAD MODULE string_utils;
LOAD MODULE ml_utils;
SET exec.auto-watermark-interval = '400 ms';
SET exec.max-parallelism = '128';
SET table.optimizer.join-reorder-enabled = 'true';
CREATE TABLE kafka_source (
user_id STRING,
log_ts TIMESTAMP(3)
WATERMARK FOR log_ts AS log_ts - INTERVAL '5' SECOND
) WITH (
'connector.type' = 'kafka',
'connector.version' = '0.10',
'connector.topic' = 'topic_name',
'format.type' = 'json'
);
CREATE TABLE kafka_sink (user_id STRING, ...) WITH (...);
INSERT INTO kafka_sink SELECT ...
What is Missing in Flink SQL?
© 2019 Ververica
The Flink Python Table API*
*in Flink 1.9.0
© 2019 Ververica28
Introducing the new Python Table API
• The new Python API was introduced in Flink 1.9.0 (FLIP-38)
• The older DataSet Python API and DataStream Python API were removed in
Flink 1.9.0
Goals/Features in Flink 1.9.0
• Support relational, LINQ-style queries
written in Python
• Support SQL queries, including DDL
• Support working with existing
Table/SQL connector ecosystem
Non-Goals in Flink 1.9.0
• User-defined functions written in
Python
© 2019 Ververica29
Python Table API Example
exec_env = ExecutionEnvironment.get_execution_environment()
exec_env.set_parallelism(1)
t_config = TableConfig()
t_env = BatchTableEnvironment.create(exec_env, t_config)
# connector definitions omitted
t_env.scan(‚mySource') 
.group_by('word') 
.select('word, count(1)') 
.insert_into('mySink')
t_env.execute("WordCount in Python")
© 2019 Ververica30
Some Assembly Required (for now)
$ mvn clean install -DskipTests –Dfast
$ cd flink-python
$ python3 setup.py sdist
$ pip install dist/*.tar.gz
• Right now, pyflink needs to be built form source, we‘re working on getting it into
pypi
• Download from https://flink.apache.org/downloads.html
© 2019 Ververica31
Defining Sources and Sinks (using builders)
t_env.connect(FileSystem().path('/tmp/input')) 
.with_format(OldCsv()
.line_delimiter('n')
.field('word', DataTypes.STRING())) 
.with_schema(Schema()
.field('word', DataTypes.STRING())) 
.register_table_source('mySource')
© 2019 Ververica32
Defining Sources and Sinks (using builders)
t_env.connect(FileSystem().path('/tmp/output')) 
.with_format(OldCsv()
.field_delimiter(',')
.field('word', DataTypes.STRING())
.field('count', DataTypes.BIGINT())) 
.with_schema(Schema()
.field('word', DataTypes.STRING())
.field('mycount', DataTypes.BIGINT())) 
.register_table_sink('mySink')
© 2019 Ververica33
Defining Sources and Sinks (using DDL)
source_ddl = 
'''
create table mySource(
word varchar
) with (
'connector.type' = 'filesystem',
'connector.path' = '/tmp/input',
'format.type' = 'csv',
'format.fields.0.name' = 'word',
'format.fields.0.type' = 'VARCHAR'
)
'''
t_env.sql_update(source_ddl)
© 2019 Ververica34
Defining Sources and Sinks (using DDL)
sink_ddl = 
'''
create table mySink(
word VARCHAR,
cnt BIGINT
) with (
'connector.type' = 'filesystem',
'connector.path' = '/tmp/output',
'format.type' = 'csv',
'format.fields.0.name' = 'word',
'format.fields.0.type' = 'VARCHAR',
'format.fields.1.name' = 'mycount',
'format.fields.1.type' = 'BIGINT'
)
'''
t_env.sql_update(sink_ddl)
© 2019 Ververica35
Running SQL Queries
t_env.scan('mySource') 
.group_by('word') 
.select('word, count(1)') 
.insert_into('mySink')
t_env.sql_update('''
INSERT INTO mySink SELECT word, COUNT(1)
FROM mySource GROUP BY word
''')
© 2019 Ververica
User-defined Python Functions
© 2019 Ververica37
A Preview of FLIP-58: User-defined Python functions
• Problem:
–Flink runs on the JVM
–Proper Python does not run on the JVM
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
• Solution:
© 2019 Ververica38
Resources
• All FLIPs:
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Propos
als
• Table documentation: https://ci.apache.org/projects/flink/flink-docs-release-
1.9/dev/table/
• Building the new Python Table API: https://ci.apache.org/projects/flink/flink-
docs-release-1.9/flinkDev/building.html#build-pyflink
• Python Table API tutorial: https://ci.apache.org/projects/flink/flink-docs-release-
1.9/tutorials/python_table_api.html
• Python Table API documentation: https://ci.apache.org/projects/flink/flink-docs-
release-1.9/api/python/index.html
© 2019 Ververica
Thank you!
Questions?
© 2019 Ververica
www.ververica.com @VervericaDataaljoscha@ververica.com
timo@ververica.com

More Related Content

What's hot

Using state-engine-as-sca-component-final
Using state-engine-as-sca-component-finalUsing state-engine-as-sca-component-final
Using state-engine-as-sca-component-finalGuido Schmutz
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperJeff Smith
 
SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...
SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...
SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...Eric Shupps
 
SPS Belgium 2012 - End to End Security for SharePoint Farms - Michael Noel
SPS Belgium 2012 - End to End Security for SharePoint Farms - Michael NoelSPS Belgium 2012 - End to End Security for SharePoint Farms - Michael Noel
SPS Belgium 2012 - End to End Security for SharePoint Farms - Michael NoelMichael Noel
 
Automating Workload AE and DE; Agent Deployment and Configuration
Automating Workload AE and DE; Agent Deployment and ConfigurationAutomating Workload AE and DE; Agent Deployment and Configuration
Automating Workload AE and DE; Agent Deployment and ConfigurationCA Technologies
 
JohnConnollyResumePerformance2017
JohnConnollyResumePerformance2017JohnConnollyResumePerformance2017
JohnConnollyResumePerformance2017John Connolly
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Finaljucaab
 
Oracle Text in APEX
Oracle Text in APEXOracle Text in APEX
Oracle Text in APEXScott Wesley
 
Integration of APEX and Oracle Forms
Integration of APEX and Oracle FormsIntegration of APEX and Oracle Forms
Integration of APEX and Oracle FormsRoel Hartman
 
Spring Portlet MVC
Spring Portlet MVCSpring Portlet MVC
Spring Portlet MVCJohn Lewis
 
Pulsar Functions Deep Dive
Pulsar Functions Deep DivePulsar Functions Deep Dive
Pulsar Functions Deep DiveData Con LA
 
Oracle WebLogic: Feature Timeline from WLS9 to WLS 12c
Oracle WebLogic: Feature Timeline from WLS9 to WLS 12cOracle WebLogic: Feature Timeline from WLS9 to WLS 12c
Oracle WebLogic: Feature Timeline from WLS9 to WLS 12cfrankmunz
 
E business suite r12.2 changes for database administrators
E business suite r12.2 changes for database administratorsE business suite r12.2 changes for database administrators
E business suite r12.2 changes for database administratorsSrinivasa Pavan Marti
 
Shorten All URLs
Shorten All URLsShorten All URLs
Shorten All URLsJakarta_EE
 
GoldenGate Monitoring - GOUSER - 4/2014
GoldenGate Monitoring - GOUSER - 4/2014GoldenGate Monitoring - GOUSER - 4/2014
GoldenGate Monitoring - GOUSER - 4/2014Bobby Curtis
 
Five Cool Use Cases for the Spring Component in Oracle SOA Suite
Five Cool Use Cases for the Spring Component in Oracle SOA SuiteFive Cool Use Cases for the Spring Component in Oracle SOA Suite
Five Cool Use Cases for the Spring Component in Oracle SOA SuiteGuido Schmutz
 
Creating Single Page Applications with Oracle Apex
Creating Single Page Applications with Oracle ApexCreating Single Page Applications with Oracle Apex
Creating Single Page Applications with Oracle ApexDick Dral
 
Enterprise Software Architecture styles
Enterprise Software Architecture stylesEnterprise Software Architecture styles
Enterprise Software Architecture stylesAraf Karsh Hamid
 
Oracle apps file system
Oracle apps file systemOracle apps file system
Oracle apps file systemVivek Verma
 

What's hot (20)

Using state-engine-as-sca-component-final
Using state-engine-as-sca-component-finalUsing state-engine-as-sca-component-final
Using state-engine-as-sca-component-final
 
Mobile
MobileMobile
Mobile
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
 
SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...
SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...
SharePoint 24x7x365 Architecting for High Availability, Fault Tolerance and D...
 
SPS Belgium 2012 - End to End Security for SharePoint Farms - Michael Noel
SPS Belgium 2012 - End to End Security for SharePoint Farms - Michael NoelSPS Belgium 2012 - End to End Security for SharePoint Farms - Michael Noel
SPS Belgium 2012 - End to End Security for SharePoint Farms - Michael Noel
 
Automating Workload AE and DE; Agent Deployment and Configuration
Automating Workload AE and DE; Agent Deployment and ConfigurationAutomating Workload AE and DE; Agent Deployment and Configuration
Automating Workload AE and DE; Agent Deployment and Configuration
 
JohnConnollyResumePerformance2017
JohnConnollyResumePerformance2017JohnConnollyResumePerformance2017
JohnConnollyResumePerformance2017
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Oracle Text in APEX
Oracle Text in APEXOracle Text in APEX
Oracle Text in APEX
 
Integration of APEX and Oracle Forms
Integration of APEX and Oracle FormsIntegration of APEX and Oracle Forms
Integration of APEX and Oracle Forms
 
Spring Portlet MVC
Spring Portlet MVCSpring Portlet MVC
Spring Portlet MVC
 
Pulsar Functions Deep Dive
Pulsar Functions Deep DivePulsar Functions Deep Dive
Pulsar Functions Deep Dive
 
Oracle WebLogic: Feature Timeline from WLS9 to WLS 12c
Oracle WebLogic: Feature Timeline from WLS9 to WLS 12cOracle WebLogic: Feature Timeline from WLS9 to WLS 12c
Oracle WebLogic: Feature Timeline from WLS9 to WLS 12c
 
E business suite r12.2 changes for database administrators
E business suite r12.2 changes for database administratorsE business suite r12.2 changes for database administrators
E business suite r12.2 changes for database administrators
 
Shorten All URLs
Shorten All URLsShorten All URLs
Shorten All URLs
 
GoldenGate Monitoring - GOUSER - 4/2014
GoldenGate Monitoring - GOUSER - 4/2014GoldenGate Monitoring - GOUSER - 4/2014
GoldenGate Monitoring - GOUSER - 4/2014
 
Five Cool Use Cases for the Spring Component in Oracle SOA Suite
Five Cool Use Cases for the Spring Component in Oracle SOA SuiteFive Cool Use Cases for the Spring Component in Oracle SOA Suite
Five Cool Use Cases for the Spring Component in Oracle SOA Suite
 
Creating Single Page Applications with Oracle Apex
Creating Single Page Applications with Oracle ApexCreating Single Page Applications with Oracle Apex
Creating Single Page Applications with Oracle Apex
 
Enterprise Software Architecture styles
Enterprise Software Architecture stylesEnterprise Software Architecture styles
Enterprise Software Architecture styles
 
Oracle apps file system
Oracle apps file systemOracle apps file system
Oracle apps file system
 

Similar to What's new for Apache Flink's Table & SQL APIs?

Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021StreamNative
 
ApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platformApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platformNicola Ferraro
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareHostedbyConfluent
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuVMware Tanzu
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...Lightbend
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster inwin stack
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
NETCONF & YANG Enablement of Network Devices
NETCONF & YANG Enablement of Network DevicesNETCONF & YANG Enablement of Network Devices
NETCONF & YANG Enablement of Network DevicesCisco DevNet
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Flink Forward
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Flink Forward
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012Lucas Jellema
 

Similar to What's new for Apache Flink's Table & SQL APIs? (20)

Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
ApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platformApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platform
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
NETCONF & YANG Enablement of Network Devices
NETCONF & YANG Enablement of Network DevicesNETCONF & YANG Enablement of Network Devices
NETCONF & YANG Enablement of Network Devices
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

What's new for Apache Flink's Table & SQL APIs?

  • 1. © 2019 Ververica Aljoscha Krettek – Software Engineer, Flink PMC, Beam PMC Timo Walther – Software Engineer, Flink PMC What's new for Flink's Table & SQL APIs? Planners, Python, DDL, and more!
  • 2. © 2019 Ververica • Very expressive stream processing API – Transform, aggregate, and join events – Java and Scala • Control how events are processed with respect to time – Timestamps, Watermarks, Windows, Timers, Triggers, Allowed Lateness, … • Maintain and update application state – Keyed state, operator state, state backends, checkpointing, … The DataStream API is great…
  • 3. © 2019 Ververica • Writing distributed programs is not always easy – Stream processing technology spreads rapidly – New concepts (time, state, ...) • Requires knowledge & skill – Continous applications have special requirements – Programming experience (Java / Scala) • Users want to focus on their business logic … but it‘s not made for everyone.
  • 4. © 2019 Ververica • Relational APIs are declarative – User says what is needed, system decides how to compute it • Queries can be effectively optimized – Less imperative black-box code – Well-researched field • Queries are efficiently executed – Let Flink deal with state and time • ”Everybody” knows and uses SQL Why not SQL (or another relational API)?
  • 5. © 2019 Ververica Apache Flink’s Relational APIs Unified APIs for batch & streaming data A query specifies exactly the same result regardless whether its input is static batch data or streaming data. tableEnvironment .scan("clicks") .groupBy('user) .select('user, 'url.count as 'cnt) SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user LINQ-style Table APIANSI SQL
  • 6. © 2019 Ververica6 This is joint work with members of the Apache Flink community.
  • 7. © 2019 Ververica7 Some of this presents work that is in progress in the Flink community. Other things are planned and/or have design documents. Some were discussed at one point or another on the mailing lists or in person. This represents our understanding of the current state, this is not a fixed roadmap, Flink is an open-source Apache project.
  • 8. © 2019 Ververica Evolution in Progress… FLIP-32 FLIP-55 FLIP-29 FLIP-30 FLIP-57 FLIP-29 FLIP-64 FLIP-68 FLIP-65 FLIP-51 FLIP-69 FLIP-66 FLIP-38 FLIP-37
  • 9. © 2019 Ververica New Planner in a Unified Architecture FLIP-32
  • 10. © 2019 Ververica Architecture before Flink 1.9 Internal Stream API Runtime DataSet ExecutionEnvironment DataStream StreamExecutionEnvironment Table / SQL BatchTableEnvironment StreamTableEnvironment Does this look unified?
  • 11. © 2019 Ververica Architecture in Flink 1.9+ Internal Stream API Runtime DataStream StreamExecutionEnvironment Table / SQL (Stream)TableEnvironment
  • 12. © 2019 Ververica Alibaba’s Contribution of Blink • A truly unified runtime operator stack • Many more runtime operators for better SQL coverage • Proper cost model for planning • Improved data structures (sorters, hash tables) and serializers for operating on binary data • Support all TPC-H and TPC-DS Queries • And much more...
  • 13. © 2019 Ververica How can we merge Blink gradually? • Separate the API from the query processor • Make the query processor pluggable • Reduce technical debt in the API on the way – Make the API Scala-free (private members were public in Java !) – Remove API design fails (nobody needs 7 TableEnvironments or TypeInformation in SQL ") – Allow pure table programs (regular table users don’t need DataStream API #) Old Planner Blink Planner Table / SQL
  • 14. © 2019 Ververica How can we merge Blink gradually? time
  • 15. © 2019 Ververica An API is growing up FLIP-29 / FLIP-30 / FLIP-55 / FLIP-64
  • 16. © 2019 Ververica Separation of Concerns flink-table flink-table-common à SPI interfaces for connectors, catalogs, UDFs flink-table-api-java à Pure Java table programs flink-table-api-scala à Pure Scala table programs flink-table-api-java-bridge à Programs that interact with Java DataStream API flink-table-api-scala-bridge à Programs that interact with Scala DataStream API flink-table-planner flink-table-planner-blink
  • 17. © 2019 Ververica Pure Table Programs val settings = EnvironmentSettings.newInstance() .useBlinkPlanner() .inBatchMode() .build() val env = TableEnvironment.create(settings) env.registerCatalog("enterprise_hive", new HiveCatalog(...)) env.registerCatalog("enterprise_kafka", new SchemaRegistry(...)) env.sqlUpdate(""" INSERT INTO enterprise_hive.sensitive.customers SELECT * FROM enterprise_kafka.topics.customer """) env.execute("ETL pipeline") Deep catalog integration. No (Stream)ExecutionEnvironment.
  • 18. © 2019 Ververica Pure Table Programs val settings = EnvironmentSettings.newInstance() .useBlinkPlanner() .inBatchMode() .build() val env = TableEnvironment.create(settings) env.registerCatalog("enterprise_hive", new HiveCatalog(...)) env.registerCatalog("enterprise_kafka", new SchemaRegistry(...)) env.sqlUpdate(""" INSERT INTO enterprise_hive.sensitive.customers SELECT * FROM enterprise_kafka.topics.customer """) env.execute("ETL pipeline") Goal: no batch/streaming mode + only 1 planner
  • 19. © 2019 Ververica Pure Table Programs env.createTemporaryFunction("parse", classOf[JsonParser]) val data = env.sqlQuery(""" SELECT parse(json) AS customer FROM enterprise_hive.sensitive.customers """) val enrichedData = data .select('json.flatten()) .dropColumns(12 to 34) .addColumns('firstname + " " + 'lastname as 'fullname) env.createView("enrichedData", enrichedData) Unified functions for Java/Scala. Functionality beyond SQL. But still integrated into catalogs.
  • 20. © 2019 Ververica A good Type System as a Basis FLIP-37 / FLIP-65
  • 21. © 2019 Ververica What is the Input and Output Type? case class Customer(name: String, balance: BigDecimal) class TableFunction[Row] { def eval(customer: Customer) { val outputRow = // ... some transformation this.collect(outputRow) } } class TableFunction[Customer] { def eval(row: Row) { val outputCustomer = // ... some transformation this.collect(outputCustomer) } }
  • 22. © 2019 Ververica New Data Type Abstraction • Uncoupled from Flink’s TypeInformation or TypeSerializers • 24 types defined with parser syntax, semantics, boundaries • DataType = logical type + runtime class hint for edges of the API import org.apache.flink.table.api.DataTypes._ ROW( FIELD("name", VARCHAR(200)), FIELD("balance", DECIMAL(5, 3)) TIMESTAMP(3).bridgedTo(classOf[java.time.LocalDateTime])
  • 23. © 2019 Ververica New Type Inference case class Customer(name: String, @DataTypeHint("DECIMAL(4, 2)") balance: BigDecimal) @FunctionHint(output = @DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>")) class TableFunction[Row] { def eval(customer: Customer) { val outputRow = // ... some transformation this.collect(outputRow) } } class TableFunction[Customer] { def eval(@DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>") row: Row) { val outputCustomer = // ... some transformation this.collect(outputCustomer) } }
  • 24. © 2019 Ververica New Type Inference case class Customer(name: String, @DataTypeHint("DECIMAL(4, 2)") balance: BigDecimal) @FunctionHint(output = @DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>")) class TableFunction[Row] { def eval(customer: Customer) { val outputRow = // ... some transformation this.collect(outputRow) } } class TableFunction[Customer] { def eval(@DataTypeHint("ROW<name STRING, balance DECIMAL(4, 2)>") row: Row) { val outputCustomer = // ... some transformation this.collect(outputCustomer) } } More needed? Override getTypeInference() and implement functions like a pro. No gap between UDFs and system functions anymore!
  • 25. © 2019 Ververica SQL end-to-end FLIP-66 / FLIP-68 / FLIP-69 / FLIP-59
  • 26. © 2019 Ververica LOAD MODULE string_utils; LOAD MODULE ml_utils; SET exec.auto-watermark-interval = '400 ms'; SET exec.max-parallelism = '128'; SET table.optimizer.join-reorder-enabled = 'true'; CREATE TABLE kafka_source ( user_id STRING, log_ts TIMESTAMP(3) WATERMARK FOR log_ts AS log_ts - INTERVAL '5' SECOND ) WITH ( 'connector.type' = 'kafka', 'connector.version' = '0.10', 'connector.topic' = 'topic_name', 'format.type' = 'json' ); CREATE TABLE kafka_sink (user_id STRING, ...) WITH (...); INSERT INTO kafka_sink SELECT ... What is Missing in Flink SQL?
  • 27. © 2019 Ververica The Flink Python Table API* *in Flink 1.9.0
  • 28. © 2019 Ververica28 Introducing the new Python Table API • The new Python API was introduced in Flink 1.9.0 (FLIP-38) • The older DataSet Python API and DataStream Python API were removed in Flink 1.9.0 Goals/Features in Flink 1.9.0 • Support relational, LINQ-style queries written in Python • Support SQL queries, including DDL • Support working with existing Table/SQL connector ecosystem Non-Goals in Flink 1.9.0 • User-defined functions written in Python
  • 29. © 2019 Ververica29 Python Table API Example exec_env = ExecutionEnvironment.get_execution_environment() exec_env.set_parallelism(1) t_config = TableConfig() t_env = BatchTableEnvironment.create(exec_env, t_config) # connector definitions omitted t_env.scan(‚mySource') .group_by('word') .select('word, count(1)') .insert_into('mySink') t_env.execute("WordCount in Python")
  • 30. © 2019 Ververica30 Some Assembly Required (for now) $ mvn clean install -DskipTests –Dfast $ cd flink-python $ python3 setup.py sdist $ pip install dist/*.tar.gz • Right now, pyflink needs to be built form source, we‘re working on getting it into pypi • Download from https://flink.apache.org/downloads.html
  • 31. © 2019 Ververica31 Defining Sources and Sinks (using builders) t_env.connect(FileSystem().path('/tmp/input')) .with_format(OldCsv() .line_delimiter('n') .field('word', DataTypes.STRING())) .with_schema(Schema() .field('word', DataTypes.STRING())) .register_table_source('mySource')
  • 32. © 2019 Ververica32 Defining Sources and Sinks (using builders) t_env.connect(FileSystem().path('/tmp/output')) .with_format(OldCsv() .field_delimiter(',') .field('word', DataTypes.STRING()) .field('count', DataTypes.BIGINT())) .with_schema(Schema() .field('word', DataTypes.STRING()) .field('mycount', DataTypes.BIGINT())) .register_table_sink('mySink')
  • 33. © 2019 Ververica33 Defining Sources and Sinks (using DDL) source_ddl = ''' create table mySource( word varchar ) with ( 'connector.type' = 'filesystem', 'connector.path' = '/tmp/input', 'format.type' = 'csv', 'format.fields.0.name' = 'word', 'format.fields.0.type' = 'VARCHAR' ) ''' t_env.sql_update(source_ddl)
  • 34. © 2019 Ververica34 Defining Sources and Sinks (using DDL) sink_ddl = ''' create table mySink( word VARCHAR, cnt BIGINT ) with ( 'connector.type' = 'filesystem', 'connector.path' = '/tmp/output', 'format.type' = 'csv', 'format.fields.0.name' = 'word', 'format.fields.0.type' = 'VARCHAR', 'format.fields.1.name' = 'mycount', 'format.fields.1.type' = 'BIGINT' ) ''' t_env.sql_update(sink_ddl)
  • 35. © 2019 Ververica35 Running SQL Queries t_env.scan('mySource') .group_by('word') .select('word, count(1)') .insert_into('mySink') t_env.sql_update(''' INSERT INTO mySink SELECT word, COUNT(1) FROM mySource GROUP BY word ''')
  • 36. © 2019 Ververica User-defined Python Functions
  • 37. © 2019 Ververica37 A Preview of FLIP-58: User-defined Python functions • Problem: –Flink runs on the JVM –Proper Python does not run on the JVM https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table • Solution:
  • 38. © 2019 Ververica38 Resources • All FLIPs: https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Propos als • Table documentation: https://ci.apache.org/projects/flink/flink-docs-release- 1.9/dev/table/ • Building the new Python Table API: https://ci.apache.org/projects/flink/flink- docs-release-1.9/flinkDev/building.html#build-pyflink • Python Table API tutorial: https://ci.apache.org/projects/flink/flink-docs-release- 1.9/tutorials/python_table_api.html • Python Table API documentation: https://ci.apache.org/projects/flink/flink-docs- release-1.9/api/python/index.html
  • 39. © 2019 Ververica Thank you! Questions?
  • 40. © 2019 Ververica www.ververica.com @VervericaDataaljoscha@ververica.com timo@ververica.com