SlideShare a Scribd company logo
1
Timo Walther
Apache Flink PMC
@twalthr
With slides from Fabian Hueske
Flink Meetup @ Amsterdam, March 2nd, 2017
Table & SQL API
unified APIs for batch and stream processing
2
Original creators of
Apache Flink®
Providers of the
dA Platform, a supported
Flink distribution
Motivation
3
DataStream API is not for Everyone
4
§ Writing DataStream programs is not easy
• Stream processing technology spreads rapidly
§ Requires Knowledge & Skill
• Stream processing concepts (time, state, windows, ...)
• Programming experience (Java / Scala)
§ Program logic goes into UDFs
• great for expressiveness
• bad for optimization - need for manual tuning
Why not a Relational API?
5
§ Relational APIs are declarative
• User says what is needed
• System decides how to compute it
§ Users do not specify implementation
§ Queries are efficiently executed
§ “Everybody” knows SQL!
Goals
§ Flink is a platform for distributed stream and batch data
processing
§ Relational APIs as a unifying layer
• Queries on batch tables terminate and produce a finite result
• Queries on streaming tables run continuously and produce
result stream
§ Same syntax & semantics for both queries
6
Table API & SQL
7
Table API & SQL
§ Flink features two relational APIs
• Table API: LINQ-style API for Java & Scala (since Flink 0.9.0)
• SQL: Standard SQL (since Flink 1.1.0)
§ Equivalent feature set (at the moment)
• Table API and SQL can be mixed
§ Both are tightly integrated with Flink’s core APIs
• DataStream
• DataSet
8
Table API Example
9
val sensorData: DataStream[(String, Long, Double)] = ???
// convert DataSet into Table
val sensorTable: Table = sensorData
.toTable(tableEnv, 'location, ’time, 'tempF)
// define query on Table
val avgTempCTable: Table = sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
SQL Example
10
val sensorData: DataStream[(String, Long, Double)] = ???
// register DataStream
tableEnv.registerDataStream(
"sensorData", sensorData, 'location, ’time, 'tempF)
// query registered Table
val avgTempCTable: Table = tableEnv
.sql("""
SELECT FLOOR(rowtime() TO DAY) AS day, location,
AVG((tempF - 32) * 0.556) AS avgTempC
FROM sensorData
WHERE location LIKE 'room%'
GROUP BY location, FLOOR(rowtime() TO DAY) """)
Architecture
2 APIs [SQL, Table API]
*
2 backends [DataStream, DataSet]
=
4 different translation paths?
11
Architecture
12
Architecture
§ Table API and SQL queries
are translated into common
logical plan representation.
§ Logical plans are translated
and optimized depending on
execution backend.
§ Plans are transformed into
DataSet or DataStream
programs.
13
Translation to Logical Plan
14
sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
Translation to Optimized Plan
15
Translation to Flink Program
16
Current State (in master)
§ Batch SQL & Table API support
• Selection, Projection, Sort, Inner & Outer Joins, Set operations
• Windows for Slide, Tumble, Session
§ Streaming Table API support
• Selection, Projection, Union
• Windows for Slide, Tumble, Session
§ Streaming SQL
• Selection, Projection, Union, Tumble, but …
17
Use Cases for Streaming SQL
§ Continuous ETL & Data Import
§ Live Dashboards & Reports
§ Ad-hoc Analytics & Exploration
18
Outlook: Dynamic Tables
19
Dynamic Tables
§ Dynamic tables change over time
§ Dynamic tables are treated like static batch tables
• Dynamic tables are queried with standard SQL
• A query returns another dynamic table
§ Stream ←→ Dynamic Table conversions without
information loss
• “Stream / Table Duality”
20
Stream to Dynamic Tables
§ Append:
§ Replace by key:
21
Querying Dynamic Tables
§ Dynamic tables change over time
• A[t]: Table A at time t
§ Dynamic tables are queried with regular SQL
• Result of a query changes as input table changes
• q(A[t]): Evaluate query q on table A at time t
§ Query result is continuously updated as t progresses
• Similar to maintaining a materialized view
• t is current event time
22
Querying Dynamic Tables
23
Querying Dynamic Tables
§ Can we run any query on Dynamic Tables? No!
§ State may not grow infinitely as more data arrives
• Set clean-up timeout or key constraints.
§ Input may only trigger partial re-computation
§ Queries with possibly unbounded state or computation
are rejected
24
Dynamic Tables to Stream
§ Update:
25
Dynamic Tables to Stream
§ Add/Retract:
26
Result computation & refinement
27
Contributions welcome!
§ Huge interest and many contributors
• Adding more window operators
• Introducing dynamic tables
§ And there is a lot more to do
• New operators and features for streaming and batch
• Performance improvements
• Tooling and integration
§ Try it out, give feedback, and start contributing!
28
29
One day of hands-on Flink
training
One day of conference
Tickets are on sale
Please visit our website:
http://sf.flink-forward.org
Follow us on Twitter:
@FlinkForward
We are hiring!
data-artisans.com/careers
3
Thank you!
@twalthr
@ApacheFlink
@dataArtisans

More Related Content

What's hot

Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkWhy and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache Flink
Fabian Hueske
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward
 
Optimizing Your Cloud Applications in RightScale
Optimizing Your Cloud Applications in RightScaleOptimizing Your Cloud Applications in RightScale
Optimizing Your Cloud Applications in RightScale
RightScale
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Ververica
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward
 
Akka Streams
Akka StreamsAkka Streams
Akka Streams
Diego Pacheco
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Online index rebuild automation
Online index rebuild automationOnline index rebuild automation
Online index rebuild automation
Carlos Sierra
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 

What's hot (11)

Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkWhy and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache Flink
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
 
Optimizing Your Cloud Applications in RightScale
Optimizing Your Cloud Applications in RightScaleOptimizing Your Cloud Applications in RightScale
Optimizing Your Cloud Applications in RightScale
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
 
Akka Streams
Akka StreamsAkka Streams
Akka Streams
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
 
Online index rebuild automation
Online index rebuild automationOnline index rebuild automation
Online index rebuild automation
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 

Viewers also liked

Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
Gerard Maas
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
Ben Stopford
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski
 
[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast
Viktor Gamov
 
Stream all the things
Stream all the thingsStream all the things
Stream all the things
Dean Wampler
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
Amazon Web Services
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
Hari Shreedharan
 
Apache Beam
Apache Beam Apache Beam
Apache Beam
Adil Oulghard
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
Prabhu Thukkaram
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration Patterns
Manuel Hurtado
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
Cloudera, Inc.
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Slim Baltagi
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
Ted Won
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
Ben Stopford
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2
 

Viewers also liked (20)

Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
 
[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast
 
Stream all the things
Stream all the thingsStream all the things
Stream all the things
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
 
Apache Beam
Apache Beam Apache Beam
Apache Beam
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration Patterns
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 

Similar to Apache Flink's Table & SQL API - unified APIs for batch and stream processing

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
C4Media
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
Sveta Smirnova
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
Richie Rump
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite
Hortonworks
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
Julian Hyde
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User Group
Remus Rusanu
 
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaaPerfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Cuneyt Goksu
 

Similar to Apache Flink's Table & SQL API - unified APIs for batch and stream processing (20)

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User Group
 
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaaPerfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
 

Recently uploaded

Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
Ayan Halder
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 

Recently uploaded (20)

Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 

Apache Flink's Table & SQL API - unified APIs for batch and stream processing

  • 1. 1 Timo Walther Apache Flink PMC @twalthr With slides from Fabian Hueske Flink Meetup @ Amsterdam, March 2nd, 2017 Table & SQL API unified APIs for batch and stream processing
  • 2. 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  • 4. DataStream API is not for Everyone 4 § Writing DataStream programs is not easy • Stream processing technology spreads rapidly § Requires Knowledge & Skill • Stream processing concepts (time, state, windows, ...) • Programming experience (Java / Scala) § Program logic goes into UDFs • great for expressiveness • bad for optimization - need for manual tuning
  • 5. Why not a Relational API? 5 § Relational APIs are declarative • User says what is needed • System decides how to compute it § Users do not specify implementation § Queries are efficiently executed § “Everybody” knows SQL!
  • 6. Goals § Flink is a platform for distributed stream and batch data processing § Relational APIs as a unifying layer • Queries on batch tables terminate and produce a finite result • Queries on streaming tables run continuously and produce result stream § Same syntax & semantics for both queries 6
  • 7. Table API & SQL 7
  • 8. Table API & SQL § Flink features two relational APIs • Table API: LINQ-style API for Java & Scala (since Flink 0.9.0) • SQL: Standard SQL (since Flink 1.1.0) § Equivalent feature set (at the moment) • Table API and SQL can be mixed § Both are tightly integrated with Flink’s core APIs • DataStream • DataSet 8
  • 9. Table API Example 9 val sensorData: DataStream[(String, Long, Double)] = ??? // convert DataSet into Table val sensorTable: Table = sensorData .toTable(tableEnv, 'location, ’time, 'tempF) // define query on Table val avgTempCTable: Table = sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  • 10. SQL Example 10 val sensorData: DataStream[(String, Long, Double)] = ??? // register DataStream tableEnv.registerDataStream( "sensorData", sensorData, 'location, ’time, 'tempF) // query registered Table val avgTempCTable: Table = tableEnv .sql(""" SELECT FLOOR(rowtime() TO DAY) AS day, location, AVG((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE 'room%' GROUP BY location, FLOOR(rowtime() TO DAY) """)
  • 11. Architecture 2 APIs [SQL, Table API] * 2 backends [DataStream, DataSet] = 4 different translation paths? 11
  • 13. Architecture § Table API and SQL queries are translated into common logical plan representation. § Logical plans are translated and optimized depending on execution backend. § Plans are transformed into DataSet or DataStream programs. 13
  • 14. Translation to Logical Plan 14 sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  • 16. Translation to Flink Program 16
  • 17. Current State (in master) § Batch SQL & Table API support • Selection, Projection, Sort, Inner & Outer Joins, Set operations • Windows for Slide, Tumble, Session § Streaming Table API support • Selection, Projection, Union • Windows for Slide, Tumble, Session § Streaming SQL • Selection, Projection, Union, Tumble, but … 17
  • 18. Use Cases for Streaming SQL § Continuous ETL & Data Import § Live Dashboards & Reports § Ad-hoc Analytics & Exploration 18
  • 20. Dynamic Tables § Dynamic tables change over time § Dynamic tables are treated like static batch tables • Dynamic tables are queried with standard SQL • A query returns another dynamic table § Stream ←→ Dynamic Table conversions without information loss • “Stream / Table Duality” 20
  • 21. Stream to Dynamic Tables § Append: § Replace by key: 21
  • 22. Querying Dynamic Tables § Dynamic tables change over time • A[t]: Table A at time t § Dynamic tables are queried with regular SQL • Result of a query changes as input table changes • q(A[t]): Evaluate query q on table A at time t § Query result is continuously updated as t progresses • Similar to maintaining a materialized view • t is current event time 22
  • 24. Querying Dynamic Tables § Can we run any query on Dynamic Tables? No! § State may not grow infinitely as more data arrives • Set clean-up timeout or key constraints. § Input may only trigger partial re-computation § Queries with possibly unbounded state or computation are rejected 24
  • 25. Dynamic Tables to Stream § Update: 25
  • 26. Dynamic Tables to Stream § Add/Retract: 26
  • 27. Result computation & refinement 27
  • 28. Contributions welcome! § Huge interest and many contributors • Adding more window operators • Introducing dynamic tables § And there is a lot more to do • New operators and features for streaming and batch • Performance improvements • Tooling and integration § Try it out, give feedback, and start contributing! 28
  • 29. 29 One day of hands-on Flink training One day of conference Tickets are on sale Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward