SlideShare a Scribd company logo
Writing an interactive interface for SQL on Flink
How and why we created SQLStreamBuilder—and the lessons learned along the way
Kenny Gorman
Co-Founder and CEO
www.eventador.io
2019 Flink Forward Berlin
Background and motivations
● Eventador.io has offered a managed Flink runtime for a few years now. We
started to see some customer patterns emerge.
● The state of the art today is to write Flink jobs in Java or Scala using the
DataStream/Set API and/or the Table API’s.
● While powerful, the time and expertise needed isn’t trivial. Adoption and time to
market lags.
● Teams are busy writing code. Completely swamped to be precise.
Why SQL anyway?
● SQL is > 30 years old. It’s massively useful for inspecting and reasoning about
data. Everyone knows SQL.
● It’s declarative, just ask for what you want to see.
● It’s been extended to accommodate streaming constructs like windows
(Flink/Calcite).
● Streaming SQL never completes, it’s a query on boundless data.
● It’s an amazing way to interact with streaming data.
Of workloads could be represented
with SQL, and we plan to grow that.
Require more complex logic best
represented in Java/Scala.
80%
20%
What if we could go beyond simply building
processors in SQL - do it interactively, manage
schema’s and make it all easy?
Could building logic on streams be as productive
and intuitive as using a database yet as scalable
and powerful as Flink?
Eventador SQLStreamBuilder
● Interactive SQL editor - create and submit any Flink compatible SQL
● Virtual Table Registry - source/sink + schema definition
● Query Parser - Gives instant feedback
● Job payload management - Builds job payloads
● Flink runner - Takes the payload and runs the job
● Delivered as a cloud service - in your AWS account
Feedback on SQL
execution
Where do I send results?
Where to run the job
The SQL statement Sampling rather than a
result-set
A sample of results
in browser
Schema management - Virtual Table Registry
● SQL requires a schema of typed
columns - streams don’t have have to
have this.
● It’s common to use AVRO (easy to solve
for) but also free-form JSON
● Free form means - a total F**ing mess.
● Sources - Kafka/Kinesis (soon)
● Sinks - Kafka,S3, JDBC, ELK (soon)
SQLStreamBuilder Components
● Interactive SQL interface
○ Handles query creation and submission.
○ Handles feedback from SQLIO
○ Interface to build queries, sources and sinks
○ Python + Vue.js
○ Results are sampled back to interface
● SQL engine (SQLIO)
○ Parse incoming statements
○ Map data sources/sinks
○ Parse schema (Schema Registry+AVRO / JSON)
○ Build POJOs
○ Submit payload to runner (Flink)
○ Java
● Virtual Table Registry
○ Creation of schema for streams
○ AVRO + JSON
○ Python
SQLStreamBuilder (con’t)
● Job Management Interface
○ Stop/Start/Edit/etc
○ Python + Vue.js
○ Uses Flink APIs
● Builder
○ Handles creation of assets via K8’s
○ Python
○ PostgreSQL backend
○ Kubernetes orchestration
● Flink runner
○ Run jobs on Flink 1.8.2
○ Kubernetes orchestration
○ Any Flink compatible SQL statement
Query Lifecycle - Parse
HTTPS POST {
“schema”: { },
“topic”: “myTopic”,
“virtualTable: “myTable”,
“sqlStmt”: “SELECT ….”
}
SQLIO
RESULT (if error)
{ “errorMsg”: “unable to find table myTable” }
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
...
Table parse = mytopic
.select()
.sort()
...
- Validate SQL against Flink/Calcite
- JSON -> AVRO
- AVRO Schema
- Generate Classes (POJO’s)
SQL console
Query Lifecycle - Execute
SQLIO
Apache Kafka / Socket.io
SQL console
Column Column Column
Value Value Value
- If class exists
- class, method, params
.connect(
new Kafka()
.version("0.11")
.topic("...")
.sinkPartitionerXX
result.writeToSink(..);
env.execute(..);
- Enhanced schema typing
- Enhanced feedback/logging
- Sends base64 encoded payload to Flink
Job
SAMPLE THE DATA TO USER
SQL join streams from multiple clusters/types
Write to multiple types of sinks, building complex
processing pipelines
Aggregate data before pushing to expensive/slow
database endpoints
Conditionally write to multiple S3 buckets
Building Processing Environments
SELECT * FROM sensors
JOIN account_info ON ...
SELECT sensorid, max(temp)
FROM stream
GROUP BY sensorid, tumble(..)
SELECT sensorid, region
FROM stream
WHERE region IN [...]
s3://xxx/yyys3://xxx/yyy
sms
SELECT * FROM table
WHERE user_selected_thing = ‘foo’;
SELECT sensorid, message
FROM stream
WHERE is_alert = ‘t’ Data Science
ML Team(s)
SnowFlakeDB
or other Data
Warehouse
Javascript User Functions - Introduced Today
function ICAO_lookup(icao) {
try {
var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection();
c.requestMethod='GET';
var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream));
return reader.readLine();
} catch(err) {
return "Unknown: " + err;
}
}
ICAO_lookup($p0);
Whats next?
We are hiring
Number of changes for
community evaluation
Whats Next?
● Read Consistent (Snapshot) Sink
● Auto-detect and type schema
● SQL API
● Intelligent Auto-scale
● AWS Spot instance support/management
Streams
Are
The
Database
Thank You
hello@eventador.io

More Related Content

What's hot

Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Enhancements in Java 9 Streams
Enhancements in Java 9 StreamsEnhancements in Java 9 Streams
Enhancements in Java 9 Streams
Corneil du Plessis
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
Digital Vidya
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
NikolayGrishchenkov
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Thomas Weise
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
Chandler Huang
 
SAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix ScaleSAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Knoldus Inc.
 

What's hot (20)

Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Enhancements in Java 9 Streams
Enhancements in Java 9 StreamsEnhancements in Java 9 Streams
Enhancements in Java 9 Streams
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
SAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix ScaleSAIS2018 - Fact Store At Netflix Scale
SAIS2018 - Fact Store At Netflix Scale
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 

Similar to Writing an Interactive Interface for SQL on Flink

Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Eventador
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
Couchbase@live person meetup july 22nd
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22nd
Ido Shilon
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
Aniket Mokashi
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
OpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit LondonOpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit London
HostedbyConfluent
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
OpenDaylight and YANG
OpenDaylight and YANGOpenDaylight and YANG
OpenDaylight and YANG
CoreStack
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
 
Understanding Framework Architecture using Eclipse
Understanding Framework Architecture using EclipseUnderstanding Framework Architecture using Eclipse
Understanding Framework Architecture using Eclipse
anshunjain
 
Experiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the DatabaseExperiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the Database
Marcelo Ochoa
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
Eugene Yokota
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0
Nelson Calero
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
Claus Ibsen
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 

Similar to Writing an Interactive Interface for SQL on Flink (20)

Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Couchbase@live person meetup july 22nd
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22nd
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
OpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit LondonOpenLineage for Stream Processing | Kafka Summit London
OpenLineage for Stream Processing | Kafka Summit London
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 
OpenDaylight and YANG
OpenDaylight and YANGOpenDaylight and YANG
OpenDaylight and YANG
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Understanding Framework Architecture using Eclipse
Understanding Framework Architecture using EclipseUnderstanding Framework Architecture using Eclipse
Understanding Framework Architecture using Eclipse
 
Experiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the DatabaseExperiences with Evangelizing Java Within the Database
Experiences with Evangelizing Java Within the Database
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 

Recently uploaded

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 

Recently uploaded (20)

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 

Writing an Interactive Interface for SQL on Flink

  • 1. Writing an interactive interface for SQL on Flink How and why we created SQLStreamBuilder—and the lessons learned along the way Kenny Gorman Co-Founder and CEO www.eventador.io 2019 Flink Forward Berlin
  • 2. Background and motivations ● Eventador.io has offered a managed Flink runtime for a few years now. We started to see some customer patterns emerge. ● The state of the art today is to write Flink jobs in Java or Scala using the DataStream/Set API and/or the Table API’s. ● While powerful, the time and expertise needed isn’t trivial. Adoption and time to market lags. ● Teams are busy writing code. Completely swamped to be precise.
  • 3. Why SQL anyway? ● SQL is > 30 years old. It’s massively useful for inspecting and reasoning about data. Everyone knows SQL. ● It’s declarative, just ask for what you want to see. ● It’s been extended to accommodate streaming constructs like windows (Flink/Calcite). ● Streaming SQL never completes, it’s a query on boundless data. ● It’s an amazing way to interact with streaming data.
  • 4. Of workloads could be represented with SQL, and we plan to grow that. Require more complex logic best represented in Java/Scala. 80% 20%
  • 5. What if we could go beyond simply building processors in SQL - do it interactively, manage schema’s and make it all easy? Could building logic on streams be as productive and intuitive as using a database yet as scalable and powerful as Flink?
  • 6. Eventador SQLStreamBuilder ● Interactive SQL editor - create and submit any Flink compatible SQL ● Virtual Table Registry - source/sink + schema definition ● Query Parser - Gives instant feedback ● Job payload management - Builds job payloads ● Flink runner - Takes the payload and runs the job ● Delivered as a cloud service - in your AWS account
  • 7. Feedback on SQL execution Where do I send results? Where to run the job The SQL statement Sampling rather than a result-set A sample of results in browser
  • 8. Schema management - Virtual Table Registry ● SQL requires a schema of typed columns - streams don’t have have to have this. ● It’s common to use AVRO (easy to solve for) but also free-form JSON ● Free form means - a total F**ing mess. ● Sources - Kafka/Kinesis (soon) ● Sinks - Kafka,S3, JDBC, ELK (soon)
  • 9. SQLStreamBuilder Components ● Interactive SQL interface ○ Handles query creation and submission. ○ Handles feedback from SQLIO ○ Interface to build queries, sources and sinks ○ Python + Vue.js ○ Results are sampled back to interface ● SQL engine (SQLIO) ○ Parse incoming statements ○ Map data sources/sinks ○ Parse schema (Schema Registry+AVRO / JSON) ○ Build POJOs ○ Submit payload to runner (Flink) ○ Java ● Virtual Table Registry ○ Creation of schema for streams ○ AVRO + JSON ○ Python
  • 10. SQLStreamBuilder (con’t) ● Job Management Interface ○ Stop/Start/Edit/etc ○ Python + Vue.js ○ Uses Flink APIs ● Builder ○ Handles creation of assets via K8’s ○ Python ○ PostgreSQL backend ○ Kubernetes orchestration ● Flink runner ○ Run jobs on Flink 1.8.2 ○ Kubernetes orchestration ○ Any Flink compatible SQL statement
  • 11. Query Lifecycle - Parse HTTPS POST { “schema”: { }, “topic”: “myTopic”, “virtualTable: “myTable”, “sqlStmt”: “SELECT ….” } SQLIO RESULT (if error) { “errorMsg”: “unable to find table myTable” } final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ... Table parse = mytopic .select() .sort() ... - Validate SQL against Flink/Calcite - JSON -> AVRO - AVRO Schema - Generate Classes (POJO’s) SQL console
  • 12. Query Lifecycle - Execute SQLIO Apache Kafka / Socket.io SQL console Column Column Column Value Value Value - If class exists - class, method, params .connect( new Kafka() .version("0.11") .topic("...") .sinkPartitionerXX result.writeToSink(..); env.execute(..); - Enhanced schema typing - Enhanced feedback/logging - Sends base64 encoded payload to Flink Job SAMPLE THE DATA TO USER
  • 13. SQL join streams from multiple clusters/types Write to multiple types of sinks, building complex processing pipelines Aggregate data before pushing to expensive/slow database endpoints Conditionally write to multiple S3 buckets
  • 14. Building Processing Environments SELECT * FROM sensors JOIN account_info ON ... SELECT sensorid, max(temp) FROM stream GROUP BY sensorid, tumble(..) SELECT sensorid, region FROM stream WHERE region IN [...] s3://xxx/yyys3://xxx/yyy sms SELECT * FROM table WHERE user_selected_thing = ‘foo’; SELECT sensorid, message FROM stream WHERE is_alert = ‘t’ Data Science ML Team(s) SnowFlakeDB or other Data Warehouse
  • 15. Javascript User Functions - Introduced Today function ICAO_lookup(icao) { try { var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection(); c.requestMethod='GET'; var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream)); return reader.readLine(); } catch(err) { return "Unknown: " + err; } } ICAO_lookup($p0);
  • 17. We are hiring Number of changes for community evaluation
  • 18. Whats Next? ● Read Consistent (Snapshot) Sink ● Auto-detect and type schema ● SQL API ● Intelligent Auto-scale ● AWS Spot instance support/management