SlideShare a Scribd company logo
1 of 57
Download to read offline
Streaming SQL for
Data Engineers: The
Next Big Thing?
Streaming SQL Products
● Apache Flink
● Apache Spark
● Apache Beam
● AWS Kinesis
● Google Cloud Dataflow
● Databricks
● ksqlDB
● …
● Meta
● LinkedIn
● Pinterest
● DoorDash
● Alibaba
● …
Companies building
internal platforms
Open source and
vendor solutions
👋 Hi, I’m Yaroslav
👋 Hi, I’m Yaroslav
● Principal Software Engineer @ Goldsky
● Staff Data Engineer @ Shopify
● Software Architect @ Activision
● …
👋 Hi, I’m Yaroslav
● Principal Software Engineer @ Goldsky
● Staff Data Engineer @ Shopify
● Software Architect @ Activision
● …
❤ Apache Flink
🤔
TableEnvironment tableEnv = TableEnvironment.create(/*…*/);
Table revenue = tableEnv.sqlQuery(
"SELECT cID, cName, SUM(revenue) AS revSum " +
"FROM Orders " +
"WHERE cCountry = 'FRANCE' " +
"GROUP BY cID, cName"
);
… but why SQL?
Why SQL?
● Wide adoption
● Declarative transformation model
● Planner!
● Common type system
What instead of How
User
Intention Execution
Runtime
←
Imperative Style
→
User
Intention Execution
Runtime
→
Planning
Planner
→
Declarative SQL Style
SELECT * FROM Orders
INNER JOIN Product
ON Orders.productId = Product.id
● LOTS of code!
● Create an operator to connect
two streams
● Define and accumulate state
● Implement a mechanism for
emitting the latest value per
key
SQL API DataStream API
Declarative Transformation Model
SELECT * FROM Orders
INNER JOIN Product
ON Orders.productId = Product.id
SQL API Why not Table API?
val orders = tEnv.from("Orders")
.select($"productId", $"a", $"b")
val products = tEnv.from("Products")
.select($"id", $"c", $"d")
val result = orders
.join(products)
.where($"productId" === $"id")
.select($"a", $"b", $"c")
Declarative Transformation Model
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ticker
ORDER BY price DESC) AS row_num
FROM stock_table)
WHERE row_num <= 10;
Top-N Query
Declarative Transformation Model
Row Pattern Recognition in SQL
(ISO/IEC TR 19075-5:2016)
SELECT *
FROM stock_table
MATCH_RECOGNIZE(
PARTITION BY ticker
ORDER BY event_time
MEASURES
A.event_time AS initialPriceTime,
C.event_time AS dropTime,
A.price - C.price AS dropDiff,
A.price AS initialPrice,
C.price AS lastPrice
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B* C) WITHIN INTERVAL '10' MINUTES
DEFINE
B AS B.price > A.price - 500
)
Flink Planner Migration
From https://www.ververica.com/blog/a-journey-to-beating-flinks-sql-performance
Planner Decoupling
Planner Optimizations & Query Rewrite
● Predicate push down
● Projection push down
● Join rewrite
● Join elimination
● Constant inlining
● …
SQL API DataStream API
val postgresSink: SinkFunction[Envelope] = JdbcSink.sink(
"INSERT INTO table " +
"(id, number, timestamp, author, difficulty, size, vid, block_range) " +
"VALUES (?, ?, ?, ?, ?, ?, ?, ?) " +
"ON CONFLICT (id) DO UPDATE SET " +
"number = excluded.number, " +
"timestamp = excluded.timestamp, " +
"author = excluded.author, " +
"difficulty = excluded.difficulty, " +
"size = excluded.size, " +
"vid = excluded.vid, " +
"block_range = excluded.block_range " +
"WHERE excluded.vid > table.vid",
new JdbcStatementBuilder[Envelope] {
override def accept(statement: PreparedStatement, record: Envelope): Unit = {
val payload = record.payload
payload.id.foreach { id => statement.setString(1, id) }
payload.number.foreach { number => statement.setBigDecimal(2, new java.math.BigDecimal(number)) }
payload.timestamp.foreach { timestamp => statement.setBigDecimal(3, new java.math.BigDecimal(timestamp)) }
payload.author.foreach { author => statement.setString(4, author) }
payload.difficulty.foreach { difficulty => statement.setBigDecimal(5, new java.math.BigDecimal(difficulty)) }
payload.size.foreach { size => statement.setBigDecimal(6, new java.math.BigDecimal(size)) }
payload.vid.foreach { vid => statement.setLong(7, vid.toLong) }
payload.block_range.foreach { block_range => statement.setObject(8, new PostgresIntRange(block_range), Types.O
}
},
CREATE TABLE TABLE (
id BIGINT,
number INTEGER,
timestamp TIMESTAMP,
author STRING,
difficulty STRING,
size INTEGER,
vid BIGINT,
block_range STRING
PRIMARY KEY (vid) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'table-name' = 'table'
);
😱
Common Type System
When you start using SQL
you get access to the
decades of advancements
in database design
When NOT to use
● Complex serialization / deserialization logic
● Low-level optimizations, especially with state and timers
● Not always debugging-friendly
Dealing with Complexity
UDFs for heavy lifting
● Calling 3rd-party
libraries
● External calls
● Enrichments
Templating
● Control structures
● dbt-style macros
and references
Convinced? Let’s use it!
Ways to use
Structured Statements
dbt-style Project Notebooks
Managed Runtime
Requirements
● Version control
● Code organization
● Testability
● CI/CD
● Observability
Structured Statements
def revenueByCountry(country: String): Table = {
tEnv.sqlQuery(
s"""
|SELECT name, SUM(revenue) AS totalRevenue
|FROM Orders
|WHERE country = '${country}'
|GROUP BY name""".stripMargin
)
}
Structured Statements
def revenueByCountry(country: String): Table = {
tEnv.sqlQuery(
s"""
|SELECT name, SUM(revenue) AS totalRevenue
|FROM Orders
|WHERE country = '${country}'
|GROUP BY name""".stripMargin
)
}
✅ structure
✅ mock/stub
for testing
Structured Statements
● Treat them like code
● Only make sense when Table API is not available
● Mix with other API flavours
● SQL also has style guides
● Otherwise it’s a typical streaming application!
Structured Statements
● Version control: 🟢
● Code organization: 🟢
● Testability: 🟡
● CI/CD: 🟡
● Observability: 🟢
dbt-style Project
➔ models
◆ common
● users.sql
● users.yml
◆ sales.sql
◆ sales.yml
◆ …
➔ tests
◆ …
dbt-style Project
➔ models
◆ common
● users.sql
● users.yml
◆ sales.sql
◆ sales.yml
◆ …
➔ tests
◆ …
✅ structured
✅ schematized
✅ testable
dbt-style Project
SELECT
((text::jsonb)->>'bid_price')::FLOAT AS bid_price,
(text::jsonb)->>'order_quantity' AS order_quantity,
(text::jsonb)->>'symbol' AS symbol,
(text::jsonb)->>'trade_type' AS trade_type,
to_timestamp(((text::jsonb)->'timestamp')::BIGINT) AS ts
FROM {{ REF('market_orders_raw') }}
{{ config(materialized='materializedview') }}
SELECT symbol,
AVG(bid_price) AS avg
FROM {{ REF('market_orders') }}
GROUP BY symbol
dbt-style Project
● Works well for heavy analytical use-cases
● Could write tests in Python/Scala/etc.
● Probably needs more tooling than you think (state
management, observability, etc.)
● Check dbt adapter from Materialize!
dbt-style Project
● Version control: 🟢
● Code organization: 🟢
● Testability: 🟡
● CI/CD: 🟡
● Observability: 🟡
Notebooks
Apache Zeppelin
Notebooks
Apache Zeppelin
Notebooks
● Great UX
● Ideal for exploratory analysis and BI
● Complements all other patterns really well
● Way more important for realtime workloads
Notebooks
We don't recommend productionizing notebooks and
instead encourage empowering data scientists to build
production-ready code with the right programming
frameworks
https://www.thoughtworks.com/en-ca/radar/technique
s/productionizing-notebooks
Notebooks
● Version control: 🟡
● Code organization: 🔴
● Testability: 🔴
● CI/CD: 🔴
● Observability: 🔴
Managed Runtime
decodable
Managed Runtime
● Managed ≈ “Serverless”
● Auto-scaling
● Automated deployments, rollbacks, etc.
● Testing for different layers is decoupled
(runtime vs jobs)
Managed Runtime
Reference Architecture
Control Plane Data Plane
API Reconciler
Streaming Job
UI CLI
Any managed runtime
requires excellent
developer experience
to succeed
Managed Runtime: Ideal Developer Experience
Notebooks UX
SELECT * …
SELECT * …
Managed Runtime: Ideal Developer Experience
Version Control Integration
SELECT * …
SELECT * …
Managed Runtime: Ideal Developer Experience
dbt-style Project Structure
SELECT * …
SELECT * …
➔ models
◆ common
◆ sales
◆ shipping
◆ marketing
◆ …
Managed Runtime: Ideal Developer Experience
Versioning
SELECT * …
SELECT * …
● Version 1
● Version 2
● Version 3
● …
Managed Runtime: Ideal Developer Experience
Previews
SELECT * …
SELECT * …
User Count
Irene 100
Alex 53
Josh 12
Jane 1
Managed Runtime
● Version control: 🟢
● Code organization: 🟢
● Testability: 🟡
● CI/CD: 🟢
● Observability: 🟢
Summary
Structured
Statements
dbt-style Project Notebooks Managed
Runtime
Version Control 🟢 🟢 🟡 🟢
Code
Organization
🟢 🟢 🔴 🟢
Testability 🟡 🟡 🔴 🟡
CI/CD 🟡 🟡 🔴 🟢
Observability 🟢 🟡 🔴 🟢
Complexity 🟢 🟡 🟡 🔴
General Guidelines
● Long-running streaming apps require special attention
to state management
● Try to avoid mutability: every change is a new version
● Integration testing > unit testing
● Embrace the SRE mentality
Really dislike SQL?
Malloy PRQL
Questions?
@sap1ens

More Related Content

What's hot

Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDatabricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationOri Reshef
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introductionleanderlee2
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningBobby Curtis
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache ArrowWes McKinney
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on ReadDatabricks
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...Andrew Lamb
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 

What's hot (20)

Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance Tuning
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 

Similar to Streaming SQL for Data Engineers: The Next Big Thing?

Shaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilShaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilAsher Sterkin
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextPrateek Maheshwari
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-actionAssaf Gannon
 
Shaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patternsShaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patternsShimon Tolts
 
Shaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patternsShaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patternsAsher Sterkin
 
Sprint 45 review
Sprint 45 reviewSprint 45 review
Sprint 45 reviewManageIQ
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
Advanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the FieldAdvanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the FieldAriel Moskovich
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
 
Modular Web Applications With Netzke
Modular Web Applications With NetzkeModular Web Applications With Netzke
Modular Web Applications With Netzkenetzke
 
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...ScyllaDB
 
GraphQL the holy contract between client and server
GraphQL the holy contract between client and serverGraphQL the holy contract between client and server
GraphQL the holy contract between client and serverPavel Chertorogov
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...NETWAYS
 

Similar to Streaming SQL for Data Engineers: The Next Big Thing? (20)

Shaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilShaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-il
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Sprint 58
Sprint 58Sprint 58
Sprint 58
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-action
 
Shaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patternsShaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patterns
 
Shaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patternsShaping serverless architecture with domain driven design patterns
Shaping serverless architecture with domain driven design patterns
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Sprint 45 review
Sprint 45 reviewSprint 45 review
Sprint 45 review
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Sprint 55
Sprint 55Sprint 55
Sprint 55
 
Advanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the FieldAdvanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the Field
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
Modular Web Applications With Netzke
Modular Web Applications With NetzkeModular Web Applications With Netzke
Modular Web Applications With Netzke
 
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...
 
GraphQL the holy contract between client and server
GraphQL the holy contract between client and serverGraphQL the holy contract between client and server
GraphQL the holy contract between client and server
 
Sprint 59
Sprint 59Sprint 59
Sprint 59
 
Revealing ALLSTOCKER
Revealing ALLSTOCKERRevealing ALLSTOCKER
Revealing ALLSTOCKER
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
 

More from Yaroslav Tkachenko

Dynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent HashingDynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent HashingYaroslav Tkachenko
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureYaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutYaroslav Tkachenko
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Yaroslav Tkachenko
 
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesDesigning Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesYaroslav Tkachenko
 
10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming language10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming languageYaroslav Tkachenko
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With AkkaYaroslav Tkachenko
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And DesignYaroslav Tkachenko
 
Why Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesWhy Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesYaroslav Tkachenko
 
Why actor-based systems are the best for microservices
Why actor-based systems are the best for microservicesWhy actor-based systems are the best for microservices
Why actor-based systems are the best for microservicesYaroslav Tkachenko
 
Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture  Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture Yaroslav Tkachenko
 
Быстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложенийБыстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложенийYaroslav Tkachenko
 

More from Yaroslav Tkachenko (18)

Dynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent HashingDynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent Hashing
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda Architecture
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesDesigning Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
 
10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming language10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming language
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With Akka
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Why Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesWhy Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For Microservices
 
Why actor-based systems are the best for microservices
Why actor-based systems are the best for microservicesWhy actor-based systems are the best for microservices
Why actor-based systems are the best for microservices
 
Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture  Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture
 
Быстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложенийБыстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложений
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Streaming SQL for Data Engineers: The Next Big Thing?

  • 1. Streaming SQL for Data Engineers: The Next Big Thing?
  • 2.
  • 4. ● Apache Flink ● Apache Spark ● Apache Beam ● AWS Kinesis ● Google Cloud Dataflow ● Databricks ● ksqlDB ● … ● Meta ● LinkedIn ● Pinterest ● DoorDash ● Alibaba ● … Companies building internal platforms Open source and vendor solutions
  • 5.
  • 6. 👋 Hi, I’m Yaroslav
  • 7. 👋 Hi, I’m Yaroslav ● Principal Software Engineer @ Goldsky ● Staff Data Engineer @ Shopify ● Software Architect @ Activision ● …
  • 8. 👋 Hi, I’m Yaroslav ● Principal Software Engineer @ Goldsky ● Staff Data Engineer @ Shopify ● Software Architect @ Activision ● … ❤ Apache Flink
  • 9. 🤔 TableEnvironment tableEnv = TableEnvironment.create(/*…*/); Table revenue = tableEnv.sqlQuery( "SELECT cID, cName, SUM(revenue) AS revSum " + "FROM Orders " + "WHERE cCountry = 'FRANCE' " + "GROUP BY cID, cName" );
  • 10. … but why SQL?
  • 11. Why SQL? ● Wide adoption ● Declarative transformation model ● Planner! ● Common type system
  • 15. SELECT * FROM Orders INNER JOIN Product ON Orders.productId = Product.id ● LOTS of code! ● Create an operator to connect two streams ● Define and accumulate state ● Implement a mechanism for emitting the latest value per key SQL API DataStream API Declarative Transformation Model
  • 16. SELECT * FROM Orders INNER JOIN Product ON Orders.productId = Product.id SQL API Why not Table API? val orders = tEnv.from("Orders") .select($"productId", $"a", $"b") val products = tEnv.from("Products") .select($"id", $"c", $"d") val result = orders .join(products) .where($"productId" === $"id") .select($"a", $"b", $"c") Declarative Transformation Model
  • 17. SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY price DESC) AS row_num FROM stock_table) WHERE row_num <= 10; Top-N Query Declarative Transformation Model
  • 18. Row Pattern Recognition in SQL (ISO/IEC TR 19075-5:2016) SELECT * FROM stock_table MATCH_RECOGNIZE( PARTITION BY ticker ORDER BY event_time MEASURES A.event_time AS initialPriceTime, C.event_time AS dropTime, A.price - C.price AS dropDiff, A.price AS initialPrice, C.price AS lastPrice ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B* C) WITHIN INTERVAL '10' MINUTES DEFINE B AS B.price > A.price - 500 )
  • 19. Flink Planner Migration From https://www.ververica.com/blog/a-journey-to-beating-flinks-sql-performance Planner Decoupling
  • 20. Planner Optimizations & Query Rewrite ● Predicate push down ● Projection push down ● Join rewrite ● Join elimination ● Constant inlining ● …
  • 21. SQL API DataStream API val postgresSink: SinkFunction[Envelope] = JdbcSink.sink( "INSERT INTO table " + "(id, number, timestamp, author, difficulty, size, vid, block_range) " + "VALUES (?, ?, ?, ?, ?, ?, ?, ?) " + "ON CONFLICT (id) DO UPDATE SET " + "number = excluded.number, " + "timestamp = excluded.timestamp, " + "author = excluded.author, " + "difficulty = excluded.difficulty, " + "size = excluded.size, " + "vid = excluded.vid, " + "block_range = excluded.block_range " + "WHERE excluded.vid > table.vid", new JdbcStatementBuilder[Envelope] { override def accept(statement: PreparedStatement, record: Envelope): Unit = { val payload = record.payload payload.id.foreach { id => statement.setString(1, id) } payload.number.foreach { number => statement.setBigDecimal(2, new java.math.BigDecimal(number)) } payload.timestamp.foreach { timestamp => statement.setBigDecimal(3, new java.math.BigDecimal(timestamp)) } payload.author.foreach { author => statement.setString(4, author) } payload.difficulty.foreach { difficulty => statement.setBigDecimal(5, new java.math.BigDecimal(difficulty)) } payload.size.foreach { size => statement.setBigDecimal(6, new java.math.BigDecimal(size)) } payload.vid.foreach { vid => statement.setLong(7, vid.toLong) } payload.block_range.foreach { block_range => statement.setObject(8, new PostgresIntRange(block_range), Types.O } }, CREATE TABLE TABLE ( id BIGINT, number INTEGER, timestamp TIMESTAMP, author STRING, difficulty STRING, size INTEGER, vid BIGINT, block_range STRING PRIMARY KEY (vid) NOT ENFORCED ) WITH ( 'connector' = 'jdbc', 'table-name' = 'table' ); 😱 Common Type System
  • 22. When you start using SQL you get access to the decades of advancements in database design
  • 23. When NOT to use ● Complex serialization / deserialization logic ● Low-level optimizations, especially with state and timers ● Not always debugging-friendly
  • 24. Dealing with Complexity UDFs for heavy lifting ● Calling 3rd-party libraries ● External calls ● Enrichments Templating ● Control structures ● dbt-style macros and references
  • 26. Ways to use Structured Statements dbt-style Project Notebooks Managed Runtime
  • 27. Requirements ● Version control ● Code organization ● Testability ● CI/CD ● Observability
  • 28. Structured Statements def revenueByCountry(country: String): Table = { tEnv.sqlQuery( s""" |SELECT name, SUM(revenue) AS totalRevenue |FROM Orders |WHERE country = '${country}' |GROUP BY name""".stripMargin ) }
  • 29. Structured Statements def revenueByCountry(country: String): Table = { tEnv.sqlQuery( s""" |SELECT name, SUM(revenue) AS totalRevenue |FROM Orders |WHERE country = '${country}' |GROUP BY name""".stripMargin ) } ✅ structure ✅ mock/stub for testing
  • 30. Structured Statements ● Treat them like code ● Only make sense when Table API is not available ● Mix with other API flavours ● SQL also has style guides ● Otherwise it’s a typical streaming application!
  • 31. Structured Statements ● Version control: 🟢 ● Code organization: 🟢 ● Testability: 🟡 ● CI/CD: 🟡 ● Observability: 🟢
  • 32. dbt-style Project ➔ models ◆ common ● users.sql ● users.yml ◆ sales.sql ◆ sales.yml ◆ … ➔ tests ◆ …
  • 33. dbt-style Project ➔ models ◆ common ● users.sql ● users.yml ◆ sales.sql ◆ sales.yml ◆ … ➔ tests ◆ … ✅ structured ✅ schematized ✅ testable
  • 34. dbt-style Project SELECT ((text::jsonb)->>'bid_price')::FLOAT AS bid_price, (text::jsonb)->>'order_quantity' AS order_quantity, (text::jsonb)->>'symbol' AS symbol, (text::jsonb)->>'trade_type' AS trade_type, to_timestamp(((text::jsonb)->'timestamp')::BIGINT) AS ts FROM {{ REF('market_orders_raw') }} {{ config(materialized='materializedview') }} SELECT symbol, AVG(bid_price) AS avg FROM {{ REF('market_orders') }} GROUP BY symbol
  • 35. dbt-style Project ● Works well for heavy analytical use-cases ● Could write tests in Python/Scala/etc. ● Probably needs more tooling than you think (state management, observability, etc.) ● Check dbt adapter from Materialize!
  • 36. dbt-style Project ● Version control: 🟢 ● Code organization: 🟢 ● Testability: 🟡 ● CI/CD: 🟡 ● Observability: 🟡
  • 39. Notebooks ● Great UX ● Ideal for exploratory analysis and BI ● Complements all other patterns really well ● Way more important for realtime workloads
  • 40. Notebooks We don't recommend productionizing notebooks and instead encourage empowering data scientists to build production-ready code with the right programming frameworks https://www.thoughtworks.com/en-ca/radar/technique s/productionizing-notebooks
  • 41. Notebooks ● Version control: 🟡 ● Code organization: 🔴 ● Testability: 🔴 ● CI/CD: 🔴 ● Observability: 🔴
  • 43. Managed Runtime ● Managed ≈ “Serverless” ● Auto-scaling ● Automated deployments, rollbacks, etc. ● Testing for different layers is decoupled (runtime vs jobs)
  • 44. Managed Runtime Reference Architecture Control Plane Data Plane API Reconciler Streaming Job UI CLI
  • 45. Any managed runtime requires excellent developer experience to succeed
  • 46. Managed Runtime: Ideal Developer Experience Notebooks UX SELECT * … SELECT * …
  • 47. Managed Runtime: Ideal Developer Experience Version Control Integration SELECT * … SELECT * …
  • 48. Managed Runtime: Ideal Developer Experience dbt-style Project Structure SELECT * … SELECT * … ➔ models ◆ common ◆ sales ◆ shipping ◆ marketing ◆ …
  • 49. Managed Runtime: Ideal Developer Experience Versioning SELECT * … SELECT * … ● Version 1 ● Version 2 ● Version 3 ● …
  • 50. Managed Runtime: Ideal Developer Experience Previews SELECT * … SELECT * … User Count Irene 100 Alex 53 Josh 12 Jane 1
  • 51. Managed Runtime ● Version control: 🟢 ● Code organization: 🟢 ● Testability: 🟡 ● CI/CD: 🟢 ● Observability: 🟢
  • 52. Summary Structured Statements dbt-style Project Notebooks Managed Runtime Version Control 🟢 🟢 🟡 🟢 Code Organization 🟢 🟢 🔴 🟢 Testability 🟡 🟡 🔴 🟡 CI/CD 🟡 🟡 🔴 🟢 Observability 🟢 🟡 🔴 🟢 Complexity 🟢 🟡 🟡 🔴
  • 53. General Guidelines ● Long-running streaming apps require special attention to state management ● Try to avoid mutability: every change is a new version ● Integration testing > unit testing ● Embrace the SRE mentality
  • 56.