SlideShare a Scribd company logo
1 of 30
Download to read offline
Query Kafka with SQL
Jove Zhong
Co-Founder and Head of Product, Timeplus
Gang Tao
Co-Founder and CTO, Timeplus
Why, how, and what’s next?
Sep 27, 2023
image credit: teacoffeecup.com
sed 's/coffee/SQL on Kafka/g'
Real-time data is everywhere, at the edge and cloud
46 ZB
of data created by
billions of IoT by 2025
30%
of data generated will be
real-time by 2025
Only 1%
of data is analyzed and
streaming data is
primarily untapped
Why SQL on Kafka?
Why SQL on Database?
ret = open_database(&(my_stock->inventory_dbp)..);
my_database->get(my_database, NULL, &key, &data, 0);
client.get(key)
update_bins = {'b'=: u"ud83dude04" 'i': aerospike.null()}
client.put(key, update_bins)
request = new GetItemRequest()
.withKey(key_to_get)
.withTableName(table_name);
SELECT * FROM tab WHERE id='id1'
UPDATE tab SET flag=FALSE WHERE id='id1'
Why SQL on Kafka?
Reliable Fast Easy
Powerful Descriptive
FinTech
● Real-time post-trade analytics
● Real-time pricing
DevOps
● Real-time Github insights
● Real-time o11y and usage based
pricing
Security Compliance
● SOC2 compliance
● Container vulnerability monitoring
● Monitor Superblocks user activities
● Protect sensitive info in Slack
IoT
● Real-time fleet monitoring
Customer 360
● Auth0 notifications for new signups
● HubSpot custom dashboards/alerts
● Jitsu clickstream analytics
● Real-time Twitter marketing
Misc
● Wildfire monitoring and alerting
● Data-driven parent
Sample Use Cases
source: https://docs.timeplus.com/showcases
How do you like your coffee?
Flink ksqlDB Hazelcast
Druid Pinot
Trino
ClickHouse StarRocks
RisingWave Databend
Streaming
Processor
Streaming
Database
Real-time
Database
FlinkSQL
since 2016
FlinkSQL
since 2016
Community
☕☕☕☕
Real-time ☕☕☕
Streaming ☕☕☕
Historical ☕
JOIN
☕☕☕☕
Largescale
☕☕☕☕
Lightweight☕☕
Easy to use☕☕
Community ☕☕☕
Real-time ☕☕☕
Streaming ☕☕☕
Historical ☕☕
JOIN ☕☕☕
Largescale ☕☕
Lightweight☕☕
Easy to use☕☕☕
ksqlDB
since 2019
Distributed computation and storage platform
No dependency on disk storage, it keeps all its
operational state in the RAM of the cluster.
Flink ksqlDB Hazelcast
Druid Pinot
Trino
Streaming
Processor
Streaming
Database
Real-time
Database
1. create a schema json (columns, PKs)
2. create a table configuration json (streamType=Kafka)
3. docker run .. apachepinot/pinot:latest AddTable 
-schemaFile /tmp/transcript-schema.json 
-tableConfigFile /tmp/transcript-table-realtime.json 
..
-exec
1. load the druid-kafka-indexing-service extension on both the
Overlord and the MiddleManagers
2. Create a supervisor-spec.json containing the Kafka
supervisor spec file.
3. curl -X POST -H 'Content-Type: application/json' -d
@supervisor-spec.json
http://localhost:8090/druid/indexer/v1/supervisor
Add a catalog properties file etc/catalog/kafka.properties for the Kafka connector.
$ ./trino --catalog kafka --schema aSchema
trino:aSchema> SELECT count(*) FROM customer;
ClickHouse StarRocks
Streaming
Processor
Streaming
Database
Real-time
Database
the Next-Generation Streaming Database
(Kafka + Flink + ClickHouse )
SQL with streaming extension
Data Ingestion Unified Query Processing Pipeline
ingest
append
stream
read
historical
read
streaming
storage
historical
storage
query
Kafka
External
Stream
SELECT * FROM car_live_data
Stream tail
SELECT count(*) FROM car_live_data
Global
aggregation
SELECT window_start, count(*)
FROM tumble(car_live_data, 1m)
GROUP BY window_start
Window
aggregation
SELECT cid,
speed_kmh,
lag(speed_kmh) OVER
(PARTITION BY cid) AS last_spd
FROM car_live_data
Sub streams
SELECT window_start, count(*)
FROM tumble(car_live_data, 5s)
GROUP BY window_start
EMIT AFTER WATERMARK AND DELAY 2s
Late event
SELECT *
FROM car_live_data
WHERE
_tp_time > now() - 1d
Time travel
Community ☕☕
Real-time
☕☕☕☕
Streaming ☕☕☕
Historical
☕☕☕☕
JOIN
☕☕☕☕
Largescale ☕☕
Lightweight
☕☕☕☕
Easy to use ☕☕☕
since 2021
Mocha
one more thing
ClickHouse StarRocks
CREATE TABLE queue2 (
timestamp UInt64,
level String,
message String
)
ENGINE = Kafka
SETTINGS
kafka_broker_list =
'localhost:9092',
kafka_topic_list = 'topic',
kafka_group_name = 'group1',
kafka_format = 'JSONEachRow',
kafka_num_consumers = 4;
CREATE ROUTINE LOAD test_db.table102 ON
table1
COLUMNS TERMINATED BY ",",
COLUMNS (user_id, user_gender,
event_date, event_type)
WHERE event_type = 1
FROM KAFKA
(
"kafka_broker_list" =
"<kafka_broker_host>:<kafka_broker_port>"
,
"kafka_topic" = "topic1",
"property.kafka_default_offsets" =
"OFFSET_BEGINNING"
);
ClickHouse features
● table engine and table function
● rich functions and data types
● not 100% ansi compatible
Streaming
Processor
Streaming
Database
Realtime
Database
RisingWave
Databend
dozer -c dozer-config.yaml
curl -X POST
http://localhost:8080/tout/query
--header 'Content-Type:
application/json'
Community ☕☕☕
Real-time ☕☕☕
Streaming ☕☕☕
Historical ☕☕
JOIN ☕☕☕
Largescale ☕☕☕
Lightweight☕☕☕
☕
Easy to use☕☕☕
Cappuccino
Programing - turn data into insight
human
machine
1GL - machine language
2GL - assembly language
3GL - imperative language
4GL - descriptive language
5GL - intelligent language
data
insight
source
Streaming
Processor
● SQL as data pipeline
● No data storage
● Unbounded real-time
query
ETL / Data Pipeline
ingest
external
Realtime
Database
● mostly leveraging kafka to
ingest data
● federation search/query
○ ClickHouse Kafka Engine
○ Trino
● Bounded batch query, no
streaming query
Historical Report / Ad hoc Analysis
source
Streaming
Database
● support kafka data
storage
● Unbounded real-time
query
● combination of
real-time data and
historical data
Hybrid
Query Kafka with SQL: Open Source + Cloud + Source Available
Flink
ksqlDB Hazelcast Druid Pinot Trino
ClickHouse StarRocks
RisingWave
Databend
Streaming Processor Streaming Database Realtime Database
Community
☕☕☕☕
Real-time ☕☕☕
Streaming ☕☕☕
Historical ☕
JOIN
☕☕☕☕
Largescale
☕☕☕☕
Lightweight☕☕
Easy to use☕☕
Community ☕☕☕
Real-time ☕☕☕
Streaming ☕☕☕
Historical ☕☕
JOIN ☕☕☕
Largescale ☕☕
Lightweight☕☕
Easy to use☕☕☕
Community ☕☕
Real-time
☕☕☕☕
Streaming ☕☕☕
Historical
☕☕☕☕
JOIN
☕☕☕☕
Largescale ☕☕
Lightweight☕☕☕
☕
Easy to use☕☕☕
Community ☕☕☕
Real-time ☕☕☕
Streaming ☕☕☕
Historical ☕☕
JOIN ☕☕☕
Largescale ☕☕☕
Lightweight☕☕☕
☕
Easy to use☕☕☕
Q+A / Thank you!
Meet us at booth #407
Try Timeplus Proton (Open Source)
Or sign up for a free cloud account
timeplus.com

More Related Content

Similar to Query Your Streaming Data on Kafka using SQL: Why, How, and What

Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Databricks
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkDatabricks
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentKinetica
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...Amazon Web Services
 
Why SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaWhy SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaHostedbyConfluent
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteDeepak Singh
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...HostedbyConfluent
 
HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)akirahiguchi
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Unlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent PlatformUnlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent Platformconfluent
 

Similar to Query Your Streaming Data on Kafka using SQL: Why, How, and What (20)

Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
 
Why SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaWhy SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, Cloudera
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop Keynote
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
 
HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Unlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent PlatformUnlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent Platform
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 

Query Your Streaming Data on Kafka using SQL: Why, How, and What

  • 1. Query Kafka with SQL Jove Zhong Co-Founder and Head of Product, Timeplus Gang Tao Co-Founder and CTO, Timeplus Why, how, and what’s next? Sep 27, 2023
  • 2. image credit: teacoffeecup.com sed 's/coffee/SQL on Kafka/g'
  • 3. Real-time data is everywhere, at the edge and cloud 46 ZB of data created by billions of IoT by 2025 30% of data generated will be real-time by 2025 Only 1% of data is analyzed and streaming data is primarily untapped
  • 4. Why SQL on Kafka?
  • 5. Why SQL on Database? ret = open_database(&(my_stock->inventory_dbp)..); my_database->get(my_database, NULL, &key, &data, 0); client.get(key) update_bins = {'b'=: u"ud83dude04" 'i': aerospike.null()} client.put(key, update_bins) request = new GetItemRequest() .withKey(key_to_get) .withTableName(table_name); SELECT * FROM tab WHERE id='id1' UPDATE tab SET flag=FALSE WHERE id='id1'
  • 6. Why SQL on Kafka? Reliable Fast Easy Powerful Descriptive
  • 7. FinTech ● Real-time post-trade analytics ● Real-time pricing DevOps ● Real-time Github insights ● Real-time o11y and usage based pricing Security Compliance ● SOC2 compliance ● Container vulnerability monitoring ● Monitor Superblocks user activities ● Protect sensitive info in Slack IoT ● Real-time fleet monitoring Customer 360 ● Auth0 notifications for new signups ● HubSpot custom dashboards/alerts ● Jitsu clickstream analytics ● Real-time Twitter marketing Misc ● Wildfire monitoring and alerting ● Data-driven parent Sample Use Cases source: https://docs.timeplus.com/showcases
  • 8. How do you like your coffee? Flink ksqlDB Hazelcast Druid Pinot Trino ClickHouse StarRocks RisingWave Databend Streaming Processor Streaming Database Real-time Database
  • 10. FlinkSQL since 2016 Community ☕☕☕☕ Real-time ☕☕☕ Streaming ☕☕☕ Historical ☕ JOIN ☕☕☕☕ Largescale ☕☕☕☕ Lightweight☕☕ Easy to use☕☕
  • 11. Community ☕☕☕ Real-time ☕☕☕ Streaming ☕☕☕ Historical ☕☕ JOIN ☕☕☕ Largescale ☕☕ Lightweight☕☕ Easy to use☕☕☕ ksqlDB since 2019
  • 12. Distributed computation and storage platform No dependency on disk storage, it keeps all its operational state in the RAM of the cluster. Flink ksqlDB Hazelcast Druid Pinot Trino Streaming Processor Streaming Database Real-time Database
  • 13. 1. create a schema json (columns, PKs) 2. create a table configuration json (streamType=Kafka) 3. docker run .. apachepinot/pinot:latest AddTable -schemaFile /tmp/transcript-schema.json -tableConfigFile /tmp/transcript-table-realtime.json .. -exec 1. load the druid-kafka-indexing-service extension on both the Overlord and the MiddleManagers 2. Create a supervisor-spec.json containing the Kafka supervisor spec file. 3. curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/supervisor
  • 14. Add a catalog properties file etc/catalog/kafka.properties for the Kafka connector. $ ./trino --catalog kafka --schema aSchema trino:aSchema> SELECT count(*) FROM customer;
  • 16. the Next-Generation Streaming Database (Kafka + Flink + ClickHouse ) SQL with streaming extension Data Ingestion Unified Query Processing Pipeline ingest append stream read historical read streaming storage historical storage query Kafka External Stream
  • 17. SELECT * FROM car_live_data Stream tail SELECT count(*) FROM car_live_data Global aggregation SELECT window_start, count(*) FROM tumble(car_live_data, 1m) GROUP BY window_start Window aggregation SELECT cid, speed_kmh, lag(speed_kmh) OVER (PARTITION BY cid) AS last_spd FROM car_live_data Sub streams SELECT window_start, count(*) FROM tumble(car_live_data, 5s) GROUP BY window_start EMIT AFTER WATERMARK AND DELAY 2s Late event SELECT * FROM car_live_data WHERE _tp_time > now() - 1d Time travel
  • 20. ClickHouse StarRocks CREATE TABLE queue2 ( timestamp UInt64, level String, message String ) ENGINE = Kafka SETTINGS kafka_broker_list = 'localhost:9092', kafka_topic_list = 'topic', kafka_group_name = 'group1', kafka_format = 'JSONEachRow', kafka_num_consumers = 4; CREATE ROUTINE LOAD test_db.table102 ON table1 COLUMNS TERMINATED BY ",", COLUMNS (user_id, user_gender, event_date, event_type) WHERE event_type = 1 FROM KAFKA ( "kafka_broker_list" = "<kafka_broker_host>:<kafka_broker_port>" , "kafka_topic" = "topic1", "property.kafka_default_offsets" = "OFFSET_BEGINNING" ); ClickHouse features ● table engine and table function ● rich functions and data types ● not 100% ansi compatible
  • 22.
  • 23. dozer -c dozer-config.yaml curl -X POST http://localhost:8080/tout/query --header 'Content-Type: application/json'
  • 24. Community ☕☕☕ Real-time ☕☕☕ Streaming ☕☕☕ Historical ☕☕ JOIN ☕☕☕ Largescale ☕☕☕ Lightweight☕☕☕ ☕ Easy to use☕☕☕ Cappuccino
  • 25.
  • 26. Programing - turn data into insight human machine 1GL - machine language 2GL - assembly language 3GL - imperative language 4GL - descriptive language 5GL - intelligent language data insight
  • 27. source Streaming Processor ● SQL as data pipeline ● No data storage ● Unbounded real-time query ETL / Data Pipeline ingest external Realtime Database ● mostly leveraging kafka to ingest data ● federation search/query ○ ClickHouse Kafka Engine ○ Trino ● Bounded batch query, no streaming query Historical Report / Ad hoc Analysis source Streaming Database ● support kafka data storage ● Unbounded real-time query ● combination of real-time data and historical data Hybrid
  • 28. Query Kafka with SQL: Open Source + Cloud + Source Available Flink ksqlDB Hazelcast Druid Pinot Trino ClickHouse StarRocks RisingWave Databend Streaming Processor Streaming Database Realtime Database
  • 29. Community ☕☕☕☕ Real-time ☕☕☕ Streaming ☕☕☕ Historical ☕ JOIN ☕☕☕☕ Largescale ☕☕☕☕ Lightweight☕☕ Easy to use☕☕ Community ☕☕☕ Real-time ☕☕☕ Streaming ☕☕☕ Historical ☕☕ JOIN ☕☕☕ Largescale ☕☕ Lightweight☕☕ Easy to use☕☕☕ Community ☕☕ Real-time ☕☕☕☕ Streaming ☕☕☕ Historical ☕☕☕☕ JOIN ☕☕☕☕ Largescale ☕☕ Lightweight☕☕☕ ☕ Easy to use☕☕☕ Community ☕☕☕ Real-time ☕☕☕ Streaming ☕☕☕ Historical ☕☕ JOIN ☕☕☕ Largescale ☕☕☕ Lightweight☕☕☕ ☕ Easy to use☕☕☕
  • 30. Q+A / Thank you! Meet us at booth #407 Try Timeplus Proton (Open Source) Or sign up for a free cloud account timeplus.com