1
KSQL – Streaming SQL
for Apache Kafka
Matthias J. Sax | Software Engineer
matthias@confluent.io
@MatthiasJSax
2
1.0 Enterprise
Ready
A Brief History of Kafka and Confluent
0.11 Exactly-once
semantics
0.10 Data processing
(Streams API)
0.9 Data integration
(Connect API)
Intra-cluster
replication
0.8
2012 2014
Cluster mirroring0.7
2015 2016 20172013 2018
CP 4.1
KSQL GA
3
Why KSQL?
• Enable Stream Processing for Non-Engineers
• Everybody knows SQL
• Look Ma, no code!
• Declarative stream processing language
• Fast prototyping
• Streaming SQL engine for Apache Kafka
• Streaming ETL
• Ad-hoc topic inspection
4
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Select…from…
• Join…where…
• Group by..
KSQL
Flexibility Simplicity
5
Core Concepts KSQL
6
CREATE STREAM clickstream (
time BIGINT,
url VARCHAR,
status INTEGER,
bytes INTEGER,
user_id VARCHAR,
agent VARCHAR)
WITH (
value_format = 'JSON',
kafka_topic = 'my_clickstream_topic'
);
Creating a Stream
7
Querying Streams
CREATE STREAM user_clicks AS
SELECT *
FROM clickstream
WHERE user_id = 'mjsax';
8
CREATE TABLE clicks AS
SELECT user_id, COUNT(url)
FROM clickstream
WINDOW TUMBLING (size 30 seconds)
GROUP BY user_id
HAVING COUNT(url) > 20
WHERE bytes > 1024;
Windowed Aggregations
9
Windowed Aggregations
10
Core Concepts KSQL
Do you think that’s a table you are querying ?
12
CREATE TABLE users (
user_id INTEGER,
registered_at LONG,
username VARCHAR,
name VARCHAR,
city VARCHAR,
level VARCHAR)
WITH (
key = 'user_id',
kafka_topic = 'clickstream_users',
value_format = 'AVRO');
Creating a Table
13
Confluent Schema Registry FTW!
CREATE TABLE users
WITH (
key = 'user_id',
kafka_topic = 'clickstream_users',
value_format = 'AVRO');
Creating a Table
14
Using Tables
15
CREATE STREAM vip_actions AS
SELECT c.user_id, fullname, url, status
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
Joins for Enrichment
16
Joins for Enrichment
17
Stream-Table-Duality
18
How to deploy and use KSQL
19
How to run KSQL
JVM
KSQL Server
KSQL CLI
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
#1 Client-server
20
How to run KSQL
#1 Client-server
• Start any number of server nodes
bin/ksql-server-start
• Start one or more CLIs and point them to a server
bin/ksql https://myksqlserver:8090
• All servers share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
21
How to run KSQL
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
#2 as a standalone Application
Kafka Cluster
22
How to run KSQL
#2 as a standalone Application
• Start any number of server nodes
Pass a file of KSQL statement to execute
bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
Version-control your queries and transformations as code
• All running engines share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
23
How to run KSQL
#3 EMBEDDED IN AN APPLICATION
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
Kafka Cluster
24
How to run KSQL
#3 EMBEDDED IN AN APPLICATION
• Embed directly in your Java application
• Generate and execute KSQL queries through the Java API
Version-control your queries and transformations as code
• All running application instances share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
25
Internals
26
Internals
Read input from Kafka
Operator DAG:
• filter/map/aggregation/joins
• Operators can be stateful
Write result back to Kafka
27
Internals
28
Runtime: Kafka Streams
29
Distributed State
30
Scaling
31
Take home
• Streaming SQL engine for Apache Kafka
• Leverages Kafka Streams
• Open Source: Apache 2.0
• https://github.com/confluentinc/ksql/
• Included in Confluent Platform 4.1
• https://www.confluent.io/product/ksql/
• https://docs.confluent.io/current/ksql/docs/index.html
32
Thank You
We are hiring!

KSQL---Streaming SQL for Apache Kafka

  • 1.
    1 KSQL – StreamingSQL for Apache Kafka Matthias J. Sax | Software Engineer matthias@confluent.io @MatthiasJSax
  • 2.
    2 1.0 Enterprise Ready A BriefHistory of Kafka and Confluent 0.11 Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 Cluster mirroring0.7 2015 2016 20172013 2018 CP 4.1 KSQL GA
  • 3.
    3 Why KSQL? • EnableStream Processing for Non-Engineers • Everybody knows SQL • Look Ma, no code! • Declarative stream processing language • Fast prototyping • Streaming SQL engine for Apache Kafka • Streaming ETL • Ad-hoc topic inspection
  • 4.
    4 Trade-Offs • subscribe() • poll() •send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity
  • 5.
  • 6.
    6 CREATE STREAM clickstream( time BIGINT, url VARCHAR, status INTEGER, bytes INTEGER, user_id VARCHAR, agent VARCHAR) WITH ( value_format = 'JSON', kafka_topic = 'my_clickstream_topic' ); Creating a Stream
  • 7.
    7 Querying Streams CREATE STREAMuser_clicks AS SELECT * FROM clickstream WHERE user_id = 'mjsax';
  • 8.
    8 CREATE TABLE clicksAS SELECT user_id, COUNT(url) FROM clickstream WINDOW TUMBLING (size 30 seconds) GROUP BY user_id HAVING COUNT(url) > 20 WHERE bytes > 1024; Windowed Aggregations
  • 9.
  • 10.
  • 11.
    Do you thinkthat’s a table you are querying ?
  • 12.
    12 CREATE TABLE users( user_id INTEGER, registered_at LONG, username VARCHAR, name VARCHAR, city VARCHAR, level VARCHAR) WITH ( key = 'user_id', kafka_topic = 'clickstream_users', value_format = 'AVRO'); Creating a Table
  • 13.
    13 Confluent Schema RegistryFTW! CREATE TABLE users WITH ( key = 'user_id', kafka_topic = 'clickstream_users', value_format = 'AVRO'); Creating a Table
  • 14.
  • 15.
    15 CREATE STREAM vip_actionsAS SELECT c.user_id, fullname, url, status FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level = 'Platinum'; Joins for Enrichment
  • 16.
  • 17.
  • 18.
    18 How to deployand use KSQL
  • 19.
    19 How to runKSQL JVM KSQL Server KSQL CLI JVM KSQL Server JVM KSQL Server Kafka Cluster #1 Client-server
  • 20.
    20 How to runKSQL #1 Client-server • Start any number of server nodes bin/ksql-server-start • Start one or more CLIs and point them to a server bin/ksql https://myksqlserver:8090 • All servers share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart
  • 21.
    21 How to runKSQL JVM KSQL Server JVM KSQL Server JVM KSQL Server #2 as a standalone Application Kafka Cluster
  • 22.
    22 How to runKSQL #2 as a standalone Application • Start any number of server nodes Pass a file of KSQL statement to execute bin/ksql-node query-file=foo/bar.sql • Ideal for streaming ETL application deployment Version-control your queries and transformations as code • All running engines share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart
  • 23.
    23 How to runKSQL #3 EMBEDDED IN AN APPLICATION JVM App Instance KSQL Engine Application Code JVM App Instance KSQL Engine Application Code JVM App Instance KSQL Engine Application Code Kafka Cluster
  • 24.
    24 How to runKSQL #3 EMBEDDED IN AN APPLICATION • Embed directly in your Java application • Generate and execute KSQL queries through the Java API Version-control your queries and transformations as code • All running application instances share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart
  • 25.
  • 26.
    26 Internals Read input fromKafka Operator DAG: • filter/map/aggregation/joins • Operators can be stateful Write result back to Kafka
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    31 Take home • StreamingSQL engine for Apache Kafka • Leverages Kafka Streams • Open Source: Apache 2.0 • https://github.com/confluentinc/ksql/ • Included in Confluent Platform 4.1 • https://www.confluent.io/product/ksql/ • https://docs.confluent.io/current/ksql/docs/index.html
  • 32.