KSQLOpen-source streaming for Apache Kafka
let's look at
Kafka Streams
First!
But ...is a
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
Flexibility Simplicity
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
Flexibility Simplicity
Before
DashboardProcessing Cluster
Your Job
Shared Database
After
Dashboard
APP
Streams
API
Things Kafka Streams Does
Runs
everywhere
Clustering
done for you
Exactly-once
processing
Event-time
processing
Integrated
database
Joins, windowing,
aggregation
S/M/L/XL/XXL/XXXL
sizes
A Familiar DSL
A Kafka story!
https://github.com/framiere/a-kafka-story
No Panic, it's a walktrough!
https://github.com/framiere/a-kafka-story/tree/master/step6
Expressive ... but verbose
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Select…from…
• Join…where…
• Group by..
KSQL
Flexibility Simplicity
Declarative
Stream
Language
Processing
KSQLis a
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
Stream Processing by Analogy
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
KSQL
are some
what
use cases?
KSQL for Data Exploration
SELECT status, bytes
FROM clickstream
WHERE user_agent =
‘Mozilla/5.0 (compatible; MSIE 6.0)’;
An easy way to inspect data in a running cluster
KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS 

SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id 

WHERE u.level = 'Platinum';
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS

SELECT card_number, count(*)

FROM authorization_attempts 

WINDOW TUMBLING (SIZE 5 SECONDS)

GROUP BY card_number

HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
KSQL implements for you the Kafka Stream
application you would have implemented if you had
• ... the time
• ... the experience
• ... the KSQL as a spec
• ... the willingness to do boring code
KSQL for Data Transformation
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS 

SELECT * FROM clickstream PARTITION BY user_id;
Make simple derivations of existing topics from the command line
Demo fun
https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/
Where is KSQL not such a great fit?
BI reports (Tableau etc.)
• No indexes
• No JDBC (most BI tools are not
good with continuous results!)
Ad-hoc queries
• Limited span of time usually
retained in Kafka
• No indexes
Do you think that’s a table you are
querying ?
Stream/Table Duality
alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1
TABLE STREAM TABLE
(“alice”, 1)
(“charlie”, 1)
(“alice”, 2)
(“bob”, 1)
alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1
CREATE STREAM clickstream (
time BIGINT,
url VARCHAR,
status INTEGER,
bytes INTEGER,
userid VARCHAR,
agent VARCHAR)
WITH (
value_format = ‘JSON’,
kafka_topic=‘my_clickstream_topic’
);
Creating a Stream
CREATE TABLE users (
user_id INTEGER,
registered_at LONG,
username VARCHAR,
name VARCHAR,
city VARCHAR,
level VARCHAR)
WITH (
key = ‘user_id',
kafka_topic=‘clickstream_users’,
value_format=‘JSON');
Creating a Table
CREATE STREAM vip_actions AS
SELECT userid, fullname, url, status 

FROM clickstream c 

LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
Joins for Enrichment
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Select…from…
• Join…where…
• Group by..
KSQL
Flexibility Simplicity
Kafka Cluster
JVM
KSQL ServerKSQL CLI
#1 STAND-ALONE AKA ‘LOCAL MODE’
How to run KSQL
How to run KSQL
JVM
KSQL Server
KSQL CLI
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
#2 CLIENT-SERVER
How to run KSQL
Kafka Cluster
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
#3 AS A STANDALONE APPLICATION
How to run KSQL
Kafka Cluster
#4 EMBEDDED IN AN APPLICATION
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
Confluent 5.0 - Nested Types
SELECT userid, address.city
FROM users
WHERE address.state = 'CA'
https://github.com/confluentinc/ksql/pull/1114
Confluent 5.0 - Remaining joins
SELECT orderid, shipmentid
FROM orders INNER JOIN shipments
ON order.id = shipmentid;
Confluent 5.0 - User defined functions
SELECT address, stringLength(address->street) from orders;
KSQL Capacity Planning – Sizing
https://docs.confluent.io/current/ksql/docs/capacity-planning.html
Do you demo!
https://tinyurl.com/ksql-parisjug
Control Center
Control Center
Control Center
Resources – Community Slack and Mailing List
https://slackpass.io/confluentcommunity
https://groups.google.com/forum/#!forum/confluent-platform
Resources and Next Steps
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
Links
• https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/
• https://gist.github.com/rmoff/7efa882dfd808dbab4eb7b8e6f9eda16
• https://asciinema.org/a/nNfYKsVFwzOKMIWfdj5L16uzW
• https://github.com/confluentinc/ksql/blob/master/ksql-examples/src/main/java/io/confluent/ksql/datagen/
DataGen.java
• https://github.com/confluentinc/ksql/blob/master/ksql-parser/src/main/antlr4/io/confluent/ksql/parser/SqlBase.g4
• https://github.com/confluentinc/ksql/blob/v5.1.0-beta201806200051/ksql-engine/src/main/java/io/confluent/ksql/
KsqlContext.java
• https://github.com/mmolimar/ksql-jdbc-driver
• https://twitter.com/alexott_en/status/1008384806693097472
• https://github.com/framiere/a-kafka-story
• https://docs.confluent.io/current/ksql/docs/syntax-reference.html
• https://github.com/confluentinc/cp-demo
• https://github.com/confluentinc/cp-ansible
• https://www.confluent.io/blog/introducing-the-confluent-operator-apache-kafka-on-kubernetes/
• https://www.confluent.io/blog/author/robin/
Links
http://bit.ly/ksql-parisjug
@framiere
florent@confluent.io

Paris jug ksql - 2018-06-28