Exploring KSQL Patterns

2
Tim is a teacher, author and technology leader with
Confluent. He is not only an expert on KSQL but he can
also frequently be found speaking at conferences in the
United States and all over the world. He is the co-presenter
of various O’Reilly training videos on topics ranging from Git
to Distributed Systems, and he is the author of Gradle
Beyond the Basics.
Tim Berglund
Senior Director of Developer Experience,
Confluent

3
Housekeeping Items
● This session will last about an hour.
● It will be recorded.
● You can submit your questions by entering them into the GoToWebinar panel.
● The last 10 minutes will consist of Q&A.
● The slides and recording will be available after the talk.

Declarative
Stream
Language
Processing
KSQLis a

KSQLis the
Streaming
SQL Enginefor
Apache Kafka

KSQL Concepts
• Streams are first-class citizens
• Tables are first-class citizens
• Some queries are persistent
• All queries run until terminated

CREATE STREAM clickstream
WITH (
value_format = ‘JSON’,
kafka_topic=‘my_clickstream_topic’
);
Creating a Stream
• Let’s say we have a topic called my_clickstream_topic
• The topic contains JSON data
• KSQL now knows about that topic

Exploring that Stream
SELECT status, bytes
FROM clickstream
WHERE user_agent =
‘Mozilla/5.0 (compatible; MSIE 6.0)’;
• Now that the stream exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query

CREATE TABLE users
WITH (
key = ‘user_id',
kafka_topic=‘clickstream_users’,
value_format=‘JSON’
);
Creating a Table
• We have a topic called my_clickstream_topic
• The topic contains JSON data
• The topic contains changelog data

Inspecting that Table
SELECT userid, username
FROM users
WHERE level = ‘Platinum’;
• Now that the table exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query

Joining a Stream to a Table
• Now that we have clickstream and users, we can join them
• This allows us to do filtering of clicks on a user attribute
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';

KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';

KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds

KSQL for Real-Time
Monitoring• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;

KSQL for Data Transformation
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT * FROM clickstream PARTITION BY user_id;
Make simple derivations of existing topics from the command line

Kafka Cluster
JVM
KSQL ServerKSQL CLI
KSQL in Local Mode

• Starts a CLI and a server in the same JVM
• Ideal for developing on your laptop
bin/ksql-cli local
• Or with customized settings
bin/ksql-cli local --properties-file ksql.properties
KSQL in Local Mode

KSQL in Client-Server Mode
JVM
KSQL Server
KSQL CLI
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster

• Start any number of server nodes
bin/ksql-server-start
• Start one or more CLIs and point them to a server
bin/ksql-cli remote https://myksqlserver:8090
• All servers share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
KSQL in Client-Server Mode

KSQL in Application Mode
Kafka Cluster
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server

• Start any number of server nodes
Pass a file of KSQL statement to execute
bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
Version-control your queries and transformations as code
• All running engines share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
KSQL in Application Mode

Resources and Next Steps
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql

30
Thank you for attending Exploring KSQL
Patterns.

Exploring KSQL Patterns

More Related Content

What's hot

Similar to Exploring KSQL Patterns

More from confluent

Recently uploaded

Exploring KSQL Patterns

Editor's Notes