patterns
exploring
Neil is a senior engineer and technologist at Confluent, the
company founded by the creators of Apache Kafka®. He has over
20 years of expertise of working on distributed computing,
messaging and stream processing. He has built or redesigned
commercial messaging platforms, distributed caching products as
well as developed large scale bespoke systems for tier-1 banks.
After a period at ThoughtWorks, he went on to build some of the
first distributed risk engines in financial services. In 2008 he
launched a startup that specialised in distributed data analytics and
visualization. Then prior to joining Confluent he was the CTO at a
fintech consultancy.
Neil Avery
Senior Engineer and Technologist, Confluent
Declarative
Stream
Language
Processing
KSQLis a
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
KSQL Concepts
KSQL Concepts
• Streams are first-class citizens
• Tables are first-class citizens
• 2 types of Queries (Pers, Trans)
• Some queries are persistent
• Persistent Queries populate Streams and Tables
• Transient Queries are for users interaction
• All queries run until terminated
CREATE STREAM clickstream
WITH (
value_format = ‘JSON’,
kafka_topic=‘my_clickstream_topic’
);
Creating a Stream
• “Data in motion”
• Let’s say we have a topic called my_clickstream_topic
• The topic contains JSON data
• KSQL now knows about that topic
Exploring that Stream
SELECT status, bytes
FROM clickstream
WHERE user_agent =
‘Mozilla/5.0 (compatible; MSIE 6.0)’;
• Now that the stream exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query
CREATE TABLE users
(user_id int, registered_At long …)
WITH (
key = ‘user_id',
kafka_topic=‘clickstream_users’,
value_format=‘JSON’
);
Creating a Table
• “A stateful view of a data in motion”
• Can be built from a kafka topic OR another KSQL Stream
• We have a topic called my_clickstream_topic
• The topic contains JSON data, The topic contains changelog data
CREATE TABLE events_per_min
AS SELECT user_id, count(*)AS events
FROM clickstream WINDOW TUMBLING(
size 10 seconds= ‘user_id'
) GROUP BY user_id;
Creating a Table
• Derived from a stream
• Windowed aggregate
Inspecting that Table
SELECT userid, username
FROM users
WHERE level = ‘Platinum’;
• Now that the table exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query
Joining a Stream to a Table
• Now that we have clickstream and users, we can join them
• This allows us to do filtering of clicks on a user attribute
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
Usage Patterns
KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
KSQL for Real-Time
Monitoring• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
KSQL for Data Transformation
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT * FROM clickstream PARTITION BY user_id;
Make simple derivations of existing topics from the command line
Stream Patterns
Stream
Stream with partitioning
(scaling out)
Stream
(fork)
Table view
(windowed - from stream)
Table
(scaled out – across nodes)
Stream table join
(left-join)
Streaming Apps - Clickstream
clickstream
stream (topic)
events_per_mi
n
table (window)
users table
table (topic)
platinum users
stream-table join
Demo
• Streams, Tables and Queries
• Many streaming patterns
• The clickstream demo is a good place to ‘grok'
Recap
Resources and Next Steps
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
Signup for
- Part 2: Development
- Part 3 Deployment
https://www.confluent.io/empowering-streams-through-ksql
Thank you!
Q&A
@avery_neil

Exploring KSQL Patterns

  • 1.
  • 2.
    Neil is asenior engineer and technologist at Confluent, the company founded by the creators of Apache Kafka®. He has over 20 years of expertise of working on distributed computing, messaging and stream processing. He has built or redesigned commercial messaging platforms, distributed caching products as well as developed large scale bespoke systems for tier-1 banks. After a period at ThoughtWorks, he went on to build some of the first distributed risk engines in financial services. In 2008 he launched a startup that specialised in distributed data analytics and visualization. Then prior to joining Confluent he was the CTO at a fintech consultancy. Neil Avery Senior Engineer and Technologist, Confluent
  • 3.
  • 4.
  • 5.
  • 7.
    KSQL Concepts • Streamsare first-class citizens • Tables are first-class citizens • 2 types of Queries (Pers, Trans) • Some queries are persistent • Persistent Queries populate Streams and Tables • Transient Queries are for users interaction • All queries run until terminated
  • 8.
    CREATE STREAM clickstream WITH( value_format = ‘JSON’, kafka_topic=‘my_clickstream_topic’ ); Creating a Stream • “Data in motion” • Let’s say we have a topic called my_clickstream_topic • The topic contains JSON data • KSQL now knows about that topic
  • 9.
    Exploring that Stream SELECTstatus, bytes FROM clickstream WHERE user_agent = ‘Mozilla/5.0 (compatible; MSIE 6.0)’; • Now that the stream exists, we can examine its contents • Simple, declarative filtering • A non-persistent query
  • 10.
    CREATE TABLE users (user_idint, registered_At long …) WITH ( key = ‘user_id', kafka_topic=‘clickstream_users’, value_format=‘JSON’ ); Creating a Table • “A stateful view of a data in motion” • Can be built from a kafka topic OR another KSQL Stream • We have a topic called my_clickstream_topic • The topic contains JSON data, The topic contains changelog data
  • 11.
    CREATE TABLE events_per_min ASSELECT user_id, count(*)AS events FROM clickstream WINDOW TUMBLING( size 10 seconds= ‘user_id' ) GROUP BY user_id; Creating a Table • Derived from a stream • Windowed aggregate
  • 12.
    Inspecting that Table SELECTuserid, username FROM users WHERE level = ‘Platinum’; • Now that the table exists, we can examine its contents • Simple, declarative filtering • A non-persistent query
  • 13.
    Joining a Streamto a Table • Now that we have clickstream and users, we can join them • This allows us to do filtering of clicks on a user attribute CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 14.
  • 15.
    KSQL for StreamingETL • Kafka is popular for data pipelines. • KSQL enables easy transformations of data within the pipe. • Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 16.
    KSQL for AnomalyDetection CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  • 17.
    KSQL for Real-Time Monitoring•Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 18.
    KSQL for DataTransformation CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id; Make simple derivations of existing topics from the command line
  • 19.
    Stream Patterns Stream Stream withpartitioning (scaling out) Stream (fork) Table view (windowed - from stream) Table (scaled out – across nodes) Stream table join (left-join)
  • 20.
    Streaming Apps -Clickstream clickstream stream (topic) events_per_mi n table (window) users table table (topic) platinum users stream-table join
  • 21.
  • 22.
    • Streams, Tablesand Queries • Many streaming patterns • The clickstream demo is a good place to ‘grok' Recap
  • 23.
    Resources and NextSteps https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql Signup for - Part 2: Development - Part 3 Deployment https://www.confluent.io/empowering-streams-through-ksql
  • 24.