This document discusses ksqlDB, a streaming SQL engine for Apache Kafka. It allows users to write streaming applications using familiar SQL queries against Kafka topic data. Some key points made include:
- ksqlDB allows users to create, select, and join streaming data in Kafka topics using SQL queries without the need for Java or other code
- It provides a simpler way to build streaming applications compared to Kafka Streams by using SQL
- Examples show how ksqlDB can be used for real-time monitoring, anomaly detection, streaming ETL, and data transformations.
8. 8
The log is a simple idea
Messages are added at the end of the
log
Old New
9. 9
Shard data to get scalability
Messages are sent to different partitions
Producer
(1)
Producer
(2)
Producer
(3)
Cluster of
machines
Partitions live on
different machines
Messages are sent to
different partitions
13. 13C O N F I D E N T I A L
KSQL
The streaming SQL engine for Apache Kafka®
to write real-time applications in SQL
14. 14C O N F I D E N T I A L
KSQL
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
streams
Lowering the
bar: KSQL vs.
Kafka
Streams
Lower the bar to enter the world of streaming
vs.
15. 15C O N F I D E N T I A L
KSQL
● You write only SQL.
No Java, Python, or
other boilerplate to
wrap around it!
● Create KSQL user
defined functions in
Java when needed.
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
16. 16C O N F I D E N T I A L
All you need is Kafka and KSQL
1.Build & package
2. Submit job
required for
fault-tolerance
ksql> SELECT * FROM myStream
Without KSQL With KSQL
processing
storage
17. 17C O N F I D E N T I A L
Something to remember !
KSQL is a process.*
*But what was announced at #kafkasummit is slightly different
18. 18C O N F I D E N T I A L
Data exploration
KSQL example use cases
Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
19. 19C O N F I D E N T I A L
Example: CDC from DB via Kafka to Elastic
KSQL processes table
changes in real-time
Kafka Connect
streams data in
Kafka Connect
streams data out
20. 20C O N F I D E N T I A L
Example: Retail
KSQL joins the two
streams in real-time
Stream of shipments
that arrive
Stream of purchases from
online and physical stores
21. 21C O N F I D E N T I A L
Example: IoT, Automotive, Connected Cars
KSQL joins the two
streams in real-time
Kafka Connect
streams data in
Cars send telemetry data
via Kafka API
Kafka Streams application
to notify customers
22. 22C O N F I D E N T I A L
KSQL for Real-Time Monitoring
● Log data monitoring
● Tracking and alerting
● Syslog data
● Sensor / IoT data
● Application metrics
CREATE STREAM syslog_invalid_users AS
SELECT host, message
FROM syslog
WHERE message LIKE '%Invalid user%';
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
23. 23C O N F I D E N T I A L
KSQL for Anomaly Detection
● Identify patterns or
anomalies in real-
time data, surfaced
in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
24. 24C O N F I D E N T I A L
KSQL for Streaming ETL
● Joining, filtering, and
aggregating streams
of event data
CREATE STREAM vip_actions AS
SELECT user_id, page, action
FROM clickstream c
LEFT JOIN users u
ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
25. 25C O N F I D E N T I A L
KSQL for Data Transformation
● Easily make
derivations of
existing topics
CREATE STREAM pageviews_avro
WITH (PARTITIONS=6,
VALUE_FORMAT='AVRO') AS
SELECT * FROM pageviews_json
PARTITION BY user_id;
26. 26C O N F I D E N T I A L
Updates from Kafka Summit San Fran 2019
Connectors to work
closely with KSQL
Lookups made simpler
Overall KSQL is a Process
& also a database
https://bit.ly/33V17X8