Stream Processing Made Simple with KSQL

1
Kafka Stream Processing for
Everyone with KSQL
Nick Dearden, August 2017

2
Stream Processing by Analogy
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Stream Processing
Kafka
Cluster
Connect API Connect API

3
Kafka Stream Processing Evolution

5
Consumer,
Producer
Kafka
Streams
KSQL
Flexibility Simplicity
subscribe(),
poll(), send(),
flush()
mapValues(),
filter(),
punctuate()
Select…from…
join…where…
group by..

6
On the Shoulders of (Streaming) Giants
• Native, 100%-compatible Kafka integration
• Secure stream processing using Kafka’s security features
• Elastic and highly scalable
• Fault-tolerant
• Stateful and stateless computations
• Interactive queries
• Time model
• Supports late-arriving and out-of-order data
• Windowing
• Millisecond processing latency, no micro-batching
• At-least-once and exactly-once processing guarantees

7
Where can KSQL help ?
• Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe
• Anomaly Detection
• Identifying patterns or anomalies in real-time data, surfaced in milliseconds
• Monitoring
• Log data monitoring, tracking and alerting
• Sensor / IoT data
• Simple derivations of existing topics
• One line to re-partition and/or re-key a topic for new uses

8
Where is KSQL not such a great fit ?
• Ad-hoc query
• Limited span of time usually retained in Kafka
• No indexes
• BI reports (Tableau etc.)
• No indexes
• No JDBC (most BI tools are not good with continuous results!)

9
Do you think that’s a table you are querying ?

10
Running Example
Ecommerce Site HadoopClick-stream Support

12
Schema and Format
• A Kafka broker knows how to move <sets of bytes>
• Technically a message is ( ts, byte[], byte[] )
• SQL-like queries require a richer structure
• Start with message (value) format
• JSON
• DELIMITED (comma-separated in this preview)
• AVRO – requires that you supply a .avsc schema-file
• Pseudo-columns are automatically generated
• ROWKEY, ROWTIME

13
CREATE STREAM clickstream (
time bigint,
url varchar,
status int,
bytes int,
userid varchar,
agent varchar)
WITH (
value_format = ‘JSON',
kafka_topic=‘my_clickstream_topic');

14
SELECTing from the Stream
SELECT *
FROM clickstream
WHERE status >= 400
AND lcase(url) LIKE ‘%html'
CREATE STREAM error_stream AS
LIMIT 5;

15
Aggregates and Windowing
• COUNT, SUM, MIN, MAX
• Windowing - Not strictly ANSI SQL 
• Three window types supported:
• TUMBLING
• HOPPING (aka ‘sliding’)
• SESSION
SELECT url, ip, count(*) as problem_count
FROM clickstream
WINDOW TUMBLING(size 1 minute)
WHERE loglevel = ‘ERROR’
GROUP BY url, ip;

16
Joins for Enrichment
CREATE STREAM vip_actions AS
SELECT userid, fullname, url, status
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';

18
KSQL Components
• CLI
• Designed to be familiar to users of MySQL, Postgres, etc
• Engine
• Actually runs the Kafka Streams topologies
• REST Server
• HTTP interface allows an Engine to receive instructions from the CLI

19
How to run KSQL - #1 Stand-alone aka ‘local mode’
• Starts a CLI, an Engine, and a REST server all in the same JVM
• Ideal for laptop development
• Start with default settings:
> bin/ksql-cli local
• Or with customized settings:
> bin/ksql-cli local –-properties-file foo/bar/ksql.properties

20
How to run KSQL - #2 Client-Server
• Start any number of Server nodes
• > bin/ksql-server-start
• Start any number of CLIs and specify ‘remote’ server address
• >bin/ksql-cli remote http://myserver:8090
• All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up/down without restart

21
How to run KSQL - #3 as an Application
How do you deploy applications ?

22
How to run KSQL - #3 as an Application
• Start any number of Engine instances
• Pass a file of KSQL statements to execute
> bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
• Version control your queries and transformations as code
• All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up/down without restart

23
Dedicating resources
• Join Engines to the same ‘service pool’ by means of the ksql.service.id property
• In both Client-Server and Application modes

24
Developer Preview
• Known Issues
• May leave orphaned topics from interactive or terminated queries
• Caveats
• No ‘ORDER BY’
• Stream-stream joins
• Limited function library (see syntax guide on GitHub)
• APIs / KSQL Syntax may change depending on feedback

25
Resources & Next Steps
Time to get involved !
• Try the Quickstart on Github
• Check out the code
• Play with the examples
The point of ‘developer preview’ is that we can change things for the better, together
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql

Stream Processing Made Simple with KSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Stream Processing Made Simple with KSQL

Similar to Stream Processing Made Simple with KSQL (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Stream Processing Made Simple with KSQL