Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1Confidential
KSQL
An Open Source Streaming SQL Engine for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waeh...
2KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Dem...
3KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Dem...
4KSQL- Streaming SQL for Apache Kafka
Apache Kafka - A Distributed, Scalable Commit Log
5KSQL- Streaming SQL for Apache Kafka
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Apache Kafka – The R...
6KSQL- Streaming SQL for Apache Kafka
Apache Kafka – The Rise of a Streaming Platform
7KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
8KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Dem...
9KSQL- Streaming SQL for Apache Kafka
Why KSQL?
Population
CodingSophistication
Realm of Stream Processing
New, Expanded R...
10KSQL- Streaming SQL for Apache Kafka
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
• mapValues()
• filter()
• pun...
11KSQL- Streaming SQL for Apache Kafka
What is it for ?
Streaming ETL
• Kafka is popular for data pipelines
• KSQL enables...
12KSQL- Streaming SQL for Apache Kafka
What is it for ?
Simple Derivations of Existing Topics
• One-liner to re-partition ...
13KSQL- Streaming SQL for Apache Kafka
What is it for ?
Analytics, e.g. Anomaly Detection
• Identifying patterns or anomal...
14KSQL- Streaming SQL for Apache Kafka
What is it for ?
Real Time Monitoring
• Log data monitoring, tracking and alerting
...
15KSQL- Streaming SQL for Apache Kafka
Where is KSQL not such a great fit (at least today)?
Powerful ad-hoc query
○ Limite...
16KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live De...
17KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
18KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● No need for source code
• Zero, none at all, not even one line.
• N...
19KSQL- Streaming SQL for Apache Kafka
STREAM and TABLE as first-class citizens
20KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` ...
21KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expressi...
22KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WIN...
23KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries
● Three types supported (same as Ka...
24KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live De...
25KSQL- Streaming SQL for Apache Kafka
Create a STREAM and a TABLE from Kafka Topics
ksql> CREATE STREAM pageviews_origina...
26KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Hello World
27KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live De...
28KSQL- Streaming SQL for Apache Kafka
KSQL - Components
KSQL has 3 main components:
1. The CLI, designed to be familiar t...
29KSQL- Streaming SQL for Apache Kafka
Kafka Cluster
JVM
KSQL EngineRESTKSQL>
#1 STAND-ALONE AKA ‘LOCAL MODE’
30KSQL- Streaming SQL for Apache Kafka
#1 STAND-ALONE AKA ‘LOCAL MODE’
Starts a CLI, an Engine,
and a REST server all
in t...
31KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Kafka Cluster
JVM
KSQL Engine
REST
KSQL>
JVM
KSQL Engine
REST
JVM
...
32KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Start any number
of Server nodes
• > bin/ksql-server-start
Start a...
33KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Kafka Cluster
JVM
KSQL Engine
JVM
KSQL Engine
JVM
KSQL Engine
34KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Running the KSQL server
with a pre-defined set of
instruction...
35KSQL- Streaming SQL for Apache Kafka
Dedicating resources
36KSQL- Streaming SQL for Apache Kafka
How do you deploy applications?
37KSQL- Streaming SQL for Apache Kafka
Where to develop and operate your applications?
38KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live De...
39KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
Kafka
Producer
Elastic
search
Grafana
Kafka
Cluster
Kafk...
40KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
• https://github.com/confluentinc/ksql/tree/0.1.x/ksql-c...
41KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Clickstream Analysis
42KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live De...
43KSQL- Streaming SQL for Apache Kafka
KSQL Quick Start
github.com/confluentinc/ksql
Local runtime
or
Docker container
44KSQL- Streaming SQL for Apache Kafka
Remember: Developer Preview!
Caveats of Developer Preview
• No ORDER BY yet
• No St...
45KSQL- Streaming SQL for Apache Kafka
Resources and Next Steps
Get Involved
• Try the Quickstart on GitHub
• Check out th...
46KSQL- Streaming SQL for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.confluent....
47KSQL- Streaming SQL for Apache Kafka
Appendix
48KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● STREAM and TABLE as first-class citizens
● Interpretations of topic...
49KSQL- Streaming SQL for Apache Kafka
Schema & Format
● A Kafka broker knows how to move bytes
• Technically a key-value ...
50KSQL- Streaming SQL for Apache Kafka
Schema & Format
Start with message (value) format
● JSON - the simplest choice
● DE...
51KSQL- Streaming SQL for Apache Kafka
Schema & Datatypes
● varchar / string
● boolean / bool
● integer / int
● bigint / l...
52KSQL- Streaming SQL for Apache Kafka
Interactive Querying
● Great for iterative development
● LIST (or SHOW) STREAMS / T...
53KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WIN...
54KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries :-)
● Three types supported (same a...
55KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS SELECT
● Once your query is ready and you want to run your query n...
56KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` ...
57KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS SELECT
● Once your query is ready and you want to run it non-intera...
58KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expressi...
59KSQL- Streaming SQL for Apache Kafka
Functions
● Scalar Functions:
• CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE
•...
60KSQL- Streaming SQL for Apache Kafka
Session Variables
● Just as in MySQL, ORCL etc. there are settings to control how y...
Upcoming SlideShare
Loading in …5
×

KSQL – An Open Source Streaming Engine for Apache Kafka

1,375 views

Published on

The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is an open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. The project is managed and open sourced by Confluent.

KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing logic as an alternative to writing an application in a programming language such as Java, Python or Go. Benefits of using KSQL include: No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem.

This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.

Published in: Technology
  • You might also like this slide 'Apache Kafka vs MapR-ES: Fit for purpose/Decision tree': https://www.slideshare.net/sbaltagi/apache-kafka-vs-mapres-fit-for-purposedecision-tree
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

KSQL – An Open Source Streaming Engine for Apache Kafka

  1. 1. 1Confidential KSQL An Open Source Streaming SQL Engine for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  2. 2. 2KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  3. 3. 3KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  4. 4. 4KSQL- Streaming SQL for Apache Kafka Apache Kafka - A Distributed, Scalable Commit Log
  5. 5. 5KSQL- Streaming SQL for Apache Kafka The Log ConnectorsConnectors Producer Consumer Streaming Engine Apache Kafka – The Rise of a Streaming Platform
  6. 6. 6KSQL- Streaming SQL for Apache Kafka Apache Kafka – The Rise of a Streaming Platform
  7. 7. 7KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  8. 8. 8KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  9. 9. 9KSQL- Streaming SQL for Apache Kafka Why KSQL? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java
  10. 10. 10KSQL- Streaming SQL for Apache Kafka Trade-Offs • subscribe() • poll() • send() • flush() • mapValues() • filter() • punctuate() • Select…from… • Join…where… • Group by.. Flexibility Simplicity Kafka Streams KSQL Consumer Producer
  11. 11. 11KSQL- Streaming SQL for Apache Kafka What is it for ? Streaming ETL • Kafka is popular for data pipelines • KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  12. 12. 12KSQL- Streaming SQL for Apache Kafka What is it for ? Simple Derivations of Existing Topics • One-liner to re-partition and / or re-key a topic for new uses CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id;
  13. 13. 13KSQL- Streaming SQL for Apache Kafka What is it for ? Analytics, e.g. Anomaly Detection • Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTES) GROUP BY card_number HAVING count(*) > 3;
  14. 14. 14KSQL- Streaming SQL for Apache Kafka What is it for ? Real Time Monitoring • Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  15. 15. 15KSQL- Streaming SQL for Apache Kafka Where is KSQL not such a great fit (at least today)? Powerful ad-hoc query ○ Limited span of time usually retained in Kafka ○ No indexes BI reports (Tableau etc.) ○ No indexes ○ No JDBC (most Bi tools are not good with continuous results!)
  16. 16. 16KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  17. 17. 17KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  18. 18. 18KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● No need for source code • Zero, none at all, not even one line. • No SerDes, no generics, no lambdas, ... ● All the Kafka Streams “magic” out-of-the-box • Exactly Once Semantics • Windowing • Event-time aggregation • Late-arriving data • Distributed, fault-tolerant, scalable, ...
  19. 19. 19KSQL- Streaming SQL for Apache Kafka STREAM and TABLE as first-class citizens
  20. 20. 20KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  21. 21. 21KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  22. 22. 22KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  23. 23. 23KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries ● Three types supported (same as Kafka Streams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  24. 24. 24KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  25. 25. 25KSQL- Streaming SQL for Apache Kafka Create a STREAM and a TABLE from Kafka Topics ksql> CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews', value_format='DELIMITED'); ksql> CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH (kafka_topic='users', value_format='JSON'); ksql> SELECT pageid FROM pageviews_original LIMIT 3; ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
  26. 26. 26KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Hello World
  27. 27. 27KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  28. 28. 28KSQL- Streaming SQL for Apache Kafka KSQL - Components KSQL has 3 main components: 1. The CLI, designed to be familiar to users of MySQL, Postgres etc. 2. The Engine which actually runs the Kafka Streams topologies 3. The REST server interface enables an Engine to receive instructions from the CLI (Note that you also need a Kafka Cluster… KSQL is deployed independently)
  29. 29. 29KSQL- Streaming SQL for Apache Kafka Kafka Cluster JVM KSQL EngineRESTKSQL> #1 STAND-ALONE AKA ‘LOCAL MODE’
  30. 30. 30KSQL- Streaming SQL for Apache Kafka #1 STAND-ALONE AKA ‘LOCAL MODE’ Starts a CLI, an Engine, and a REST server all in the same JVM Ideal for laptop development • Start with default settings: • > bin/ksql-cli local Or with customized settings: • > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
  31. 31. 31KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Kafka Cluster JVM KSQL Engine REST KSQL> JVM KSQL Engine REST JVM KSQL Engine REST
  32. 32. 32KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Start any number of Server nodes • > bin/ksql-server-start Start any number of CLIs and specify ‘remote’ server address • >bin/ksql-cli remote http://myserver:8090 All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up / down without restart
  33. 33. 33KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Kafka Cluster JVM KSQL Engine JVM KSQL Engine JVM KSQL Engine
  34. 34. 34KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Running the KSQL server with a pre-defined set of instructions/queries • Version control your queries and transformations as code Start any number of Engine instances • Pass a file of KSQL statements to execute • > bin/ksql-node query-file=foo/bar.sql All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  35. 35. 35KSQL- Streaming SQL for Apache Kafka Dedicating resources
  36. 36. 36KSQL- Streaming SQL for Apache Kafka How do you deploy applications?
  37. 37. 37KSQL- Streaming SQL for Apache Kafka Where to develop and operate your applications?
  38. 38. 38KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  39. 39. 39KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis Kafka Producer Elastic search Grafana Kafka Cluster Kafka Connect KSQL Stream of Log Events
  40. 40. 40KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis • https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo#clickstream-analysis • Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana • 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I • Setup in 5 minutes (with or without Docker) SELECT STREAM CEIL(timestamp TO HOUR) AS timeWindow, productId, COUNT(*) AS hourlyOrders, SUM(units) AS units FROM Orders GROUP BY CEIL(timestamp TO HOUR), productId; timeWindow | productId | hourlyOrders | units ------------+-----------+--------------+------- 08:00:00 | 10 | 2 | 5 08:00:00 | 20 | 1 | 8 09:00:00 | 10 | 4 | 22 09:00:00 | 40 | 1 | 45 ... | ... | ... | ...
  41. 41. 41KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Clickstream Analysis
  42. 42. 42KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  43. 43. 43KSQL- Streaming SQL for Apache Kafka KSQL Quick Start github.com/confluentinc/ksql Local runtime or Docker container
  44. 44. 44KSQL- Streaming SQL for Apache Kafka Remember: Developer Preview! Caveats of Developer Preview • No ORDER BY yet • No Stream-stream joins yet • Limited function library • Avro support only via workaround • Breaking API / Syntax changes still possible BE EXCITED, BUT BE ADVISED
  45. 45. 45KSQL- Streaming SQL for Apache Kafka Resources and Next Steps Get Involved • Try the Quickstart on GitHub • Check out the code • Play with the examples The point of a developer preview is to improve things—together! https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
  46. 46. 46KSQL- Streaming SQL for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.confluent.io www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me… Come to our booth… Come to Kafka Summit London in April 2018…
  47. 47. 47KSQL- Streaming SQL for Apache Kafka Appendix
  48. 48. 48KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream • One record per key (per window) • Current values (compacted topic) not yet • Changelog ● STREAM – TABLE Joins
  49. 49. 49KSQL- Streaming SQL for Apache Kafka Schema & Format ● A Kafka broker knows how to move bytes • Technically a key-value message (byte[], byte[]) ● To enable declarative SQL-like queries and transformations we have to define a richer structure ● Structural metadata maintained in an in-memory catalog • DDL is recorded in a special topic
  50. 50. 50KSQL- Streaming SQL for Apache Kafka Schema & Format Start with message (value) format ● JSON - the simplest choice ● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in. Will be expanded. ● AVRO - requires that you also supply a schema-file (.avsc) Pseudo-columns are automatically provided • ROWKEY, ROWTIME - for querying the message key and timestamp • (PARTITION, OFFSET coming soon) • CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH (value_format = 'delimited', kafka_topic='my_pageview_topic');
  51. 51. 51KSQL- Streaming SQL for Apache Kafka Schema & Datatypes ● varchar / string ● boolean / bool ● integer / int ● bigint / long ● double ● array(of_type) - of-type must be primitive (no nested Array or Map yet) ● map(key_type, value_type) - key-type must be string, value-type must be primitive
  52. 52. 52KSQL- Streaming SQL for Apache Kafka Interactive Querying ● Great for iterative development ● LIST (or SHOW) STREAMS / TABLES ● DESCRIBE STREAM / TABLE ● SELECT • Selects rows from a KSQL stream or table. • The result of this statement will be printed out in the console. • To stop the continuous query in the CLI press Ctrl+C.
  53. 53. 53KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  54. 54. 54KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries :-) ● Three types supported (same as KStreams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  55. 55. 55KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS SELECT ● Once your query is ready and you want to run your query non-interactively • CREATE STREAM AS SELECT ...; ● Creates a new KSQL Stream along with the corresponding Kafka topic and streams the result of the SELECT query into the topic ● To find what streams are already running: • SHOW QUERIES; ● If you need to stop one: • TERMINATE query_id;
  56. 56. 56KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  57. 57. 57KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS SELECT ● Once your query is ready and you want to run it non-interactively ● CREATE TABLE AS SELECT ...; ● Just like ‚CREATE STREAM AS SELECT‘ but for aggregations
  58. 58. 58KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  59. 59. 59KSQL- Streaming SQL for Apache Kafka Functions ● Scalar Functions: • CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE • ABS, CEIL, FLOOR, RANDOM, ROUND • StringToTimestamp, TimestampToString • GetStringFromJSON • CAST ● Aggregate Functions: • SUM, COUNT, MIN, MAX ● User- defined Functions: • Java Interface
  60. 60. 60KSQL- Streaming SQL for Apache Kafka Session Variables ● Just as in MySQL, ORCL etc. there are settings to control how your CLI behaves ● Set any property the KStreams consumers/producers will understand ● Defaults can be set in the ksql.properties file ● To see a list of currently set or default variable values: • ksql> show properties; ● Useful examples: • num.stream.threads=4 • commit.interval.ms=1000 • cache.max.bytes.buffering=2000000 ● TIP! - Your new best friend for testing or building a demo is: • ksql> set ‘auto.offset.reset’ = ‘earliest’;

×