KSQL - Stream Processing simplified!

KSQL
Stream Processing leicht gemacht!
Guido Schmutz
@gschmutz doag2018

Agenda
1. Apache Kafka Overview
2. KSQL in Action – it’s demo time J
3. Summary
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!2

KSQL - Let’s try it with a “real-life” sample
Truck
Driver
jdbc-to-
kafka
truck_
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
driving-geohash
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
https://github.com/gschmutz/various-demos/tree/master/iot-truck-demo

Guido Schmutz
Working at Trivadis for more than 21 years
Oracle Groundbreaker Ambassador & Oracle ACE Director
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz

With over 650 specialists and IT experts in your region.
Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!6 11/21/18
16 Trivadis branches and more than
650 employees
Experience from more than 1,900
projects per year at over 800
customers
250 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable

Apache Kafka Overview

Apache Kafka – A Streaming Platform
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget

Apache Kafka – wait there is more!
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing

Kafka Connect - Overview
Source
Connector
Sink
Connector

Kafka Streams - Overview
Designed as a simple and lightweight
library in Apache Kafka
no other dependencies than Kafka
Supports fault-tolerant local state
Supports Windowing (Fixed, Sliding and
Session) and Stream-Stream / Stream-
Table Joins
Millisecond processing latency, no
micro-batching
At-least-once and exactly-once
processing guarantees
KTable<Integer, Customer> customers =
builder.stream(”customer");
KStream<Integer, Order> orders =
builder.stream(”order");
KStream<Integer, String> enriched =
orders.leftJoin(customers, …);
joined.to(”orderEnriched");
trucking_
driver
Kafka Broker
Java Application
Kafka Streams

KSQL - Overview
STREAM and TABLE as first-class
citizens
• STREAM = data in motion
• TABLE = collected state of a stream
Stream Processing with zero coding
using SQL-like language
Built on top of Kafka Streams
Interactive (CLI) and headless (command
file)
ksql> CREATE STREAM customer_s
WITH (kafka_topic=‘customer',
value_format=‘AVRO');
Message
----------------
Stream created
ksql> SELECT * FROM customer_s
WHERE address->country = ‘Switzerland’;
...
trucking_
driver
Kafka Broker
KSQL Engine
Kafka Streams
KSQL CLI Commands

KSQL in Action – it’s demo time J
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
13

Demo (I) – Data Ingestion via MQTT
truck/nn/
position
mqtt to
kafka
truck_position kafkacat
Position &
Driving Info
Testdata-Generator adapted from
Hortonworks Tutorial
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837

KSQL - Terminology
Stream
• “History”
• an unbounded sequence of structured
data ("facts")
• Facts in a stream are immutable
• new facts can be inserted to a
stream
• existing facts can never be updated
or deleted
• Streams can be created from a Kafka
topic or derived from an existing
stream
Table
• “State”
• a view of a stream, or another table,
and represents a collection of evolving
facts
• Facts in a table are mutable
• new facts can be inserted to the
table
• existing facts can be updated or
deleted
• Tables can be created from a Kafka
topic or derived from existing streams
and tables

Demo (II) – Create a STREAM on topic truck_position
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Position &
Driving Info
KSQL CLI
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837

CREATE STREAM
Create a new stream, backed by a Kafka topic, with the specified columns and
properties
Supported column data types:
• BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING
• ARRAY<ArrayType>
• MAP<VARCHAR, ValueType>
• STRUCT<FieldName FieldType, ...>
Supports the following serialization formats: CSV, JSON, AVRO
KSQL adds the implicit columns ROWTIME and ROWKEY to every stream
CREATE STREAM stream_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );

Start KSQL CLI
$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | . ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>

Create a STREAM on truck_position
ksql> CREATE STREAM truck_position_s
(ts VARCHAR,
truckId VARCHAR,
driverId BIGINT,
routeId BIGINT,
eventType VARCHAR,
latitude DOUBLE,
longitude DOUBLE,
correlationId VARCHAR)
WITH (kafka_topic='truck_position',
value_format=‘JSON');
Message
----------------
Stream created

SELECT
Selects rows from a KSQL stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out
in the console
from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
[ LIMIT count ];

Use SELECT to browse from Stream
ksql> SELECT * FROM truck_position_s;
1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 |
Normal | 36.84 | -94.83 | -6187001306629414077
Normal | 42.04 | -88.02 | -6187001306629414077
Normal | 38.33 | -94.35 | -6187001306629414077
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
Lane Departure | 38.98 | -92.53 | -6187001306629414077
Overspeed | 40.76 | -88.77 | -6187001306629414077
Unsafe following distance | 38.22 | -91.18 | -6187001306629414077

Demo (III) – CREATE AS … SELECT …
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
Position &
Driving Info
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837

CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the
result of the SELECT query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ]
ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];

INSERT INTO … SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s
schema and key
If the schema and partitioning column are incompatible with the stream, then the
statement will return an error
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];

CREATE AS … SELECT …
ksql> CREATE STREAM dangerous_driving_s
WITH (kafka_topic= dangerous_driving_s',
value_format='JSON')
AS SELECT * FROM truck_position_s
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running

CREATE AS … SELECT …
ksql> select * from dangerous_driving_s;
Lane Departure | 35.1 | -90.07 | -6187001306629414077

Demo (IV) – Aggregate and Window
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
Position &
Driving Info
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837

Windowing
Streams are unbounded
Some meaningful time frames to do
computations (i.e. aggregations) are needed
Computations over events done using windows of data
Fixed Window Sliding Window Session Window
Time
Stream of Data Window of Data

CREATE TABLE
Create a new table with the specified columns and properties
Supports same data types as CREATE STREAM
KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well
KSQL has currently the following requirements for creating a table from a Kafka topic
• message key must also be present as a field/column in the Kafka message value
• message key must be in VARCHAR aka STRING format
CREATE TABLE table_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );

Functions
Scalar Functions
• ABS, ROUND, CEIL, FLOOR
• ARRAYCONTAINS
• CONCAT, SUBSTRING, TRIM
• EXTRACJSONFIELD
• GEO_DISTANCE
• LCASE, UCASE
• MASK, MASK_KEEP_LEFT,
MASK_KEEP_RIGHT, MASK_LEFT,
MASK_RIGHT
• RANDOM
• STRINGTOTIMESTAMP,
TIMESTAMPTOSTRING
Aggregate Functions
• COUNT
• MAX
• MIN
• SUM
• TOPK
• TOPKDISTINCT
User-Defined Functions (UDF) and User-
Defined Aggregate Functions (UDAF)
• Currently only supported using Java

SELECT COUNT … GROUP BY
ksql> CREATE TABLE dangerous_driving_count AS
SELECT eventType, count(*) nof
FROM dangerous_driving_s
WINDOW TUMBLING (SIZE 30 SECONDS)
GROUP BY eventType;
Message
----------------------------
Table created and running
ksql> SELECT TIMESTAMPTOSTRING(ROWTIME,'yyyy-MM-dd HH:mm:ss.SSS’),
eventType, nof
FROM dangerous_driving_count;;
2018-10-16 05:12:19.408 | Unsafe following distance | 1
2018-10-16 05:12:39.615 | Unsafe tail distance | 1
2018-10-16 05:12:43.155 | Overspeed | 1

Joining
Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!
Challenges of joining streams
1. Data streams need to be aligned as they
come because they have different timestamps
2. since streams are never-ending, the joins
must be limited; otherwise join will never end
3. join needs to produce results continuously as
there is no end to the data
Stream to Static (Table) Join
Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-to-
Static Join
Stream-to-
Stream
Join
Stream-to-
Stream
Join
Time
Time
Time
11/21/1833

Demo (V) – Join Table to enrich with Driver data
Truck
Driver
jdbc-to-
kafka
truck-
driver
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
85","last_update":1506923052012}
Position &
Driving Info
dangerous-
driving & driver
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837

Get changes from driver table
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors"
-H "Content-Type: application/json"
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"truck_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'

Create Table with Driver State
ksql> CREATE TABLE driver_t
(id BIGINT,
first_name VARCHAR,
last_name VARCHAR,
available VARCHAR)
WITH (kafka_topic='truck_driver',
value_format='JSON',
key='id');
Message
----------------
Table created

Create Table with truck_position joined to driver_t
ksql> CREATE STREAM dangerous_driving_and_driver_s
WITH (kafka_topic='dangerous_driving_and_driver_s',
value_format='JSON’, partitions=8)
AS SELECT driverId, first_name, last_name, truckId, routeId,
eventtype, latitude, longitude
FROM truck_position_s
LEFT JOIN driver_t
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane
Departure | 39.01 | -93.85
1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe
following distance | 39.0 | -93.65

(VI) – UDF for calculating Geohash
Truck
Driver
jdbc-to-
kafka
truck_
driver
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
85","last_update":1506923052012}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
driving-geohash
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837

UDF for calculating Geohash
Geohash is a geocoding which encodes a
geographic location into a short string of letters
and digits
hierarchical spatial data structure which
subdivides space into buckets of grid shape
Length Area width x height
1 5,009.4km x 4,992.6km
2 1,252.3km x 624.1km
3 156.5km x 156km
4 39.1km x 19.5km
12 3.7cm x 1.9cm
ksql> SELECT latitude, longitude,
geohash(latitude, longitude, 4)
FROM dangerous_driving_s;
38.31 | -91.07 | 9yz1
37.7 | -92.61 | 9ywn
34.78 | -92.31 | 9ynm
42.23 | -91.78 | 9zw8xw
...
http://geohash.gofreerange.com/

UDF for calculating Geohash
Geohash and join to some important messages for drivers
@UdfDescription(name = "geohash",
description = "returns the geohash for a given LatLong")
public class GeoHashUDF {
@Udf(description = "encode lat/long to geohash of specified length.")
public String geohash(final double latitude, final double longitude,
int length) {
return GeoHash.encodeHash(latitude, longitude, length);
}
@Udf(description = "encode lat/long to geohash.")
public String geohash(final double latitude, final double longitude) {
return GeoHash.encodeHash(latitude, longitude);
}
}

Summary
11/21/18
gemacht!
41

Summary
KSQL is another way to work with data in Kafka => you can (re)use some of your SQL
knowledge
Similar semantics to SQL, but is for queries on continuous, streaming data
Well-suited for structured data (there is the ”S” in KSQL)
KSQL is dependent on “Kafka core”
• KSQL consumes from Kafka broker
• KSQL produces to Kafka broker
KSQL runs as a Java application and can be deployed to various resource managers
Use Kafka Connect or any other Stream Data Integration tool to bring your data into
Kafka first

Choosing the Right API
• Java, c#, c++, scala,
phyton, node.js,
go, php …
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM …
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent

Trivadis @ DOAG 2018
#opencompany
Booth: 3rd Floor – next to the escalator
We share our Know how!
Just come across, Live-Presentations
and documents archive
T-Shirts, Contest and much more
We look forward to your visit
11/21/18
gemacht!
44

Technology on its own won't help you.
You need to know how to use it properly.
11/21/18
gemacht!
45

KSQL - Stream Processing simplified!

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to KSQL - Stream Processing simplified!

Similar to KSQL - Stream Processing simplified! (20)

More from Guido Schmutz

More from Guido Schmutz (20)

Recently uploaded

Recently uploaded (20)

KSQL - Stream Processing simplified!