KSQL
Stream Processing leicht gemacht!
Guido Schmutz
@gschmutz doag2018
Agenda
1. Apache Kafka Overview
2. KSQL in Action – it’s demo time J
3. Summary
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!2
KSQL - Let’s try it with a “real-life” sample
Truck
Driver
jdbc-to-
kafka
truck_
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
driving-geohash
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!3
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
https://github.com/gschmutz/various-demos/tree/master/iot-truck-demo
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle Groundbreaker Ambassador & Oracle ACE Director
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!4
With over 650 specialists and IT experts in your region.
Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!6 11/21/18
16 Trivadis branches and more than
650 employees
Experience from more than 1,900
projects per year at over 800
customers
250 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Apache Kafka Overview
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!7
Apache Kafka – A Streaming Platform
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!8
Apache Kafka – wait there is more!
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!9
Kafka Connect - Overview
Source
Connector
Sink
Connector
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!10
Kafka Streams - Overview
Designed as a simple and lightweight
library in Apache Kafka
no other dependencies than Kafka
Supports fault-tolerant local state
Supports Windowing (Fixed, Sliding and
Session) and Stream-Stream / Stream-
Table Joins
Millisecond processing latency, no
micro-batching
At-least-once and exactly-once
processing guarantees
KTable<Integer, Customer> customers =
builder.stream(”customer");
KStream<Integer, Order> orders =
builder.stream(”order");
KStream<Integer, String> enriched =
orders.leftJoin(customers, …);
joined.to(”orderEnriched");
trucking_
driver
Kafka Broker
Java Application
Kafka Streams
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!11
KSQL - Overview
STREAM and TABLE as first-class
citizens
• STREAM = data in motion
• TABLE = collected state of a stream
Stream Processing with zero coding
using SQL-like language
Built on top of Kafka Streams
Interactive (CLI) and headless (command
file)
ksql> CREATE STREAM customer_s 
WITH (kafka_topic=‘customer', 
value_format=‘AVRO');
Message
----------------
Stream created
ksql> SELECT * FROM customer_s 
WHERE address->country = ‘Switzerland’;
...
trucking_
driver
Kafka Broker
KSQL Engine
Kafka Streams
KSQL CLI Commands
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!12
KSQL in Action – it’s demo time J
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
13
Demo (I) – Data Ingestion via MQTT
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!14
truck/nn/
position
mqtt to
kafka
truck_position kafkacat
Position &
Driving Info
Testdata-Generator adapted from
Hortonworks Tutorial
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
KSQL - Terminology
Stream
• “History”
• an unbounded sequence of structured
data ("facts")
• Facts in a stream are immutable
• new facts can be inserted to a
stream
• existing facts can never be updated
or deleted
• Streams can be created from a Kafka
topic or derived from an existing
stream
Table
• “State”
• a view of a stream, or another table,
and represents a collection of evolving
facts
• Facts in a table are mutable
• new facts can be inserted to the
table
• existing facts can be updated or
deleted
• Tables can be created from a Kafka
topic or derived from existing streams
and tables
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!15
Demo (II) – Create a STREAM on topic truck_position
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!16
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Position &
Driving Info
KSQL CLI
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
CREATE STREAM
Create a new stream, backed by a Kafka topic, with the specified columns and
properties
Supported column data types:
• BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING
• ARRAY<ArrayType>
• MAP<VARCHAR, ValueType>
• STRUCT<FieldName FieldType, ...>
Supports the following serialization formats: CSV, JSON, AVRO
KSQL adds the implicit columns ROWTIME and ROWKEY to every stream
CREATE STREAM stream_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!17
Start KSQL CLI
$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | .  ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!18
Create a STREAM on truck_position
ksql> CREATE STREAM truck_position_s 
(ts VARCHAR, 
truckId VARCHAR, 
driverId BIGINT, 
routeId BIGINT, 
eventType VARCHAR, 
latitude DOUBLE, 
longitude DOUBLE, 
correlationId VARCHAR) 
WITH (kafka_topic='truck_position', 
value_format=‘JSON');
Message
----------------
Stream created
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!19
Create a STREAM on truck_position
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!20
SELECT
Selects rows from a KSQL stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out
in the console
from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
[ LIMIT count ];
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!21
Use SELECT to browse from Stream
ksql> SELECT * FROM truck_position_s;
1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 |
Normal | 36.84 | -94.83 | -6187001306629414077
1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 |
Normal | 42.04 | -88.02 | -6187001306629414077
1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 |
Normal | 38.33 | -94.35 | -6187001306629414077
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
1539712101614 | truck/67/position | null | 67 | 11 | 160405074 |
Lane Departure | 38.98 | -92.53 | -6187001306629414077
1539712116450 | truck/18/position | null | 18 | 25 | 987179512 |
Overspeed | 40.76 | -88.77 | -6187001306629414077
1539712120102 | truck/31/position | null | 31 | 12 | 927636994 |
Unsafe following distance | 38.22 | -91.18 | -6187001306629414077
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!22
Demo (III) – CREATE AS … SELECT …
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!23
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
Position &
Driving Info
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the
result of the SELECT query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ]
ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!24
INSERT INTO … SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s
schema and key
If the schema and partitioning column are incompatible with the stream, then the
statement will return an error
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!25
CREATE AS … SELECT …
ksql> CREATE STREAM dangerous_driving_s 
WITH (kafka_topic= dangerous_driving_s', 
value_format='JSON') 
AS SELECT * FROM truck_position_s 
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!26
CREATE AS … SELECT …
ksql> select * from dangerous_driving_s;
1539712399201 | truck/67/position | null | 67 | 11 | 160405074 |
Unsafe following distance | 38.65 | -90.21 | -6187001306629414077
1539712416623 | truck/67/position | null | 67 | 11 | 160405074 |
Unsafe following distance | 39.1 | -94.59 | -6187001306629414077
1539712430051 | truck/18/position | null | 18 | 25 | 987179512 |
Lane Departure | 35.1 | -90.07 | -6187001306629414077
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!27
Demo (IV) – Aggregate and Window
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!28
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
Position &
Driving Info
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Windowing
Streams are unbounded
Some meaningful time frames to do
computations (i.e. aggregations) are needed
Computations over events done using windows of data
Fixed Window Sliding Window Session Window
Time
Stream of Data Window of Data
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!29
CREATE TABLE
Create a new table with the specified columns and properties
Supports same data types as CREATE STREAM
KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well
KSQL has currently the following requirements for creating a table from a Kafka topic
• message key must also be present as a field/column in the Kafka message value
• message key must be in VARCHAR aka STRING format
CREATE TABLE table_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!30
Functions
Scalar Functions
• ABS, ROUND, CEIL, FLOOR
• ARRAYCONTAINS
• CONCAT, SUBSTRING, TRIM
• EXTRACJSONFIELD
• GEO_DISTANCE
• LCASE, UCASE
• MASK, MASK_KEEP_LEFT,
MASK_KEEP_RIGHT, MASK_LEFT,
MASK_RIGHT
• RANDOM
• STRINGTOTIMESTAMP,
TIMESTAMPTOSTRING
Aggregate Functions
• COUNT
• MAX
• MIN
• SUM
• TOPK
• TOPKDISTINCT
User-Defined Functions (UDF) and User-
Defined Aggregate Functions (UDAF)
• Currently only supported using Java
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!31
SELECT COUNT … GROUP BY
ksql> CREATE TABLE dangerous_driving_count AS 
SELECT eventType, count(*) nof 
FROM dangerous_driving_s 
WINDOW TUMBLING (SIZE 30 SECONDS) 
GROUP BY eventType;
Message
----------------------------
Table created and running
ksql> SELECT TIMESTAMPTOSTRING(ROWTIME,'yyyy-MM-dd HH:mm:ss.SSS’),
eventType, nof
FROM dangerous_driving_count;;
2018-10-16 05:12:19.408 | Unsafe following distance | 1
2018-10-16 05:12:39.615 | Unsafe tail distance | 1
2018-10-16 05:12:43.155 | Overspeed | 1
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!32
Joining
Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!
Challenges of joining streams
1. Data streams need to be aligned as they
come because they have different timestamps
2. since streams are never-ending, the joins
must be limited; otherwise join will never end
3. join needs to produce results continuously as
there is no end to the data
Stream to Static (Table) Join
Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-to-
Static Join
Stream-to-
Stream
Join
Stream-to-
Stream
Join
Time
Time
Time
11/21/1833
Demo (V) – Join Table to enrich with Driver data
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!34
Truck
Driver
jdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
Position &
Driving Info
dangerous-
driving & driver
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Get changes from driver table
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"truck_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!35
Create Table with Driver State
ksql> CREATE TABLE driver_t 
(id BIGINT, 
first_name VARCHAR, 
last_name VARCHAR, 
available VARCHAR) 
WITH (kafka_topic='truck_driver', 
value_format='JSON', 
key='id');
Message
----------------
Table created
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!36
Create Table with truck_position joined to driver_t
ksql> CREATE STREAM dangerous_driving_and_driver_s 
WITH (kafka_topic='dangerous_driving_and_driver_s', 
value_format='JSON’, partitions=8) 
AS SELECT driverId, first_name, last_name, truckId, routeId,
eventtype, latitude, longitude 
FROM truck_position_s 
LEFT JOIN driver_t 
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane
Departure | 39.01 | -93.85
1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe
following distance | 39.0 | -93.65
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!37
(VI) – UDF for calculating Geohash
Truck
Driver
jdbc-to-
kafka
truck_
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
driving-geohash
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!38
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
UDF for calculating Geohash
Geohash is a geocoding which encodes a
geographic location into a short string of letters
and digits
hierarchical spatial data structure which
subdivides space into buckets of grid shape
Length Area width x height
1 5,009.4km x 4,992.6km
2 1,252.3km x 624.1km
3 156.5km x 156km
4 39.1km x 19.5km
12 3.7cm x 1.9cm
ksql> SELECT latitude, longitude, 
geohash(latitude, longitude, 4) 
FROM dangerous_driving_s;
38.31 | -91.07 | 9yz1
37.7 | -92.61 | 9ywn
34.78 | -92.31 | 9ynm
42.23 | -91.78 | 9zw8xw
...
http://geohash.gofreerange.com/
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!39
UDF for calculating Geohash
Geohash and join to some important messages for drivers
@UdfDescription(name = "geohash",
description = "returns the geohash for a given LatLong")
public class GeoHashUDF {
@Udf(description = "encode lat/long to geohash of specified length.")
public String geohash(final double latitude, final double longitude,
int length) {
return GeoHash.encodeHash(latitude, longitude, length);
}
@Udf(description = "encode lat/long to geohash.")
public String geohash(final double latitude, final double longitude) {
return GeoHash.encodeHash(latitude, longitude);
}
}
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!40
Summary
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
41
Summary
KSQL is another way to work with data in Kafka => you can (re)use some of your SQL
knowledge
Similar semantics to SQL, but is for queries on continuous, streaming data
Well-suited for structured data (there is the ”S” in KSQL)
KSQL is dependent on “Kafka core”
• KSQL consumes from Kafka broker
• KSQL produces to Kafka broker
KSQL runs as a Java application and can be deployed to various resource managers
Use Kafka Connect or any other Stream Data Integration tool to bring your data into
Kafka first
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!42
Choosing the Right API
• Java, c#, c++, scala,
phyton, node.js,
go, php …
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM …
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!43
Trivadis @ DOAG 2018
#opencompany
Booth: 3rd Floor – next to the escalator
We share our Know how!
Just come across, Live-Presentations
and documents archive
T-Shirts, Contest and much more
We look forward to your visit
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
44
Technology on its own won't help you.
You need to know how to use it properly.
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
45

KSQL - Stream Processing simplified!

  • 1.
    KSQL Stream Processing leichtgemacht! Guido Schmutz @gschmutz doag2018
  • 2.
    Agenda 1. Apache KafkaOverview 2. KSQL in Action – it’s demo time J 3. Summary 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!2
  • 3.
    KSQL - Let’stry it with a “real-life” sample Truck Driver jdbc-to- kafka truck_ driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck_ position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} Position & Driving Info dangerous driving by geo Stream dangerous- driving-geohash 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!3 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837 https://github.com/gschmutz/various-demos/tree/master/iot-truck-demo
  • 4.
    Guido Schmutz Working atTrivadis for more than 21 years Oracle Groundbreaker Ambassador & Oracle ACE Director Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!4
  • 5.
    With over 650specialists and IT experts in your region. Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!6 11/21/18 16 Trivadis branches and more than 650 employees Experience from more than 1,900 projects per year at over 800 customers 250 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable
  • 6.
    Apache Kafka Overview 11/21/18Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!7
  • 7.
    Apache Kafka –A Streaming Platform High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!8
  • 8.
    Apache Kafka –wait there is more! Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!9
  • 9.
    Kafka Connect -Overview Source Connector Sink Connector 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!10
  • 10.
    Kafka Streams -Overview Designed as a simple and lightweight library in Apache Kafka no other dependencies than Kafka Supports fault-tolerant local state Supports Windowing (Fixed, Sliding and Session) and Stream-Stream / Stream- Table Joins Millisecond processing latency, no micro-batching At-least-once and exactly-once processing guarantees KTable<Integer, Customer> customers = builder.stream(”customer"); KStream<Integer, Order> orders = builder.stream(”order"); KStream<Integer, String> enriched = orders.leftJoin(customers, …); joined.to(”orderEnriched"); trucking_ driver Kafka Broker Java Application Kafka Streams 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!11
  • 11.
    KSQL - Overview STREAMand TABLE as first-class citizens • STREAM = data in motion • TABLE = collected state of a stream Stream Processing with zero coding using SQL-like language Built on top of Kafka Streams Interactive (CLI) and headless (command file) ksql> CREATE STREAM customer_s WITH (kafka_topic=‘customer', value_format=‘AVRO'); Message ---------------- Stream created ksql> SELECT * FROM customer_s WHERE address->country = ‘Switzerland’; ... trucking_ driver Kafka Broker KSQL Engine Kafka Streams KSQL CLI Commands 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!12
  • 12.
    KSQL in Action– it’s demo time J 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! 13
  • 13.
    Demo (I) –Data Ingestion via MQTT 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!14 truck/nn/ position mqtt to kafka truck_position kafkacat Position & Driving Info Testdata-Generator adapted from Hortonworks Tutorial 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 14.
    KSQL - Terminology Stream •“History” • an unbounded sequence of structured data ("facts") • Facts in a stream are immutable • new facts can be inserted to a stream • existing facts can never be updated or deleted • Streams can be created from a Kafka topic or derived from an existing stream Table • “State” • a view of a stream, or another table, and represents a collection of evolving facts • Facts in a table are mutable • new facts can be inserted to the table • existing facts can be updated or deleted • Tables can be created from a Kafka topic or derived from existing streams and tables 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!15
  • 15.
    Demo (II) –Create a STREAM on topic truck_position 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!16 truck/nn/ position mqtt-to- kafka truck_ position Stream Position & Driving Info KSQL CLI 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 16.
    CREATE STREAM Create anew stream, backed by a Kafka topic, with the specified columns and properties Supported column data types: • BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING • ARRAY<ArrayType> • MAP<VARCHAR, ValueType> • STRUCT<FieldName FieldType, ...> Supports the following serialization formats: CSV, JSON, AVRO KSQL adds the implicit columns ROWTIME and ROWKEY to every stream CREATE STREAM stream_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] ); 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!17
  • 17.
    Start KSQL CLI $docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql> 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!18
  • 18.
    Create a STREAMon truck_position ksql> CREATE STREAM truck_position_s (ts VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationId VARCHAR) WITH (kafka_topic='truck_position', value_format=‘JSON'); Message ---------------- Stream created 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!19
  • 19.
    Create a STREAMon truck_position ksql> describe truck_position_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING) 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!20
  • 20.
    SELECT Selects rows froma KSQL stream or table Result of this statement will not be persisted in a Kafka topic and will only be printed out in the console from_item is one of the following: stream_name, table_name SELECT select_expr [, ...] FROM from_item [ LEFT JOIN join_table ON join_criteria ] [ WINDOW window_expression ] [ WHERE condition ] [ GROUP BY grouping_expression ] [ HAVING having_expression ] [ LIMIT count ]; 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!21
  • 21.
    Use SELECT tobrowse from Stream ksql> SELECT * FROM truck_position_s; 1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 | Normal | 36.84 | -94.83 | -6187001306629414077 1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 | Normal | 42.04 | -88.02 | -6187001306629414077 1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 | Normal | 38.33 | -94.35 | -6187001306629414077 ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal'; 1539712101614 | truck/67/position | null | 67 | 11 | 160405074 | Lane Departure | 38.98 | -92.53 | -6187001306629414077 1539712116450 | truck/18/position | null | 18 | 25 | 987179512 | Overspeed | 40.76 | -88.77 | -6187001306629414077 1539712120102 | truck/31/position | null | 31 | 12 | 927636994 | Unsafe following distance | 38.22 | -91.18 | -6187001306629414077 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!22
  • 22.
    Demo (III) –CREATE AS … SELECT … 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!23 detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream dangerous- driving Position & Driving Info {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 23.
    CREATE STREAM …AS SELECT … Create a new KSQL table along with the corresponding Kafka topic and stream the result of the SELECT query as a changelog into the topic WINDOW clause can only be used if the from_item is a stream CREATE STREAM stream_name [WITH ( property_name = expression [, ...] )] AS SELECT select_expr [, ...] FROM from_stream [ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria [ WHERE condition ] [PARTITION BY column_name]; 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!24
  • 24.
    INSERT INTO …SELECT … Stream the result of the SELECT query into an existing stream and its underlying topic schema and partitioning column produced by the query must match the stream’s schema and key If the schema and partitioning column are incompatible with the stream, then the statement will return an error CREATE STREAM stream_name ...; INSERT INTO stream_name SELECT select_expr [., ...] FROM from_stream [ WHERE condition ] [ PARTITION BY column_name ]; 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!25
  • 25.
    CREATE AS …SELECT … ksql> CREATE STREAM dangerous_driving_s WITH (kafka_topic= dangerous_driving_s', value_format='JSON') AS SELECT * FROM truck_position_s WHERE eventtype != 'Normal'; Message ---------------------------- Stream created and running 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!26
  • 26.
    CREATE AS …SELECT … ksql> select * from dangerous_driving_s; 1539712399201 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 38.65 | -90.21 | -6187001306629414077 1539712416623 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 39.1 | -94.59 | -6187001306629414077 1539712430051 | truck/18/position | null | 18 | 25 | 987179512 | Lane Departure | 35.1 | -90.07 | -6187001306629414077 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!27
  • 27.
    Demo (IV) –Aggregate and Window 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!28 detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count Position & Driving Info 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 28.
    Windowing Streams are unbounded Somemeaningful time frames to do computations (i.e. aggregations) are needed Computations over events done using windows of data Fixed Window Sliding Window Session Window Time Stream of Data Window of Data 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!29
  • 29.
    CREATE TABLE Create anew table with the specified columns and properties Supports same data types as CREATE STREAM KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well KSQL has currently the following requirements for creating a table from a Kafka topic • message key must also be present as a field/column in the Kafka message value • message key must be in VARCHAR aka STRING format CREATE TABLE table_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] ); 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!30
  • 30.
    Functions Scalar Functions • ABS,ROUND, CEIL, FLOOR • ARRAYCONTAINS • CONCAT, SUBSTRING, TRIM • EXTRACJSONFIELD • GEO_DISTANCE • LCASE, UCASE • MASK, MASK_KEEP_LEFT, MASK_KEEP_RIGHT, MASK_LEFT, MASK_RIGHT • RANDOM • STRINGTOTIMESTAMP, TIMESTAMPTOSTRING Aggregate Functions • COUNT • MAX • MIN • SUM • TOPK • TOPKDISTINCT User-Defined Functions (UDF) and User- Defined Aggregate Functions (UDAF) • Currently only supported using Java 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!31
  • 31.
    SELECT COUNT …GROUP BY ksql> CREATE TABLE dangerous_driving_count AS SELECT eventType, count(*) nof FROM dangerous_driving_s WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY eventType; Message ---------------------------- Table created and running ksql> SELECT TIMESTAMPTOSTRING(ROWTIME,'yyyy-MM-dd HH:mm:ss.SSS’), eventType, nof FROM dangerous_driving_count;; 2018-10-16 05:12:19.408 | Unsafe following distance | 1 2018-10-16 05:12:39.615 | Unsafe tail distance | 1 2018-10-16 05:12:43.155 | Overspeed | 1 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!32
  • 32.
    Joining Trivadis DOAG18: KSQL- Stream Processing leicht gemacht! Challenges of joining streams 1. Data streams need to be aligned as they come because they have different timestamps 2. since streams are never-ending, the joins must be limited; otherwise join will never end 3. join needs to produce results continuously as there is no end to the data Stream to Static (Table) Join Stream to Stream Join (one window join) Stream to Stream Join (two window join) Stream-to- Static Join Stream-to- Stream Join Stream-to- Stream Join Time Time Time 11/21/1833
  • 33.
    Demo (V) –Join Table to enrich with Driver data 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!34 Truck Driver jdbc-to- kafka truck- driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} Position & Driving Info dangerous- driving & driver 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 34.
    Get changes fromdriver table #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"truck_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }' 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!35
  • 35.
    Create Table withDriver State ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='truck_driver', value_format='JSON', key='id'); Message ---------------- Table created 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!36
  • 36.
    Create Table withtruck_position joined to driver_t ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON’, partitions=8) AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype, latitude, longitude FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverId = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure | 39.01 | -93.85 1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following distance | 39.0 | -93.65 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!37
  • 37.
    (VI) – UDFfor calculating Geohash Truck Driver jdbc-to- kafka truck_ driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck_ position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} Position & Driving Info dangerous driving by geo Stream dangerous- driving-geohash 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!38 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 38.
    UDF for calculatingGeohash Geohash is a geocoding which encodes a geographic location into a short string of letters and digits hierarchical spatial data structure which subdivides space into buckets of grid shape Length Area width x height 1 5,009.4km x 4,992.6km 2 1,252.3km x 624.1km 3 156.5km x 156km 4 39.1km x 19.5km 12 3.7cm x 1.9cm ksql> SELECT latitude, longitude, geohash(latitude, longitude, 4) FROM dangerous_driving_s; 38.31 | -91.07 | 9yz1 37.7 | -92.61 | 9ywn 34.78 | -92.31 | 9ynm 42.23 | -91.78 | 9zw8xw ... http://geohash.gofreerange.com/ 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!39
  • 39.
    UDF for calculatingGeohash Geohash and join to some important messages for drivers @UdfDescription(name = "geohash", description = "returns the geohash for a given LatLong") public class GeoHashUDF { @Udf(description = "encode lat/long to geohash of specified length.") public String geohash(final double latitude, final double longitude, int length) { return GeoHash.encodeHash(latitude, longitude, length); } @Udf(description = "encode lat/long to geohash.") public String geohash(final double latitude, final double longitude) { return GeoHash.encodeHash(latitude, longitude); } } 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!40
  • 40.
    Summary 11/21/18 Trivadis DOAG18: KSQL- Stream Processing leicht gemacht! 41
  • 41.
    Summary KSQL is anotherway to work with data in Kafka => you can (re)use some of your SQL knowledge Similar semantics to SQL, but is for queries on continuous, streaming data Well-suited for structured data (there is the ”S” in KSQL) KSQL is dependent on “Kafka core” • KSQL consumes from Kafka broker • KSQL produces to Kafka broker KSQL runs as a Java application and can be deployed to various resource managers Use Kafka Connect or any other Stream Data Integration tool to bring your data into Kafka first 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!42
  • 42.
    Choosing the RightAPI • Java, c#, c++, scala, phyton, node.js, go, php … • subscribe() • poll() • send() • flush() • Anything Kafka • Fluent Java API • mapValues() • filter() • flush() • Stream Analytics • SQL dialect • SELECT … FROM … • JOIN ... WHERE • GROUP BY • Stream Analytics Consumer, Producer API Kafka Streams KSQL • Declarative • Configuration • REST API • Out-of-the-box connectors • Stream Integration Kafka Connect Flexibility Simplicity Source: adapted from Confluent 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!43
  • 43.
    Trivadis @ DOAG2018 #opencompany Booth: 3rd Floor – next to the escalator We share our Know how! Just come across, Live-Presentations and documents archive T-Shirts, Contest and much more We look forward to your visit 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! 44
  • 44.
    Technology on itsown won't help you. You need to know how to use it properly. 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! 45