SlideShare a Scribd company logo
1 of 44
Download to read offline
KSQL
Stream Processing leicht gemacht!
Guido Schmutz
@gschmutz doag2018
Agenda
1. Apache Kafka Overview
2. KSQL in Action – it’s demo time J
3. Summary
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!2
KSQL - Let’s try it with a “real-life” sample
Truck
Driver
jdbc-to-
kafka
truck_
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
driving-geohash
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!3
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
https://github.com/gschmutz/various-demos/tree/master/iot-truck-demo
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle Groundbreaker Ambassador & Oracle ACE Director
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!4
With over 650 specialists and IT experts in your region.
Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!6 11/21/18
16 Trivadis branches and more than
650 employees
Experience from more than 1,900
projects per year at over 800
customers
250 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Apache Kafka Overview
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!7
Apache Kafka – A Streaming Platform
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!8
Apache Kafka – wait there is more!
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!9
Kafka Connect - Overview
Source
Connector
Sink
Connector
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!10
Kafka Streams - Overview
Designed as a simple and lightweight
library in Apache Kafka
no other dependencies than Kafka
Supports fault-tolerant local state
Supports Windowing (Fixed, Sliding and
Session) and Stream-Stream / Stream-
Table Joins
Millisecond processing latency, no
micro-batching
At-least-once and exactly-once
processing guarantees
KTable<Integer, Customer> customers =
builder.stream(”customer");
KStream<Integer, Order> orders =
builder.stream(”order");
KStream<Integer, String> enriched =
orders.leftJoin(customers, …);
joined.to(”orderEnriched");
trucking_
driver
Kafka Broker
Java Application
Kafka Streams
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!11
KSQL - Overview
STREAM and TABLE as first-class
citizens
• STREAM = data in motion
• TABLE = collected state of a stream
Stream Processing with zero coding
using SQL-like language
Built on top of Kafka Streams
Interactive (CLI) and headless (command
file)
ksql> CREATE STREAM customer_s 
WITH (kafka_topic=‘customer', 
value_format=‘AVRO');
Message
----------------
Stream created
ksql> SELECT * FROM customer_s 
WHERE address->country = ‘Switzerland’;
...
trucking_
driver
Kafka Broker
KSQL Engine
Kafka Streams
KSQL CLI Commands
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!12
KSQL in Action – it’s demo time J
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
13
Demo (I) – Data Ingestion via MQTT
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!14
truck/nn/
position
mqtt to
kafka
truck_position kafkacat
Position &
Driving Info
Testdata-Generator adapted from
Hortonworks Tutorial
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
KSQL - Terminology
Stream
• “History”
• an unbounded sequence of structured
data ("facts")
• Facts in a stream are immutable
• new facts can be inserted to a
stream
• existing facts can never be updated
or deleted
• Streams can be created from a Kafka
topic or derived from an existing
stream
Table
• “State”
• a view of a stream, or another table,
and represents a collection of evolving
facts
• Facts in a table are mutable
• new facts can be inserted to the
table
• existing facts can be updated or
deleted
• Tables can be created from a Kafka
topic or derived from existing streams
and tables
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!15
Demo (II) – Create a STREAM on topic truck_position
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!16
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Position &
Driving Info
KSQL CLI
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
CREATE STREAM
Create a new stream, backed by a Kafka topic, with the specified columns and
properties
Supported column data types:
• BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING
• ARRAY<ArrayType>
• MAP<VARCHAR, ValueType>
• STRUCT<FieldName FieldType, ...>
Supports the following serialization formats: CSV, JSON, AVRO
KSQL adds the implicit columns ROWTIME and ROWKEY to every stream
CREATE STREAM stream_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!17
Start KSQL CLI
$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | .  ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!18
Create a STREAM on truck_position
ksql> CREATE STREAM truck_position_s 
(ts VARCHAR, 
truckId VARCHAR, 
driverId BIGINT, 
routeId BIGINT, 
eventType VARCHAR, 
latitude DOUBLE, 
longitude DOUBLE, 
correlationId VARCHAR) 
WITH (kafka_topic='truck_position', 
value_format=‘JSON');
Message
----------------
Stream created
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!19
Create a STREAM on truck_position
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!20
SELECT
Selects rows from a KSQL stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out
in the console
from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
[ LIMIT count ];
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!21
Use SELECT to browse from Stream
ksql> SELECT * FROM truck_position_s;
1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 |
Normal | 36.84 | -94.83 | -6187001306629414077
1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 |
Normal | 42.04 | -88.02 | -6187001306629414077
1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 |
Normal | 38.33 | -94.35 | -6187001306629414077
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
1539712101614 | truck/67/position | null | 67 | 11 | 160405074 |
Lane Departure | 38.98 | -92.53 | -6187001306629414077
1539712116450 | truck/18/position | null | 18 | 25 | 987179512 |
Overspeed | 40.76 | -88.77 | -6187001306629414077
1539712120102 | truck/31/position | null | 31 | 12 | 927636994 |
Unsafe following distance | 38.22 | -91.18 | -6187001306629414077
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!22
Demo (III) – CREATE AS … SELECT …
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!23
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
Position &
Driving Info
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the
result of the SELECT query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ]
ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!24
INSERT INTO … SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s
schema and key
If the schema and partitioning column are incompatible with the stream, then the
statement will return an error
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!25
CREATE AS … SELECT …
ksql> CREATE STREAM dangerous_driving_s 
WITH (kafka_topic= dangerous_driving_s', 
value_format='JSON') 
AS SELECT * FROM truck_position_s 
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!26
CREATE AS … SELECT …
ksql> select * from dangerous_driving_s;
1539712399201 | truck/67/position | null | 67 | 11 | 160405074 |
Unsafe following distance | 38.65 | -90.21 | -6187001306629414077
1539712416623 | truck/67/position | null | 67 | 11 | 160405074 |
Unsafe following distance | 39.1 | -94.59 | -6187001306629414077
1539712430051 | truck/18/position | null | 18 | 25 | 987179512 |
Lane Departure | 35.1 | -90.07 | -6187001306629414077
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!27
Demo (IV) – Aggregate and Window
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!28
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
Position &
Driving Info
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Windowing
Streams are unbounded
Some meaningful time frames to do
computations (i.e. aggregations) are needed
Computations over events done using windows of data
Fixed Window Sliding Window Session Window
Time
Stream of Data Window of Data
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!29
CREATE TABLE
Create a new table with the specified columns and properties
Supports same data types as CREATE STREAM
KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well
KSQL has currently the following requirements for creating a table from a Kafka topic
• message key must also be present as a field/column in the Kafka message value
• message key must be in VARCHAR aka STRING format
CREATE TABLE table_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!30
Functions
Scalar Functions
• ABS, ROUND, CEIL, FLOOR
• ARRAYCONTAINS
• CONCAT, SUBSTRING, TRIM
• EXTRACJSONFIELD
• GEO_DISTANCE
• LCASE, UCASE
• MASK, MASK_KEEP_LEFT,
MASK_KEEP_RIGHT, MASK_LEFT,
MASK_RIGHT
• RANDOM
• STRINGTOTIMESTAMP,
TIMESTAMPTOSTRING
Aggregate Functions
• COUNT
• MAX
• MIN
• SUM
• TOPK
• TOPKDISTINCT
User-Defined Functions (UDF) and User-
Defined Aggregate Functions (UDAF)
• Currently only supported using Java
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!31
SELECT COUNT … GROUP BY
ksql> CREATE TABLE dangerous_driving_count AS 
SELECT eventType, count(*) nof 
FROM dangerous_driving_s 
WINDOW TUMBLING (SIZE 30 SECONDS) 
GROUP BY eventType;
Message
----------------------------
Table created and running
ksql> SELECT TIMESTAMPTOSTRING(ROWTIME,'yyyy-MM-dd HH:mm:ss.SSS’),
eventType, nof
FROM dangerous_driving_count;;
2018-10-16 05:12:19.408 | Unsafe following distance | 1
2018-10-16 05:12:39.615 | Unsafe tail distance | 1
2018-10-16 05:12:43.155 | Overspeed | 1
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!32
Joining
Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!
Challenges of joining streams
1. Data streams need to be aligned as they
come because they have different timestamps
2. since streams are never-ending, the joins
must be limited; otherwise join will never end
3. join needs to produce results continuously as
there is no end to the data
Stream to Static (Table) Join
Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-to-
Static Join
Stream-to-
Stream
Join
Stream-to-
Stream
Join
Time
Time
Time
11/21/1833
Demo (V) – Join Table to enrich with Driver data
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!34
Truck
Driver
jdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
Position &
Driving Info
dangerous-
driving & driver
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Get changes from driver table
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"truck_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!35
Create Table with Driver State
ksql> CREATE TABLE driver_t 
(id BIGINT, 
first_name VARCHAR, 
last_name VARCHAR, 
available VARCHAR) 
WITH (kafka_topic='truck_driver', 
value_format='JSON', 
key='id');
Message
----------------
Table created
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!36
Create Table with truck_position joined to driver_t
ksql> CREATE STREAM dangerous_driving_and_driver_s 
WITH (kafka_topic='dangerous_driving_and_driver_s', 
value_format='JSON’, partitions=8) 
AS SELECT driverId, first_name, last_name, truckId, routeId,
eventtype, latitude, longitude 
FROM truck_position_s 
LEFT JOIN driver_t 
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane
Departure | 39.01 | -93.85
1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe
following distance | 39.0 | -93.65
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!37
(VI) – UDF for calculating Geohash
Truck
Driver
jdbc-to-
kafka
truck_
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Stream
dangerous-
driving
count_by_
eventType
Table
dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
driving-geohash
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!38
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
UDF for calculating Geohash
Geohash is a geocoding which encodes a
geographic location into a short string of letters
and digits
hierarchical spatial data structure which
subdivides space into buckets of grid shape
Length Area width x height
1 5,009.4km x 4,992.6km
2 1,252.3km x 624.1km
3 156.5km x 156km
4 39.1km x 19.5km
12 3.7cm x 1.9cm
ksql> SELECT latitude, longitude, 
geohash(latitude, longitude, 4) 
FROM dangerous_driving_s;
38.31 | -91.07 | 9yz1
37.7 | -92.61 | 9ywn
34.78 | -92.31 | 9ynm
42.23 | -91.78 | 9zw8xw
...
http://geohash.gofreerange.com/
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!39
UDF for calculating Geohash
Geohash and join to some important messages for drivers
@UdfDescription(name = "geohash",
description = "returns the geohash for a given LatLong")
public class GeoHashUDF {
@Udf(description = "encode lat/long to geohash of specified length.")
public String geohash(final double latitude, final double longitude,
int length) {
return GeoHash.encodeHash(latitude, longitude, length);
}
@Udf(description = "encode lat/long to geohash.")
public String geohash(final double latitude, final double longitude) {
return GeoHash.encodeHash(latitude, longitude);
}
}
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!40
Summary
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
41
Summary
KSQL is another way to work with data in Kafka => you can (re)use some of your SQL
knowledge
Similar semantics to SQL, but is for queries on continuous, streaming data
Well-suited for structured data (there is the ”S” in KSQL)
KSQL is dependent on “Kafka core”
• KSQL consumes from Kafka broker
• KSQL produces to Kafka broker
KSQL runs as a Java application and can be deployed to various resource managers
Use Kafka Connect or any other Stream Data Integration tool to bring your data into
Kafka first
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!42
Choosing the Right API
• Java, c#, c++, scala,
phyton, node.js,
go, php …
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM …
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent
11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!43
Trivadis @ DOAG 2018
#opencompany
Booth: 3rd Floor – next to the escalator
We share our Know how!
Just come across, Live-Presentations
and documents archive
T-Shirts, Contest and much more
We look forward to your visit
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
44
Technology on its own won't help you.
You need to know how to use it properly.
11/21/18
Trivadis DOAG18: KSQL - Stream Processing leicht
gemacht!
45

More Related Content

What's hot

Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 

What's hot (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka Security
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 

Similar to KSQL - Stream Processing simplified!

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
confluent
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 

Similar to KSQL - Stream Processing simplified! (20)

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 
Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER
Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADERBuilding a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER
Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Building a Big Data Machine Learning Platform
Building a Big Data Machine Learning PlatformBuilding a Big Data Machine Learning Platform
Building a Big Data Machine Learning Platform
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 

More from Guido Schmutz

Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Recently uploaded

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 

Recently uploaded (20)

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 

KSQL - Stream Processing simplified!

  • 1. KSQL Stream Processing leicht gemacht! Guido Schmutz @gschmutz doag2018
  • 2. Agenda 1. Apache Kafka Overview 2. KSQL in Action – it’s demo time J 3. Summary 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!2
  • 3. KSQL - Let’s try it with a “real-life” sample Truck Driver jdbc-to- kafka truck_ driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck_ position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} Position & Driving Info dangerous driving by geo Stream dangerous- driving-geohash 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!3 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837 https://github.com/gschmutz/various-demos/tree/master/iot-truck-demo
  • 4. Guido Schmutz Working at Trivadis for more than 21 years Oracle Groundbreaker Ambassador & Oracle ACE Director Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!4
  • 5. With over 650 specialists and IT experts in your region. Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!6 11/21/18 16 Trivadis branches and more than 650 employees Experience from more than 1,900 projects per year at over 800 customers 250 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable
  • 6. Apache Kafka Overview 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!7
  • 7. Apache Kafka – A Streaming Platform High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!8
  • 8. Apache Kafka – wait there is more! Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!9
  • 9. Kafka Connect - Overview Source Connector Sink Connector 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!10
  • 10. Kafka Streams - Overview Designed as a simple and lightweight library in Apache Kafka no other dependencies than Kafka Supports fault-tolerant local state Supports Windowing (Fixed, Sliding and Session) and Stream-Stream / Stream- Table Joins Millisecond processing latency, no micro-batching At-least-once and exactly-once processing guarantees KTable<Integer, Customer> customers = builder.stream(”customer"); KStream<Integer, Order> orders = builder.stream(”order"); KStream<Integer, String> enriched = orders.leftJoin(customers, …); joined.to(”orderEnriched"); trucking_ driver Kafka Broker Java Application Kafka Streams 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!11
  • 11. KSQL - Overview STREAM and TABLE as first-class citizens • STREAM = data in motion • TABLE = collected state of a stream Stream Processing with zero coding using SQL-like language Built on top of Kafka Streams Interactive (CLI) and headless (command file) ksql> CREATE STREAM customer_s WITH (kafka_topic=‘customer', value_format=‘AVRO'); Message ---------------- Stream created ksql> SELECT * FROM customer_s WHERE address->country = ‘Switzerland’; ... trucking_ driver Kafka Broker KSQL Engine Kafka Streams KSQL CLI Commands 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!12
  • 12. KSQL in Action – it’s demo time J 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! 13
  • 13. Demo (I) – Data Ingestion via MQTT 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!14 truck/nn/ position mqtt to kafka truck_position kafkacat Position & Driving Info Testdata-Generator adapted from Hortonworks Tutorial 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 14. KSQL - Terminology Stream • “History” • an unbounded sequence of structured data ("facts") • Facts in a stream are immutable • new facts can be inserted to a stream • existing facts can never be updated or deleted • Streams can be created from a Kafka topic or derived from an existing stream Table • “State” • a view of a stream, or another table, and represents a collection of evolving facts • Facts in a table are mutable • new facts can be inserted to the table • existing facts can be updated or deleted • Tables can be created from a Kafka topic or derived from existing streams and tables 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!15
  • 15. Demo (II) – Create a STREAM on topic truck_position 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!16 truck/nn/ position mqtt-to- kafka truck_ position Stream Position & Driving Info KSQL CLI 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 16. CREATE STREAM Create a new stream, backed by a Kafka topic, with the specified columns and properties Supported column data types: • BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING • ARRAY<ArrayType> • MAP<VARCHAR, ValueType> • STRUCT<FieldName FieldType, ...> Supports the following serialization formats: CSV, JSON, AVRO KSQL adds the implicit columns ROWTIME and ROWKEY to every stream CREATE STREAM stream_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] ); 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!17
  • 17. Start KSQL CLI $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql> 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!18
  • 18. Create a STREAM on truck_position ksql> CREATE STREAM truck_position_s (ts VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationId VARCHAR) WITH (kafka_topic='truck_position', value_format=‘JSON'); Message ---------------- Stream created 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!19
  • 19. Create a STREAM on truck_position ksql> describe truck_position_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING) 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!20
  • 20. SELECT Selects rows from a KSQL stream or table Result of this statement will not be persisted in a Kafka topic and will only be printed out in the console from_item is one of the following: stream_name, table_name SELECT select_expr [, ...] FROM from_item [ LEFT JOIN join_table ON join_criteria ] [ WINDOW window_expression ] [ WHERE condition ] [ GROUP BY grouping_expression ] [ HAVING having_expression ] [ LIMIT count ]; 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!21
  • 21. Use SELECT to browse from Stream ksql> SELECT * FROM truck_position_s; 1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 | Normal | 36.84 | -94.83 | -6187001306629414077 1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 | Normal | 42.04 | -88.02 | -6187001306629414077 1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 | Normal | 38.33 | -94.35 | -6187001306629414077 ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal'; 1539712101614 | truck/67/position | null | 67 | 11 | 160405074 | Lane Departure | 38.98 | -92.53 | -6187001306629414077 1539712116450 | truck/18/position | null | 18 | 25 | 987179512 | Overspeed | 40.76 | -88.77 | -6187001306629414077 1539712120102 | truck/31/position | null | 31 | 12 | 927636994 | Unsafe following distance | 38.22 | -91.18 | -6187001306629414077 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!22
  • 22. Demo (III) – CREATE AS … SELECT … 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!23 detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream dangerous- driving Position & Driving Info {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 23. CREATE STREAM … AS SELECT … Create a new KSQL table along with the corresponding Kafka topic and stream the result of the SELECT query as a changelog into the topic WINDOW clause can only be used if the from_item is a stream CREATE STREAM stream_name [WITH ( property_name = expression [, ...] )] AS SELECT select_expr [, ...] FROM from_stream [ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria [ WHERE condition ] [PARTITION BY column_name]; 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!24
  • 24. INSERT INTO … SELECT … Stream the result of the SELECT query into an existing stream and its underlying topic schema and partitioning column produced by the query must match the stream’s schema and key If the schema and partitioning column are incompatible with the stream, then the statement will return an error CREATE STREAM stream_name ...; INSERT INTO stream_name SELECT select_expr [., ...] FROM from_stream [ WHERE condition ] [ PARTITION BY column_name ]; 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!25
  • 25. CREATE AS … SELECT … ksql> CREATE STREAM dangerous_driving_s WITH (kafka_topic= dangerous_driving_s', value_format='JSON') AS SELECT * FROM truck_position_s WHERE eventtype != 'Normal'; Message ---------------------------- Stream created and running 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!26
  • 26. CREATE AS … SELECT … ksql> select * from dangerous_driving_s; 1539712399201 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 38.65 | -90.21 | -6187001306629414077 1539712416623 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 39.1 | -94.59 | -6187001306629414077 1539712430051 | truck/18/position | null | 18 | 25 | 987179512 | Lane Departure | 35.1 | -90.07 | -6187001306629414077 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!27
  • 27. Demo (IV) – Aggregate and Window 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!28 detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count Position & Driving Info 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 28. Windowing Streams are unbounded Some meaningful time frames to do computations (i.e. aggregations) are needed Computations over events done using windows of data Fixed Window Sliding Window Session Window Time Stream of Data Window of Data 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!29
  • 29. CREATE TABLE Create a new table with the specified columns and properties Supports same data types as CREATE STREAM KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well KSQL has currently the following requirements for creating a table from a Kafka topic • message key must also be present as a field/column in the Kafka message value • message key must be in VARCHAR aka STRING format CREATE TABLE table_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] ); 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!30
  • 30. Functions Scalar Functions • ABS, ROUND, CEIL, FLOOR • ARRAYCONTAINS • CONCAT, SUBSTRING, TRIM • EXTRACJSONFIELD • GEO_DISTANCE • LCASE, UCASE • MASK, MASK_KEEP_LEFT, MASK_KEEP_RIGHT, MASK_LEFT, MASK_RIGHT • RANDOM • STRINGTOTIMESTAMP, TIMESTAMPTOSTRING Aggregate Functions • COUNT • MAX • MIN • SUM • TOPK • TOPKDISTINCT User-Defined Functions (UDF) and User- Defined Aggregate Functions (UDAF) • Currently only supported using Java 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!31
  • 31. SELECT COUNT … GROUP BY ksql> CREATE TABLE dangerous_driving_count AS SELECT eventType, count(*) nof FROM dangerous_driving_s WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY eventType; Message ---------------------------- Table created and running ksql> SELECT TIMESTAMPTOSTRING(ROWTIME,'yyyy-MM-dd HH:mm:ss.SSS’), eventType, nof FROM dangerous_driving_count;; 2018-10-16 05:12:19.408 | Unsafe following distance | 1 2018-10-16 05:12:39.615 | Unsafe tail distance | 1 2018-10-16 05:12:43.155 | Overspeed | 1 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!32
  • 32. Joining Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! Challenges of joining streams 1. Data streams need to be aligned as they come because they have different timestamps 2. since streams are never-ending, the joins must be limited; otherwise join will never end 3. join needs to produce results continuously as there is no end to the data Stream to Static (Table) Join Stream to Stream Join (one window join) Stream to Stream Join (two window join) Stream-to- Static Join Stream-to- Stream Join Stream-to- Stream Join Time Time Time 11/21/1833
  • 33. Demo (V) – Join Table to enrich with Driver data 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!34 Truck Driver jdbc-to- kafka truck- driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} Position & Driving Info dangerous- driving & driver 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 34. Get changes from driver table #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"truck_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }' 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!35
  • 35. Create Table with Driver State ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='truck_driver', value_format='JSON', key='id'); Message ---------------- Table created 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!36
  • 36. Create Table with truck_position joined to driver_t ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON’, partitions=8) AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype, latitude, longitude FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverId = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure | 39.01 | -93.85 1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following distance | 39.0 | -93.65 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!37
  • 37. (VI) – UDF for calculating Geohash Truck Driver jdbc-to- kafka truck_ driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck_ position Stream Stream dangerous- driving count_by_ eventType Table dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} Position & Driving Info dangerous driving by geo Stream dangerous- driving-geohash 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!38 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 38. UDF for calculating Geohash Geohash is a geocoding which encodes a geographic location into a short string of letters and digits hierarchical spatial data structure which subdivides space into buckets of grid shape Length Area width x height 1 5,009.4km x 4,992.6km 2 1,252.3km x 624.1km 3 156.5km x 156km 4 39.1km x 19.5km 12 3.7cm x 1.9cm ksql> SELECT latitude, longitude, geohash(latitude, longitude, 4) FROM dangerous_driving_s; 38.31 | -91.07 | 9yz1 37.7 | -92.61 | 9ywn 34.78 | -92.31 | 9ynm 42.23 | -91.78 | 9zw8xw ... http://geohash.gofreerange.com/ 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!39
  • 39. UDF for calculating Geohash Geohash and join to some important messages for drivers @UdfDescription(name = "geohash", description = "returns the geohash for a given LatLong") public class GeoHashUDF { @Udf(description = "encode lat/long to geohash of specified length.") public String geohash(final double latitude, final double longitude, int length) { return GeoHash.encodeHash(latitude, longitude, length); } @Udf(description = "encode lat/long to geohash.") public String geohash(final double latitude, final double longitude) { return GeoHash.encodeHash(latitude, longitude); } } 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!40
  • 40. Summary 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! 41
  • 41. Summary KSQL is another way to work with data in Kafka => you can (re)use some of your SQL knowledge Similar semantics to SQL, but is for queries on continuous, streaming data Well-suited for structured data (there is the ”S” in KSQL) KSQL is dependent on “Kafka core” • KSQL consumes from Kafka broker • KSQL produces to Kafka broker KSQL runs as a Java application and can be deployed to various resource managers Use Kafka Connect or any other Stream Data Integration tool to bring your data into Kafka first 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!42
  • 42. Choosing the Right API • Java, c#, c++, scala, phyton, node.js, go, php … • subscribe() • poll() • send() • flush() • Anything Kafka • Fluent Java API • mapValues() • filter() • flush() • Stream Analytics • SQL dialect • SELECT … FROM … • JOIN ... WHERE • GROUP BY • Stream Analytics Consumer, Producer API Kafka Streams KSQL • Declarative • Configuration • REST API • Out-of-the-box connectors • Stream Integration Kafka Connect Flexibility Simplicity Source: adapted from Confluent 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht!43
  • 43. Trivadis @ DOAG 2018 #opencompany Booth: 3rd Floor – next to the escalator We share our Know how! Just come across, Live-Presentations and documents archive T-Shirts, Contest and much more We look forward to your visit 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! 44
  • 44. Technology on its own won't help you. You need to know how to use it properly. 11/21/18 Trivadis DOAG18: KSQL - Stream Processing leicht gemacht! 45