SlideShare a Scribd company logo
1 of 41
Download to read offline
http://guidoschmutz@wordpress.com@gschmutz
ksqlDB
Stream Processing simplified!
Guido Schmutz
Guido
Working at Trivadis for more than 23 years
Consultant, Trainer, Platform Architect for Java,
Oracle, SOA and Big Data / Fast Data
Oracle Groundbreaker Ambassador & Oracle ACE
Director
@gschmutz guidoschmutz.wordpress.com
210th
edition
Apache Kafka – scalable message processing and more!
Source
Connector
trucking_
driver
Kafka Cluster
Sink
Connector
Stream
Processing
Schema
Registry
Kafka Kafka
ksqlDB
Kafka Streams - Overview
• Designed as a simple and lightweight library in
Apache Kafka
• Part of Apache Kafka project
(https://kafka.apache.org/documentation/streams/)
• no other dependencies than Kafka
• Supports fault-tolerant local state
• Supports Windowing (Fixed, Sliding and Session)
and Stream-Stream / Stream-Table Joins
• Millisecond processing latency, no micro-batching
• At-least-once and exactly-once processing
guarantees
BD-STREAM – Introducing Stream Processing5
KTable<Integer, Customer> customers =
builder.stream(”customer");
KStream<Integer, Order> orders =
builder.stream(”order");
KStream<Integer, String> enriched =
orders.leftJoin(customers, …);
enriched.to(”orderEnriched");
trucking_
driver
Kafka Cluster
Java Application
Kafka Streams
ksqlDB: Streaming SQL Engine for Apache Kafka
• Separate open source project, not part of Apache
Kafka (http://ksqldb.io)
• simplest way to process streams of data in real-time
• Enables stream processing with zero coding
• Use a familiar language (SQL dialect)
• Powered by Kafka and Kafka Streams
• scalable, distributed, mature
• Create materialized views over streams
• Receive real-time push updates or pull current state
on demand
• Kafka native - All you need is Kafka
ksqlDB
trucking_
driver
Kafka Cluster
KSQL Engine
Kafka Streams
KSQL CLI Commands
ksqlDB Concepts
Source: ksqlDB Documentation
Terminology
Stream
• an unbounded sequence of structured data
(“facts”)
• Facts in a stream are immutable: new facts can
be inserted to a stream, but existing facts can
never be updated or deleted
• Streams can be created from a Kafka topic or
derived from an existing stream
• A stream’s underlying data is durably stored
(persisted) within a Kafka topic on the Kafka
brokers
Table
• materialized View of events with only the
latest value for a key
• a view of a stream, or another table, and
represents a collection of evolving facts
• the equivalent of a traditional database table
but enriched by streaming semantics such as
windowing
• Facts in a table are mutable: new facts can be
inserted to the table, and existing facts can be
updated or deleted
• Tables can be created from a Kafka topic or
derived from existing streams and tables
Demo 1
19.11 – 13:00 – Kafka Livedemo: Umsetzung einer Streaminglösung #slideless
Demo 1 - ksqlDB CLI
$ docker exec -it ksqldb-cli ksql http://ksqldb-server-1:8088
===========================================
= _ _ ____ ____ =
= | | _____ __ _| | _ | __ ) =
= | |/ / __|/ _` | | | | | _  =
= | <__  (_| | | |_| | |_) | =
= |_|____/__, |_|____/|____/ =
= |_| =
= Event Streaming Database purpose-built =
= for stream processing apps =
===========================================
Copyright 2017-2020 Confluent Inc.
CLI v0.13.0, Server v0.13.0 located at http://ksqldb-server-1:8088
Server Status: RUNNING
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
Demo 1 – Create a STREAM on vehicle_tracking_sysA
ksql> CREATE STREAM IF NOT EXISTS vehicle_tracking_sysA_s(
mqttTopic VARCHAR KEY,
timestamp VARCHAR,
truckId VARCHAR,
driverId BIGINT,
routeId BIGINT,
eventType VARCHAR,
latitude DOUBLE,
longitude DOUBLE,
correlationId VARCHAR)
WITH (kafka_topic='vehicle_tracking_sysA’,
value_format='JSON');
Message
----------------
Stream created
Demo 1 – PUSH Query on vehicle_tracking_sysA_s
ksql> SELECT *
FROM vehicle_tracking_sysA_s
EMIT CHANGES;
+-------------+-------------+-------------+-----------+-------------+-------------+-------------+-------------+-------------+
|MQTTTOPIC |TIMESTAMP |TRUCKID |DRIVERID |ROUTEID |EVENTTYPE |LATITUDE |LONGITUDE |CORRELATIONID|
+-------------+-------------+-------------+-----------+-------------+-------------+-------------+-------------+-------------+
|truck/17/posi|1605557915574|17 |17 |803014426 |Normal |37.03 |-94.58 |3556373681251|
|tion | | | | | | | |424186 |
|truck/39/posi|1605557916344|39 |32 |371182829 |Normal |35.21 |-90.37 |3556373681251|
|tion | | | | | | | |424186 |
|truck/38/posi|1605557916774|38 |10 |1927624662 |Normal |37.7 |-92.61 |3556373681251|
|tion | | | | | | | |424186 |
|truck/37/posi|1605557917924|37 |30 |160779139 |Normal |36.17 |-95.99 |3556373681251|
|tion | | | | | | | |424186 |
.
.
.
Demo 1 – Create a new STREAM with refinement to AVRO
ksql> CREATE STREAM IF NOT EXISTS vehicle_tracking_refined_s
WITH (kafka_topic='vehicle_tracking_refined’,
value_format='AVRO’,
value_avro_schema_full_name=
'com.trivadis.avro.VehicleTrackingRefined’)
AS SELECT truckId AS ROWKEY
, 'Tracking_SysA' AS source
, timestamp
, AS_VALUE(truckId) AS vehicleId
, driverId
, routeId
, eventType
, latitude
, longitude
, correlationId
FROM vehicle_tracking_sysA_s
PARTITION BY truckId
EMIT CHANGES;
Demo 1 – Create a STREAM on vehicle_tracking_sysB
ksql> CREATE STREAM IF NOT EXISTS vehicle_tracking_sysB_s (
ROWKEY VARCHAR KEY,
system VARCHAR,
timestamp VARCHAR,
vehicleId VARCHAR,
driverId BIGINT,
routeId BIGINT,
eventType VARCHAR,
latLong VARCHAR,
correlationId VARCHAR)
WITH (kafka_topic='vehicle_tracking_sysB’,
value_format='DELIMITED');
Demo 1 – INSERT INTO existing stream SELECT form other
stream
ksql> INSERT INTO vehicle_tracking_refined_s
SELECT ROWKEY
, 'Tracking_SysB' AS source
, timestamp
, vehicleId
, driverId
, routeId
, eventType
, cast(split(latLong,':')[1] as DOUBLE) as latitude
, CAST(split(latLong,':')[2] AS DOUBLE) as longitude
, correlationId
FROM vehicle_tracking_sysB_s
EMIT CHANGES;
Demo 1 – SELECT from stream vehicle_tracking_refined_s
ksql> SELECT * FROM vehicle_tracking_refined_s;
+------------+------------+------------+----------+----------+------------+------------+----------+----------+------------+
|ROWKEY |SOURCE |TIMESTAMP |VEHICLEID |DRIVERID |ROUTEID |EVENTTYPE |LATITUDE |LONGITUDE |CORRELATIONI|
| | | | | | | | | |D |
+------------+------------+------------+----------+----------+------------+------------+----------+----------+------------+
|38 |Tracking_Sys|160555842863|38 |10 |1927624662 |Normal |37.51 |-92.89 |355637368125|
| |A |4 | | | | | | |1424186 |
|62 |Tracking_Sys|160555842856|62 |26 |1594289134 |Normal |37.09 |-94.23 |505825961956|
| |B |8 | | | | | | |6029956 |
|51 |Tracking_Sys|160555842847|51 |13 |1198242881 |Normal |42.04 |-88.02 |505825961956|
| |B |9 | | | | | | |6029956 |
|62 |Tracking_Sys|160555842856|62 |26 |1594289134 |Normal |37.09 |-94.23 |505825961956|
| |B |8 | | | | | | |6029956 |
|51 |Tracking_Sys|160555842847|51 |13 |1198242881 |Normal |42.04 |-88.02 |505825961956|
| |B |9 | | | | | | |6029956 |
.
.
.
CREATE STREAM
Create a new stream, backed by a Kafka topic, with the specified columns and properties
Supported column data types:
• BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING
• ARRAY<ArrayType>
• MAP<VARCHAR, ValueType>
• STRUCT<FieldName FieldType, ...>
Supports the following serialization formats: CSV, JSON, AVRO
• KSQL adds the implicit columns ROWTIME and ROWKEY to every stream
CREATE STREAM stream_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );
19
SELECT (Push Query)
Push a continuous stream of updates to the ksqlDB stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out in the console
This is a continuous query, to stop the query in the CLI press CTRL-C
• from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
EMIT output_refinement
[ LIMIT count ];
Functions
Scalar Functions
• ABS, ROUND, CEIL, FLOOR
• ARRAYCONTAINS
• CONCAT, SUBSTRING, TRIM
• EXTRACJSONFIELD
• GEO_DISTANCE
• LCASE, UCASE
• MASK, MASK_KEEP_LEFT, MASK_KEEP_RIGHT,
MASK_LEFT, MASK_RIGHT
• RANDOM
• STRINGTOTIMESTAMP, TIMESTAMPTOSTRING
Aggregate Functions
• COUNT
• MAX
• MIN
• SUM
• TOPK
• TOPKDISTINCT
User-Defined Functions (UDF) and User-Defined
Aggregate Functions (UDAF)
• Currently only supported using Java
21
https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/functions/
CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the result of the SELECT
query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];
INSERT INTO … AS SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s schema and key
If the schema and partitioning column are incompatible with the stream, then the statement will
return an error
stream_name and from_item must both
refer to a Stream. Tables are not supported!
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];
Two Types of Queries
Pull queries
• allow you to fetch the current state of a materialized
view
• Because materialized views are updated incrementally
as new events arrive, pull queries run with predictably
low latency
Push queries
• enable you to subscribe to materialized view updates
and stream changes
• When new events arrive, push queries emit
refinements, so your event streaming applications can
react to new information in real-time
SELECT …
FROM vehicle_position_s
EMIT CHANGES;
SELECT …
FROM vehicle_position_t
WHERE vehicleId = 10;
Demo 2
Demo 2 – Pull Query on vehicle_tracking
ksql> CREATE TABLE IF NOT EXISTS vehicle_tracking_refined_t
WITH (kafka_topic = 'vehicle_tracking_refined_t’)
AS SELECT CAST(vehicleId AS BIGINT) vehicleId
, latest_by_offset(driverId) driverId
, latest_by_offset(source) source
, latest_by_offset(eventType) eventType
, latest_by_offset(latitude) latitude
, latest_by_offset(longitude) longitude
FROM vehicle_tracking_refined_s
GROUP BY CAST(vehicleId AS BIGINT)
EMIT CHANGES;
ksql> SELECT * FROM vehicle_tracking_refined_t
WHERE vehicleId = 42;
SELECT (Pull Query)
Pulls the current value from the materialized table and terminates
The result of this statement isn't persisted in a Kafka topic and is printed out only in the console
Pull queries enable to fetch the current state of a materialized view
They're a great match for request/response flows and can be used with ksqlDB REST API
SELECT select_expr [, ...]
FROM aggregate_table
WHERE key_column = key
[ AND window_bounds ];
Demo 3
Demo 3 – SELECT non-normal driving behavior
ksql> SELECT * FROM vehicle_tracking_refined_s
WHERE eventType != ‘Normal’ EMIT CHANGES;
+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
|ROWKEY |SOURCE |TIMESTAMP |VEHICLEID |DRIVERID |ROUTEID |EVENTTYPE |LATITUDE |LONGITUDE |CORRELATIONI|
| | | | | | | | | |D |
+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
|62 |Tracking_Sys|160555884232|62 |26 |1594289134 |Unsafe follo|34.78 |-92.31 |505825961956|
| |B |9 | | | |wing distanc| | |6029956 |
| | | | | | |e | | | |
|62 |Tracking_Sys|160555884232|62 |26 |1594289134 |Unsafe follo|34.78 |-92.31 |505825961956|
| |B |9 | | | |wing distanc| | |6029956 |
| | | | | | |e | | | |
|51 |Tracking_Sys|160555884416|51 |13 |1198242881 |Overspeed |40.38 |-89.17 |505825961956|
| |B |9 | | | | | | |6029956 |
|51 |Tracking_Sys|160555884416|51 |13 |1198242881 |Overspeed |40.38 |-89.17 |505825961956|
| |B |9 | | | | | | |6029956 |
ksql> CREATE STREAM IF NOT EXISTS problematic_driving_s
WITH (kafka_topic='problematic_driving’,
value_format='AVRO’,
partitions=8)
AS SELECT *
FROM vehicle_tracking_refined_s
WHERE eventtype != 'Normal’
PARTITION BY driverid;
Demo 4
Demo 4 – Create Connector to get driver table data
ksql> CREATE SOURCE CONNECTOR jdbc_logistics_sc WITH (
"connector.class"='io.confluent.connect.jdbc.JdbcSourceConnector’,
"tasks.max" = '1’,
"connection.url" =
'jdbc:postgresql://postgresql/demodb?user=demo&password=abc123!’,
"mode" = 'timestamp’,
"timestamp.column.name" = 'last_update’,
"schema.pattern" = 'logistics_db’,
"table.whitelist" = 'driver’,
"validate.non.null" = 'false’,
"topic.prefix" = 'logisticsdb_’,
"poll.interval.ms" = '10000’,
"key.converter" = 'org.apache.kafka.connect.converters.LongConverter’,
"key.converter.schemas.enable" = 'false’,
"value.converter" = 'org.apache.kafka.connect.json.JsonConverter’,
"value.converter.schemas.enable" = 'false’,
"transforms" = 'createKey,extractInt’,
"transforms.createKey.type" = 'org.apache.kafka.connect.transforms.ValueToKey’,
"transforms.createKey.fields" = 'id’,
"transforms.extractInt.type" =
'org.apache.kafka.connect.transforms.ExtractField$Key’,
"transforms.extractInt.field" = 'id' );
Demo 4 – Create a TABLE on logisticsdb_driver
ksql> CREATE TABLE IF NOT EXISTS driver_t (
id BIGINT PRIMARY KEY,
first_name VARCHAR,
last_name VARCHAR,
available VARCHAR,
birthdate VARCHAR)
WITH (kafka_topic='logisticsdb_driver’,
value_format='JSON');
ksql> SELECT * FROM driver_t EMIT CHANGES;
+----------+-------------------+---------------------+-------------------+-------------------------------+
|ID |FIRST_NAME |LAST_NAME |AVAILABLE |BIRTHDATE |
+----------+-------------------+---------------------+-------------------+-------------------------------+
|28 |Della |Mcdonald |Y |3491 |
|31 |Rosemarie |Ruiz |Y |3917 |
|12 |Laurence |Lindsey |Y |3060 |
|22 |Patricia |Coleman |Y |3875 |
|11 |Micky |Isaacson |Y |973 |
Demo 4 – Create a Stream with Enrichment by Driver
ksql> CREATE STREAM IF NOT EXISTS problematic_driving_and_driver_s
WITH (kafka_topic='problematic_driving_and_driver’,
value_format='AVRO’, partitions=8)
AS SELECT pd.driverId
, d.first_name
, d.last_name
, d.available
, pd.vehicleId
, pd.routeId
, pd.eventType
FROM problematic_driving_s pd
LEFT JOIN driver_t d
ON pd.driverId = d.id;
ksql> select * from problematic_driving_and_driver_s EMIT CHANGES;
1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure |
39.01 | -93.85
1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following
distance | 39.0 | -93.65
CREATE CONNECTOR
Create a new connector in the Kafka Connect cluster
with the configuration passed in the WITH clause
Kafka Connect is an open source component of
Apache Kafka that simplifies loading and exporting
data between Kafka and external systems
ksqlDB provides functionality to manage and
integrate with Connect
CREATE SOURCE | SINK CONNECTOR [IF NOT EXISTS] connector_name
WITH( property_name = expression [, ...]);
Source: ksqlDB Documentation
CREATE TABLE
Create a new table with the specified columns and properties
Supports same data types as CREATE STREAM
KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well
KSQL has currently the following requirements for creating a table from a Kafka topic
• message key must also be present as a field/column in the Kafka message value
• message key must be in VARCHAR aka STRING format
CREATE TABLE table_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );
Demo 5
Windowing
• Tumbling Window
• Hopping Window
• Session Window
SELECT item_id, SUM(quantity)
FROM orders
WINDOW TUMBLING (SIZE 20 SECONDS)
GROUP BY item_id
SELECT item_id, SUM(quantity)
FROM orders
WINDOW SESSION (20 SECONDS)
GROUP BY item_id
SELECT item_id, SUM(quantity)
FROM orders
WINDOW HOPPING (SIZE 20 SECONDS,
ADVANCE BY 5 SECONDS)
GROUP BY item_id;
Demo 5 – SELECT COUNT … GROUP BY
ksql> CREATE TABLE event_type_by_1hour_tumbl_t
WITH (kafka_topic = 'event_type_by_1hour_tumbl_t’)
AS SELECT windowstart AS winstart
, windowend AS winend
, eventType
, count(*) AS nof
FROM problematic_driving_s
WINDOW TUMBLING (SIZE 60 minutes)
GROUP BY eventType;
ksql> SELECT TIMESTAMPTOSTRING(WINDOWSTART,'yyyy-MM-dd HH:mm:SS','CET') wsf
, TIMESTAMPTOSTRING(WINDOWEND,'yyyy-MM-dd HH:mm:SS','CET') wef
, eventType
, nof
FROM event_type_by_1hour_tumbl_t
EMIT CHANGES;
+----------------------------+---------------------------+---------------------------------+-------+
|WSF |WEF |EVENTTYPE |NOF |
+----------------------------+---------------------------+---------------------------------+-------+
|2020-11-16 21:00:00 |2020-11-16 22:00:00 |Unsafe following distance |1 |
|2020-11-16 21:00:00 |2020-11-16 22:00:00 |Lane Departure |1 |
|2020-11-16 21:00:00 |2020-11-16 22:00:00 |Unsafe tail distance |1 |
|2020-11-16 21:00:00 |2020-11-16 22:00:00 |Overspeed |1 |
|2020-11-16 21:00:00 |2020-11-16 22:00:00 |Overspeed |3 |
ksqlDB REST API
• The /status endpoint lets you poll the status of the command
• The /info resource gives you information about the status of a ksqlDB Server
• The /ksql resource runs a sequence of SQL statements
• The /query resource lets you stream the output records of a SELECT statement via a chunked
transfer encoding
curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’
-i http://dataplatform:8088/query
--data '{ "ksql":
"SELECT * FROM problematic_driving_s EMIT CHANGES;",
"streamsProperties": {
"ksql.streams.auto.offset.reset": "latest" }
}'
ksqlDB Native Client
• ksqlDB ships with a lightweight Java client
• enables sending requests easily to a ksqlDB server from within your Java application
• alternative to using the REST API
• Supports
• pull and push queries
• inserting new rows of data into existing ksqlDB streams
• creation and management of new streams and tables
• persistent queries
• admin operations such as listing streams, tables, and topics
https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-clients/java-client/
Choosing the Right API
• Java, c#, c++, scala,
phyton, node.js,
go, php …
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM …
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent
You are welcome to join us at the Expo area.
We're looking forward to meeting you.
Link to the Expo area:
https://www.vinivia-event-
manager.io/e/DOAG/portal/expo/29731
My other talks at DOAG 2020:
18.11 – 10:00 - Big Data, Data Lake, Datenserialisierungsformate
18.11 – 13:00 – Rolle des Event Hubs in einer modernen Daten Architektur
19.11 – 13:00 – Kafka Livedemo: Umsetzung einer Streaminglösung #slideless
43

More Related Content

What's hot

What's hot (20)

Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 

Similar to ksqlDB - Stream Processing simplified!

Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 

Similar to ksqlDB - Stream Processing simplified! (20)

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
SQL Server 2022 Programmability & Performance
SQL Server 2022 Programmability & PerformanceSQL Server 2022 Programmability & Performance
SQL Server 2022 Programmability & Performance
 
Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Wal...
Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Wal...Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Wal...
Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Wal...
 
Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER
Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADERBuilding a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER
Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
The Magic of Window Functions in Postgres
The Magic of Window Functions in PostgresThe Magic of Window Functions in Postgres
The Magic of Window Functions in Postgres
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 

More from Guido Schmutz

Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Recently uploaded

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

ksqlDB - Stream Processing simplified!

  • 2. Guido Working at Trivadis for more than 23 years Consultant, Trainer, Platform Architect for Java, Oracle, SOA and Big Data / Fast Data Oracle Groundbreaker Ambassador & Oracle ACE Director @gschmutz guidoschmutz.wordpress.com 210th edition
  • 3.
  • 4. Apache Kafka – scalable message processing and more! Source Connector trucking_ driver Kafka Cluster Sink Connector Stream Processing Schema Registry Kafka Kafka ksqlDB
  • 5. Kafka Streams - Overview • Designed as a simple and lightweight library in Apache Kafka • Part of Apache Kafka project (https://kafka.apache.org/documentation/streams/) • no other dependencies than Kafka • Supports fault-tolerant local state • Supports Windowing (Fixed, Sliding and Session) and Stream-Stream / Stream-Table Joins • Millisecond processing latency, no micro-batching • At-least-once and exactly-once processing guarantees BD-STREAM – Introducing Stream Processing5 KTable<Integer, Customer> customers = builder.stream(”customer"); KStream<Integer, Order> orders = builder.stream(”order"); KStream<Integer, String> enriched = orders.leftJoin(customers, …); enriched.to(”orderEnriched"); trucking_ driver Kafka Cluster Java Application Kafka Streams
  • 6. ksqlDB: Streaming SQL Engine for Apache Kafka • Separate open source project, not part of Apache Kafka (http://ksqldb.io) • simplest way to process streams of data in real-time • Enables stream processing with zero coding • Use a familiar language (SQL dialect) • Powered by Kafka and Kafka Streams • scalable, distributed, mature • Create materialized views over streams • Receive real-time push updates or pull current state on demand • Kafka native - All you need is Kafka ksqlDB trucking_ driver Kafka Cluster KSQL Engine Kafka Streams KSQL CLI Commands
  • 8. Terminology Stream • an unbounded sequence of structured data (“facts”) • Facts in a stream are immutable: new facts can be inserted to a stream, but existing facts can never be updated or deleted • Streams can be created from a Kafka topic or derived from an existing stream • A stream’s underlying data is durably stored (persisted) within a Kafka topic on the Kafka brokers Table • materialized View of events with only the latest value for a key • a view of a stream, or another table, and represents a collection of evolving facts • the equivalent of a traditional database table but enriched by streaming semantics such as windowing • Facts in a table are mutable: new facts can be inserted to the table, and existing facts can be updated or deleted • Tables can be created from a Kafka topic or derived from existing streams and tables
  • 9. Demo 1 19.11 – 13:00 – Kafka Livedemo: Umsetzung einer Streaminglösung #slideless
  • 10. Demo 1 - ksqlDB CLI $ docker exec -it ksqldb-cli ksql http://ksqldb-server-1:8088 =========================================== = _ _ ____ ____ = = | | _____ __ _| | _ | __ ) = = | |/ / __|/ _` | | | | | _ = = | <__ (_| | | |_| | |_) | = = |_|____/__, |_|____/|____/ = = |_| = = Event Streaming Database purpose-built = = for stream processing apps = =========================================== Copyright 2017-2020 Confluent Inc. CLI v0.13.0, Server v0.13.0 located at http://ksqldb-server-1:8088 Server Status: RUNNING Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql>
  • 11. Demo 1 – Create a STREAM on vehicle_tracking_sysA ksql> CREATE STREAM IF NOT EXISTS vehicle_tracking_sysA_s( mqttTopic VARCHAR KEY, timestamp VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationId VARCHAR) WITH (kafka_topic='vehicle_tracking_sysA’, value_format='JSON'); Message ---------------- Stream created
  • 12. Demo 1 – PUSH Query on vehicle_tracking_sysA_s ksql> SELECT * FROM vehicle_tracking_sysA_s EMIT CHANGES; +-------------+-------------+-------------+-----------+-------------+-------------+-------------+-------------+-------------+ |MQTTTOPIC |TIMESTAMP |TRUCKID |DRIVERID |ROUTEID |EVENTTYPE |LATITUDE |LONGITUDE |CORRELATIONID| +-------------+-------------+-------------+-----------+-------------+-------------+-------------+-------------+-------------+ |truck/17/posi|1605557915574|17 |17 |803014426 |Normal |37.03 |-94.58 |3556373681251| |tion | | | | | | | |424186 | |truck/39/posi|1605557916344|39 |32 |371182829 |Normal |35.21 |-90.37 |3556373681251| |tion | | | | | | | |424186 | |truck/38/posi|1605557916774|38 |10 |1927624662 |Normal |37.7 |-92.61 |3556373681251| |tion | | | | | | | |424186 | |truck/37/posi|1605557917924|37 |30 |160779139 |Normal |36.17 |-95.99 |3556373681251| |tion | | | | | | | |424186 | . . .
  • 13. Demo 1 – Create a new STREAM with refinement to AVRO ksql> CREATE STREAM IF NOT EXISTS vehicle_tracking_refined_s WITH (kafka_topic='vehicle_tracking_refined’, value_format='AVRO’, value_avro_schema_full_name= 'com.trivadis.avro.VehicleTrackingRefined’) AS SELECT truckId AS ROWKEY , 'Tracking_SysA' AS source , timestamp , AS_VALUE(truckId) AS vehicleId , driverId , routeId , eventType , latitude , longitude , correlationId FROM vehicle_tracking_sysA_s PARTITION BY truckId EMIT CHANGES;
  • 14. Demo 1 – Create a STREAM on vehicle_tracking_sysB ksql> CREATE STREAM IF NOT EXISTS vehicle_tracking_sysB_s ( ROWKEY VARCHAR KEY, system VARCHAR, timestamp VARCHAR, vehicleId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latLong VARCHAR, correlationId VARCHAR) WITH (kafka_topic='vehicle_tracking_sysB’, value_format='DELIMITED');
  • 15. Demo 1 – INSERT INTO existing stream SELECT form other stream ksql> INSERT INTO vehicle_tracking_refined_s SELECT ROWKEY , 'Tracking_SysB' AS source , timestamp , vehicleId , driverId , routeId , eventType , cast(split(latLong,':')[1] as DOUBLE) as latitude , CAST(split(latLong,':')[2] AS DOUBLE) as longitude , correlationId FROM vehicle_tracking_sysB_s EMIT CHANGES;
  • 16. Demo 1 – SELECT from stream vehicle_tracking_refined_s ksql> SELECT * FROM vehicle_tracking_refined_s; +------------+------------+------------+----------+----------+------------+------------+----------+----------+------------+ |ROWKEY |SOURCE |TIMESTAMP |VEHICLEID |DRIVERID |ROUTEID |EVENTTYPE |LATITUDE |LONGITUDE |CORRELATIONI| | | | | | | | | | |D | +------------+------------+------------+----------+----------+------------+------------+----------+----------+------------+ |38 |Tracking_Sys|160555842863|38 |10 |1927624662 |Normal |37.51 |-92.89 |355637368125| | |A |4 | | | | | | |1424186 | |62 |Tracking_Sys|160555842856|62 |26 |1594289134 |Normal |37.09 |-94.23 |505825961956| | |B |8 | | | | | | |6029956 | |51 |Tracking_Sys|160555842847|51 |13 |1198242881 |Normal |42.04 |-88.02 |505825961956| | |B |9 | | | | | | |6029956 | |62 |Tracking_Sys|160555842856|62 |26 |1594289134 |Normal |37.09 |-94.23 |505825961956| | |B |8 | | | | | | |6029956 | |51 |Tracking_Sys|160555842847|51 |13 |1198242881 |Normal |42.04 |-88.02 |505825961956| | |B |9 | | | | | | |6029956 | . . .
  • 17. CREATE STREAM Create a new stream, backed by a Kafka topic, with the specified columns and properties Supported column data types: • BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING • ARRAY<ArrayType> • MAP<VARCHAR, ValueType> • STRUCT<FieldName FieldType, ...> Supports the following serialization formats: CSV, JSON, AVRO • KSQL adds the implicit columns ROWTIME and ROWKEY to every stream CREATE STREAM stream_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] ); 19
  • 18. SELECT (Push Query) Push a continuous stream of updates to the ksqlDB stream or table Result of this statement will not be persisted in a Kafka topic and will only be printed out in the console This is a continuous query, to stop the query in the CLI press CTRL-C • from_item is one of the following: stream_name, table_name SELECT select_expr [, ...] FROM from_item [ LEFT JOIN join_table ON join_criteria ] [ WINDOW window_expression ] [ WHERE condition ] [ GROUP BY grouping_expression ] [ HAVING having_expression ] EMIT output_refinement [ LIMIT count ];
  • 19. Functions Scalar Functions • ABS, ROUND, CEIL, FLOOR • ARRAYCONTAINS • CONCAT, SUBSTRING, TRIM • EXTRACJSONFIELD • GEO_DISTANCE • LCASE, UCASE • MASK, MASK_KEEP_LEFT, MASK_KEEP_RIGHT, MASK_LEFT, MASK_RIGHT • RANDOM • STRINGTOTIMESTAMP, TIMESTAMPTOSTRING Aggregate Functions • COUNT • MAX • MIN • SUM • TOPK • TOPKDISTINCT User-Defined Functions (UDF) and User-Defined Aggregate Functions (UDAF) • Currently only supported using Java 21 https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/functions/
  • 20. CREATE STREAM … AS SELECT … Create a new KSQL table along with the corresponding Kafka topic and stream the result of the SELECT query as a changelog into the topic WINDOW clause can only be used if the from_item is a stream CREATE STREAM stream_name [WITH ( property_name = expression [, ...] )] AS SELECT select_expr [, ...] FROM from_stream [ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria [ WHERE condition ] [PARTITION BY column_name];
  • 21. INSERT INTO … AS SELECT … Stream the result of the SELECT query into an existing stream and its underlying topic schema and partitioning column produced by the query must match the stream’s schema and key If the schema and partitioning column are incompatible with the stream, then the statement will return an error stream_name and from_item must both refer to a Stream. Tables are not supported! CREATE STREAM stream_name ...; INSERT INTO stream_name SELECT select_expr [., ...] FROM from_stream [ WHERE condition ] [ PARTITION BY column_name ];
  • 22. Two Types of Queries Pull queries • allow you to fetch the current state of a materialized view • Because materialized views are updated incrementally as new events arrive, pull queries run with predictably low latency Push queries • enable you to subscribe to materialized view updates and stream changes • When new events arrive, push queries emit refinements, so your event streaming applications can react to new information in real-time SELECT … FROM vehicle_position_s EMIT CHANGES; SELECT … FROM vehicle_position_t WHERE vehicleId = 10;
  • 24. Demo 2 – Pull Query on vehicle_tracking ksql> CREATE TABLE IF NOT EXISTS vehicle_tracking_refined_t WITH (kafka_topic = 'vehicle_tracking_refined_t’) AS SELECT CAST(vehicleId AS BIGINT) vehicleId , latest_by_offset(driverId) driverId , latest_by_offset(source) source , latest_by_offset(eventType) eventType , latest_by_offset(latitude) latitude , latest_by_offset(longitude) longitude FROM vehicle_tracking_refined_s GROUP BY CAST(vehicleId AS BIGINT) EMIT CHANGES; ksql> SELECT * FROM vehicle_tracking_refined_t WHERE vehicleId = 42;
  • 25. SELECT (Pull Query) Pulls the current value from the materialized table and terminates The result of this statement isn't persisted in a Kafka topic and is printed out only in the console Pull queries enable to fetch the current state of a materialized view They're a great match for request/response flows and can be used with ksqlDB REST API SELECT select_expr [, ...] FROM aggregate_table WHERE key_column = key [ AND window_bounds ];
  • 27. Demo 3 – SELECT non-normal driving behavior ksql> SELECT * FROM vehicle_tracking_refined_s WHERE eventType != ‘Normal’ EMIT CHANGES; +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ |ROWKEY |SOURCE |TIMESTAMP |VEHICLEID |DRIVERID |ROUTEID |EVENTTYPE |LATITUDE |LONGITUDE |CORRELATIONI| | | | | | | | | | |D | +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ |62 |Tracking_Sys|160555884232|62 |26 |1594289134 |Unsafe follo|34.78 |-92.31 |505825961956| | |B |9 | | | |wing distanc| | |6029956 | | | | | | | |e | | | | |62 |Tracking_Sys|160555884232|62 |26 |1594289134 |Unsafe follo|34.78 |-92.31 |505825961956| | |B |9 | | | |wing distanc| | |6029956 | | | | | | | |e | | | | |51 |Tracking_Sys|160555884416|51 |13 |1198242881 |Overspeed |40.38 |-89.17 |505825961956| | |B |9 | | | | | | |6029956 | |51 |Tracking_Sys|160555884416|51 |13 |1198242881 |Overspeed |40.38 |-89.17 |505825961956| | |B |9 | | | | | | |6029956 | ksql> CREATE STREAM IF NOT EXISTS problematic_driving_s WITH (kafka_topic='problematic_driving’, value_format='AVRO’, partitions=8) AS SELECT * FROM vehicle_tracking_refined_s WHERE eventtype != 'Normal’ PARTITION BY driverid;
  • 29. Demo 4 – Create Connector to get driver table data ksql> CREATE SOURCE CONNECTOR jdbc_logistics_sc WITH ( "connector.class"='io.confluent.connect.jdbc.JdbcSourceConnector’, "tasks.max" = '1’, "connection.url" = 'jdbc:postgresql://postgresql/demodb?user=demo&password=abc123!’, "mode" = 'timestamp’, "timestamp.column.name" = 'last_update’, "schema.pattern" = 'logistics_db’, "table.whitelist" = 'driver’, "validate.non.null" = 'false’, "topic.prefix" = 'logisticsdb_’, "poll.interval.ms" = '10000’, "key.converter" = 'org.apache.kafka.connect.converters.LongConverter’, "key.converter.schemas.enable" = 'false’, "value.converter" = 'org.apache.kafka.connect.json.JsonConverter’, "value.converter.schemas.enable" = 'false’, "transforms" = 'createKey,extractInt’, "transforms.createKey.type" = 'org.apache.kafka.connect.transforms.ValueToKey’, "transforms.createKey.fields" = 'id’, "transforms.extractInt.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key’, "transforms.extractInt.field" = 'id' );
  • 30. Demo 4 – Create a TABLE on logisticsdb_driver ksql> CREATE TABLE IF NOT EXISTS driver_t ( id BIGINT PRIMARY KEY, first_name VARCHAR, last_name VARCHAR, available VARCHAR, birthdate VARCHAR) WITH (kafka_topic='logisticsdb_driver’, value_format='JSON'); ksql> SELECT * FROM driver_t EMIT CHANGES; +----------+-------------------+---------------------+-------------------+-------------------------------+ |ID |FIRST_NAME |LAST_NAME |AVAILABLE |BIRTHDATE | +----------+-------------------+---------------------+-------------------+-------------------------------+ |28 |Della |Mcdonald |Y |3491 | |31 |Rosemarie |Ruiz |Y |3917 | |12 |Laurence |Lindsey |Y |3060 | |22 |Patricia |Coleman |Y |3875 | |11 |Micky |Isaacson |Y |973 |
  • 31. Demo 4 – Create a Stream with Enrichment by Driver ksql> CREATE STREAM IF NOT EXISTS problematic_driving_and_driver_s WITH (kafka_topic='problematic_driving_and_driver’, value_format='AVRO’, partitions=8) AS SELECT pd.driverId , d.first_name , d.last_name , d.available , pd.vehicleId , pd.routeId , pd.eventType FROM problematic_driving_s pd LEFT JOIN driver_t d ON pd.driverId = d.id; ksql> select * from problematic_driving_and_driver_s EMIT CHANGES; 1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure | 39.01 | -93.85 1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following distance | 39.0 | -93.65
  • 32. CREATE CONNECTOR Create a new connector in the Kafka Connect cluster with the configuration passed in the WITH clause Kafka Connect is an open source component of Apache Kafka that simplifies loading and exporting data between Kafka and external systems ksqlDB provides functionality to manage and integrate with Connect CREATE SOURCE | SINK CONNECTOR [IF NOT EXISTS] connector_name WITH( property_name = expression [, ...]); Source: ksqlDB Documentation
  • 33. CREATE TABLE Create a new table with the specified columns and properties Supports same data types as CREATE STREAM KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well KSQL has currently the following requirements for creating a table from a Kafka topic • message key must also be present as a field/column in the Kafka message value • message key must be in VARCHAR aka STRING format CREATE TABLE table_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] );
  • 35. Windowing • Tumbling Window • Hopping Window • Session Window SELECT item_id, SUM(quantity) FROM orders WINDOW TUMBLING (SIZE 20 SECONDS) GROUP BY item_id SELECT item_id, SUM(quantity) FROM orders WINDOW SESSION (20 SECONDS) GROUP BY item_id SELECT item_id, SUM(quantity) FROM orders WINDOW HOPPING (SIZE 20 SECONDS, ADVANCE BY 5 SECONDS) GROUP BY item_id;
  • 36. Demo 5 – SELECT COUNT … GROUP BY ksql> CREATE TABLE event_type_by_1hour_tumbl_t WITH (kafka_topic = 'event_type_by_1hour_tumbl_t’) AS SELECT windowstart AS winstart , windowend AS winend , eventType , count(*) AS nof FROM problematic_driving_s WINDOW TUMBLING (SIZE 60 minutes) GROUP BY eventType; ksql> SELECT TIMESTAMPTOSTRING(WINDOWSTART,'yyyy-MM-dd HH:mm:SS','CET') wsf , TIMESTAMPTOSTRING(WINDOWEND,'yyyy-MM-dd HH:mm:SS','CET') wef , eventType , nof FROM event_type_by_1hour_tumbl_t EMIT CHANGES; +----------------------------+---------------------------+---------------------------------+-------+ |WSF |WEF |EVENTTYPE |NOF | +----------------------------+---------------------------+---------------------------------+-------+ |2020-11-16 21:00:00 |2020-11-16 22:00:00 |Unsafe following distance |1 | |2020-11-16 21:00:00 |2020-11-16 22:00:00 |Lane Departure |1 | |2020-11-16 21:00:00 |2020-11-16 22:00:00 |Unsafe tail distance |1 | |2020-11-16 21:00:00 |2020-11-16 22:00:00 |Overspeed |1 | |2020-11-16 21:00:00 |2020-11-16 22:00:00 |Overspeed |3 |
  • 37. ksqlDB REST API • The /status endpoint lets you poll the status of the command • The /info resource gives you information about the status of a ksqlDB Server • The /ksql resource runs a sequence of SQL statements • The /query resource lets you stream the output records of a SELECT statement via a chunked transfer encoding curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’ -i http://dataplatform:8088/query --data '{ "ksql": "SELECT * FROM problematic_driving_s EMIT CHANGES;", "streamsProperties": { "ksql.streams.auto.offset.reset": "latest" } }'
  • 38. ksqlDB Native Client • ksqlDB ships with a lightweight Java client • enables sending requests easily to a ksqlDB server from within your Java application • alternative to using the REST API • Supports • pull and push queries • inserting new rows of data into existing ksqlDB streams • creation and management of new streams and tables • persistent queries • admin operations such as listing streams, tables, and topics https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-clients/java-client/
  • 39. Choosing the Right API • Java, c#, c++, scala, phyton, node.js, go, php … • subscribe() • poll() • send() • flush() • Anything Kafka • Fluent Java API • mapValues() • filter() • flush() • Stream Analytics • SQL dialect • SELECT … FROM … • JOIN ... WHERE • GROUP BY • Stream Analytics Consumer, Producer API Kafka Streams KSQL • Declarative • Configuration • REST API • Out-of-the-box connectors • Stream Integration Kafka Connect Flexibility Simplicity Source: adapted from Confluent
  • 40. You are welcome to join us at the Expo area. We're looking forward to meeting you. Link to the Expo area: https://www.vinivia-event- manager.io/e/DOAG/portal/expo/29731 My other talks at DOAG 2020: 18.11 – 10:00 - Big Data, Data Lake, Datenserialisierungsformate 18.11 – 13:00 – Rolle des Event Hubs in einer modernen Daten Architektur 19.11 – 13:00 – Kafka Livedemo: Umsetzung einer Streaminglösung #slideless
  • 41. 43