APAC ksqlDB Workshop

Agenda — ksqlDB Workshop
22
01
Introductions, Welcome &
guidelines. How to get help 05 Lab: Hands on
11:00AM - 12:00 PM
02
Talk: Introduction to Kafka,
Kafka Streams & ksqlDB
10:10 - 10:30 AM
03
Lab: Scenario overview and
what you’ll be building
10:30 - 10:45 AM
04 Lab: Getting your lab set up
10:45 - 11:00 AM

The Rise of Event Streaming
60%Fortune 100 Companies
Using Apache Kafka
3

Confluent Enables Your
Event Streaming Success
Hall of Innovation
CTO Innovation
Award Winner
2019
Enterprise Technology
Innovation
AWARDS
Confluent founders are
original creators of Kafka
Confluent team wrote 80%
of Kafka commits and has
over 1M hours technical
experience with Kafka
Confluent helps enterprises
successfully deploy event
streaming at scale and
accelerate time to market
Confluent Platform extends
Apache Kafka to be a
secure, enterprise-ready
platform

Introduction to Kafka and streams

6
Kafka
Distributed Commit Log
Apache Kafka®

Apache Kafka Connect API:
Import and Export Data In & Out of Kafka
Kafka Connect API
Kafka Pipeline
Sources Sinks

Instantly Connect Popular Data Sources & Sinks
Data Diode
100+
pre-built
connectors
80+ Confluent Supported 20+ Partner Supported, Confluent Verified

Kafka Streams API
Write standard Java applications &
microservices
to process your data in real-time
Kafka Connect API
Reliable and scalable
integration of Kafka
with other systems – no coding
required.
Apache Kafka®

What’s stream processing good for?
Materialized cache
Build and serve incrementally
updated stateful views of your
data.
10
Streaming ETL pipeline
Manipulate in-ﬂight events to
connect arbitrary sources and
sinks.
Event-driven microservice
Trigger changes based on
observed patterns of events in
a stream.

11
What does a streaming platform do?

Kafka Cluster
12
Stream Processing by Analogy

Example: Using Kafka’s Streams API for writing
elastic, scalable, fault-tolerant Java and Scala
applications
Main
Logi
c
Stream processing with Kafka

CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
Same example, now with ksqlDB.
Not a single line of Java or Scala code needed.
Stream processing with Kafka

3 modalities of stream processing with Conﬂuent
Kafka clients
15
Kafka Streams ksqlDB
ConsumerRecords<String, String> records = consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(), Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;

Using external processing systems leads to
complicated architectures
DB CONNECTOR
APP
APP
DB
STREAM
PROCESSING
APPDB
CONNECTOR
CONNECTOR

We can put it back together in a simpler way
DB
APP
APP
DB
APP
PULL
PUSH
CONNECTORS
STREAM PROCESSING
STATE STORES
ksqlDB

Consumer,
Producer
Kafka
Streams
ksqlDB
Flexibility
Simplicity
subscribe(),
poll(), send(),
ﬂush()
mapValues(),
ﬁlter(),
aggregate()
Select…from…
join…where…
group by..
Client Trade-offs

Build a complete streaming app with one mental
model in SQL
Serve lookups against
materialized views
Create
materialized views
Perform continuous
transformations
Capture data
CREATE STREAM purchases AS
SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd')
FROM pageviews;
CREATE TABLE orders_by_country AS
SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM purchases
WINDOW TUMBLING (SIZE 5 MINUTES)
LEFT JOIN user_profiles ON purchases.customer_id = user_profiles.customer_id
GROUP BY country
EMIT CHANGES;
SELECT * FROM orders_by_country WHERE country='usa';
CREATE SOURCE CONNECTOR jdbcConnector WITH (
‘connector.class’ = '...JdbcSourceConnector',
‘connection.url’ = '...',
…);

Multi-way joins
In the past, ksqlDB required
multiple joins to “daisy chain”
together, which was cumbersome
and resource intensive.
ksqlDB now supports efﬁcient
multi-way joins in a single
expression.
Before
CREATE STREAM tmp_join AS
SELECT customers.customerid AS customerid,
customers.customername, orders.orderid,
orders.itemid, orders.purchasedate
FROM orders
INNER JOIN customers ON orders.customerid = customers.customerid
EMIT CHANGES;
CREATE STREAM customers_orders_report AS
SELECT customerid, customername, orderid, items.itemname, purchasedate
FROM tmp_join
LEFT JOIN items ON tmp_join.itemid = items.itemid
EMIT CHANGES;
...
After
CREATE STREAM customers_orders_report AS
SELECT customers.customerid AS customerid,
customers.customername, orders.orderid, items.itemname,
orders.purchasedate
FROM orders
LEFT JOIN customers ON orders.customerid = customers.customerid
LEFT JOIN items ON orders.itemid = items.itemid
EMIT CHANGES;

app
First-class
Java client
Write stream processing programs
using language-neutral SQL, then
access your data from your favorite
programming language.
Use either our ﬁrst-class Java client,
or use our REST API any language
that you like.
CREATE TABLE t1 AS
SELECT k1, SUM(b)
FROM s1
GROUP BY k1
EMIT CHANGES;
Pull query Push query

Highly available pull queries
22
Pull queries now include improved availability semantics
• Pull queries will continue to work during rebalances (assuming standbys are available)
• Lag-aware routing: standbys with the least amount of lag will be targeted
SELECT * FROM my_table WHERE ROWKEY = ‘my_key’;
my_table replica0
● At offset 100
my_table replica1
● At offset 32
Pull queries are now enabled by default in RBAC-enabled environments, too!

How we will run the training
24
You will be working with Zoom, and your browser (instructions, ksqlDB console, and
Conﬂuent Control Centre).
If you have questions you can post them via the Zoom chat feature.
If you are stuck don’t worry - just use the “Raise hand” button in Zoom and a Conﬂuent
engineer will come to help you.
Try to avoid just racing ahead and copy-and-pasting. Most people learn better when they
actually type the code into the console. And it allows you to learn from mistakes.

Activity
25
Identify a use case that applies to your
current work
Based upon your understanding of Kafka and
ksqlDB can you identify an area of your job
where you could use Kafka and ksqlDB to
unleash business value from your data?
Not sure where to start? Visit the Stream
Processing Cookbook
https://www.confluent.io/stream-processing-cookbook/

Cluster Architectural Overview
26
MySQL
customer
database
Microservice
User reviews
Website
Product page with
ratings widget
Kafka Connect
Datagen
connector
MySQL CDC
connector
Kafka
ksqlDB
transforms
enriches
queries

Overview
28
• Airline website with customer database
• Customer database stores membership levels
• Members can write reviews and rate services on the website and/or mobile app
• Reviews submitted to a reviews microservice
• Customer account referenced in the review via id - missing customer information in
the review
The airline wants to unlock the business value of user reviews by
processing them in real-time.

Use Case - Cleanliness of Facilities
29
Some reviews mention the cleanliness of the airport toilets. This affects
the customer experience of the airline and holds important data for the
airline.
9/12/19 12:55:05 GMT, 5313, {
"rating_id": 5313,
"user_id": 3,
"stars": 1,
"route_id": 6975,
"rating_time": 1519304105213,
"channel": "web",
"message": "why is it so difficult to keep the bathrooms clean?"
}

Use Case - Approach 1
30
Reviews go to a data warehouse. We process the reviews at the end of
each month and then respond to areas where we receive a signiﬁcant
number of comments.
This approach tells you what has already happened.

31
Process the reviews in real time, and provide a dashboard to the
Airport management team. This dashboard could sort reviews by
topics to quickly surface issues with cleanliness.
This approach tells you what is happening.

32
Process the reviews in real time. Set up alerts for 3 bad reviews related
to toilet cleanliness within a 10-minute window. Automatically page
the cleaning staff to deal with the issue.
This approach does something based upon what is happening.

ksqlDB runs in its own cluster
33

Hands on
3. Testing the setup
4. KSQL

ksqlDB console
36
> show topics;
> show streams;
> print 'ratings';

Hands on
5. Creating your ﬁrst ksqlDB
streaming application
Complete up to and including 5.2.2

Discussion - tables vs streams
38
> describe extended customers;
> select * from customers emit changes;
> select * from customers_flat emit changes;

Hands on
5.3 Identify the unhappy
customers
5.4 Monitoring our queries

Pause to consider what we have just done
40
We have taken data from two different, remote systems and pulled
them into Kafka
We have performed real time transformations on this data to reformat
We have joined these two separate data streams
We have created a query that constantly runs against a stream of
events and generates new events when data matches the query
and all of this will run at enterprise scale!

CDC — only after state
41
The JSON data shows what information
is being pulled from MySQL via
Debezium CDC.
Here you can see that there is no
“BEFORE” data (it is null).
This means the record was just created
with no updates. Example would be
when a new user is ﬁrst added.

CDC — before and after
42
Now we have some “BEFORE” data
because there was an update to the
user’s record.

C3 - Visualise ksqlDB
45
• Overview of the CDC step [david]

The topology viewer has been enabled by default in CP 5.5:
Accessible via the “Flow” tab:
Topology viewer
47

Windowed queries
49
“Alert me if I receive
more than three reviews
within 10 seconds”
Build your alerting logic using
ksqlDBs rich support for
windowed queries. This allows us
to implement solutions for
problems like fraud and anomaly
detection.

UDF and machine learning
50
“I want to apply my machine-learning algorithm to real-time data”
Built in functions
ksqlDB ships with a number of built-in functions to simplify stream processing. Examples
include:
• GEODISTANCE: Measure the distance between two lat/long coordinates
• MASK: Convert a string to a masked or obfuscated version of itself
• JSON_ARRAY_CONTAINS: checks if a search value is contained in the array
User-deﬁned functions
Extend the functions available in ksqlDB by building your own functions. A common use
case is to implement a machine-learning algorithm via ksqlDB, enabling these models to
contribute to your real-time data transformation

Internet of Things
51
“Process telemetry in real
time to provide predictive
maintenance”
Despite its simple
implementation ksqlDB operates
at enterprise scale
Other IoT use cases:
• Mineral extraction
• Cruise Ship
• Production Line
• Connected Car
• Power Plant
• Gas Pipelines

Reﬂection
53
Consider the challenges you face in your current role, and how
event streaming and processing could help solve them. What
products or solutions could you build if you had access to the
right data?

Learning
54
Visit the ksqlDB site to learn more about the technology
https://ksqldb.io/
Review the Stream Processing Cookbook
https://www.confluent.io/stream-processing-cookbook/?utm_source=field&utm_campaign=fieldocpromo
Download the ebook on designing event driven systems
https://www.confluent.io/designing-event-driven-systems?utm_source=field&utm_campaign=fieldocpromo
Subscribe to the Streaming Audio podcast
https://podcasts.apple.com/au/podcast/streaming-audio-a-confluent-podcast-about-apache-kafka/id1401509765
More resources
https://docs.confluent.io/current/resources.html

Learn Kafka.
developer.conﬂuent.io

Free eBooks
Kafka: The Deﬁnitive Guide
Neha Narkhede, Gwen Shapira, Todd
Palino
Making Sense of Stream Processing
Martin Kleppmann
I ❤ Logs
Jay Kreps
Designing Event-Driven Systems
Ben Stopford
http://cnﬂ.io/book-bundle

Building
57
Download Confluent Platform to develop your new idea
https://docs.confluent.io/current/quickstart/index.html
Get started for free on Confluent Cloud

Get $60 of free Conﬂuent Cloud
(Even if you’re an existing user)
CC60COMM
Promo value expiration: 90 days after activation • Activate by December 31st 2021 • Any unused promo value on the expiration date will be forfeited.
How to activate
Apply this code directly within the Conﬂuent Cloud billing interface
LIMITED PROMOTION
If you receive an invalid promo code error when trying to activate a code, this means that all promo codes have already been claimed

Interacting
59
Join the Confluent Slack Channel
https://launchpass.com/confluentcommunity
Local meetups
https://www.confluent.io/community/
KafkaSummit 2020
https://kafka-summit.org/

Interesting ideas?
60
Did something catch your fancy, want to dive a bit deeper?
Please chat in the zoom window or reach out to us.

APAC ksqlDB Workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to APAC ksqlDB Workshop

Similar to APAC ksqlDB Workshop (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

APAC ksqlDB Workshop