Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER

Building A Fully Kafka-Based Product As A
Data Scientist

BAADER
● Worldwide hidden champion for mechanical engineering of fish and poultry
processing
● Founded in Germany over 100 years ago
● Digitalization team focuses on innovative solutions
○ Software Architect (Stefan Frehse)
○ Software Engineers
○ Data Scientists
○ UI/UX Engineers
2 / 27

Content
Transport Manager
Kafka Streams
ksqlDB
Takeaways
3 / 27

Transport Manager - Initial Challenges
4 / 27
MQTT
Customer GPS
Speed
Farm ?
Load ?
ETA ?

Transport Manager - Data Analytics Solution
5 / 27
How can we increase animal welfare? /
How can we optimize for on-time delivery?
create app to fill out load information
connect load information to truck
calculate ETA
define incoming / outgoing truck
obtain list of farms + factory position → label when truck is there
get weather → has impact on birds condition

Transport Manager - Application
● Android app
● Desktop version
6 / 27

Transport Manager - Backend
9 / 27

● Recommend Kafka Streams in Action
● Confluent Community Forum
Kafka Streams - My Resources
10 / 27

Kafka Streams - Configuration
● Microservices implemented in Kotlin
● Source topic
● Sink topic
● Bootstrap Server
● Application Id
11 / 27
need to be created before → for us part of Infrastructure as Code (IaC)

Kafka Streams - Stateless Transformations
12 / 27

Kafka Streams - Stateless Transformations
13 / 27
consume event
label truck to process state
.all() → iterator over all keys in store
produce event
.build() → topology

Kafka Streams - Interactive Queries
14 / 27
process states as
GlobalKTable
make store queryable from outside

Kafka Streams - Stateful Transformations
15 / 27

16 / 27
replications of changelog topic (state store)
create state store and add to topology
(required)
stateful transformation
use defined state store

17 / 27
detect
incoming/outgoing
initialize state store
get stored event
compare events
.put() → update changelog topic

Kafka Streams
19 / 27
stateful
stateless

ksqlDB
● ksqlDB > Kafka Streams
● Store your queries in Git
● Experiences:
○ Stream-Table Join
○ Keys
○ Recreation Handling
20 / 27

ksqlDB - Stream-Table Join
21 / 27
ensure join works → set retention time for
Kafka topics (retention.ms)
stream with continuous data flow
lookup table

ksqlDB - Keys
22 / 27
ksqlDB < 0.10.0:
rowtime: 2021/05/12 10:30:00.000 Z,
key: [truck_1],
value: {
"thingId" : "truck_1",
"latitude" : 53.61406,
"longitude" : 10.2328,
"speed_m/s" : 16.1
}
rowtime: 2021/05/12 10:30:00.000 Z,
key: [truck_1],
value: {
"latitude" : 53.61406,
"speed_m/s" : 16.1
}
ksqlDB ≥ 0.10.0:

ksqlDB - Keys
23 / 27
ksqlDB ≥ 0.10.0:
rowtime: 2021/05/12 10:30:00.000 Z,
key: [truck_1],
value: {
"thingId" : "truck_1",
"latitude" : 53.61406,
"speed_m/s" : 16.1
}
create copy of the key column in the value
More on keys, see Confluent Blog post:
https://www.confluent.io/blog/ksqldb-0-10-updates-
key-columns/

ksqlDB - Recreation Handling
● 0.12.0: update queries via CREATE OR REPLACE
● 0.15.0: drop stream and automatically terminate query via DROP
24 / 27
ksql> DROP STREAM MOVING_THINGS;
Cannot drop MOVING_THINGS.
The following streams and/or tables read from this source: [MOVING_THINGS_WITH_DESTINATION].
You need to drop them before dropping MOVING_THINGS.
ksql> DROP STREAM MOVING_THINGS;
Cannot drop MOVING_THINGS.
The following queries read from this source: [CSAS_MOVING_THINGS_WITH_DESTINATION_265].
The following queries write into this source: [INSERTQUERY_37].
You need to terminate them before dropping MOVING_THINGS.
≥ 0.15.0

Transport Manager - Iterative Process
● First iteration: Presenting the data
○ analytic result actions implemented in microservices
○ → received positive feedback
○ → new requests
● Next iteration: Using the data
○ compare planned arrival time with actual one
○ predict workload
○ ...
25 / 27

● ksqlDB > Kafka Streams
○ have an eye on the changelog
● Kafka Streams in Action provides great introduction
● Don’t need to know everything, just start simple
○ improve iteratively
○ once one microservice is developed → several microservices can be as well
● Working with Kafka is fun
Takeaways
26 / 27

Questions?
Patrick Neff
BAADER
patrick.neff@baader.com
patrick-neff
27 / 27

Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER

More Related Content

What's hot

Similar to Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER

More from HostedbyConfluent

Recently uploaded

Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER

Editor's Notes