Developing Kafka Streams Applications with Upgradability in Mind with Neil Buesing | Kafka Summit London 2022

Designing your Ka
fk
a Streams
Applications with Upgradability In
Mind
Ka
fk
a Summit 2022 London

Neil Buesing, Rill Data
@nbuesing nbuesing

Background
• Principal Solution Architect, Rill Data, Inc.

• Work with clients streaming data into our platform

• 5+ years experience with Ka
fk
a Streams

• Speak on topics I'm passionate about with Apache Ka
fk
a and Ka
fk
a
Streams

• Working from home with the best pair-programmer

Goals
1. Con
fi
dence you can upgrade your application

2. Support for Data Recovery

• e.g., data corrupted due to bug in upgrade

3. Options

• e.g., responsibility

4. Reduce Developer time to achieve upgrade

Topics
1. Name processors

2. Name state stores

3. Minimize rebuilding of state

4. Data evolution

5. Partitioning

6. Microservices
 
 
7. Backup & Restore

8. Repartitioning

9. Windowed Stores

10. Circuit Breakers

11. Switches

Name Your Processors
HELLO

My Name Is
0000000001

• Syntax, add naming to existing con
fi
guration, Named added to those w/out

• Produced.as(), Grouped.as(), Joined.as(), Consumed.as(), Name.as()

• Gotchas - builders & static construction behavior

• Produced.with(Serdes.String(), vSerde).as("name")

• Produced.as("name").withKeySerde(Serdes.String()).withValueSerde(vSerde)

• Produced.<String,PurchaseOrder>as("name")
 
.withKeySerde(Serdes.String())
 
.withValueSerde(vSerde)

HELLO

My Name Is
order-groupBy-id-
fi
lter

Name Your State Stores
HELLO

My Name Is
0000000027

• The most important thing you can do to make upgrades easier
 
• Simple
 
 
KTable<String, User> users =
 
builder.table(options.getUserTopic(),

Consumed.as("ktable-users"),

Materialized.as("user-table"));

HELLO

My Name Is
user-table

Topology
• Print out topology on application start
 
 
final Topology topology = streamsBuilder(options).build(p);
 
log.info("Topology:n" + topology.describe());

• Visualize with

• https://zz85.github.io/ka
fk
a-streams-viz/

Minimize Rebuilding of State Stores
-changelog

final StreamsBuilder builder = new StreamsBuilder();

GlobalKTable<String, Store> stores = builder.globalTable("store",

Consumed.as("gktable-stores"),

Materialized.as("store-global-table")

);

GlobalKTable<String, Foo> foo = builder.globalTable("foo",

Consumed.as("gktable-foo"),

Materialized.as("foo-global-table")

);

builder.<String, PurchaseOrder>stream(
.
.
.
)

./4_1/rocksdb/pickup-order-reduce-store

./4_1/rocksdb/pickup-order-reduce-store/OPTIONS-000013

./4_1/rocksdb/pickup-order-reduce-store/CURRENT

./4_1/rocksdb/pickup-order-reduce-store/LOG

./4_1/rocksdb/pickup-order-reduce-store/IDENTITY


./4_1/rocksdb/pickup-order-reduce-store/000009.log

./4_1/rocksdb/pickup-order-reduce-store/MANIFEST-000008

./4_1/rocksdb/pickup-order-reduce-store/LOCK

./5_1/rocksdb/pickup-order-reduce-store


./5_1/rocksdb/pickup-order-reduce-store/CURRENT

./5_1/rocksdb/pickup-order-reduce-store/LOG

./5_1/rocksdb/pickup-order-reduce-store/IDENTITY


./5_1/rocksdb/pickup-order-reduce-store/000009.log

./5_1/rocksdb/pickup-order-reduce-store/MANIFEST-000008

./5_1/rocksdb/pickup-order-reduce-store/LOCK

Data Evolution
• Data Modeling

• Data Marshaling

• Data Versioning

Data Evolution - JSON
public class UnmappedProperties {

private final Map<String, Object> map = new LinkedHashMap
<
>
();

@JsonAnyGetter

public Map<String, Object> getUnknownProperties() {

return map;

}

@JsonAnySetter

public void setUnknownProperty(String key, Object value) {

map.put(key, value);

}

}

@JsonInclude(JsonInclude.Include.NON_NULL)

public class Product {

private String sku;

@JsonUnwrapped

private UnmappedProperties unmappedProperties = new UnmappedProperties();

public Product(Sku sku) {

this.sku = sku;

}

}

• Risk/Pitfall

• Data type changes can break this approach

• Validates 3rd party inputs

• Implement a clearUnknownProperties()

Data Evolution - Avro
• Evolution…

• part of Avro's library

• leveraged by Con
fl
uent's Schema Registry

Data Evolution - Avro
• FULL

• ability to roll-backup

• streams apps are producers and consumers (forward and backward are harder)

• V1 ⟷ V2 and V2 ⟷ V3

• FULL-TRANSITIVE

• Ability to handle aggregations of older versions inde
fi
nitely

• V1 ⟷ V2 and V2 ⟷ V3 and V1 ⟷ V3

Data Evolution - Protobuf
• Tags numbers are encoded,
fi
eld names are not

• optional ⟷ repeated

• no encoding di
ff
erences: writing a repeated value and reading it as an
optional value has "last one wins"

• Renaming
fi
elds ⟶ full evolution

• Renumber tags ⟶ no evolution

Avoid Schema Registry Serialization for Keys
• A simple addition of a default attribute — breaks partitioning

• Exceptions

• output topics for sink connectors (e.g. JDBC Sink)

Data Evolution

(takeaways)
• Full (Forward and Backwards) - easier to roll-back your applications

• Full Transitive - easier to handle old data in your aggregates

• JSON, Avro, and Protobuf all have their own nuances - understand them

Partitioning
• Plan for growth (but…)

• Strive for even work-loads

• Partition for storage is as important (if not more so) than throughput

• Selecting a Partitioning for your Streams Applications

• 12 partitions better than 10 partitions

• avoid primes, 5

• 24 (but at what cost?)
 
 
1,2,3,4,6,12 1,2,5,10
1,2,3,4,6,8,12,24
1,5

Partitioning
• If repartitioning is easy

• 4 partitions

• If repartitioning is hard

• 8 or 12 partitions

• 24 partitions (large state stores)

• consider separation into multiple micro services

Validate partitioning on ingestion
• Peek - Log and Exception
 
 
builder.createStream("input-topic")
 
.peek((key, value) -> {… key != value.getKey() …})

• Filter - Log and Ignore
 
 
builder.createStream("input-topic")
 
.
fi
lter((key, value) -> {… key != value.getKey() …})

Micro Services
products
users products
users

Micro Services
• easier to deploy

• more uniform allocation of work

• minimize downtime during restarts

• easier to understand

• threading

• storage

Backup and Restore
store
-changelog
S3
transformation

Backup and Restore
• Event Sourcing

• "In
fi
nite" retention of Input Events

• Replaying Events could take a lot of time

external process
Backup and Restore
source
source
restore
source
store
aggregate
transform
-values
toStream to sink
-changelog
on/off

fi
lter

Backup and Restore
source
source
restore
source
store
aggregate
transform
-values
toStream to sink
• source & restore in same sub-
topology, because they use the
same store.

• restore's transform-values will not
emit events
store
cache
store
restored version
could be emitted

Backup and Restore
source
source
restore
source
store
aggregate
transform
-values
toStream to sink
merge
• transformValues must be created after
aggregate to access aggregate's store.

• restore's transform-values will emit events
(not cached…)

• cached store — aggregate emits duplicate
This how I thought it had to be done.

Backup and Restore
source
source
restore
source
store
reduce toStream to sink
merge
• reduce is aggregate w/out mapping

• restore logic within reduce

• does not have access to headers

Backup and Restore
• transformValues cannot be created before aggregate/reduce since DSL
requires store to be materialized
fi
rst.

• aggregate and reduce do not have access to headers

• if DSL adopts PAPI updated refactoring, it would then be able to.

• understand how store caching and commit interval works

Backup and Restore
• a set of -changelog topics is not an Event Source based system.

co-partitioning
• partitioning of source and restore topics must match

• co-partitioning validation isn't catching this.

• behavior very confusing when they are not the same
 
(speaking from experience 🤦)

Materialized caching enabled/disabled
• I highly recommend enabling/disabling just to see behavior within your
applications.

Backup and Restore & Data Evolution
source
source
restore
source
store
aggregate
transform
-values
toStream to sink
-changelog
V1
V2
V2

Repartitioning
• Leverage Built-in Backup and Restore

• On/O
ff
fi
lters so you can discard while brining the application online

• Version your application

• "foo.v1" ➟ "foo.v2"

Repartitioning
source
source
restore.v2
source.v2
store
aggregate
transform
-values
toStream to sink
app.v1-changelog
fi
lter

on/o
f
fi
lter

on/off
app.v2-changelog

Repartitioning
• Considerations around making restore a separate application

• Downtime

• Cut-over

• Using `application.id` for backup

• Keeping the code up to date

Window Stores
Type Boundary Examples
# records for key
 
@ point in time
Fixed

Size
Tumbling Epoch
[8:00, 8:30)

[8:30, 9:00)
single Yes
Hopping Epoch
[8:00, 8:30)

[8:15, 8:45)

[8:30, 8:45)

[8:45, 9:00)
constant Yes
Sliding Record
[8:02, 8:32]

[8:20, 8:50]

[8:21, 8:51]
variable Yes
Session Record
[8:02, 8:02]

[8:02, 8:10]

[9:10, 12:56]
single
 
(by tombstoning)
No

Window Stores
• Fixed Windows do NOT store window size (or end timestamp) in the
message

• Release new version and co-exist with old version

• Wait to use new version until windows are "ready"

Window Stores
• Backups

• Event Sourcing

Window Stores
• New Version Challenges

• Very long windows make it harder to wait for cut-over

• epoch

• hydration

• replay incoming events

• How ("When") to have clients cut over to new version

• earliest, latest, or speci
fi
c timestamp

• circuit breaker — moves burden to streams development team.

application.id & versions
• Versions should be a su
ffi
x on application.id

• ".v1", ".v2"

• Leverage ACLs with pre
fi
x on application.id

Circuit Breakers
api.input stream.input
v2
v1

Circuit Breakers
• Starting and Stopping the Circuit Breaker application controls
fl
ow of
messages

• Unable to stop producers

• Complicated streams application

• in-
fl
ight data needs to be handled by same version

• no duplicate processing between version releases

Circuit Breakers
• Added Complexity

• Extra Application

• Extra Topic

• but can have smaller retention time (original is source-of-truth)

• Extra Deployments

Circuit Breaker handy for ksqlDB
• Placing a Ka
fk
a Streams circuit-breaker application gives control in front of
ksqlDB where consumer group selection is not possible

• KSQL query starts from latest

• KLIP-28 "create or replace" solves many issues (0.12.0)

• KLIP-22 "add consumer group id" (proposal - no traction)

Switches
stream.output output
v2
v1 version=v1
version=v2
v2
v1
v1
v2

Switches
• Burden on our deployment, not down-stream applications

• no o
ff
set management changes

Circuit Breakers & Switches
• Do not adopt these w/out need

• Add-in only if (and when) needed

Takeaways
• Do Right Away

• Name your State Stores

• Name your Processors

• Meaningful Partition Size

• Su
ffi
x based versioning
• Start Planning

• Backup/Restore & Repartitioning

• External Applications & Teams

• Release Scheduling

• Data Evolution Strategy

Thank you

Questions?
@nbuesing nbuesing
5 minutes remaining
Upgrading

Developing Kafka Streams Applications with Upgradability in Mind with Neil Buesing | Kafka Summit London 2022

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Developing Kafka Streams Applications with Upgradability in Mind with Neil Buesing | Kafka Summit London 2022

Similar to Developing Kafka Streams Applications with Upgradability in Mind with Neil Buesing | Kafka Summit London 2022 (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Developing Kafka Streams Applications with Upgradability in Mind with Neil Buesing | Kafka Summit London 2022