Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

Elasticity vs. State?
A New Kafka Streams
State Store
Hartmut Armbruster, Software Engineer at Thriving.dev

Hartmut Armbruster
■ Software Architect & Developer
■ Spent the past years working with distributed systems
and real time data processing with clients such as
HSBC, NEX Group plc, Deutsche Bahn.
■ Striving to see the bigger picture, passionate about architecture,
combining, integrating, and bringing all things together.

■ Recap: Kafka Streams, State Stores, RocksDB
■ Managing State: Challenges & Opportunities
■ Kafka Streams Cassandra State Store
■ ‘Drop-in’ State Store Alternative - Quickstart
■ Data Models & Supported Store Types
■ Interactive Queries
■ Conclusions, Limitations, Next Steps
Presentation Agenda

Recap
Kafka Streams,
State Store,
RocksDB

Apache Kafka
■ Distributed Streaming Platform
■ Stores its data in topics, immutable
■ Each topic has partitions

■ Functional Java API
■ Stream Processing on data stored in Kafka
Kafka Streams
■ Processor Topology
■ Source(s)
■ Stateless Stream Processors
■ filter, map, flatMap
■ Stateful Stream Processors
■ join, group, window, aggregate
■ Sink(s)

Stateful Processors -> State Stores
■ State is local, distributed across replicas
■ State Stores Types
■ RocksDB
■ InMemory
■ Writes to local
+ changelog topics
■ ‘State Restore’ from changelog
if local state is lost

Managing State
Challenges & Opportunities

Challenges for Stateful Topologies (1)
Example Use Case
■ Real-time data processing
■ Considerably large state
⚡ App Upgrade
⚡ Security Patching
⚡ Infrastructure Failure
⚡ Scaling up/down
Mitigation
➔Use ‘persistent’ stores (RocksDB)
➔Run Stateful (keep disks -> local state)
➔Prevent/Minimise Rebalances
➔Static Group Membership ⚠️
➔Standby Replicas
➔Warmup Replicas
➔Moving Restoration to a Dedicated
Thread (KAFKA-10199)
➔Tuning Restore Consumer Config
➔State restore required 😱
…may take minutes/hours!!

Challenges for Stateful Topologies (2)
Idle Streams Group Members
■ Following rebalancing & task re-assignment
new members get warmup tasks assigned
■ Warmup Replicas catch up
■ Transition from ‘Warmup -> Active’ Never Happens
➔Unbalanced assignment
➔Idle Replicas
◆ Performance Degradation
◆ Unused Allocated Resources 💸

Kafka Streams
Cassandra State Store

Kafka Streams Cassandra State Store
thriving-dev/kafka-streams-cassandra-state-store

Why Cassandra / ScyllaDB?
■ Distributed Architecture
■ Scalability (r/w & data)
■ High Availability through Data Replication
■ High Fault Tolerance
■ High Performance
■ (Expiring Data with TTL)

Quickstart (1) - Get it!
<dependency>
<groupId>dev.thriving.oss</groupId>
<artifactId>kafka-streams-cassandra-state-store</artifactId>
<version>0.8.4</version>
</dependency>
Maven
implementation 'dev.thriving.oss:kafka-streams-cassandra-state-store:0.8.4'
Gradle
implementation 'com.scylladb:java-driver-core:4.17.0.0'
+

Quickstart (2) - High-level DSL
(RocksDB is used by
default)
KTable<Long,String> table = builder.table("topicName",
Materialized.<Long,String>as(
CassandraStores.builder(session, "store-name")
.partitionedKeyValueStore())
.withKeySerde(Serdes.Long())
.withValueSerde(Serdes.String())
.withLoggingDisabled()
);
KTable<Long,String> table = builder.table("topicName");
Cassandra ‘partitionedKeyValueStore’

Basic usage example:
Quickstart (3) - Builder
CassandraStores.builder(session, "orders")
.partitionedKeyValueStore();
Advanced usage example:
CassandraStores.builder(session, "orders")
.withKeyspace("order_processing")
.withDmlExecutionProfile("kstreams-dml")
.withTableOptions("""
compaction = { 'class' : 'LeveledCompactionStrategy' }
AND default_time_to_live = 86400
""")
.withCountAllEnabled()
.partitionedKeyValueStore();

Under the hood - Data Model
➔Key/Value are of type BLOB
◆ Any Payloads supported
◆ String, Number, Avro, Protobuf, …
◆ Serialized -> bytes

Under the hood - Primary Keys
➔Composite Primary Key
◆ Having `key` as Clustering Key allows
additional IQs such as `range` &
`prefixScan`
◆ No. of partitions defined by Streams tasks
(source topic partitions)
⚠️ Large Partitions
◆ BATCH inserts possible & efficient
partitionedKeyValueStore
globalKeyValueStore ➔Store Entry `key` as Primary Key
◆ High Cardinality,
works for any data volumes
◆ No Support for `range` & `prefixScan`
◆ Lookups possible from any instance,
knowledge/context of partition/task is not
required

Store Types
■ partitionedKeyValueStore
■ globalKeyValueStore
■ partitionedVersionedKeyValueStore
■ globalVersionedKeyValueStore

Regular Kafka Streams State Store
Interactive Queries, REST API

Cassandra Kafka Streams State Store
Interactive Queries, REST API

Conclusions
Drop-in replacement state store
Persistent, allows for very large state
No changelog topic,
no state restore required
-> run as stateless
-> instant rebalancing
-> reduce rebalance downtimes
& recovery time
-> improve elasticity + scalability
Exactly-Once Semantics (EOS)
not supported
No consistency guarantees for
non-idempotent data streams
on hard failures
No window & session store support
(yet)
Experimental
The bad and the ugly

Next Steps
https://github.com/thriving-dev/kafka-streams-cassandra-state-store
■ Buffered + batched writes at offset commit
■ Add Window & Session Store Support
■ Address shortcomings on reliability, consistency, EOS
■ Benchmark
■ Add in-memory read cache
Open Source. Apache-2.0 license.

Stay in Touch
Hartmut Armbruster
@hartmut_co_uk
@hartmut-co-uk
@hartmut--thriving-dev
thriving-dev/kafka-streams-cassandra-state-store
@thriving_dev
https://thriving.dev

Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

More Related Content

Similar to Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

More from ScyllaDB

Recently uploaded

Elasticity vs. State? Exploring Kafka Streams Cassandra State Store