Elasticity vs. State?
A New Kafka Streams
State Store
Hartmut Armbruster, Software Engineer at Thriving.dev
Hartmut Armbruster
■ Software Architect & Developer
■ Spent the past years working with distributed systems
and real time data processing with clients such as
HSBC, NEX Group plc, Deutsche Bahn.
■ Striving to see the bigger picture, passionate about architecture,
combining, integrating, and bringing all things together.
■ Recap: Kafka Streams, State Stores, RocksDB
■ Managing State: Challenges & Opportunities
■ Kafka Streams Cassandra State Store
■ ‘Drop-in’ State Store Alternative - Quickstart
■ Data Models & Supported Store Types
■ Interactive Queries
■ Conclusions, Limitations, Next Steps
Presentation Agenda
Recap
Kafka Streams,
State Store,
RocksDB
Apache Kafka
■ Distributed Streaming Platform
■ Stores its data in topics, immutable
■ Each topic has partitions
■ Functional Java API
■ Stream Processing on data stored in Kafka
Kafka Streams
■ Processor Topology
■ Source(s)
■ Stateless Stream Processors
■ filter, map, flatMap
■ Stateful Stream Processors
■ join, group, window, aggregate
■ Sink(s)
Stateful Processors -> State Stores
■ State is local, distributed across replicas
■ State Stores Types
■ RocksDB
■ InMemory
■ Writes to local
+ changelog topics
■ ‘State Restore’ from changelog
if local state is lost
Managing State
Challenges & Opportunities
Challenges for Stateful Topologies (1)
Example Use Case
■ Real-time data processing
■ Considerably large state
⚡ App Upgrade
⚡ Security Patching
⚡ Infrastructure Failure
⚡ Scaling up/down
Mitigation
➔Use ‘persistent’ stores (RocksDB)
➔Run Stateful (keep disks -> local state)
➔Prevent/Minimise Rebalances
➔Static Group Membership ⚠️
➔Standby Replicas
➔Warmup Replicas
➔Moving Restoration to a Dedicated
Thread (KAFKA-10199)
➔Tuning Restore Consumer Config
➔State restore required 😱
…may take minutes/hours!!
Challenges for Stateful Topologies (2)
Idle Streams Group Members
■ Following rebalancing & task re-assignment
new members get warmup tasks assigned
■ Warmup Replicas catch up
■ Transition from ‘Warmup -> Active’ Never Happens
➔Unbalanced assignment
➔Idle Replicas
◆ Performance Degradation
◆ Unused Allocated Resources 💸
Kafka Streams
Cassandra State Store
Kafka Streams Cassandra State Store
thriving-dev/kafka-streams-cassandra-state-store
Why Cassandra / ScyllaDB?
■ Distributed Architecture
■ Scalability (r/w & data)
■ High Availability through Data Replication
■ High Fault Tolerance
■ High Performance
■ (Expiring Data with TTL)
Quickstart (1) - Get it!
<dependency>
<groupId>dev.thriving.oss</groupId>
<artifactId>kafka-streams-cassandra-state-store</artifactId>
<version>0.8.4</version>
</dependency>
Maven
implementation 'dev.thriving.oss:kafka-streams-cassandra-state-store:0.8.4'
Gradle
implementation 'com.scylladb:java-driver-core:4.17.0.0'
+
Quickstart (2) - High-level DSL
(RocksDB is used by
default)
KTable<Long,String> table = builder.table("topicName",
Materialized.<Long,String>as(
CassandraStores.builder(session, "store-name")
.partitionedKeyValueStore())
.withKeySerde(Serdes.Long())
.withValueSerde(Serdes.String())
.withLoggingDisabled()
);
KTable<Long,String> table = builder.table("topicName");
Cassandra ‘partitionedKeyValueStore’
Basic usage example:
Quickstart (3) - Builder
CassandraStores.builder(session, "orders")
.partitionedKeyValueStore();
Advanced usage example:
CassandraStores.builder(session, "orders")
.withKeyspace("order_processing")
.withDmlExecutionProfile("kstreams-dml")
.withTableOptions("""
compaction = { 'class' : 'LeveledCompactionStrategy' }
AND default_time_to_live = 86400
""")
.withCountAllEnabled()
.partitionedKeyValueStore();
Under the hood - Data Model
➔Key/Value are of type BLOB
◆ Any Payloads supported
◆ String, Number, Avro, Protobuf, …
◆ Serialized -> bytes
Under the hood - Primary Keys
➔Composite Primary Key
◆ Having `key` as Clustering Key allows
additional IQs such as `range` &
`prefixScan`
◆ No. of partitions defined by Streams tasks
(source topic partitions)
⚠️ Large Partitions
◆ BATCH inserts possible & efficient
partitionedKeyValueStore
globalKeyValueStore ➔Store Entry `key` as Primary Key
◆ High Cardinality,
works for any data volumes
◆ No Support for `range` & `prefixScan`
◆ Lookups possible from any instance,
knowledge/context of partition/task is not
required
Store Types
■ partitionedKeyValueStore
■ globalKeyValueStore
■ partitionedVersionedKeyValueStore
■ globalVersionedKeyValueStore
Interactive Queries
Interactive Queries, REST API
Regular Kafka Streams State Store
Interactive Queries, REST API
Cassandra Kafka Streams State Store
Interactive Queries, REST API
Conclusions
Conclusions
Drop-in replacement state store
Persistent, allows for very large state
No changelog topic,
no state restore required
-> run as stateless
-> instant rebalancing
-> reduce rebalance downtimes
& recovery time
-> improve elasticity + scalability
Exactly-Once Semantics (EOS)
not supported
No consistency guarantees for
non-idempotent data streams
on hard failures
No window & session store support
(yet)
Experimental
The bad and the ugly
Next Steps
https://github.com/thriving-dev/kafka-streams-cassandra-state-store
■ Buffered + batched writes at offset commit
■ Add Window & Session Store Support
■ Address shortcomings on reliability, consistency, EOS
■ Benchmark
■ Add in-memory read cache
Open Source. Apache-2.0 license.
Stay in Touch
Hartmut Armbruster
@hartmut_co_uk
@hartmut-co-uk
@hartmut--thriving-dev
thriving-dev/kafka-streams-cassandra-state-store
@thriving_dev
https://thriving.dev

Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

  • 1.
    Elasticity vs. State? ANew Kafka Streams State Store Hartmut Armbruster, Software Engineer at Thriving.dev
  • 2.
    Hartmut Armbruster ■ SoftwareArchitect & Developer ■ Spent the past years working with distributed systems and real time data processing with clients such as HSBC, NEX Group plc, Deutsche Bahn. ■ Striving to see the bigger picture, passionate about architecture, combining, integrating, and bringing all things together.
  • 3.
    ■ Recap: KafkaStreams, State Stores, RocksDB ■ Managing State: Challenges & Opportunities ■ Kafka Streams Cassandra State Store ■ ‘Drop-in’ State Store Alternative - Quickstart ■ Data Models & Supported Store Types ■ Interactive Queries ■ Conclusions, Limitations, Next Steps Presentation Agenda
  • 4.
  • 5.
    Apache Kafka ■ DistributedStreaming Platform ■ Stores its data in topics, immutable ■ Each topic has partitions
  • 6.
    ■ Functional JavaAPI ■ Stream Processing on data stored in Kafka Kafka Streams ■ Processor Topology ■ Source(s) ■ Stateless Stream Processors ■ filter, map, flatMap ■ Stateful Stream Processors ■ join, group, window, aggregate ■ Sink(s)
  • 7.
    Stateful Processors ->State Stores ■ State is local, distributed across replicas ■ State Stores Types ■ RocksDB ■ InMemory ■ Writes to local + changelog topics ■ ‘State Restore’ from changelog if local state is lost
  • 8.
  • 9.
    Challenges for StatefulTopologies (1) Example Use Case ■ Real-time data processing ■ Considerably large state ⚡ App Upgrade ⚡ Security Patching ⚡ Infrastructure Failure ⚡ Scaling up/down Mitigation ➔Use ‘persistent’ stores (RocksDB) ➔Run Stateful (keep disks -> local state) ➔Prevent/Minimise Rebalances ➔Static Group Membership ⚠️ ➔Standby Replicas ➔Warmup Replicas ➔Moving Restoration to a Dedicated Thread (KAFKA-10199) ➔Tuning Restore Consumer Config ➔State restore required 😱 …may take minutes/hours!!
  • 10.
    Challenges for StatefulTopologies (2) Idle Streams Group Members ■ Following rebalancing & task re-assignment new members get warmup tasks assigned ■ Warmup Replicas catch up ■ Transition from ‘Warmup -> Active’ Never Happens ➔Unbalanced assignment ➔Idle Replicas ◆ Performance Degradation ◆ Unused Allocated Resources 💸
  • 11.
  • 12.
    Kafka Streams CassandraState Store thriving-dev/kafka-streams-cassandra-state-store
  • 13.
    Why Cassandra /ScyllaDB? ■ Distributed Architecture ■ Scalability (r/w & data) ■ High Availability through Data Replication ■ High Fault Tolerance ■ High Performance ■ (Expiring Data with TTL)
  • 14.
    Quickstart (1) -Get it! <dependency> <groupId>dev.thriving.oss</groupId> <artifactId>kafka-streams-cassandra-state-store</artifactId> <version>0.8.4</version> </dependency> Maven implementation 'dev.thriving.oss:kafka-streams-cassandra-state-store:0.8.4' Gradle implementation 'com.scylladb:java-driver-core:4.17.0.0' +
  • 15.
    Quickstart (2) -High-level DSL (RocksDB is used by default) KTable<Long,String> table = builder.table("topicName", Materialized.<Long,String>as( CassandraStores.builder(session, "store-name") .partitionedKeyValueStore()) .withKeySerde(Serdes.Long()) .withValueSerde(Serdes.String()) .withLoggingDisabled() ); KTable<Long,String> table = builder.table("topicName"); Cassandra ‘partitionedKeyValueStore’
  • 16.
    Basic usage example: Quickstart(3) - Builder CassandraStores.builder(session, "orders") .partitionedKeyValueStore(); Advanced usage example: CassandraStores.builder(session, "orders") .withKeyspace("order_processing") .withDmlExecutionProfile("kstreams-dml") .withTableOptions(""" compaction = { 'class' : 'LeveledCompactionStrategy' } AND default_time_to_live = 86400 """) .withCountAllEnabled() .partitionedKeyValueStore();
  • 17.
    Under the hood- Data Model ➔Key/Value are of type BLOB ◆ Any Payloads supported ◆ String, Number, Avro, Protobuf, … ◆ Serialized -> bytes
  • 18.
    Under the hood- Primary Keys ➔Composite Primary Key ◆ Having `key` as Clustering Key allows additional IQs such as `range` & `prefixScan` ◆ No. of partitions defined by Streams tasks (source topic partitions) ⚠️ Large Partitions ◆ BATCH inserts possible & efficient partitionedKeyValueStore globalKeyValueStore ➔Store Entry `key` as Primary Key ◆ High Cardinality, works for any data volumes ◆ No Support for `range` & `prefixScan` ◆ Lookups possible from any instance, knowledge/context of partition/task is not required
  • 19.
    Store Types ■ partitionedKeyValueStore ■globalKeyValueStore ■ partitionedVersionedKeyValueStore ■ globalVersionedKeyValueStore
  • 20.
  • 21.
  • 22.
    Regular Kafka StreamsState Store Interactive Queries, REST API
  • 23.
    Cassandra Kafka StreamsState Store Interactive Queries, REST API
  • 24.
  • 25.
    Conclusions Drop-in replacement statestore Persistent, allows for very large state No changelog topic, no state restore required -> run as stateless -> instant rebalancing -> reduce rebalance downtimes & recovery time -> improve elasticity + scalability Exactly-Once Semantics (EOS) not supported No consistency guarantees for non-idempotent data streams on hard failures No window & session store support (yet) Experimental The bad and the ugly
  • 26.
    Next Steps https://github.com/thriving-dev/kafka-streams-cassandra-state-store ■ Buffered+ batched writes at offset commit ■ Add Window & Session Store Support ■ Address shortcomings on reliability, consistency, EOS ■ Benchmark ■ Add in-memory read cache Open Source. Apache-2.0 license.
  • 27.
    Stay in Touch HartmutArmbruster @hartmut_co_uk @hartmut-co-uk @hartmut--thriving-dev thriving-dev/kafka-streams-cassandra-state-store @thriving_dev https://thriving.dev