Interactive Query in Kafka Streams: The Next Generation with Vasiliki Papavasileiou and John Roesler | Kafka Summit London 2022

IQ in Kafka Streams:
The Next Generation
Vicky Papavasileiou Software Engineer @ Conﬂuent
John Roesler Software Engineer @ Conﬂuent, Apache Kafka PMC

Interactive Query in Kafka Streams:
The Next Generation
Vicky Papavasileiou Software Engineer @ Conﬂuent
John Roesler Software Engineer @ Conﬂuent, Apache Kafka PMC

What is Kafka Streams?
Stream processing framework
● Stateless stream processing: filters, transformations
● Stateful stream processing: aggregations, joins
● Big data processing: partitioning, grouping (aka shuffle)
● Event stream processing: windowing
● Relational stream processing: tables, joins, foreign-key joins
4

What else is Kafka Streams?
Streaming application framework
● Stateful applications
● Scalable cluster
● High availability
● Streaming environment
Application developers only need to handle:
● Metadata APIs
● Interactive Query APIs
5

7
What is Interactive Query?
transactions
currentBalance
1
Jay
1
Jay
$10
burger
2
Sue
$11
pizza
3
Jay
$5
coffee
3 Jay $15
customer, balance, last_purchase
Jay $15 coffee
Sue $11 pizza
currentBalance
2 Sue $11
get (Jay)
IQ
● Query the state of streaming applications from outside the application
1 Jay $10
Jay $10 burger

8
Instance A
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
Use IQ and Metadata API to build a Streaming App

9
Instance A
GET bal?id=Jay
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
1. Send the request to
an app instance

10
Instance A
GET bal?id=Jay
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
an app instance
2. Consult Streams
Metadata

11
Instance A
GET bal?id=Jay
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
an app instance
2. Consult Streams
Metadata 3. Forward to
correct instance

12
Instance A
GET bal?id=Jay
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
an app instance
2. Consult Streams
correct instance
4. Fetch from local
state with IQ

13
Instance A
GET bal?id=Jay
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
5. Return local state(s) to the query handler
an app instance
2. Consult Streams
correct instance
4. Fetch from local
state with IQ

14
Instance A
GET bal?id=Jay 200: {id:Jay, bal:$15, last_transaction:coffee}
6. Send the results back to the caller
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
an app instance
2. Consult Streams
correct instance
4. Fetch from local
state with IQ

15
Instance A
GET bal?id=Jay 200: {id:Jay, bal:$15, last_transaction:coffee}
6. Send the results back to the caller
currentBalance
part 0
REST API
Instance B
REST API
Kafka Streams
Meta currentBalance
part 1
Kafka Streams
Meta
an app instance
2. Consult Streams
correct instance
4. Fetch from local
state with IQ

Problems with IQ v1
● Customization is hard.
○ Both when plugging in custom storage engines and adding new query types
● Not enough control: encapsulation is too aggressive
○ Simple abstraction that gets in the way of creating performant, reliable applications
● The consistency model is too sparse
○ Have to choose between "strong" and "eventual"
16

Preview release in Apache Kafka 3.2
● Customization
○ Easy to add custom queries
○ Easy to support custom state stores
● Control
○ Request API : Attach specific requirements to the query (e.g. skip cache)
○ Response API: Get results of individual partitions and metadata (e.g. execution time)
● Consistency
○ Adds "Position" concept to implement various consistency guarantees
IQ v2 to the rescue
17

Simplicity of IQ v2
val currentBalanceStore =
kafkaStreams.store(
StoreQueryParameters.fromNameAndType(
"currentBalance",
QueryableStoreTypes.timestampedKeyValueStore()
)
);
val balance =
currentBalanceStore.get(“Jay”).value();
val request =
inStore(“currentBalance”)
.withQuery(KeyQuery.withKey(“Jay”));
val result = kafkaStreams
.query(request)
.getOnlyPartitionResult().getResult();
IQ v1 IQ v2
18
Easy to use and intuitive API

1. Customization
○ Add custom queries in user space
2. Control
○ Intuitive query interface
○ Flexible response handling
3. Consistency
○ Broad range of consistency levels
19
Roadmap

20
Customize IQ by implementing custom queries and custom stores
● IQ v1
○ KS allows custom stores but requires changes to Apache Kafka to
expose through IQ
○ Need to add new query to ReadOnlyKeyValueStore interface and all
classes that implement it
● IQ v2
○ Add custom queries to custom stores in user space
○ Easy to contribute new queries to Apache Kafka
1. Customization

21
MeteredKeyValueStore: serialize/deserialize keys and values
CachingKeyValueStore: buffers writes
ChangeLoggingKeyValueStore: make writes durable
RocksDB, InMemory, custom:
● implement KeyValueStore
● stores serialized data
● pluggable
Customization: Anatomy of a state store

22
get (Jay)
Metered: get(String)
Caching: get(Bytes)
Write buffer
Change logging
Changelog topic
RocksDB: get(Bytes)
1. Serialize
2. Check if it is in
the buffer, else get
from state store
Anatomy of IQ

23
Customization: Add new query using IQ v1
Add to Metered store
Add to Caching store
Add to Change logging
store
Add to RocksDB store
And every other store that
implements the
ReadOnlyKeyValueStore
interface
● pluggable

24
Customization: Add new query using IQ v2
Only add to store that
evaluates the query
(eg custom)
IF you want to
use cached data,
then integrate
with the cache
● pluggable

Customization: Add reverseRange and reverseAll queries in IQ v1
25
KIP-617 PR #9137: Changes to 44 files / 111 files in total

Customization: Add RangeQuery in IQ v2
26
KIP-805 PR #11598 : Changes to 5 files (the rest is internal refactoring)

1. Customization
2. Control
3. Consistency
27
Roadmap

28
Control which partitions and stores to query
● IQ v1
○ Either query one specific partition or all partitions
○ All queries compose the cache with the underlying bytes stores
● IQ v2:
○ Specify subset of partitions per query
○ Specify whether to use cache per query
○ Implement custom logic per query in user code
2. Control

29
val request =
.withQuery(KeyQuery.withKey(“Jay”))
.withPartitions(Sets.newHashSet(1));
1. Specify the store to query
2. Specify the query. Predefined
queries include key lookups, range
queries, windowed queries
3. Specify the partitions
Control: IQ v2 request

30
Control what to return and how to handle a response
● IQ v1
○ Results of all partitions are combined into one iterator, not possible to
distinguish which rows come from which partition
○ If a partition fails, the entire query fails. Impossible to know which
partition failed, need to repeat the entire query
● IQ v2:
○ Iterator per partition
○ Failures per partition. If a partition has failed, repeat the query only for
that partition
○ Return extra information such as execution time, tracing information, etc.
2. Control

31
.query(request)
.getOnlyPartitionResult()
.getResult();
Issue the query
public final class StateQueryResult {
Map<Integer, QueryResult<R>> getPartitionResults()
QueryResult<R> getOnlyPartitionResult()
public final class QueryResult {
R getResult();
List<String> getExecutionInfo();
FailureReason getFailureReason();
Position getPosition();
Control: IQ v2 response
Execution info: Execution time, store info, etc
Failure reason: Store was not the active, partition
does not exist, etc
Position of store at time of query evaluation
Map of results per partition
Get results of single partition
Get actual rows

1. Customization
2. Control
3. Consistency
32
Roadmap

Streaming applications require a broad range of consistency levels
● Eventual consistency doesn’t cut it:
○ Developers cannot validate correctness
○ Applications are not user-friendly
● Kafka streams offers:
○ Strong consistency by default
○ Eventual consistency with StoreQueryParameters#enableStaleStores
33
3. Consistency

34
Strong consistency through failure recovery
transactions
currentBalance
1
Jay
1
Jay
$10
burger
2
Sue
$11
pizza
3
Jay
$5
coffee
1 Jay $10
2 Sue $11
3 Jay $15
1. Active fails
changelog
2. New active gets elected
currentBalance
Active instance Active instance
3. Restore from changelog
get (Jay)
● During restoration, IQ will fail
● Only after new active has fully caught up, IQ will succeed
● Ensure strong consistency, IQ guaranteed to see the most recent write

35
Eventual consistency through standbys
transactions
currentBalance
1
Jay
1
Jay
$10
burger
2
Sue
$11
pizza
3
Jay
$5
coffee
1 Jay $10
2 Sue $11
3 Jay $15
1. Configure Streams with replicas
changelog
2. Query standby with
StoreQueryParameters#
enableStaleStores
currentBalance
Active instance Standby instance
3. Return stale results get (Jay)
● Query during replication
● Eventual consistency as there is no guarantee on staleness

3. Consistency
Strong consistency: See most recent write
● Only query the active (no load balancing)
36
Staleness
Availability
Latency
Low
Middle
High
High
Middle
Low
High
Middle
Low
Strong

3. Consistency
37
Staleness
Availability
Latency
Low
Middle
High
High
Middle
Low
High
Middle
Low
Eventual
Eventual consistency: No guarantee how
stale the results are
● Query any instance
Strong

Monotonic reads : No time travel in query
results
● Query any instance that is up to bound
of last read
3. Consistency
38
Staleness
Availability
Latency
Low
Middle
High
High
Middle
Low
High
Middle
Low
Monotonic
reads
Strong
Eventual

3. Consistency
39
Staleness
Availability
Latency
Low
Middle
High
High
Middle
Low
High
Middle
Low
Bounded consistency: Eventual
consistency but with a limit on staleness
● Query any instance that is up to bound
Eventual
Bounded
Monotonic
reads
Strong

3. Consistency
40
Staleness
Availability
Latency
Low
Middle
High
High
Middle
Low
High
Middle
Low
Eventual
Bounded
Monotonic
reads
Strong
IQ v1 offers
StoreQueryParameters#enableStaleStores
IQ v2 offers

41
Consistency: Position tracking in IQ v2
transactions
currentBalance
1
Jay
1
Jay
$10
burger
2
Sue
$11
pizza
3
Jay
$5
coffee
customer, total, last_reason
Jay $15 coffee
Sue $11 pizza
1 Jay $10
2 Sue $11
Position: {0: 1}
Position: {0: 2}
3 Jay $15
Position: {0: 3}
Jay $10 burger
Track the position of the state
store wrt to the input topic offset
changelog
currentBalance
Standby instance
customer, total, last_reason
Jay $10 burger
Sue $11 pizza
Position: {0: 2}
Position: {0: 1}

42
Consistency: Monotonic reads
Server A
Server B
Client
query(“Jay”)
{Jay $10 burger}
{position: 53}
Position: 53
Within a single client session, a query is guaranteed to see the
same or newer values than a previous query
query(“Jay”)
position ≧ 53
Position: 54
{Jay $15 coffee}
{position: 54}
ERR: Not up
to bound
query(“Jay”)
position ≧ 54
Position: 53

43
Consistency: Bounded by Offset Lag
Server A
Server B
Client
query(“Jay”)
{Jay $10 burger}
{position: 53}
Position: 53
query(“Jay”)
position ≧ 48
Position: 54
{Jay $15 coffee}
{position: 54}
Bound : 53-5=48 Bound : 54-5=49
query(“Jay”)
position ≧ 49
Position: 53
{Jay $10 burger}
{position: 53}
Bound is: (highest offset seen - acceptable lag)
Less strict than monotonic, but within reason

Generally, a more intuitive way to think about bounding eventual consistency.
1. Use Kafka AdminClient to translate time to offsets
a. OffsetSpec.forTimestamp(System.currentTimeMillis() - 1h)
b. Admin.listOffsets(partitions)
2. Use the offsets as a lower bound on query position
a. PositionBound.at(
Position.withComponent(topic, partition, offset)
);
44
Consistency: Bounded by Time

45
val request =
.withQuery(KeyQuery.withKey(“Jay”))
.withPartitions(Sets.newHashSet(1))
.withPositionBound(PositionBound.at(53));
Specify position bound
Consistency: IQ v2
.query(request)
.getOnlyPartitionResult()
.getPosition();
Get position of result

Take-aways
IQ v2 allows developers:
● Customization: Build custom queries for custom stores in user space
● Control: Handle partitions separately and get meaningful information in
response
● Consistency: Implement broad range of consistency levels such as monotonic
reads and bounded eventual consistency
46

Conclusion
● Released in preview mode in 3.2 AK
○ API not guaranteed to not change
● Future work:
○ Add more queries to Apache Kafka (an excellent first KIP/contribution)
○ Cleanup for GA
■ Add options to hit/skip the cache
■ More query coverage for things like prefix scan
○ Extend the consistency model to propagate through the topology
47

Wanna join our team?
Confluent is hiring for ksqlDB/Streams engineering positions in UK and Germany
https://careers.confluent.io/open-positions
48

Interactive Query in Kafka Streams: The Next Generation with Vasiliki Papavasileiou and John Roesler | Kafka Summit London 2022

Recommended

Recommended

More Related Content

Similar to Interactive Query in Kafka Streams: The Next Generation with Vasiliki Papavasileiou and John Roesler | Kafka Summit London 2022

Similar to Interactive Query in Kafka Streams: The Next Generation with Vasiliki Papavasileiou and John Roesler | Kafka Summit London 2022 (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Interactive Query in Kafka Streams: The Next Generation with Vasiliki Papavasileiou and John Roesler | Kafka Summit London 2022