1. 1
Kick Your Database to the Curb
Using Kafka Streams Interactive Queries to Enable
Powerful MicroServices
2. 2
Brief Introduction
• Worked at Confluent (Streams Team) 2 years
• Apache Kafka Committer
• Author Kafka Streams in Action
Special thanks to @gamussa!
3. 3
Agenda
• What is State
• Kafka Streams Overview
• Describe Interactive Queries
• Live Demo!
6. GroupBy Example
public static void main(String[] args) {
int counter = 0;
int sendInterval = 15;
Map<String, Integer> groupByCounts = new HashMap<>();
try(..consumer = new KafkaConsumer<>(consumerProperties());
..producer = new KafkaProducer<>(producerProperties())){
consumer.subscribe(Arrays.asList(”A”,”B”));
7. GroupBy Example
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
Integer count = groupByCounts.get(key);
if (count == null) {
count = 0;
}
count += 1;
groupByCounts.put(key, count);
}
8. GroupBy Example
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
Integer count = groupByCounts.get(key);
if (count == null) {
count = 0;
}
count += 1;
groupByCounts.put(key, count);
}
14. 14
Stateful Stream Processing
Streams Stateful Operations
• Joins
• Windowing operations
• Aggregation/Reduce
Using any of these operations, Streams creates a state
store
15. 15
Making Streams Results Queryable
Kafka Streams Application
KAFKA
External Application /
REST Service
16. 16
Making Streams Results Queryable
Kafka Streams Application
KAFKA
External Application /
REST Service
Database
17. 17
Making Streams Results Queryable
Kafka Streams Application
KAFKA
External Application /
REST Service
Database
22. 22
What’s with the APPLICATION_SERVER_ID
• A single Streams instance doesn’t contain all keys
• Streams will query other instances for store misses
• A single Streams instance can be proxy for all instances
25. 26
Topic Partitions and Streams Tasks
Streams app “A”
Host = hostA:4567
Streams app “B”
Host = hostB:4568
State Store State Store
Topic with four partitions
Four partitions are converted to 4 tasks so
each streams application is assigned 2
partitions/tasks
26. 27
Making Streams Results Queryable
Streams app “A”
Host = hostA:4567
Streams app “B”
Host = hostB:4568
State Store State Store
{“ENERGY”:”10000”} written to
partition 0 assigned to App A
{“FINANCE”:”11000”} written to
partition 1 assigned to App B
27. 28
Making Streams Results Queryable
Streams app “A”
Host = hostA:4567
Streams app “B”
Host = hostB:4568
State Store State Store
{“ENERGY”:”10000”} written to
partition 0 assigned to App A
{“FINANCE”:”11000”} written to
partition 1 assigned to App B
http://hostA:4567?key=FINANCE
28. 29
Example of a Streams RPC
Kafka Streams Application
KAFKA
{ JS }
Demo Time!
29. 30
Embedding the Web Server
KafkaStreams kafkaStreams =
new KafkaStreams(builder.build(), streamsConfig);
InteractiveQueryServer queryServer =
new InteractiveQueryServer(kafkaStreams, hostInfo);
30. 31
Embedding the Web Server
.
kafkaStreams.setStateListener(((newState, oldState) -> {
if (newState == KafkaStreams.State.RUNNING
&& oldState == KafkaStreams.State.REBALANCING) {
queryServer.setReady(true);
} else if (newState != KafkaStreams.State.RUNNING) {
queryServer.setReady(false);
}
}))
31. 32
Embedding the Web Server
.
kafkaStreams.setStateListener(((newState, oldState) -> {
if (newState == KafkaStreams.State.RUNNING
&& oldState == KafkaStreams.State.REBALANCING) {
queryServer.setReady(true);
} else if (newState != KafkaStreams.State.RUNNING) {
queryServer.setReady(false);
}
}))
33. 34
Embedding the Web Server
fetchFromSessionStore(Map<String, String> params) {
String store = params.get(STORE_PARAM);
String key = params.get(KEY_PARAM);
HostInfo storeHostInfo = getHostInfo(store, key);
if (storeHostInfo.host().equals("unknown")) {
return STORES_NOT_ACCESSIBLE;
}
if (dataNotLocal(storeHostInfo)) {
return fetchRemote(storeHostInfo, "session", params);
}
ReadOnlySessionStore<String, CustomerTransactions> readOnlySessionStore = kafkaStreams.store(store,
QueryableStoreTypes.sessionStore());
34. 35
Embedding the Web Server
getHostInfo(String storeName, String key) {
StreamsMetadata metadata =
kafkaStreams.metadataForKey(storeName, key, stringSerializer);
return metadata.hostInfo();
}
35. 36
Embedding the Web Server
fetchFromSessionStore(Map<String, String> params) {
String store = params.get(STORE_PARAM);
String key = params.get(KEY_PARAM);
HostInfo storeHostInfo = getHostInfo(store, key);
if (storeHostInfo.host().equals("unknown")) {
return STORES_NOT_ACCESSIBLE;
}
if (dataNotLocal(storeHostInfo)) {
return fetchRemote(storeHostInfo, "session", params);
}
ReadOnlySessionStore<String, CustomerTransactions> readOnlySessionStore = kafkaStreams.store(store,
QueryableStoreTypes.sessionStore());
36. 37
Embedding the Web Server
fetchFromSessionStore(Map<String, String> params) {
String store = params.get(STORE_PARAM);
String key = params.get(KEY_PARAM);
HostInfo storeHostInfo = getHostInfo(store, key);
if (storeHostInfo.host().equals("unknown")) {
return STORES_NOT_ACCESSIBLE;
}
if (dataNotLocal(storeHostInfo)) {
return fetchRemote(storeHostInfo, "session", params);
}
ReadOnlySessionStore<String, CustomerTransactions>
readOnlySessionStore = kafkaStreams.store(
store,
QueryableStoreTypes.sessionStore());
37. 38
Embedding the Web Server
fetchFromSessionStore(Map<String, String> params) {
String store = params.get(STORE_PARAM);
String key = params.get(KEY_PARAM);
HostInfo storeHostInfo = getHostInfo(store, key);
if (storeHostInfo.host().equals("unknown")) {
return STORES_NOT_ACCESSIBLE;
}
if (dataNotLocal(storeHostInfo)) {
return fetchRemote(storeHostInfo, "session", params);
}
// Iterate over readOnlySessionStore and
// store results in a list sessionResults
return gson.toJson(sessionResults);
38. 39
Client View Development
<body>
<h2>Kafka Streams Equities Dashboard Application</h2>
<!–- Other div elements left out for clarity -->
<div id="sessionDiv">
<h3 id="sessionHeader">Customer Session Equity Activity Table</h3>
<table id="sessionTable">
<tr>
<th>Customer Id</th>
<th>Average Equity Transaction Spent Per Session</th>
</tr>
</table>
</div>
</body>
39. 40
Client View Development
<script>
function loadIqTables() {
$.getJSON("/kv/TransactionsBySector", function (response) {
updateTable(response, $('#txnsTable'))
$('#txnsHeader').animate({color:'red'},500).animate({color:'#CCCCCC'}, 500)
})
updateTableWithList("/window/NumberSharesPerPeriod/", symbols,
$('#stockTable'), $('#stockHeader'));
updateTableWithList("/session/CustomerPurchaseSessions/", customers,
$('#sessionTable'), $('#sessionHeader'))
}
setInterval(loadIqTables, 7000);
</script>
40. 41
Client View Development
<script>
function loadIqTables() {
$.getJSON("/kv/TransactionsBySector", function (response) {
updateTable(response, $('#txnsTable'))
$('#txnsHeader').animate({color:'red'},500).animate({color:'#CCCCCC'}, 500)
})
updateTableWithList("/window/NumberSharesPerPeriod/", symbols,
$('#stockTable'), $('#stockHeader'));
updateTableWithList("/session/CustomerPurchaseSessions/", customers,
$('#sessionTable'), $('#sessionHeader'))
}
setInterval(loadIqTables, 7000);
</script>
42. 43
Security
ReadOnlySessionStore<String, CustomerTransactions>
readOnlySessionStore = kafkaStreams.store(
store,
QueryableStoreTypes.sessionStore());
try (KeyValueIterator<Windowed<String>, CustomerTransactions>
iterator = readOnlySessionStore.fetch(key)) {
while (iterator.hasNext()) {
//Transform or mask data here and return sanitized
//data
}
}
return gson.toJson(sanitizedRecords);
43. 44
Summary
Interactive Queries is a powerful abstraction that
simplifies stateful stream processing
There are still cases for which external database/storage
might be a better
44. 45
Summary
Kafka Streams in Action Examples: https://github.com/bbejeck/kafka-streams-in-
action/blob/master/src/main/java/bbejeck/webserver/InteractiveQueryServer.java
Music example: https://github.com/confluentinc/examples/blob/master/kafka-
streams/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/
KafkaMusicExample.java
Streaming Movie Ratings: https://github.com/confluentinc/demo-
scene/tree/master/streams-movie-demo
45. 46
Thanks!
Stay in Touch!
• https://slackpass.io/confluentcommunity
• https://www.confluent.io/blog/
• Twitter @bbejeck
• We are hiring! https://www.confluent.io/careers/
This is me
I’ve worked at Confluent for 1.5 years on Streams team
I authored the book Kafka Streams in Action
Now let’s get started!
First let’s go over what we are going to cover today
This is me
I’ve worked at Confluent for 1.5 years on Streams team
I authored the book Kafka Streams in Action
Now let’s get started!
First let’s go over what we are going to cover today
Toplogy is a collection of procesing nodes in a graph
A sub-topology is a collection of processing nodes connected by common input topic
Relationship between tasks threads and state stores
Next let's take a look a life before Kafka Streams so we can get a sense of what Kafka Streams is.
Toplogy is a collection of procesing nodes in a graph
A sub-topology is a collection of processing nodes connected by common input topic
Relationship between tasks threads and state stores
Next let's take a look a life before Kafka Streams so we can get a sense of what Kafka Streams is.
Image you have a Kafka topic and you need to do a group-by count on it, without Kafka Streams you'd need to do some manual processing to acheive this.
Here is the main method and setting up the consumer and producer and subscribing to two topics
This is more boiler plate work that needs to be done
Next you loop over the retreived records and to a count by key and store it in a hashmap.
And we can see this is an example of needing local state for stream processing
Again we can see boiler plate work here nothing to do your business logic
Next you loop over the retreived records and to a count by key and store it in a hashmap.
At this point we are doing the business logic this is part we care about.
Putting the record back in local state
Then after a given amount of retrievals you will iterate over the resutls and publish those grouped counts out to a topic for downstream users
Here we are manually keeping track of how often we want to process records downstream
Not a complex application, but there are a handful manual steps involved.
Manage producer, consumer when records are emitted handling commits etc.
Now lets take a look at how we'd solve the same issue in Kafka Streams
Then after a given amount of retrievals you will iterate over the resutls and publish those grouped counts out to a topic for downstream users
Not a complex application, but there are a handful manual steps involved.
Manage producer, consumer when records are emitted handling commits etc.
Here we also commit only after we sent records downstream, if a failure occurs beforehand we’ll re-process the data
Now lets take a look at how we'd solve the same issue in Kafka Streams
This is a basic streams application we’ll use as the base of our examples
First we create the stream from two topics “A” and “B”.
We’re going to assue the data is coming in with keys
Then we groupByKey, so we can count
Notice the Materialized.as which allows us to name the state store
We then convert our update stream to a record stream to allow us to write the result stream ot the output-topic
The streams DSL gives you a lot power and flexibilty and is concise
This generates a topology - connected processing nodes
NEXT: how can we view the topology generated from this DSL code?
This is a basic streams application we’ll use as the base of our examples
First we create the stream from two topics “A” and “B”.
We’re going to assue the data is coming in with keys
Then we groupByKey, so we can count
Notice the Materialized.as which allows us to name the state store
We then convert our update stream to a record stream to allow us to write the result stream ot the output-topic
The streams DSL gives you a lot power and flexibilty and is concise
This generates a topology - connected processing nodes
NEXT: how can we view the topology generated from this DSL code?
This is a basic streams application we’ll use as the base of our examples
First we create the stream from two topics “A” and “B”.
We’re going to assue the data is coming in with keys
Then we groupByKey, so we can count
Notice the Materialized.as which allows us to name the state store
We then convert our update stream to a record stream to allow us to write the result stream ot the output-topic
The streams DSL gives you a lot power and flexibilty and is concise
This generates a topology - connected processing nodes
NEXT: how can we view the topology generated from this DSL code?
Toplogy is a collection of procesing nodes in a graph
A sub-topology is a collection of processing nodes connected by common input topic
Relationship between tasks threads and state stores
Next let's take a look a life before Kafka Streams so we can get a sense of what Kafka Streams is.
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
This is a basic streams application we’ll use as the base of our examples
First we create the stream from two topics “A” and “B”.
We’re going to assue the data is coming in with keys
Then we groupByKey, so we can count
Notice the Materialized.as which allows us to name the state store
We then convert our update stream to a record stream to allow us to write the result stream ot the output-topic
The streams DSL gives you a lot power and flexibilty and is concise
This generates a topology - connected processing nodes
NEXT: how can we view the topology generated from this DSL code?
This is a basic streams application we’ll use as the base of our examples
First we create the stream from two topics “A” and “B”.
We’re going to assue the data is coming in with keys
Then we groupByKey, so we can count
Notice the Materialized.as which allows us to name the state store
We then convert our update stream to a record stream to allow us to write the result stream ot the output-topic
The streams DSL gives you a lot power and flexibilty and is concise
This generates a topology - connected processing nodes
NEXT: how can we view the topology generated from this DSL code?
Now if we want to view the topology created from our DSL code
Build the Topology then call describe and render as a strring
NEXT: now let’s look at the string
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
This is how the string topology looks
Notice Toplogies at the top implying multiple sub-topologies
Here one sub-topology, A subtology is created by a source node we’ll see more in a minute
Arrows pointing to the right show the next processor, arrows pointing left show where the processor received its records from
Notice our named store, input topic and output topics
This is good information but not very intutive, we have options for renderting a graph
NEXT: let’s take a look at how we can create a graph out of this string
Here the red code corresponds to the top node KStreamSource-00000000000
The groupByKey call does not create a processing node
It creates an intermediate object you can use to create aggregation operations
So groupByKey is actually represented by the arrow with sending records from the source to the KSTREAM-AGGREGATE
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Here stream the KGroupedStream returned by groupByKey are represented by the arrow going from the KSteamSource to the KStreamAggregate node
So count creates a KStreamAggregate node and we can see the associated store over to the right created from Materialized.as call
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent
Thanks for your time
Stay in touch and use these resources to participate in the community
We have a book signing for Kafka Streams in Action at the Confluent Booth at 4:45 PM today stop
By and pick up a signed copy and check out what’s going on at Confluent