Writing Blazing Fast, and
Production Ready
Kafka Streams apps (in less than 30 min)
using Azkarra
Kafka Summit Europe 2021
Florian HUSSONNOIS
.
@fhussonnois
Consultant, Trainer Software Engineer
Co-founder @StreamThoughts
Confluent Community Catalyst (2019/2021)
Apache Kafka Streams contributor
Open Source Technology Enthusiastic
- Azkarra Streams
- Kafka Connect File Pulse
- Kafka Streams CEP
- Kafka Client for Kotlin
Hi, Im
Florian Hussonnois
2
3
Like me, you probably started
with the famous Word Count !
KStream<String, String> source = builder.stream("streams-plaintext-input");
source.flatMapValues(splitAndToLowercase())
.groupBy((key, value) -> value)
.count(Materialized.as("counts-store"))
.toStream()
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
Topology topology = builder.build();
4
KStream<String, String> source = builder.stream("streams-plaintext-input");
source.flatMapValues(splitAndToLowercase())
.groupBy((key, value) -> value)
.count(Materialized.as("counts-store"))
.toStream()
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
Topology topology = builder.build();
GroupBy(Key)
Repartition
Stateful Stream Processing
Consume
Transform
Aggregate / Join
Produce
1 2 3
public class WordCount {
public static void main(String[] args) {
var builder = new StreamsBuilder
();
KStream<String, String> source = builder.stream("streams-plaintext-input"
);
source.flatMapValues(splitAndToLowercase
())
.groupBy((key, value) -> value)
.count(Materialized.as("counts-store"
))
.toStream()
.to("streams-wordcount-output"
, Produced.with(Serdes.String(), Serdes.Long()));
var topology = builder.build();
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG
, "streams-wordcount"
);
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG
, "localhost:9092"
);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG
, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG
, Serdes.String().getClass());
var streams = new KafkaStreams(topology, props);
Runtime.getRuntime().addShutdownHook
(new Thread(streams::close
));
}
}
Core Logic
Execution
5
Configuration
6
Can we deploy a Kafka Streams
application like this one in
production, without any changes?
7
The Answer is No!
8
(Well, unless you are testing your app
in production…cough, cough...)
9
(Well, unless you are testing your app
in production…cough, cough...)
OK, Nobody does that!
▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Some requirements before
moving into production
Our TODO list
10
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
. Business Value vs Effort
Topology
(Business Logic)
Business Value
High
Kafka Streams
Management
IQ
Error Handling
logic
Monitoring /
Health-check
Security
Config
Externalization
Low
Effort
Low/Medium
High
Streams
Lifecycle
Kafka Streams Application
11
RocksDB Offsets and Lags Packaging
.
A lightweight Java framework to make a Kafka Streams application
production-ready in just a few lines of code.
■ Distributed under the Apache License 2.0.
■ Was developed based on experience on a wide range of projects
■ Uses best-practices developed by Kafka users and the open-source community.
Overview:
■ REST API: Health Check, Monitoring, Interactive Queries, etc
■ Embedded WebUI: Topology DAG Visualization
■ Built-in features for handling exceptions and tuning RocksDB
■ Support for Server-Sent Events
Azkarra Framework
in a nutshell
12
#azkarrastreams
.
Available on Maven Central
Azkarra Stream
How to use It ?
13
<dependency>
<groupId>io.streamthoughts
</groupId>
<artifactId>azkarra-streams
</artifactId>
<version>0.9.2</version>
</dependency>
Azkarra Framework:
<dependency>
<groupId>io.streamthoughts
</groupId>
<artifactId>azkarra-commons
</artifactId>
<version>0.9.2</version>
</dependency>
Provides reusable classes for Kafka Streams :
mvn archetype:generate
-DarchetypeGroupId
=io.streamthoughts 
-DarchetypeArtifactId
=azkarra-quickstart-java 
-DarchetypeVersion
=0.9.2 
-DgroupId=azkarra.streams 
-DartifactId=azkarra-getting-started 
-Dversion=1.0 
-Dpackage=azkarra 
-DinteractiveMode
=false
Quick start:
14
Let’s re-write the “Word Count”
using with Azkarra
(we have still 25’ left) 👾
.
.
. Concepts
TopologyProvider
Topology
Provider
Topology
Container for building
and configuring a
Topology
15
class WordCountTopology
implements TopologyProvider, Configurable {
private Conf conf;
@Override
public Topology topology() {
var source = conf.getString("topic.source.name");
var sink = conf.getString("topic.sink.name");
var store = conf.getString("store.name");
var builder = new StreamsBuilder();
builder
.<String, String>stream(source)
.flatMapValues(splitAndToLowercase())
.groupBy((key, value) -> value)
.count(Materialized.as(store))
.toStream()
.to(sink, Produced.with(Serdes.String(), Serdes.Long()));
return builder.build();
}
@Override
public void configure(final Conf conf) { this.conf = conf; }
@Override
public String version() { return "1.0"; }
}
.
.
. Concepts
Execution Environment
StreamsExecution
Environment
Manages the life
cycle of
KafkaStreams
instances. Topology
Provider
Topology
16
// (1) Define the KafkaStreams configuration
var streamsConfig = Conf.of(
BOOTSTRAP_SERVERS_CONFIG, "localhost:9092",
DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass(),
DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass()
);
// (2) Define the Topology configuration
var topologyConfig = Conf.of(
"topic.source.name", "topic-text-lines",
"topic.sink.name", "topic-text-word-count",
"store.name", "Count"
);
// (3) Create and configure a local execution environment
var env = LocalStreamsExecutionEnvironment
.create(Conf.of("streams", streamsConfig))
// (4) Register our topology to run
.registerTopology(
WordCountTopology::new,
Executed.as("WordCount").withConfig(topologyConfig)
);
// (5) Start the environment
env.start();
// (6) Add Shutdown Hook
Runtime.getRuntime()
.addShutdownHook(new Thread(env::stop));
.
17
Let’s start KafkaStreams
Boom! Transient Errors
word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] Received error code INCOMPLETE_SOURCE_TOPIC_METADATA
16:05:12.585 [word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] ERROR
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer
clientId=word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1-consumer, groupId=word-count-1-0] User provided listener
org.apache.kafka.streams.processor.internals.StreamsRebalanceListener failed on invocation of onPartitionsAssigned for partitions []
org.apache.kafka.streams.errors.MissingSourceTopicException: One or more source topics were missing during rebalance
at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:57)
~[kafka-streams-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:451) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:367) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508) [kafka-clients-2.7.0.jar:?]
.
.
.
18
StreamLifecycleInterceptor
Concepts
Interface StreamsLifecycleInterceptor {
/**
* Intercepts the streams instance before being started.
*/
default void onStart(StreamsLifecycleContext context,
StreamsLifecycleChain chain) {
chain.execute();
}
/**
* Intercepts the streams instance before being stopped.
*/
default void onStop(StreamsLifecycleContext context,
StreamsLifecycleChain chain) {
chain.execute();
}
/**
* Used for logging information.
*/
default String name() {
return getClass().getSimpleName();
}
}
A pluggable interface that allows intercepting a
KafkaStreams instance before being started or
stopped.
Built-in Implementations:
■ AutoCreateTopicsInterceptor
■ WaitForSourceTopicsInterceptor
■ KafkaBrokerReadyInterceptor
...and a few more (discussed later) 😉
Most Interceptors are configurable.
.
.
.
19
AutoCreateTopicsInterceptor
Concepts import static io.s.a.r.i.AutoCreateTopicsInterceptorConfig.*;
// (1) Define the KafkaStreams configuration
var streamsConfig = ...
// (2) Define the Topology configuration
var topologyConfig = ...
// (3) Define the Environment configuration
var envConfig = Conf.of(
"streams", streamsConfig,
AUTO_CREATE_TOPICS_NUM_PARTITIONS_CONFIG, 2,
AUTO_CREATE_TOPICS_REPLICATION_FACTOR_CONFIG, 1,
// WARN - ONLY DURING DEVELOPMENT
AUTO_DELETE_TOPICS_ENABLE_CONFIG, true
);
// (4) Create and configure the local execution environment
LocalStreamsExecutionEnvironment
.create(envConfig)
// (5) Add the StreamLifecycleInterceptor
.addStreamsLifecycleInterceptor(
AutoCreateTopicsInterceptor::new
)
// ...code omitted for clarity
Automatically infers the source and sink topics to
be created from the Topology.describe().
■ Internally, uses the AdminClient API.
■ Can be used during development for deleting
all topics when the instance is stopped.
for
▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Externalizing configuration
(we have 20’ left)😀
What's left to do ?
20
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
.
.
.
21
Conf & AzkarraConf
External Configuration
// file:application.conf
azkarra {
// The configuration settings passed to the Kafka Streams
// instance should be prefixed with `.streams`
streams {
bootstrap.servers = "localhost:9092"
default.key.serde = "org.apache.kafka..Serdes$StringSerde"
default.value.serde = "org.apache.kafka..Serdes$StringSerde"
}
topic.source.name = "topic-text-lines"
topic.sink.name = "topic-text-word-count"
store.name = "Count"
auto.create.topics.num.partitions = 2
auto.create.topics.replication.factor = 1
auto.delete.topics.enable = true
}
// file:Main.class
var config = AzkarraConf.create().getSubConf("azkarra");
Azkarra provides the Configurable interface which
can be implemented by most of the Azkarra
components.
■ AzkarraConf: Uses the Lightbend Config library.
○ Allows loading configuration settings from
HOCON files.
void configure(final Conf configuration);
.
.
. Concepts
AzkarraContext
AzkarraContext
StreamsExecution
Environment
Container for
Dependency Injection.
Used to automatically
configures
streams execution
environments.
Topology
Provider
Topology
22
public static void main(final String[] args) {
// (1) Load the configuration (application.conf)
var config = AzkarraConf.create().getSubConf("azkarra");
// (2) Create the Azkarra Context
var context = DefaultAzkarraContext.create(config);
// (3) Register StreamLifecycleInterceptor as component
context.registerComponent(
ConsoleStreamsLifecycleInterceptor.class
);
// (4) Register the Topology to the default environment
context.addTopology(
WordCountTopology.class,
Executed.as("word-count")
);
// (5) Start the context
context
.setRegisterShutdownHook(true)
.start();
}
.
.
. Concepts
AzkarraApplication
AzkarraContext
AzkarraApplication
StreamsExecution
Environment
Used to bootstrap and
configure an Azkarra
application.
Provides Embedded
HTTP-Server
Provides
Component
Scanning
Topology
Provider
Topology
23
public class WordCount {
public static void main(final String[] args) {
// (1) Load the configuration (application.conf)
var config = AzkarraConf.create();
// (2) Create the Azkarra Context
var context = DefaultAzkarraContext.create();
// (3) Register the Topology to the default environment
context.addTopology(
WordCountTopology.class,
Executed.as("word-count")
);
// (4) Create Azkarra application
new AzkarraApplication()
.setContext(context)
.setConfiguration(config)
// (5) Enable and configure embedded HTTP server
.setHttpServerEnable(true)
.setHttpServerConf(ServerConfig.newBuilder()
.setListener("localhost")
.setPort(8080)
.build()
)
// (6) Start Azkarra
.run(args);
}
}
.
.
. Concepts
AzkarraApplication
AzkarraContext
AzkarraApplication
StreamsExecution
Environment
Topology
Provider
Topology
24
@AzkarraStreamsApplication
public class WordCount {
public static void main(String[] args) {
AzkarraApplication.run(WordCount.class, args);
}
@Component
public static class WordCountTopology implements
TopologyProvider, Configurable {
private Conf conf;
@Override
public Topology topology() {
var builder = new StreamsBuilder();
// ...code omitted for clarity
return builder.build();
}
@Override
public void configure(Conf conf) {
this.conf = conf;
}
@Override
public String version() { return "1.0"; }
}
}
Used to bootstrap and
configure an Azkarra
application.
Provides Embedded
HTTP-Server
Provides
Component
Scanning
▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Handling Deserialization Exceptions
(we have 15’ left)🤔
What's left to do ?
25
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
.
default.deserialization.exception.handler
■ CONTINUE: continue with processing
■ FAIL: fail the processing and stop
Two available implementations :
■ LogAndContinueExceptionHandler
■ LogAndFailExceptionHandler
26
Solution #1
Built-in mechanisms
Not really suitable for production.
Cannot monitor efficiently
corrupted messages
.
.
.
27
Solution #2
Dead Letter Queue Topic
Solution #3
Sentinel Value
DeserializationExceptionHandler
Send corrupted messages to a
special topic.
Deserializer<T>
Catch any exception thrown during deserialization
and return a default value (e.g: null, “N/A”, etc).
Handler
?
Source Topic
Topology
(skip)
Dead Letter Topic
! !
! !
Source Topic SafeDeserializer
Delegate
Deserializer
(null)(null)
! !
.
.
.
28
Solution #2
Using Azkarra
Solution #3
DeadLetterTopicExceptionHandler
■ By default, sends corrupted records to
<Topic>-rejected
■ Doesn’t change the schema/format of the
corrupted message.
■ Use Kafka Headers to trace exception cause and
origin, e.g. :
○ __errors.exception.stacktrace
__errors.exception.message
○ __errors.exception.class.name
○ __errors.timestamp
○ __errors.application.id
○ __errors.record.[topic|partition|offset]
■ Can be configured to send records to a distinct
Kafka Cluster than the one used for KafkaStreams.
SafeSerdes
SafeSerdes.Long(-1L);
SafeSerdes.UUID(null);
SafeSerdes.serdeFrom(
new JsonSerializer (),
new JsonDeserializer (),
NullNode.getInstance ()
);
▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Monitoring
(we have 10’ left)🙃
Our TODO list
29
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
.
The Kafka Streams API provides few methods for monitoring the state of the running instance.
■ KafkaStreams#state(), KafkaStreams#setStateListener()
⎼ CREATED, REBALANCING, RUNNING, PENDING_SHUTDOWN, NOT_RUNNING, ERROR
⎼ can be used for checking the Liveness and Readiness for the instance.
■ KafkaStreams#localThreadsMetadata
⎼ returns information about local Threads/Tasks and partition assignments.
■ KafkaStreams#metrics()
Best Practices:
■ Build some REST APIs to expose the states of Kafka Streams
■ Export Metrics using JMX, Prometheus, etc
30
How to monitor
Kafka Streams ?
.
31
Kafka Consumer Lag and Offsets
Maybe the most fundamental indicator to monitor
Consumer
KafkaStreams#allLocalStorePartitionLags()
KafkaStreams#setGlobalStateRestoreListener
■ NOTE: Internal KafkaStreams Threads do not
start consuming messages until stores are
recovered.
public interface ConsumerInterceptor <K, V> extends Configurable ,
AutoCloseable {
ConsumerRecords <K, V> onConsume (ConsumerRecords <K, V> record);
void onCommit (Map<TopicPartition , OffsetAndMetadata > offsets);
void close();
}
KafkaStreams
Configured using :
main.consumer.interceptor.classes
How far behind the Kafka Streams consumers
are from the producers ?
Is the Kafka Streams application ready to process
records and can serve interactive queries ?
.
Azkarra supports a REST API for managing,
monitoring and querying Kafka Streams instances.
■ Provides support for Interactive Queries
■ Built-in authentication and authorization
mechanisms (Basic Auth, SSL 2-Way).
■ Allows registration of new JAX-RS resources
using plugin interface: AzkarraRestExtension
32
Azkarra
REST API ● Get information about the local streams instance
GET /api/v1/streams
● Get the status for the streams instance
GET /api/v1/streams/(string: id)/status
● Get the configuration for the streams instance
GET /api/v1/streams/(string: id)/config
● Get current metrics for the streams instance
GET /api/v1/streams/(string: applicationId)/metrics
● Get all metrics in Prometheus format
GET /prometheus
Micrometer Prometheus
.
.
.
Azkarra can be configured for periodically reporting
the internal states of a KafkaStreams instance.
■ Use StreamLifecycleInterceptor:
⎼ MonitoringStreamsInterceptor
■ Accepts a pluggable reporter class
⎼ Default : KafkaMonitoringReporter
⎼ Publishes events that adhere to the
CloudEvents specification.
33
Putting it all together
Exporting Kafka Streams
States Anywhere
{
"id":
"appid:word-count;appsrv:localhost:8080;ts:1620691200000",
"source": "azkarra/ks/localhost:8080",
"specversion": "1.0",
"type": "io.streamthoughts.azkarra.streams.stateupdateevent",
"time": "2021-05-11T00:00:00.000+0000",
"datacontenttype": "application/json",
"ioazkarramonitorintervalms": 10000,
"ioazkarrastreamsappid": "word-count",
"ioazkarraversion": "0.9.2",
"ioazkarrastreamsappserver": "localhost:8080",
"data": {
"state": "RUNNING",
"threads": [
{
"name": "word-count-...-93e9a84057ad-StreamThread-1",
"state": "RUNNING",
"active_tasks": [],
"standby_tasks": [],
"clients": {}
}
],
"offsets": {
"group": "",
"consumers": []
},
"stores": {
"partitionRestoreInfos": [],
"partitionLagInfos": []
},
"state_changed_time": 1620691200000
}
}
Cloud Events
▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Packaging
(we have still 5’ left) 😬
Our TODO list
34
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
.
Azkarra-based applications can be packaged as any other Kafka Streams apps.
Azkarra Worker → An empty Azkarra application
■ Topologies and components can be loaded from an external uber-jar
⎼ Similar to Kafka Connect plugins and connectors
■ Can be used as the base image for Docker
⎼ Use Jib to build optimized Docker images for Java
35
Packaging Kafka Streams
with Azkarra
$ docker run --net host streamthoughts/azkarra-streams-worker:latest 
-v ./application.conf=/etc/azkarra/azkarra.conf 
-v ./local-topologies=/usr/share/azkarra-components/ 
streamthoughts/azkarra-streams-worker
Jib + Docker + Azkarra = ❤
.
Using Kubernetes, topologies can be downloaded and mount using an init-container.
36
Deploying Kafka Streams
with Azkarra (in Kubernetes)
Deployment, StatefulSet, or...
Container
(image: azkarra-worker)
InitContainer
my-topology-with-dependencies-1.0.jar
HTTP GET /
Repository Manager
e.g., Nexus / Artifactory
Shared volume
/var/lib/components/
azkarra.component.paths
▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
In less than 30 min
using Azkarra🚀
DONE
37
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
38
Demo
(new coins...we have still 5’ left)🤫
.
Kafka Streams is a very good choice to quickly create streaming applications.
But, building applications for production can be a lot of work.
Azkarra aims to be a fast path for production by providing all the cool features you need:
■ Built-in mechanisms for handling exceptions
■ Built-in REST API for executing Interactive Queries.
■ Consumers Offsets Lag
■ Topology Visualization
■ Dashboard UI
Take Aways
Conclusion
39
.
■ Add support for querying stale stores.
■ Add support for deploying and managing Kafka Streams
topologies directly into Kubernetes
❏ i.e., KubStreamsExecutionEnvironment
■ Enhance the WebUI to add some visualizations for the key
metrics to monitor.
Take Aways
Roadmap
40
.
Official Website: https://www.azkarrastreams.io/
GitHub: https://github.com/streamthoughts/azkarra-streams (for contributing and adding⭐)
Slack: https://communityinviter.com/apps/azkarra-streams/azkarra-streams-community
Demo: https://github.com/streamthoughts/demo-kafka-streams-scottify
Take Aways
Links
41
Join us on Slack!
Thank you
@fhussonnois
Florian HUSSONNOIS ▪ florian@streamthoughts.io
.
43
Azkarra
Dashboard
.
44
Azkarra
Dashboard
.
Images
■ Photo by Mark König on Unsplash
■ Photo by CHUTTERSNAP on Unsplash
45
Images & Icons

Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30 min using Azkarra | Florian Hussonnois, StreamThoughts

  • 1.
    Writing Blazing Fast,and Production Ready Kafka Streams apps (in less than 30 min) using Azkarra Kafka Summit Europe 2021 Florian HUSSONNOIS
  • 2.
    . @fhussonnois Consultant, Trainer SoftwareEngineer Co-founder @StreamThoughts Confluent Community Catalyst (2019/2021) Apache Kafka Streams contributor Open Source Technology Enthusiastic - Azkarra Streams - Kafka Connect File Pulse - Kafka Streams CEP - Kafka Client for Kotlin Hi, Im Florian Hussonnois 2
  • 3.
    3 Like me, youprobably started with the famous Word Count ! KStream<String, String> source = builder.stream("streams-plaintext-input"); source.flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store")) .toStream() .to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Topology topology = builder.build();
  • 4.
    4 KStream<String, String> source= builder.stream("streams-plaintext-input"); source.flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store")) .toStream() .to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Topology topology = builder.build(); GroupBy(Key) Repartition Stateful Stream Processing Consume Transform Aggregate / Join Produce 1 2 3
  • 5.
    public class WordCount{ public static void main(String[] args) { var builder = new StreamsBuilder (); KStream<String, String> source = builder.stream("streams-plaintext-input" ); source.flatMapValues(splitAndToLowercase ()) .groupBy((key, value) -> value) .count(Materialized.as("counts-store" )) .toStream() .to("streams-wordcount-output" , Produced.with(Serdes.String(), Serdes.Long())); var topology = builder.build(); Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG , "streams-wordcount" ); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG , "localhost:9092" ); props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG , Serdes.String().getClass()); props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG , Serdes.String().getClass()); var streams = new KafkaStreams(topology, props); Runtime.getRuntime().addShutdownHook (new Thread(streams::close )); } } Core Logic Execution 5 Configuration
  • 6.
    6 Can we deploya Kafka Streams application like this one in production, without any changes?
  • 7.
  • 8.
    8 (Well, unless youare testing your app in production…cough, cough...)
  • 9.
    9 (Well, unless youare testing your app in production…cough, cough...) OK, Nobody does that!
  • 10.
    ▢ Test theapp is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Some requirements before moving into production Our TODO list 10 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  • 11.
    . Business Valuevs Effort Topology (Business Logic) Business Value High Kafka Streams Management IQ Error Handling logic Monitoring / Health-check Security Config Externalization Low Effort Low/Medium High Streams Lifecycle Kafka Streams Application 11 RocksDB Offsets and Lags Packaging
  • 12.
    . A lightweight Javaframework to make a Kafka Streams application production-ready in just a few lines of code. ■ Distributed under the Apache License 2.0. ■ Was developed based on experience on a wide range of projects ■ Uses best-practices developed by Kafka users and the open-source community. Overview: ■ REST API: Health Check, Monitoring, Interactive Queries, etc ■ Embedded WebUI: Topology DAG Visualization ■ Built-in features for handling exceptions and tuning RocksDB ■ Support for Server-Sent Events Azkarra Framework in a nutshell 12 #azkarrastreams
  • 13.
    . Available on MavenCentral Azkarra Stream How to use It ? 13 <dependency> <groupId>io.streamthoughts </groupId> <artifactId>azkarra-streams </artifactId> <version>0.9.2</version> </dependency> Azkarra Framework: <dependency> <groupId>io.streamthoughts </groupId> <artifactId>azkarra-commons </artifactId> <version>0.9.2</version> </dependency> Provides reusable classes for Kafka Streams : mvn archetype:generate -DarchetypeGroupId =io.streamthoughts -DarchetypeArtifactId =azkarra-quickstart-java -DarchetypeVersion =0.9.2 -DgroupId=azkarra.streams -DartifactId=azkarra-getting-started -Dversion=1.0 -Dpackage=azkarra -DinteractiveMode =false Quick start:
  • 14.
    14 Let’s re-write the“Word Count” using with Azkarra (we have still 25’ left) 👾
  • 15.
    . . . Concepts TopologyProvider Topology Provider Topology Container forbuilding and configuring a Topology 15 class WordCountTopology implements TopologyProvider, Configurable { private Conf conf; @Override public Topology topology() { var source = conf.getString("topic.source.name"); var sink = conf.getString("topic.sink.name"); var store = conf.getString("store.name"); var builder = new StreamsBuilder(); builder .<String, String>stream(source) .flatMapValues(splitAndToLowercase()) .groupBy((key, value) -> value) .count(Materialized.as(store)) .toStream() .to(sink, Produced.with(Serdes.String(), Serdes.Long())); return builder.build(); } @Override public void configure(final Conf conf) { this.conf = conf; } @Override public String version() { return "1.0"; } }
  • 16.
    . . . Concepts Execution Environment StreamsExecution Environment Managesthe life cycle of KafkaStreams instances. Topology Provider Topology 16 // (1) Define the KafkaStreams configuration var streamsConfig = Conf.of( BOOTSTRAP_SERVERS_CONFIG, "localhost:9092", DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass(), DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass() ); // (2) Define the Topology configuration var topologyConfig = Conf.of( "topic.source.name", "topic-text-lines", "topic.sink.name", "topic-text-word-count", "store.name", "Count" ); // (3) Create and configure a local execution environment var env = LocalStreamsExecutionEnvironment .create(Conf.of("streams", streamsConfig)) // (4) Register our topology to run .registerTopology( WordCountTopology::new, Executed.as("WordCount").withConfig(topologyConfig) ); // (5) Start the environment env.start(); // (6) Add Shutdown Hook Runtime.getRuntime() .addShutdownHook(new Thread(env::stop));
  • 17.
    . 17 Let’s start KafkaStreams Boom!Transient Errors word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] Received error code INCOMPLETE_SOURCE_TOPIC_METADATA 16:05:12.585 [word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] ERROR org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1-consumer, groupId=word-count-1-0] User provided listener org.apache.kafka.streams.processor.internals.StreamsRebalanceListener failed on invocation of onPartitionsAssigned for partitions [] org.apache.kafka.streams.errors.MissingSourceTopicException: One or more source topics were missing during rebalance at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:57) ~[kafka-streams-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:451) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:367) [kafka-clients-2.7.0.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508) [kafka-clients-2.7.0.jar:?]
  • 18.
    . . . 18 StreamLifecycleInterceptor Concepts Interface StreamsLifecycleInterceptor { /** *Intercepts the streams instance before being started. */ default void onStart(StreamsLifecycleContext context, StreamsLifecycleChain chain) { chain.execute(); } /** * Intercepts the streams instance before being stopped. */ default void onStop(StreamsLifecycleContext context, StreamsLifecycleChain chain) { chain.execute(); } /** * Used for logging information. */ default String name() { return getClass().getSimpleName(); } } A pluggable interface that allows intercepting a KafkaStreams instance before being started or stopped. Built-in Implementations: ■ AutoCreateTopicsInterceptor ■ WaitForSourceTopicsInterceptor ■ KafkaBrokerReadyInterceptor ...and a few more (discussed later) 😉 Most Interceptors are configurable.
  • 19.
    . . . 19 AutoCreateTopicsInterceptor Concepts import staticio.s.a.r.i.AutoCreateTopicsInterceptorConfig.*; // (1) Define the KafkaStreams configuration var streamsConfig = ... // (2) Define the Topology configuration var topologyConfig = ... // (3) Define the Environment configuration var envConfig = Conf.of( "streams", streamsConfig, AUTO_CREATE_TOPICS_NUM_PARTITIONS_CONFIG, 2, AUTO_CREATE_TOPICS_REPLICATION_FACTOR_CONFIG, 1, // WARN - ONLY DURING DEVELOPMENT AUTO_DELETE_TOPICS_ENABLE_CONFIG, true ); // (4) Create and configure the local execution environment LocalStreamsExecutionEnvironment .create(envConfig) // (5) Add the StreamLifecycleInterceptor .addStreamsLifecycleInterceptor( AutoCreateTopicsInterceptor::new ) // ...code omitted for clarity Automatically infers the source and sink topics to be created from the Topology.describe(). ■ Internally, uses the AdminClient API. ■ Can be used during development for deleting all topics when the instance is stopped. for
  • 20.
    ▢ Test theapp is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Externalizing configuration (we have 20’ left)😀 What's left to do ? 20 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  • 21.
    . . . 21 Conf & AzkarraConf ExternalConfiguration // file:application.conf azkarra { // The configuration settings passed to the Kafka Streams // instance should be prefixed with `.streams` streams { bootstrap.servers = "localhost:9092" default.key.serde = "org.apache.kafka..Serdes$StringSerde" default.value.serde = "org.apache.kafka..Serdes$StringSerde" } topic.source.name = "topic-text-lines" topic.sink.name = "topic-text-word-count" store.name = "Count" auto.create.topics.num.partitions = 2 auto.create.topics.replication.factor = 1 auto.delete.topics.enable = true } // file:Main.class var config = AzkarraConf.create().getSubConf("azkarra"); Azkarra provides the Configurable interface which can be implemented by most of the Azkarra components. ■ AzkarraConf: Uses the Lightbend Config library. ○ Allows loading configuration settings from HOCON files. void configure(final Conf configuration);
  • 22.
    . . . Concepts AzkarraContext AzkarraContext StreamsExecution Environment Container for DependencyInjection. Used to automatically configures streams execution environments. Topology Provider Topology 22 public static void main(final String[] args) { // (1) Load the configuration (application.conf) var config = AzkarraConf.create().getSubConf("azkarra"); // (2) Create the Azkarra Context var context = DefaultAzkarraContext.create(config); // (3) Register StreamLifecycleInterceptor as component context.registerComponent( ConsoleStreamsLifecycleInterceptor.class ); // (4) Register the Topology to the default environment context.addTopology( WordCountTopology.class, Executed.as("word-count") ); // (5) Start the context context .setRegisterShutdownHook(true) .start(); }
  • 23.
    . . . Concepts AzkarraApplication AzkarraContext AzkarraApplication StreamsExecution Environment Used tobootstrap and configure an Azkarra application. Provides Embedded HTTP-Server Provides Component Scanning Topology Provider Topology 23 public class WordCount { public static void main(final String[] args) { // (1) Load the configuration (application.conf) var config = AzkarraConf.create(); // (2) Create the Azkarra Context var context = DefaultAzkarraContext.create(); // (3) Register the Topology to the default environment context.addTopology( WordCountTopology.class, Executed.as("word-count") ); // (4) Create Azkarra application new AzkarraApplication() .setContext(context) .setConfiguration(config) // (5) Enable and configure embedded HTTP server .setHttpServerEnable(true) .setHttpServerConf(ServerConfig.newBuilder() .setListener("localhost") .setPort(8080) .build() ) // (6) Start Azkarra .run(args); } }
  • 24.
    . . . Concepts AzkarraApplication AzkarraContext AzkarraApplication StreamsExecution Environment Topology Provider Topology 24 @AzkarraStreamsApplication public classWordCount { public static void main(String[] args) { AzkarraApplication.run(WordCount.class, args); } @Component public static class WordCountTopology implements TopologyProvider, Configurable { private Conf conf; @Override public Topology topology() { var builder = new StreamsBuilder(); // ...code omitted for clarity return builder.build(); } @Override public void configure(Conf conf) { this.conf = conf; } @Override public String version() { return "1.0"; } } } Used to bootstrap and configure an Azkarra application. Provides Embedded HTTP-Server Provides Component Scanning
  • 25.
    ▢ Test theapp is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Handling Deserialization Exceptions (we have 15’ left)🤔 What's left to do ? 25 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  • 26.
    . default.deserialization.exception.handler ■ CONTINUE: continuewith processing ■ FAIL: fail the processing and stop Two available implementations : ■ LogAndContinueExceptionHandler ■ LogAndFailExceptionHandler 26 Solution #1 Built-in mechanisms Not really suitable for production. Cannot monitor efficiently corrupted messages
  • 27.
    . . . 27 Solution #2 Dead LetterQueue Topic Solution #3 Sentinel Value DeserializationExceptionHandler Send corrupted messages to a special topic. Deserializer<T> Catch any exception thrown during deserialization and return a default value (e.g: null, “N/A”, etc). Handler ? Source Topic Topology (skip) Dead Letter Topic ! ! ! ! Source Topic SafeDeserializer Delegate Deserializer (null)(null) ! !
  • 28.
    . . . 28 Solution #2 Using Azkarra Solution#3 DeadLetterTopicExceptionHandler ■ By default, sends corrupted records to <Topic>-rejected ■ Doesn’t change the schema/format of the corrupted message. ■ Use Kafka Headers to trace exception cause and origin, e.g. : ○ __errors.exception.stacktrace __errors.exception.message ○ __errors.exception.class.name ○ __errors.timestamp ○ __errors.application.id ○ __errors.record.[topic|partition|offset] ■ Can be configured to send records to a distinct Kafka Cluster than the one used for KafkaStreams. SafeSerdes SafeSerdes.Long(-1L); SafeSerdes.UUID(null); SafeSerdes.serdeFrom( new JsonSerializer (), new JsonDeserializer (), NullNode.getInstance () );
  • 29.
    ▢ Test theapp is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Monitoring (we have 10’ left)🙃 Our TODO list 29 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  • 30.
    . The Kafka StreamsAPI provides few methods for monitoring the state of the running instance. ■ KafkaStreams#state(), KafkaStreams#setStateListener() ⎼ CREATED, REBALANCING, RUNNING, PENDING_SHUTDOWN, NOT_RUNNING, ERROR ⎼ can be used for checking the Liveness and Readiness for the instance. ■ KafkaStreams#localThreadsMetadata ⎼ returns information about local Threads/Tasks and partition assignments. ■ KafkaStreams#metrics() Best Practices: ■ Build some REST APIs to expose the states of Kafka Streams ■ Export Metrics using JMX, Prometheus, etc 30 How to monitor Kafka Streams ?
  • 31.
    . 31 Kafka Consumer Lagand Offsets Maybe the most fundamental indicator to monitor Consumer KafkaStreams#allLocalStorePartitionLags() KafkaStreams#setGlobalStateRestoreListener ■ NOTE: Internal KafkaStreams Threads do not start consuming messages until stores are recovered. public interface ConsumerInterceptor <K, V> extends Configurable , AutoCloseable { ConsumerRecords <K, V> onConsume (ConsumerRecords <K, V> record); void onCommit (Map<TopicPartition , OffsetAndMetadata > offsets); void close(); } KafkaStreams Configured using : main.consumer.interceptor.classes How far behind the Kafka Streams consumers are from the producers ? Is the Kafka Streams application ready to process records and can serve interactive queries ?
  • 32.
    . Azkarra supports aREST API for managing, monitoring and querying Kafka Streams instances. ■ Provides support for Interactive Queries ■ Built-in authentication and authorization mechanisms (Basic Auth, SSL 2-Way). ■ Allows registration of new JAX-RS resources using plugin interface: AzkarraRestExtension 32 Azkarra REST API ● Get information about the local streams instance GET /api/v1/streams ● Get the status for the streams instance GET /api/v1/streams/(string: id)/status ● Get the configuration for the streams instance GET /api/v1/streams/(string: id)/config ● Get current metrics for the streams instance GET /api/v1/streams/(string: applicationId)/metrics ● Get all metrics in Prometheus format GET /prometheus Micrometer Prometheus
  • 33.
    . . . Azkarra can beconfigured for periodically reporting the internal states of a KafkaStreams instance. ■ Use StreamLifecycleInterceptor: ⎼ MonitoringStreamsInterceptor ■ Accepts a pluggable reporter class ⎼ Default : KafkaMonitoringReporter ⎼ Publishes events that adhere to the CloudEvents specification. 33 Putting it all together Exporting Kafka Streams States Anywhere { "id": "appid:word-count;appsrv:localhost:8080;ts:1620691200000", "source": "azkarra/ks/localhost:8080", "specversion": "1.0", "type": "io.streamthoughts.azkarra.streams.stateupdateevent", "time": "2021-05-11T00:00:00.000+0000", "datacontenttype": "application/json", "ioazkarramonitorintervalms": 10000, "ioazkarrastreamsappid": "word-count", "ioazkarraversion": "0.9.2", "ioazkarrastreamsappserver": "localhost:8080", "data": { "state": "RUNNING", "threads": [ { "name": "word-count-...-93e9a84057ad-StreamThread-1", "state": "RUNNING", "active_tasks": [], "standby_tasks": [], "clients": {} } ], "offsets": { "group": "", "consumers": [] }, "stores": { "partitionRestoreInfos": [], "partitionLagInfos": [] }, "state_changed_time": 1620691200000 } } Cloud Events
  • 34.
    ▢ Test theapp is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions Packaging (we have still 5’ left) 😬 Our TODO list 34 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  • 35.
    . Azkarra-based applications canbe packaged as any other Kafka Streams apps. Azkarra Worker → An empty Azkarra application ■ Topologies and components can be loaded from an external uber-jar ⎼ Similar to Kafka Connect plugins and connectors ■ Can be used as the base image for Docker ⎼ Use Jib to build optimized Docker images for Java 35 Packaging Kafka Streams with Azkarra $ docker run --net host streamthoughts/azkarra-streams-worker:latest -v ./application.conf=/etc/azkarra/azkarra.conf -v ./local-topologies=/usr/share/azkarra-components/ streamthoughts/azkarra-streams-worker Jib + Docker + Azkarra = ❤
  • 36.
    . Using Kubernetes, topologiescan be downloaded and mount using an init-container. 36 Deploying Kafka Streams with Azkarra (in Kubernetes) Deployment, StatefulSet, or... Container (image: azkarra-worker) InitContainer my-topology-with-dependencies-1.0.jar HTTP GET / Repository Manager e.g., Nexus / Artifactory Shared volume /var/lib/components/ azkarra.component.paths
  • 37.
    ▢ Test theapp is working as expected ▢ Externalize configuration ▢ Handle transient errors ▢ Handle deserialization exceptions In less than 30 min using Azkarra🚀 DONE 37 ▢ Expose the state of the Kafka Streams application ▢ Be able to monitor offsets and lags of consumers and state stores ▢ Interactive Queries (optional) ▢ Package the Kafka Streams application for production
  • 38.
    38 Demo (new coins...we havestill 5’ left)🤫
  • 39.
    . Kafka Streams isa very good choice to quickly create streaming applications. But, building applications for production can be a lot of work. Azkarra aims to be a fast path for production by providing all the cool features you need: ■ Built-in mechanisms for handling exceptions ■ Built-in REST API for executing Interactive Queries. ■ Consumers Offsets Lag ■ Topology Visualization ■ Dashboard UI Take Aways Conclusion 39
  • 40.
    . ■ Add supportfor querying stale stores. ■ Add support for deploying and managing Kafka Streams topologies directly into Kubernetes ❏ i.e., KubStreamsExecutionEnvironment ■ Enhance the WebUI to add some visualizations for the key metrics to monitor. Take Aways Roadmap 40
  • 41.
    . Official Website: https://www.azkarrastreams.io/ GitHub:https://github.com/streamthoughts/azkarra-streams (for contributing and adding⭐) Slack: https://communityinviter.com/apps/azkarra-streams/azkarra-streams-community Demo: https://github.com/streamthoughts/demo-kafka-streams-scottify Take Aways Links 41 Join us on Slack!
  • 42.
    Thank you @fhussonnois Florian HUSSONNOIS▪ florian@streamthoughts.io
  • 43.
  • 44.
  • 45.
    . Images ■ Photo byMark König on Unsplash ■ Photo by CHUTTERSNAP on Unsplash 45 Images & Icons