SlideShare a Scribd company logo
1 of 135
Download to read offline
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-1
SLO 1 – Getting Started with Kafka
SRM Institute of Science and Technology, Ramapuram 1
Getting Started with
Kafka
 Apache Kafka was an open sourced Apache project in 2011, then First-class
Apache project in 2012.
 Kafka is written in Scala and Java.
 Apache Kafka is publish-subscribe based fault tolerant messaging system.
 It is fast, scalable and distributed by design.
SRM Institute of Science and Technology, Ramapuram 2
Why Kafka? Publish
Subscribe messaging model
 In Big Data, an enormous volume of data is used.
 Regarding data, we have two main challenges.
 The first challenge is how to collect large volume of data and
 The second challenge is to analyze the collected data.
 To overcome those challenges, you must need a messaging system.
 Kafka is designed for distributed high throughput systems.
 Kafka tends to work very well as a replacement for a more traditional message broker.
 In comparison to other messaging systems, Kafka has better throughput, built-in
partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-
scale message processing applications.
SRM Institute of Science and Technology, Ramapuram 3
Why Kafka? Publish
Subscribe messaging model
 Why Kafka?
 Multiple Producers
 Multiple Consumers
 Disk Retention
 Scalable
 High Performance.
SRM Institute of Science and Technology, Ramapuram 4
Why Kafka? Publish
Subscribe messaging model
What is a Messaging System?
 A Messaging System is responsible for transferring data from one application to
another, so the applications can focus on data, but not worry about how to share
it.
 Distributed messaging is based on the concept of reliable message queuing.
 Messages are queued asynchronously between client applications and messaging
system.
 Two types of messaging patterns are available
 one is point to point and
 the other is publish-subscribe (pub-sub) messaging system.
 Most of the messaging patterns follow pub-sub.
SRM Institute of Science and Technology, Ramapuram 5
Why Kafka? Publish
Subscribe messaging model
Publish-Subscribe Messaging System
 In the publish-subscribe system, messages are persisted in a topic.
 Unlike point-to-point system, consumers can subscribe to one or more topic and
consume all the messages in that topic.
 In the Publish-Subscribe system, message producers are called publishers and
message consumers are called subscribers.
 A real-life example is Dish TV, which publishes different channels like sports,
movies, music, etc., and anyone can subscribe to their own set of channels and
get them whenever their subscribed channels are available.
SRM Institute of Science and Technology, Ramapuram 6
Why Kafka? Publish
Subscribe messaging model
SRM Institute of Science and Technology, Ramapuram 7
Why Kafka? Publish
Subscribe messaging model
Following are a few benefits of Kafka −
 Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.
 Scalability − Kafka messaging system scales easily without down time..
 Durability − Kafka uses Distributed commit log which means messages persists
on disk as fast as possible, hence it is durable..
 Performance − Kafka has high throughput for both publishing and subscribing
messages. It maintains stable performance even many TB of messages are
stored.
Kafka is very fast and guarantees zero downtime and zero data loss.
SRM Institute of Science and Technology, Ramapuram 8
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-1
SLO 2 – Why Kafka? Publish Subscribe
Messaging Model
SRM Institute of Science and Technology, Ramapuram 1
Why Kafka? Publish
Subscribe messaging model
 In Big Data, an enormous volume of data is used.
 Regarding data, we have two main challenges.
 The first challenge is how to collect large volume of data and
 The second challenge is to analyze the collected data.
 To overcome those challenges, you must need a messaging system.
 Kafka is designed for distributed high throughput systems.
 Kafka tends to work very well as a replacement for a more traditional message broker.
 In comparison to other messaging systems, Kafka has better throughput, built-in
partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-
scale message processing applications.
SRM Institute of Science and Technology, Ramapuram 2
Why Kafka? Publish
Subscribe messaging model
 Why Kafka?
 Multiple Producers
 Multiple Consumers
 Disk Retention
 Scalable
 High Performance.
SRM Institute of Science and Technology, Ramapuram 3
Why Kafka? Publish
Subscribe messaging model
What is a Messaging System?
 A Messaging System is responsible for transferring data from one application to
another, so the applications can focus on data, but not worry about how to share
it.
 Distributed messaging is based on the concept of reliable message queuing.
 Messages are queued asynchronously between client applications and messaging
system.
 Two types of messaging patterns are available
 one is point to point and
 the other is publish-subscribe (pub-sub) messaging system.
 Most of the messaging patterns follow pub-sub.
SRM Institute of Science and Technology, Ramapuram 4
Why Kafka? Publish
Subscribe messaging model
Publish-Subscribe Messaging System
 In the publish-subscribe system, messages are persisted in a topic.
 Unlike point-to-point system, consumers can subscribe to one or more topic and
consume all the messages in that topic.
 In the Publish-Subscribe system, message producers are called publishers and
message consumers are called subscribers.
 A real-life example is Dish TV, which publishes different channels like sports,
movies, music, etc., and anyone can subscribe to their own set of channels and
get them whenever their subscribed channels are available.
SRM Institute of Science and Technology, Ramapuram 5
Why Kafka? Publish
Subscribe messaging model
SRM Institute of Science and Technology, Ramapuram 6
Why Kafka? Publish
Subscribe messaging model
Following are a few benefits of Kafka −
 Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.
 Scalability − Kafka messaging system scales easily without down time..
 Durability − Kafka uses Distributed commit log which means messages persists
on disk as fast as possible, hence it is durable..
 Performance − Kafka has high throughput for both publishing and subscribing
messages. It maintains stable performance even many TB of messages are
stored.
Kafka is very fast and guarantees zero downtime and zero data loss.
SRM Institute of Science and Technology, Ramapuram 7
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-2
SLO 1 – Kafka Architecture
SRM Institute of Science and Technology, Ramapuram 1
Kafka Architecture
 Topics, partitions, producers, consumers, etc., together forms the Kafka architecture.
 As different applications design the architecture of Kafka accordingly, there are the
following essential parts required to design Apache Kafka architecture.
SRM Institute of Science and Technology, Ramapuram 2
Kafka Architecture
o Data Ecosystem: Several applications that use Apache Kafka forms an ecosystem. This
ecosystem is built for data processing. It takes inputs in the form of applications that
create data, and outputs are defined in the form of metrics, reports, etc. The below
diagram represents a circulatory data ecosystem for Kafka.
o Kafka Cluster: A Kafka cluster is a system that comprises of different brokers, topics,
and their respective partitions. Data is written to the topic within the cluster and read by
the cluster itself.
o Producers: A producer sends or writes data/messages to the topic within the cluster. In
order to store a huge amount of data, different producers within an application send data
to the Kafka cluster.
SRM Institute of Science and Technology, Ramapuram 3
Kafka Architecture
o Consumers: A consumer is the one that reads or consumes messages from the Kafka
cluster. There can be several consumers consuming different types of data form the
cluster. The beauty of Kafka is that each consumer knows from where it needs to
consume the data.
o Brokers: A Kafka server is known as a broker. A broker is a bridge between producers
and consumers. If a producer wishes to write data to the cluster, it is sent to the Kafka
server. All brokers lie within a Kafka cluster itself. Also, there can be multiple brokers.
o Topics: It is a common name or a heading given to represent a similar type of data. In
Apache Kafka, there can be multiple topics in a cluster. Each topic specifies different
types of messages.
SRM Institute of Science and Technology, Ramapuram 4
Kafka Architecture
o Partitions: The data or message is divided into small subparts, known as partitions.
Each partition carries data within it having an offset value. The data is always written in
a sequential manner. We can have an infinite number of partitions with infinite offset
values. However, it is not guaranteed that to which partition the message will be written.
SRM Institute of Science and Technology, Ramapuram 5
Kafka Architecture
o ZooKeeper: A ZooKeeper is used to store information about the Kafka cluster and
details of the consumer clients. It manages brokers by maintaining a list of them. Also, a
ZooKeeper is responsible for choosing a leader for the partitions. If any changes like a
broker die, new topics, etc., occurs, the ZooKeeper sends notifications to Apache Kafka.
A ZooKeeper is designed to operate with an odd number of Kafka servers. Zookeeper
has a leader server that handles all the writes, and rest of the servers are the followers
who handle all the reads. However, a user does not directly interact with the Zookeeper,
but via brokers. No Kafka server can run without a zookeeper server. It is mandatory to
run the zookeeper server.
SRM Institute of Science and Technology, Ramapuram 6
Kafka Architecture
 In the above figure, there are three zookeeper servers where server 2 is the leader, and the
other two are chosen as its followers. The five brokers are connected to these servers.
Automatically, the Kafka cluster will come to know when brokers are down, more topics
are added, etc.. Hence, on combining all the necessities, a Kafka cluster architecture is
designed.
SRM Institute of Science and Technology, Ramapuram 7
Kafka Architecture
 In the above figure, there are three zookeeper servers where server 2 is the leader, and the
other two are chosen as its followers. The five brokers are connected to these servers.
Automatically, the Kafka cluster will come to know when brokers are down, more topics
are added, etc.. Hence, on combining all the necessities, a Kafka cluster architecture is
designed.
SRM Institute of Science and Technology, Ramapuram 8
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-2
SLO 2 – Messages and Batches, Schemas
SRM Institute of Science and Technology, Ramapuram 1
Messages and Batches,
Schemas, Topics and Partitions
 The unit of data within Kafka is called a message.
 If you are approaching Kafka from a database background, you can think of this as
similar to a row or a record.
 A message is simply an array of bytes as far as Kafka is concerned, so the data contained
within it does not have a specific format or meaning to Kafka.
 A message can have an optional bit of metadata, which is referred to as a key.
 The key is also a byte array and, as with the message, has no specific meaning to Kafka.
Keys are used when messages are to be written to partitions in a more controlled manner.
SRM Institute of Science and Technology, Ramapuram 2
Messages and Batches,
Schemas, Topics and Partitions
 For efficiency, messages are written into Kafka in batches.
 A batch is just a collection of messages, all of which are being produced to the same
topic and partition.
 An individual roundtrip across the network for each message would result in excessive
overhead, and collecting messages together into a batch reduces this.
 Schemas: While messages are opaque byte arrays to Kafka itself, it is recommended that
additional structure, or schema, be imposed on the message content so that it can be
easily understood. There are many options available for message schema, depending on
your application’s individual needs. Simplistic systems, such as Javascript Object
Notation (JSON) and Extensible Markup Language (XML), are easy to use and human-
readable. However, they lack features such as robust type handling and compatibility
between schema versions. Many Kafka developers favor the use of Apache Avro, which
is a serialization framework originally developed for Hadoop.
SRM Institute of Science and Technology, Ramapuram 3
Messages and Batches,
Schemas, Topics and Partitions
 Topics and Partitions
 Messages in Kafka are categorized into topics. The closest analogies for a topic are a
database table or a folder in a filesystem. Topics are additionally broken down into a
number of partitions. Going back to the “commit log” description, a partition is a single
log. Messages are written to it in an append-only fashion, and are read in order from
beginning to end. Note that as a topic typically has multiple partitions, there is no
guarantee of message time-ordering across the entire topic, just within a single partition.
Figure 1-5 shows a topic with four partitions, with writes being appended to the end of
each one. Partitions are also the way that Kafka provides redundancy and scalability.
Each partition can be hosted on a different server, which means that a single topic can be
scaled horizontally across multiple servers to provide performance far beyond the ability
of a single server.
SRM Institute of Science and Technology, Ramapuram 4
Messages and Batches,
Schemas, Topics and Partitions
SRM Institute of Science and Technology, Ramapuram 5
Messages and Batches,
Schemas, Topics and Partitions
 The term stream is often used when discussing data within systems like Kafka. Most
often, a stream is considered to be a single topic of data, regardless of the number of
partitions. This represents a single stream of data moving from the producers to the
consumers. This way of referring to messages is most common when discussing stream
processing, which is when frameworks—some of which are Kafka Streams, Apache
Samza, and Storm—operate on the messages in real time. This method of operation can
be compared to the way offline frameworks, namely Hadoop, are designed to work on
bulk data at a later time
SRM Institute of Science and Technology, Ramapuram 6
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-3
SLO 1 – Topics and Partitions
SRM Institute of Science and Technology, Ramapuram 1
Messages and Batches,
Schemas, Topics and Partitions
 The unit of data within Kafka is called a message.
 If you are approaching Kafka from a database background, you can think of this as
similar to a row or a record.
 A message is simply an array of bytes as far as Kafka is concerned, so the data contained
within it does not have a specific format or meaning to Kafka.
 A message can have an optional bit of metadata, which is referred to as a key.
 The key is also a byte array and, as with the message, has no specific meaning to Kafka.
Keys are used when messages are to be written to partitions in a more controlled manner.
SRM Institute of Science and Technology, Ramapuram 2
Messages and Batches,
Schemas, Topics and Partitions
 For efficiency, messages are written into Kafka in batches.
 A batch is just a collection of messages, all of which are being produced to the same
topic and partition.
 An individual roundtrip across the network for each message would result in excessive
overhead, and collecting messages together into a batch reduces this.
 Schemas: While messages are opaque byte arrays to Kafka itself, it is recommended that
additional structure, or schema, be imposed on the message content so that it can be
easily understood. There are many options available for message schema, depending on
your application’s individual needs. Simplistic systems, such as Javascript Object
Notation (JSON) and Extensible Markup Language (XML), are easy to use and human-
readable. However, they lack features such as robust type handling and compatibility
between schema versions. Many Kafka developers favor the use of Apache Avro, which
is a serialization framework originally developed for Hadoop.
SRM Institute of Science and Technology, Ramapuram 3
Messages and Batches,
Schemas, Topics and Partitions
 Topics and Partitions
 Messages in Kafka are categorized into topics. The closest analogies for a topic are a
database table or a folder in a filesystem. Topics are additionally broken down into a
number of partitions. Going back to the “commit log” description, a partition is a single
log. Messages are written to it in an append-only fashion, and are read in order from
beginning to end. Note that as a topic typically has multiple partitions, there is no
guarantee of message time-ordering across the entire topic, just within a single partition.
Figure 1-5 shows a topic with four partitions, with writes being appended to the end of
each one. Partitions are also the way that Kafka provides redundancy and scalability.
Each partition can be hosted on a different server, which means that a single topic can be
scaled horizontally across multiple servers to provide performance far beyond the ability
of a single server.
SRM Institute of Science and Technology, Ramapuram 4
Messages and Batches,
Schemas, Topics and Partitions
SRM Institute of Science and Technology, Ramapuram 5
Messages and Batches,
Schemas, Topics and Partitions
 The term stream is often used when discussing data within systems like Kafka. Most
often, a stream is considered to be a single topic of data, regardless of the number of
partitions. This represents a single stream of data moving from the producers to the
consumers. This way of referring to messages is most common when discussing stream
processing, which is when frameworks—some of which are Kafka Streams, Apache
Samza, and Storm—operate on the messages in real time. This method of operation can
be compared to the way offline frameworks, namely Hadoop, are designed to work on
bulk data at a later time
SRM Institute of Science and Technology, Ramapuram 6
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-3
SLO 2 – Producers and Consumers
SRM Institute of Science and Technology, Ramapuram 1
Producers and Consumers
 Producers create new messages. In other publish/subscribe systems, these may be called
publishers or writers.
 Consumers read messages. In other publish/subscribe systems, these clients may be
called subscribers or readers.
SRM Institute of Science and Technology, Ramapuram 2
Brokers and Clusters
 A single Kafka server is called a broker. The broker receives messages from producers,
assigns offsets to them, and commits the messages to storage on disk.
 Kafka brokers are designed to operate as part of a cluster. Within a cluster of brokers, one
broker will also function as the cluster controller (elected automatically from the live
members of the cluster). The controller is responsible for administrative operations,
including assigning partitions to brokers and monitoring for broker failures. A partition is
owned by a single broker in the cluster, and that broker is called the leader of the
partition
SRM Institute of Science and Technology, Ramapuram 3
Brokers and Clusters
SRM Institute of Science and Technology, Ramapuram 4
Data Ecosystem
SRM Institute of Science and Technology, Ramapuram 5
Use cases
 Activity tracking
 Messaging
 Metrics and Logging
 Commit log
 Stream Processing
SRM Institute of Science and Technology, Ramapuram 6
Sending Messages with
Producers Steps & Example
 The simplest way to send a message is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products",
"France");
try {
producer.send(record);
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 7
Sending Messages with
Producers Steps & Example
 Sending a Message Synchronously
The simplest way to send a message synchronously is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision
Products", "France");
try {
producer.send(record).get();
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 8
Sending Messages with
Producers Steps & Example
 Sending a Message Asynchronously
private class DemoProducerCallback implements Callback {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
e.printStackTrace();
}
}
}
ProducerRecord<String, String> record =
new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA");
producer.send(record, new DemoProducerCallback());
SRM Institute of Science and Technology, Ramapuram 9
Receiving Messages with
Consumers Steps & Example
 Creating a Kafka Consumer
The following code snippet shows how to create a KafkaConsumer:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
SRM Institute of Science and Technology, Ramapuram 10
Receiving Messages with
Consumers Steps & Example
 Subscribing to Topics
 The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to
use:
consumer.subscribe(Collections.singletonList("customerCountries"));
 To subscribe to all test topics, we can call:
consumer.subscribe("test.*");
SRM Institute of Science and Technology, Ramapuram 11
Receiving Messages with
Consumers Steps & Example
The Poll Loop
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(100);
for (ConsumerRecord<String, String> record :
records)
{
log.debug("topic = %s, partition = %s, offset =
%d,
customer = %s, country = %sn",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if
(custCountryMap.countainsValue(record.value(
))) {
updatedCount =
custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(),
updatedCount)
JSONObject json = new
JSONObject(custCountryMap);
System.out.println(json.toString(4))
}
}
} finally {
consumer.close();
}
SRM Institute of Science and Technology, Ramapuram 12
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-4
SLO 1 – Brokers and Clusters
SRM Institute of Science and Technology, Ramapuram 1
Brokers and Clusters
 A single Kafka server is called a broker. The broker receives messages from producers,
assigns offsets to them, and commits the messages to storage on disk.
 Kafka brokers are designed to operate as part of a cluster. Within a cluster of brokers, one
broker will also function as the cluster controller (elected automatically from the live
members of the cluster). The controller is responsible for administrative operations,
including assigning partitions to brokers and monitoring for broker failures. A partition is
owned by a single broker in the cluster, and that broker is called the leader of the
partition
SRM Institute of Science and Technology, Ramapuram 2
Brokers and Clusters
SRM Institute of Science and Technology, Ramapuram 3
Data Ecosystem
SRM Institute of Science and Technology, Ramapuram 4
Use cases
 Activity tracking
 Messaging
 Metrics and Logging
 Commit log
 Stream Processing
SRM Institute of Science and Technology, Ramapuram 5
Sending Messages with
Producers Steps & Example
 The simplest way to send a message is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products",
"France");
try {
producer.send(record);
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 6
Sending Messages with
Producers Steps & Example
 Sending a Message Synchronously
The simplest way to send a message synchronously is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision
Products", "France");
try {
producer.send(record).get();
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 7
Sending Messages with
Producers Steps & Example
 Sending a Message Asynchronously
private class DemoProducerCallback implements Callback {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
e.printStackTrace();
}
}
}
ProducerRecord<String, String> record =
new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA");
producer.send(record, new DemoProducerCallback());
SRM Institute of Science and Technology, Ramapuram 8
Receiving Messages with
Consumers Steps & Example
 Creating a Kafka Consumer
The following code snippet shows how to create a KafkaConsumer:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
SRM Institute of Science and Technology, Ramapuram 9
Receiving Messages with
Consumers Steps & Example
 Subscribing to Topics
 The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to
use:
consumer.subscribe(Collections.singletonList("customerCountries"));
 To subscribe to all test topics, we can call:
consumer.subscribe("test.*");
SRM Institute of Science and Technology, Ramapuram 10
Receiving Messages with
Consumers Steps & Example
The Poll Loop
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(100);
for (ConsumerRecord<String, String> record :
records)
{
log.debug("topic = %s, partition = %s, offset =
%d,
customer = %s, country = %sn",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if
(custCountryMap.countainsValue(record.value(
))) {
updatedCount =
custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(),
updatedCount)
JSONObject json = new
JSONObject(custCountryMap);
System.out.println(json.toString(4))
}
}
} finally {
consumer.close();
}
SRM Institute of Science and Technology, Ramapuram 11
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-4
SLO 2 – Multiple Clusters, Data Ecosystem
SRM Institute of Science and Technology, Ramapuram 1
Brokers and Clusters
 A single Kafka server is called a broker. The broker receives messages from producers,
assigns offsets to them, and commits the messages to storage on disk.
 Kafka brokers are designed to operate as part of a cluster. Within a cluster of brokers, one
broker will also function as the cluster controller (elected automatically from the live
members of the cluster). The controller is responsible for administrative operations,
including assigning partitions to brokers and monitoring for broker failures. A partition is
owned by a single broker in the cluster, and that broker is called the leader of the
partition
SRM Institute of Science and Technology, Ramapuram 2
Brokers and Clusters
SRM Institute of Science and Technology, Ramapuram 3
Data Ecosystem
SRM Institute of Science and Technology, Ramapuram 4
Use cases
 Activity tracking
 Messaging
 Metrics and Logging
 Commit log
 Stream Processing
SRM Institute of Science and Technology, Ramapuram 5
Sending Messages with
Producers Steps & Example
 The simplest way to send a message is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products",
"France");
try {
producer.send(record);
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 6
Sending Messages with
Producers Steps & Example
 Sending a Message Synchronously
The simplest way to send a message synchronously is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision
Products", "France");
try {
producer.send(record).get();
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 7
Sending Messages with
Producers Steps & Example
 Sending a Message Asynchronously
private class DemoProducerCallback implements Callback {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
e.printStackTrace();
}
}
}
ProducerRecord<String, String> record =
new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA");
producer.send(record, new DemoProducerCallback());
SRM Institute of Science and Technology, Ramapuram 8
Receiving Messages with
Consumers Steps & Example
 Creating a Kafka Consumer
The following code snippet shows how to create a KafkaConsumer:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
SRM Institute of Science and Technology, Ramapuram 9
Receiving Messages with
Consumers Steps & Example
 Subscribing to Topics
 The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to
use:
consumer.subscribe(Collections.singletonList("customerCountries"));
 To subscribe to all test topics, we can call:
consumer.subscribe("test.*");
SRM Institute of Science and Technology, Ramapuram 10
Receiving Messages with
Consumers Steps & Example
The Poll Loop
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(100);
for (ConsumerRecord<String, String> record :
records)
{
log.debug("topic = %s, partition = %s, offset =
%d,
customer = %s, country = %sn",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if
(custCountryMap.countainsValue(record.value(
))) {
updatedCount =
custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(),
updatedCount)
JSONObject json = new
JSONObject(custCountryMap);
System.out.println(json.toString(4))
}
}
} finally {
consumer.close();
}
SRM Institute of Science and Technology, Ramapuram 11
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-5
SLO 1 – Sending Messages with
Producers
SRM Institute of Science and Technology, Ramapuram 1
Sending Messages with
Producers Steps & Example
 The simplest way to send a message is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products",
"France");
try {
producer.send(record);
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 2
Sending Messages with
Producers Steps & Example
 Sending a Message Synchronously
The simplest way to send a message synchronously is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision
Products", "France");
try {
producer.send(record).get();
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 3
Sending Messages with
Producers Steps & Example
 Sending a Message Asynchronously
private class DemoProducerCallback implements Callback {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
e.printStackTrace();
}
}
}
ProducerRecord<String, String> record =
new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA");
producer.send(record, new DemoProducerCallback());
SRM Institute of Science and Technology, Ramapuram 4
Receiving Messages with
Consumers Steps & Example
 Creating a Kafka Consumer
The following code snippet shows how to create a KafkaConsumer:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
SRM Institute of Science and Technology, Ramapuram 5
Receiving Messages with
Consumers Steps & Example
 Subscribing to Topics
 The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to
use:
consumer.subscribe(Collections.singletonList("customerCountries"));
 To subscribe to all test topics, we can call:
consumer.subscribe("test.*");
SRM Institute of Science and Technology, Ramapuram 6
Receiving Messages with
Consumers Steps & Example
The Poll Loop
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(100);
for (ConsumerRecord<String, String> record :
records)
{
log.debug("topic = %s, partition = %s, offset =
%d,
customer = %s, country = %sn",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if
(custCountryMap.countainsValue(record.value(
))) {
updatedCount =
custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(),
updatedCount)
JSONObject json = new
JSONObject(custCountryMap);
System.out.println(json.toString(4))
}
}
} finally {
consumer.close();
}
SRM Institute of Science and Technology, Ramapuram 7
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-5
SLO 2 – Steps and Example - Sending
Messages with Producers
SRM Institute of Science and Technology, Ramapuram 1
Sending Messages with
Producers Steps & Example
 The simplest way to send a message is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products",
"France");
try {
producer.send(record);
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 2
Sending Messages with
Producers Steps & Example
 Sending a Message Synchronously
The simplest way to send a message synchronously is as follows:
ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision
Products", "France");
try {
producer.send(record).get();
} catch (Exception e) {
e.printStackTrace();
}
SRM Institute of Science and Technology, Ramapuram 3
Sending Messages with
Producers Steps & Example
 Sending a Message Asynchronously
private class DemoProducerCallback implements Callback {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
e.printStackTrace();
}
}
}
ProducerRecord<String, String> record =
new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA");
producer.send(record, new DemoProducerCallback());
SRM Institute of Science and Technology, Ramapuram 4
Receiving Messages with
Consumers Steps & Example
 Creating a Kafka Consumer
The following code snippet shows how to create a KafkaConsumer:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
SRM Institute of Science and Technology, Ramapuram 5
Receiving Messages with
Consumers Steps & Example
 Subscribing to Topics
 The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to
use:
consumer.subscribe(Collections.singletonList("customerCountries"));
 To subscribe to all test topics, we can call:
consumer.subscribe("test.*");
SRM Institute of Science and Technology, Ramapuram 6
Receiving Messages with
Consumers Steps & Example
The Poll Loop
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(100);
for (ConsumerRecord<String, String> record :
records)
{
log.debug("topic = %s, partition = %s, offset =
%d,
customer = %s, country = %sn",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if
(custCountryMap.countainsValue(record.value(
))) {
updatedCount =
custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(),
updatedCount)
JSONObject json = new
JSONObject(custCountryMap);
System.out.println(json.toString(4))
}
}
} finally {
consumer.close();
}
SRM Institute of Science and Technology, Ramapuram 7
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-6
SLO 1 – Receiving Messages with
Consumers
SRM Institute of Science and Technology, Ramapuram 1
Receiving Messages with
Consumers Steps & Example
 Creating a Kafka Consumer
The following code snippet shows how to create a KafkaConsumer:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
SRM Institute of Science and Technology, Ramapuram 2
Receiving Messages with
Consumers Steps & Example
 Subscribing to Topics
 The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to
use:
consumer.subscribe(Collections.singletonList("customerCountries"));
 To subscribe to all test topics, we can call:
consumer.subscribe("test.*");
SRM Institute of Science and Technology, Ramapuram 3
Receiving Messages with
Consumers Steps & Example
The Poll Loop
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(100);
for (ConsumerRecord<String, String> record :
records)
{
log.debug("topic = %s, partition = %s, offset =
%d,
customer = %s, country = %sn",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if
(custCountryMap.countainsValue(record.value(
))) {
updatedCount =
custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(),
updatedCount)
JSONObject json = new
JSONObject(custCountryMap);
System.out.println(json.toString(4))
}
}
} finally {
consumer.close();
}
SRM Institute of Science and Technology, Ramapuram 4
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-6
SLO 2 – Steps & Examples Receiving
Messages with Consumers
SRM Institute of Science and Technology, Ramapuram 1
Receiving Messages with
Consumers Steps & Example
 Creating a Kafka Consumer
The following code snippet shows how to create a KafkaConsumer:
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "CountryCounter");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
SRM Institute of Science and Technology, Ramapuram 2
Receiving Messages with
Consumers Steps & Example
 Subscribing to Topics
 The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to
use:
consumer.subscribe(Collections.singletonList("customerCountries"));
 To subscribe to all test topics, we can call:
consumer.subscribe("test.*");
SRM Institute of Science and Technology, Ramapuram 3
Receiving Messages with
Consumers Steps & Example
The Poll Loop
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(100);
for (ConsumerRecord<String, String> record :
records)
{
log.debug("topic = %s, partition = %s, offset =
%d,
customer = %s, country = %sn",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if
(custCountryMap.countainsValue(record.value(
))) {
updatedCount =
custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(),
updatedCount)
JSONObject json = new
JSONObject(custCountryMap);
System.out.println(json.toString(4))
}
}
} finally {
consumer.close();
}
SRM Institute of Science and Technology, Ramapuram 4
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-7
SLO 1 – Developing Kafka Stream
Applications
SRM Institute of Science and Technology, Ramapuram 1
Developing Kafka Stream
Applications
 The Kafka Streams DSL is the high-level API that enables you to build Kafka
Streams applications quickly.
 The high-level API is very well thought out, and there are methods to handle most
stream-processing needs out of the box, so you can create a sophisticated stream-
processing program without much effort.
 At the heart of the high-level API is the KStream object, which represents the
streaming key/value pair records. Most of the methods in the Kafka Streams DSL return a
reference to a KStream object, allowing for a fluent interface style of programming.
 Additionally, a good percentage of the KStream methods accept types consisting of
single-method interfaces allowing for the use of Java 8 lambda expressions. Taking these
factors into account, you can imagine the simplicity and ease with which you can build a
Kafka Streams program.
Phases in Kafka Stream
Applications Development
 Your first program will be a toy application that takes incoming messages and
converts them to uppercase characters, effectively yelling at anyone who reads the
message.
Phases in Kafka Stream
Applications Development
 This is a trivial example, but the code shown here is representative of what you’ll
see in other Kafka Streams programs. In most of the examples, you’ll see a similar
structure:
1. Define the configuration items.
2. Create Serde instances, either custom or predefined.
3. Build the processor topology.
4. Create and start the KStream.
 When we get into the more advanced examples, the principal difference will be in the
complexity of the processor topology. With that in mind, it’s time to build your first
application.
Phases in Kafka Stream
Applications Development
 Creating the topology for the Yelling App
 The first step to creating any Kafka Streams application is to create a source node. The
source node is responsible for consuming the records, from a topic, that will flow through
the application.
Phases in Kafka Stream
Applications Development
 The following line of code creates the source, or parent, node of the graph.
KStream<String, String> simpleFirstStream = builder.stream("src-topic",
Consumed.with(stringSerde, stringSerde));
 The simpleFirstStreamKStream instance is set to consume messages written to the src-
topic topic
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-7
SLO 2 – Phases in Kafka Stream
Application Development
SRM Institute of Science and Technology, Ramapuram 1
Phases in Kafka Stream
Applications Development
 Your first program will be a toy application that takes incoming messages and
converts them to uppercase characters, effectively yelling at anyone who reads the
message.
Phases in Kafka Stream
Applications Development
 This is a trivial example, but the code shown here is representative of what you’ll
see in other Kafka Streams programs. In most of the examples, you’ll see a similar
structure:
1. Define the configuration items.
2. Create Serde instances, either custom or predefined.
3. Build the processor topology.
4. Create and start the KStream.
 When we get into the more advanced examples, the principal difference will be in the
complexity of the processor topology. With that in mind, it’s time to build your first
application.
Phases in Kafka Stream
Applications Development
 Creating the topology for the Yelling App
 The first step to creating any Kafka Streams application is to create a source node. The
source node is responsible for consuming the records, from a topic, that will flow through
the application.
Phases in Kafka Stream
Applications Development
 The following line of code creates the source, or parent, node of the graph.
KStream<String, String> simpleFirstStream = builder.stream("src-topic",
Consumed.with(stringSerde, stringSerde));
 The simpleFirstStreamKStream instance is set to consume messages written to the src-
topic topic
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-8
SLO 1 – Constructing a Topology
SRM Institute of Science and Technology, Ramapuram 1
Constructing a Topology
 BUILDING THE SOURCE NODE
 You’ll start by building the source node
and first processor of the topology by
chaining two calls to the KStream API
together. It should be fairly obvious by
now what the role of the origin node is.
The first processor in the topology will be
responsible for masking credit card
numbers to protect customer privacy.
SRM Institute of Science and Technology, Ramapuram 2
Constructing a Topology
 KStream<String,Purchase> purchaseKStream = streamsBuilder.stream("transactions",
Consumed.with(stringSerde, purchaseSerde)) .mapValues(p ->
Purchase.builder(p).maskCreditCard().build());
 You create the source node with a call to the StreamsBuilder.stream method using a
default String serde, a custom serde for Purchase objects, and the name of the topic that’s
the source of the messages for the stream
 The next immediate call is to the KStream.mapValues method, taking a ValueMapper< V,
V1> instance as a parameter. Value mappers take a single parameter of one type (a
Purchase object, in this case) and map that object to a to a new value, possibly of another
type. In this example, KStream.mapValues returns an object of the same type (Purchase),
but with a masked credit card number.
 Note that when using the KStream.mapValues method, the original key is unchanged and
isn’t factored into mapping a new value. If you wanted to generate a new key/value pair
or include the key in producing a new value, you’d use the KStream.map method that
takes a KeyValueMapper<K, V, KeyValue<K1, V1>> instance.
SRM Institute of Science and Technology, Ramapuram 3
Constructing a Topology
 BUILDING THE SECOND
PROCESSOR
 Now you’ll build the second processor,
responsible for extracting pattern data
from a topic, which ZMart can use to
determine purchase patterns in regions of
the country. You’ll also add a sink node
responsible for writing the pattern data to
a Kafka topic.
SRM Institute of Science and Technology, Ramapuram 4
Constructing a Topology
 This new KStream will start to receive Purchase- Pattern objects created as a result of the
mapValues call.
 KStream<String, PurchasePattern> patternKStream =
purchaseKStream.mapValues(purchase -> PurchasePattern.builder(purchase).build());
patternKStream.to("patterns", Produced.with(stringSerde,purchasePatternSerde));
 Here, you declare a variable to hold the reference of the new KStream instance, because
you’ll use it to print the results of the stream to the console with a print call. This is very
useful during development and for debugging. The purchase-patterns processor forwards
the records it receives to a child node of its own, defined by the method call KStream.to,
writing to the patterns topic.
SRM Institute of Science and Technology, Ramapuram 5
Constructing a Topology
 BUILDING THE THIRD
PROCESSOR
 The third processor in the topology is
the customer rewards accumulator
node, which will let ZMart track
purchases made by members of their
preferred customer club. The
rewards accumulator sends data to a
topic consumed by applications at
ZMart HQ to determine rewards
when customers complete purchases.
SRM Institute of Science and Technology, Ramapuram 6
Constructing a Topology
 KStream<String, RewardAccumulator> rewardsKStream =
purchaseKStream.mapValues(purchase ->
RewardAccumulator.builder(purchase).build()); rewardsKStream.to("rewards",
Produced.with(stringSerde,rewardAccumulatorSerde));
 You build the rewards accumulator processor using what should be by now a
familiar pattern: creating a new KStream instance that maps the raw purchase data
contained in the record to a new object type. You also attach a sink node to the
rewards accumulator so the results of the rewards KStream can be written to a
topic and used for determining customer reward levels.
SRM Institute of Science and Technology, Ramapuram 7
Constructing a Topology
 BUILDING THE LAST PROCESSOR
 Finally, you’ll take the first KStream you
created, purchaseKStream, and attach a
sink node to write out the raw purchase
records (with credit cards masked, of
course) to a topic called purchases. The
purchases topic will be used to feed into a
NoSQL store such as Cassandra
(http://cassandra.apache.org/), Presto
(https://prestodb.io/), or Elastic Search
(www.elastic.co/webinars/getting-started-
elasticsearch) to perform ad hoc analysis.
Figure 3.9 shows the final processor.
SRM Institute of Science and Technology, Ramapuram 8
Constructing a Topology
 Specifically, you still performed the following steps:
 Create a StreamsConfig instance.
 Build one or more Serde instances.
 Construct the processing topology.
 Assemble all the components and start the Kafka Streams program.
 In this application, I’ve mentioned using a Serde, but I haven’t explained why or how
you create them. Let’s take some time now to discuss the role of the Serde in a Kafka
Streams application.
SRM Institute of Science and Technology, Ramapuram 9
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-8
SLO 2 – Streams and State – Applying
stateful operations
SRM Institute of Science and Technology, Ramapuram 1
Streams and State
 The preceding fictional scenario illustrates something that most of us already know
instinctively. Sometimes it’s easy to reason about what’s going on, but usually you
need some context to make good decisions. When it comes to stream processing, we
call that added context state.
 At first glance, the notions of state and stream processing may seem to be at odds with
each other. Stream processing implies a constant flow of discrete events that don’t
have much to do with each other and need to be dealt with as they occur. The notion
of state might evoke images of a static resource, such as a database table.
SRM Institute of Science and Technology, Ramapuram 2
Streams and State
 In actuality, you can view these as one and the same. But the rate of change in a
stream is potentially much faster and more frequent than in a database table.1 You
don’t always need state to work with streaming data. In some cases, you may have
discrete events or records that carry enough information to be valuable on their own.
But more often than not, the incoming stream of data will need enrichment from some
sort of store, either using information from events that arrived before, or joining
related events with events from different streams.
SRM Institute of Science and Technology, Ramapuram 3
Applying Stateful Operation
 In this topology, you produced a stream of
purchase-transaction events. One of the
processing nodes in the topology calculated
reward points for customers based on the
 amount of the sale. But in that processor, you
just calculated the total number of points for
the single transaction and forwarded the
results.
 If you added some state to the processor, you
could keep track of the cumulative number of
reward points. Then, the consuming
application at ZMart would need to check the
total and send out a reward if needed.
SRM Institute of Science and Technology, Ramapuram 4
Applying Stateful Operation
 Now that you have a basic idea of how state can be useful in Kafka Streams (or any
other streaming application), let’s look at some concrete examples.
 You’ll start with transforming the stateless rewards processor into a stateful processor
using transformValues.
 You’ll keep track of the total bonus points achieved so far and the amount of time
between purchases, to provide more information to downstream consumers.
SRM Institute of Science and Technology, Ramapuram 5
Applying Stateful Operation
 The transformValues processor
 The most basic of the stateful functions is
KStream.transformValues. Figure 4.4
illustrates how the
KStream.transformValues() method
operates. This method is semantically the
same as KStream.mapValues(), with a few
exceptions. One difference is that
transformValues has access to a
StateStore instance to accomplish its task.
The other difference is its ability to
schedule operations to occur at regular
intervals via a punctuate() method.
SRM Institute of Science and Technology, Ramapuram 6
Applying Stateful Operation
 Stateful customer rewards
 The rewards processor from the chapter 3 topology for ZMart extracts information for
customers belonging to ZMart’s rewards program. Initially, the rewards processor used
the KStream.mapValues() method to map the incoming Purchase object into a
RewardAccumulator object. The RewardAccumulator object originally consisted of just
two fields, the customer ID and the purchase total for the transaction. Now, the
requirements have changed some, and points are being associated with the ZMart
rewards program:
SRM Institute of Science and Technology, Ramapuram 7
Applying Stateful Operation
 Initializing the value transformer
 The first step is to set up or create any instance variables in the transformer init()
method. In the init() method, you retrieve the state store created when building the
processing topology
SRM Institute of Science and Technology, Ramapuram 8
Applying Stateful Operation
 Mapping the Purchase object to a RewardAccumulator using state
 Now that you’ve initialized the processor, you can move on to transforming a Purchase
object using state. A few simple steps for performing the transformation are as follows:
 1 Check for points accumulated so far by customer ID.
 2 Sum the points for the current transaction and present the total.
 3 Set the reward points on the RewardAccumulator to the new total amount.
 4 Save the new total points by customer ID in the local state store.
SRM Institute of Science and Technology, Ramapuram 9
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-9
SLO 1 – Example Application
Development with Kafka Streams
SRM Institute of Science and Technology, Ramapuram 1
Example Application
Development
 Word Count
 Let’s walk through an abbreviated word count example for Kafka Streams. You can
find the full example on GitHub.
 The first thing you do when creating a stream-processing app is configure Kafka
Streams. Kafka Streams has a large number of possible configurations, which we
won’t discuss here, but you can find them in the documentation. In addition, you can
also configure the producer and consumer embedded in Kafka Streams by adding any
producer or consumer config to the Properties object:
SRM Institute of Science and Technology, Ramapuram 2
Example Application
Development
public class WordCountExample {
public static void main(String[] args) throws Exception{
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
SRM Institute of Science and Technology, Ramapuram 3
Example Application
Development
 Every Kafka Streams application must have an application ID. This is used to
coordinate the instances of the application and also when naming the internal local
stores and the topics related to them. This name must be unique for each Kafka
Streams application working with the same Kafka cluster.
 The Kafka Streams application always reads data from Kafka topics and writes its
output to Kafka topics. As we’ll discuss later, Kafka Streams applications also use
Kafka for coordination. So we had better tell our app where to find Kafka.
 When reading and writing data, our app will need to serialize and deserialize, so we
provide default Serde classes. If needed, we can override these defaults later when
building the streams topology.
SRM Institute of Science and Technology, Ramapuram 4
Example Application
Development
Now that we have the configuration, let’s build our streams topology:
KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source =builder.stream("wordcount-input");
final Pattern pattern = Pattern.compile("W+");
KStream counts = source.flatMapValues(value-
>Arrays.asList(pattern.split(value.toLowerCase())))
.map((key, value) -> new KeyValue<Object,Object>(value, value))
.filter((key, value) -> (!value.equals("the"))).groupByKey()
.count("CountStore").mapValues(value->Long.toString(value)).toStream();
counts.to("wordcount-output");
SRM Institute of Science and Technology, Ramapuram 5
Example Application
Development
 We create a KStreamBuilder object and start defining a stream by pointing at the topic
we’ll use as our input.
 Each event we read from the source topic is a line of words; we split it up using a
regular expression into a series of individual words. Then we take each word
(currently a value of the event record) and put it in the event record key so it can be
used in a group-by operation.
 We filter out the word “the,” just to show how easy filtering is.
 And we group by key, so we now have a collection of events for each unique word.
SRM Institute of Science and Technology, Ramapuram 6
Example Application
Development
 We count how many events we have in each collection. The result of counting is a
Long data type. We convert it to a String so it will be easier for humans to read the
results.
 Only one thing left–write the results back to Kafka.
 Now that we have defined the flow of transformations that our application will run,
we just need to… run it:
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();
Thread.sleep(5000L);
streams.close();
}
} SRM Institute of Science and Technology, Ramapuram 7
Example Application
Development
 Define a KafkaStreams object based on our topology and the properties we defined.
 Start Kafka Streams.
 After a while, stop it.
 Thats it! In just a few short lines, we demonstrated how easy it is to implement a
single event processing pattern (we applied a map and a filter on the events). We
repartitioned the data by adding a group-by operator and then maintained simple local
state when we counted the number of records that have each word as a key. Then we
maintained simple local state when we counted the number of times each word
appeared.
 At this point, we recommend running the full example. The README in the GitHub
repository contains instructions on how to run the example.
SRM Institute of Science and Technology, Ramapuram 8
Example Application
Development
 One thing you’ll notice is that you can run the entire example on your machine
without installing anything except Apache Kafka. This is similar to the experience you
may have seen when using Spark in something like Local Mode. The main difference
is that if your input topic contains multiple partitions, you can run multiple instances
of the WordCount application (just run the app in several different terminal tabs) and
you have your first Kafka Streams processing cluster. The instances of the WordCount
application talk to each other and coordinate the work. One of the biggest barriers to
entry with Spark is that local mode is very easy to use, but then to run a production
cluster, you need to install YARN or Mesos and then install Spark on all those
machines, and then learn how to submit your app to the cluster. With the Kafka’s
Streams API, you just start multiple instances of your app—and you have a cluster.
 The exact same app is running on your development machine and in production.
SRM Institute of Science and Technology, Ramapuram 9
18CSE489T - STREAMING ANALYTICS
UNIT-2
Session-9
SLO 2 – Demo – Kafka Streams
SRM Institute of Science and Technology, Ramapuram 1
Demo – Kafka Streams
 Stock Market Statistics
 The next example is more involved—we will read a stream of stock market trading
events that include the stock ticker, ask price, and ask size. In stock market trades, ask
price is what a seller is asking for whereas bid price is what the buyer is suggesting to
pay. Ask size is the number of shares the seller is willing to sell at that price. For
simplicity of the example, we’ll ignore bids completely. We also won’t include a
timestamp in our data; instead, we’ll rely on event time populated by our Kafka
producer.
 We will then create output streams that contains a few windowed statistics:
 • Best (i.e., minimum) ask price for every five-second window
 • Number of trades for every five-second window
 • Average ask price for every five-second window
 All statistics will be updated every second.
SRM Institute of Science and Technology, Ramapuram 2
Demo – Kafka Streams
 For simplicity, we’ll assume our exchange only has 10 stock tickers trading in it. The
setup and configuration are very similar to those we used in the “Word Count” on
page 265:
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "stockstat");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, Constants.BROKER);
props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
TradeSerde.class.getName());
SRM Institute of Science and Technology, Ramapuram 3
Demo – Kafka Streams
 The main difference is the Serde classes used. In the “Word Count”, we used strings
for both key and value and therefore used the Serdes.String() class as a serializer and
deserializer for both. In this example, the key is still a string, but the value is a Trade
object that contains the ticker symbol, ask price, and ask size.
 In order to serialize and deserialize this object (and a few other objects we used in this
small app), we used the Gson library from Google to generate a JSon serializer and
deserializer from our Java object. Then created a small wrapper that created a Serde
object from those. Here is how we created the Serde:
static public final class TradeSerde extends WrapperSerde<Trade> {
public TradeSerde() {
super(new JsonSerializer<Trade>(),
new JsonDeserializer<Trade>(Trade.class));
}} SRM Institute of Science and Technology, Ramapuram 4
Demo – Kafka Streams
 Nothing fancy, but you need to remember to provide a Serde object for every object
you want to store in Kafka—input, output, and in some cases, also intermediate
results. To make this easier, we recommend generating these Serdes through projects
like GSon, Avro, Protobufs, or similar.
Now that we have everything configured, it’s time to build our topology:
KStream<TickerWindow, TradeStats> stats = source.groupByKey()
.aggregate(TradeStats::new, (k, v, tradestats) -> tradestats.add(v),
TimeWindows.of(5000).advanceBy(1000), new TradeStatsSerde(),
"trade-stats-store") .toStream((key, value) -> new TickerWindow(key.key(),
key.window().start())).mapValues((trade) -> trade.computeAvgPrice());
stats.to(new TickerWindowSerde(), new TradeStatsSerde(), "stockstats-output");
SRM Institute of Science and Technology, Ramapuram 5
Demo – Kafka Streams
 We start by reading events from the input topic and performing a groupByKey()
operation. Despite its name, this operation does not do any grouping. Rather, it
ensures that the stream of events is partitioned based on the record key. Since wewrote
the data into a topic with a key and didn’t modify the key before calling
groupByKey(), the data is still partitioned by its key—so this method does nothing in
this case.
 After we ensure correct partitioning, we start the windowed aggregation. The
“aggregate” method will split the stream into overlapping windows (a five-second
window every second), and then apply an aggregate method on all the events in the
window. The first parameter this method takes is a new object that will contain the
results of the aggregation—Tradestats in our case. This is an object we created to
contain all the statistics we are interested in for each time window— minimum price,
average price, and number of trades.
SRM Institute of Science and Technology, Ramapuram 6

More Related Content

Similar to SA UNIT II KAFKA.pdf

Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationKnoldus Inc.
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaAvanish Chauhan
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdfTarekHamdi8
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka FundamentalsKetan Keshri
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafkaAmitDhodi
 
Apache kafka
Apache kafkaApache kafka
Apache kafkaamarkayam
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationMuleSoft Meetup
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsRavindra kumar
 
Apache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging SystemApache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging SystemEdureka!
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For BeginnersRiby Varghese
 

Similar to SA UNIT II KAFKA.pdf (20)

Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka Fundamentals
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging SystemApache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging System
 
Intoduction to Apache Kafka
Intoduction to Apache KafkaIntoduction to Apache Kafka
Intoduction to Apache Kafka
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Kafka for Scale
Kafka for ScaleKafka for Scale
Kafka for Scale
 

More from ManjuAppukuttan2

CHAPTER 3 BASIC DYNAMIC ANALYSIS.ppt
CHAPTER 3 BASIC DYNAMIC ANALYSIS.pptCHAPTER 3 BASIC DYNAMIC ANALYSIS.ppt
CHAPTER 3 BASIC DYNAMIC ANALYSIS.pptManjuAppukuttan2
 
CHAPTER 2 BASIC ANALYSIS.ppt
CHAPTER 2 BASIC ANALYSIS.pptCHAPTER 2 BASIC ANALYSIS.ppt
CHAPTER 2 BASIC ANALYSIS.pptManjuAppukuttan2
 
CHAPTER 1 MALWARE ANALYSIS PRIMER.ppt
CHAPTER 1 MALWARE ANALYSIS PRIMER.pptCHAPTER 1 MALWARE ANALYSIS PRIMER.ppt
CHAPTER 1 MALWARE ANALYSIS PRIMER.pptManjuAppukuttan2
 
UNIT 3.1 INTRODUCTON TO IDA.ppt
UNIT 3.1 INTRODUCTON TO IDA.pptUNIT 3.1 INTRODUCTON TO IDA.ppt
UNIT 3.1 INTRODUCTON TO IDA.pptManjuAppukuttan2
 
UNIT 3.2 GETTING STARTED WITH IDA.ppt
UNIT 3.2 GETTING STARTED WITH IDA.pptUNIT 3.2 GETTING STARTED WITH IDA.ppt
UNIT 3.2 GETTING STARTED WITH IDA.pptManjuAppukuttan2
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfManjuAppukuttan2
 
CHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdfCHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdfManjuAppukuttan2
 
CHAPTER 1 MALWARE ANALYSIS PRIMER.pdf
CHAPTER 1 MALWARE ANALYSIS PRIMER.pdfCHAPTER 1 MALWARE ANALYSIS PRIMER.pdf
CHAPTER 1 MALWARE ANALYSIS PRIMER.pdfManjuAppukuttan2
 

More from ManjuAppukuttan2 (9)

CHAPTER 3 BASIC DYNAMIC ANALYSIS.ppt
CHAPTER 3 BASIC DYNAMIC ANALYSIS.pptCHAPTER 3 BASIC DYNAMIC ANALYSIS.ppt
CHAPTER 3 BASIC DYNAMIC ANALYSIS.ppt
 
CHAPTER 2 BASIC ANALYSIS.ppt
CHAPTER 2 BASIC ANALYSIS.pptCHAPTER 2 BASIC ANALYSIS.ppt
CHAPTER 2 BASIC ANALYSIS.ppt
 
CHAPTER 1 MALWARE ANALYSIS PRIMER.ppt
CHAPTER 1 MALWARE ANALYSIS PRIMER.pptCHAPTER 1 MALWARE ANALYSIS PRIMER.ppt
CHAPTER 1 MALWARE ANALYSIS PRIMER.ppt
 
UNIT 3.1 INTRODUCTON TO IDA.ppt
UNIT 3.1 INTRODUCTON TO IDA.pptUNIT 3.1 INTRODUCTON TO IDA.ppt
UNIT 3.1 INTRODUCTON TO IDA.ppt
 
UNIT 3.2 GETTING STARTED WITH IDA.ppt
UNIT 3.2 GETTING STARTED WITH IDA.pptUNIT 3.2 GETTING STARTED WITH IDA.ppt
UNIT 3.2 GETTING STARTED WITH IDA.ppt
 
SA UNIT III STORM.pdf
SA UNIT III STORM.pdfSA UNIT III STORM.pdf
SA UNIT III STORM.pdf
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdf
 
CHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdfCHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdf
 
CHAPTER 1 MALWARE ANALYSIS PRIMER.pdf
CHAPTER 1 MALWARE ANALYSIS PRIMER.pdfCHAPTER 1 MALWARE ANALYSIS PRIMER.pdf
CHAPTER 1 MALWARE ANALYSIS PRIMER.pdf
 

Recently uploaded

fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 

Recently uploaded (20)

fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 

SA UNIT II KAFKA.pdf

  • 1. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-1 SLO 1 – Getting Started with Kafka SRM Institute of Science and Technology, Ramapuram 1
  • 2. Getting Started with Kafka  Apache Kafka was an open sourced Apache project in 2011, then First-class Apache project in 2012.  Kafka is written in Scala and Java.  Apache Kafka is publish-subscribe based fault tolerant messaging system.  It is fast, scalable and distributed by design. SRM Institute of Science and Technology, Ramapuram 2
  • 3. Why Kafka? Publish Subscribe messaging model  In Big Data, an enormous volume of data is used.  Regarding data, we have two main challenges.  The first challenge is how to collect large volume of data and  The second challenge is to analyze the collected data.  To overcome those challenges, you must need a messaging system.  Kafka is designed for distributed high throughput systems.  Kafka tends to work very well as a replacement for a more traditional message broker.  In comparison to other messaging systems, Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large- scale message processing applications. SRM Institute of Science and Technology, Ramapuram 3
  • 4. Why Kafka? Publish Subscribe messaging model  Why Kafka?  Multiple Producers  Multiple Consumers  Disk Retention  Scalable  High Performance. SRM Institute of Science and Technology, Ramapuram 4
  • 5. Why Kafka? Publish Subscribe messaging model What is a Messaging System?  A Messaging System is responsible for transferring data from one application to another, so the applications can focus on data, but not worry about how to share it.  Distributed messaging is based on the concept of reliable message queuing.  Messages are queued asynchronously between client applications and messaging system.  Two types of messaging patterns are available  one is point to point and  the other is publish-subscribe (pub-sub) messaging system.  Most of the messaging patterns follow pub-sub. SRM Institute of Science and Technology, Ramapuram 5
  • 6. Why Kafka? Publish Subscribe messaging model Publish-Subscribe Messaging System  In the publish-subscribe system, messages are persisted in a topic.  Unlike point-to-point system, consumers can subscribe to one or more topic and consume all the messages in that topic.  In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers.  A real-life example is Dish TV, which publishes different channels like sports, movies, music, etc., and anyone can subscribe to their own set of channels and get them whenever their subscribed channels are available. SRM Institute of Science and Technology, Ramapuram 6
  • 7. Why Kafka? Publish Subscribe messaging model SRM Institute of Science and Technology, Ramapuram 7
  • 8. Why Kafka? Publish Subscribe messaging model Following are a few benefits of Kafka −  Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.  Scalability − Kafka messaging system scales easily without down time..  Durability − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable..  Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored. Kafka is very fast and guarantees zero downtime and zero data loss. SRM Institute of Science and Technology, Ramapuram 8
  • 9. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-1 SLO 2 – Why Kafka? Publish Subscribe Messaging Model SRM Institute of Science and Technology, Ramapuram 1
  • 10. Why Kafka? Publish Subscribe messaging model  In Big Data, an enormous volume of data is used.  Regarding data, we have two main challenges.  The first challenge is how to collect large volume of data and  The second challenge is to analyze the collected data.  To overcome those challenges, you must need a messaging system.  Kafka is designed for distributed high throughput systems.  Kafka tends to work very well as a replacement for a more traditional message broker.  In comparison to other messaging systems, Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large- scale message processing applications. SRM Institute of Science and Technology, Ramapuram 2
  • 11. Why Kafka? Publish Subscribe messaging model  Why Kafka?  Multiple Producers  Multiple Consumers  Disk Retention  Scalable  High Performance. SRM Institute of Science and Technology, Ramapuram 3
  • 12. Why Kafka? Publish Subscribe messaging model What is a Messaging System?  A Messaging System is responsible for transferring data from one application to another, so the applications can focus on data, but not worry about how to share it.  Distributed messaging is based on the concept of reliable message queuing.  Messages are queued asynchronously between client applications and messaging system.  Two types of messaging patterns are available  one is point to point and  the other is publish-subscribe (pub-sub) messaging system.  Most of the messaging patterns follow pub-sub. SRM Institute of Science and Technology, Ramapuram 4
  • 13. Why Kafka? Publish Subscribe messaging model Publish-Subscribe Messaging System  In the publish-subscribe system, messages are persisted in a topic.  Unlike point-to-point system, consumers can subscribe to one or more topic and consume all the messages in that topic.  In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers.  A real-life example is Dish TV, which publishes different channels like sports, movies, music, etc., and anyone can subscribe to their own set of channels and get them whenever their subscribed channels are available. SRM Institute of Science and Technology, Ramapuram 5
  • 14. Why Kafka? Publish Subscribe messaging model SRM Institute of Science and Technology, Ramapuram 6
  • 15. Why Kafka? Publish Subscribe messaging model Following are a few benefits of Kafka −  Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.  Scalability − Kafka messaging system scales easily without down time..  Durability − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable..  Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored. Kafka is very fast and guarantees zero downtime and zero data loss. SRM Institute of Science and Technology, Ramapuram 7
  • 16. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-2 SLO 1 – Kafka Architecture SRM Institute of Science and Technology, Ramapuram 1
  • 17. Kafka Architecture  Topics, partitions, producers, consumers, etc., together forms the Kafka architecture.  As different applications design the architecture of Kafka accordingly, there are the following essential parts required to design Apache Kafka architecture. SRM Institute of Science and Technology, Ramapuram 2
  • 18. Kafka Architecture o Data Ecosystem: Several applications that use Apache Kafka forms an ecosystem. This ecosystem is built for data processing. It takes inputs in the form of applications that create data, and outputs are defined in the form of metrics, reports, etc. The below diagram represents a circulatory data ecosystem for Kafka. o Kafka Cluster: A Kafka cluster is a system that comprises of different brokers, topics, and their respective partitions. Data is written to the topic within the cluster and read by the cluster itself. o Producers: A producer sends or writes data/messages to the topic within the cluster. In order to store a huge amount of data, different producers within an application send data to the Kafka cluster. SRM Institute of Science and Technology, Ramapuram 3
  • 19. Kafka Architecture o Consumers: A consumer is the one that reads or consumes messages from the Kafka cluster. There can be several consumers consuming different types of data form the cluster. The beauty of Kafka is that each consumer knows from where it needs to consume the data. o Brokers: A Kafka server is known as a broker. A broker is a bridge between producers and consumers. If a producer wishes to write data to the cluster, it is sent to the Kafka server. All brokers lie within a Kafka cluster itself. Also, there can be multiple brokers. o Topics: It is a common name or a heading given to represent a similar type of data. In Apache Kafka, there can be multiple topics in a cluster. Each topic specifies different types of messages. SRM Institute of Science and Technology, Ramapuram 4
  • 20. Kafka Architecture o Partitions: The data or message is divided into small subparts, known as partitions. Each partition carries data within it having an offset value. The data is always written in a sequential manner. We can have an infinite number of partitions with infinite offset values. However, it is not guaranteed that to which partition the message will be written. SRM Institute of Science and Technology, Ramapuram 5
  • 21. Kafka Architecture o ZooKeeper: A ZooKeeper is used to store information about the Kafka cluster and details of the consumer clients. It manages brokers by maintaining a list of them. Also, a ZooKeeper is responsible for choosing a leader for the partitions. If any changes like a broker die, new topics, etc., occurs, the ZooKeeper sends notifications to Apache Kafka. A ZooKeeper is designed to operate with an odd number of Kafka servers. Zookeeper has a leader server that handles all the writes, and rest of the servers are the followers who handle all the reads. However, a user does not directly interact with the Zookeeper, but via brokers. No Kafka server can run without a zookeeper server. It is mandatory to run the zookeeper server. SRM Institute of Science and Technology, Ramapuram 6
  • 22. Kafka Architecture  In the above figure, there are three zookeeper servers where server 2 is the leader, and the other two are chosen as its followers. The five brokers are connected to these servers. Automatically, the Kafka cluster will come to know when brokers are down, more topics are added, etc.. Hence, on combining all the necessities, a Kafka cluster architecture is designed. SRM Institute of Science and Technology, Ramapuram 7
  • 23. Kafka Architecture  In the above figure, there are three zookeeper servers where server 2 is the leader, and the other two are chosen as its followers. The five brokers are connected to these servers. Automatically, the Kafka cluster will come to know when brokers are down, more topics are added, etc.. Hence, on combining all the necessities, a Kafka cluster architecture is designed. SRM Institute of Science and Technology, Ramapuram 8
  • 24. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-2 SLO 2 – Messages and Batches, Schemas SRM Institute of Science and Technology, Ramapuram 1
  • 25. Messages and Batches, Schemas, Topics and Partitions  The unit of data within Kafka is called a message.  If you are approaching Kafka from a database background, you can think of this as similar to a row or a record.  A message is simply an array of bytes as far as Kafka is concerned, so the data contained within it does not have a specific format or meaning to Kafka.  A message can have an optional bit of metadata, which is referred to as a key.  The key is also a byte array and, as with the message, has no specific meaning to Kafka. Keys are used when messages are to be written to partitions in a more controlled manner. SRM Institute of Science and Technology, Ramapuram 2
  • 26. Messages and Batches, Schemas, Topics and Partitions  For efficiency, messages are written into Kafka in batches.  A batch is just a collection of messages, all of which are being produced to the same topic and partition.  An individual roundtrip across the network for each message would result in excessive overhead, and collecting messages together into a batch reduces this.  Schemas: While messages are opaque byte arrays to Kafka itself, it is recommended that additional structure, or schema, be imposed on the message content so that it can be easily understood. There are many options available for message schema, depending on your application’s individual needs. Simplistic systems, such as Javascript Object Notation (JSON) and Extensible Markup Language (XML), are easy to use and human- readable. However, they lack features such as robust type handling and compatibility between schema versions. Many Kafka developers favor the use of Apache Avro, which is a serialization framework originally developed for Hadoop. SRM Institute of Science and Technology, Ramapuram 3
  • 27. Messages and Batches, Schemas, Topics and Partitions  Topics and Partitions  Messages in Kafka are categorized into topics. The closest analogies for a topic are a database table or a folder in a filesystem. Topics are additionally broken down into a number of partitions. Going back to the “commit log” description, a partition is a single log. Messages are written to it in an append-only fashion, and are read in order from beginning to end. Note that as a topic typically has multiple partitions, there is no guarantee of message time-ordering across the entire topic, just within a single partition. Figure 1-5 shows a topic with four partitions, with writes being appended to the end of each one. Partitions are also the way that Kafka provides redundancy and scalability. Each partition can be hosted on a different server, which means that a single topic can be scaled horizontally across multiple servers to provide performance far beyond the ability of a single server. SRM Institute of Science and Technology, Ramapuram 4
  • 28. Messages and Batches, Schemas, Topics and Partitions SRM Institute of Science and Technology, Ramapuram 5
  • 29. Messages and Batches, Schemas, Topics and Partitions  The term stream is often used when discussing data within systems like Kafka. Most often, a stream is considered to be a single topic of data, regardless of the number of partitions. This represents a single stream of data moving from the producers to the consumers. This way of referring to messages is most common when discussing stream processing, which is when frameworks—some of which are Kafka Streams, Apache Samza, and Storm—operate on the messages in real time. This method of operation can be compared to the way offline frameworks, namely Hadoop, are designed to work on bulk data at a later time SRM Institute of Science and Technology, Ramapuram 6
  • 30. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-3 SLO 1 – Topics and Partitions SRM Institute of Science and Technology, Ramapuram 1
  • 31. Messages and Batches, Schemas, Topics and Partitions  The unit of data within Kafka is called a message.  If you are approaching Kafka from a database background, you can think of this as similar to a row or a record.  A message is simply an array of bytes as far as Kafka is concerned, so the data contained within it does not have a specific format or meaning to Kafka.  A message can have an optional bit of metadata, which is referred to as a key.  The key is also a byte array and, as with the message, has no specific meaning to Kafka. Keys are used when messages are to be written to partitions in a more controlled manner. SRM Institute of Science and Technology, Ramapuram 2
  • 32. Messages and Batches, Schemas, Topics and Partitions  For efficiency, messages are written into Kafka in batches.  A batch is just a collection of messages, all of which are being produced to the same topic and partition.  An individual roundtrip across the network for each message would result in excessive overhead, and collecting messages together into a batch reduces this.  Schemas: While messages are opaque byte arrays to Kafka itself, it is recommended that additional structure, or schema, be imposed on the message content so that it can be easily understood. There are many options available for message schema, depending on your application’s individual needs. Simplistic systems, such as Javascript Object Notation (JSON) and Extensible Markup Language (XML), are easy to use and human- readable. However, they lack features such as robust type handling and compatibility between schema versions. Many Kafka developers favor the use of Apache Avro, which is a serialization framework originally developed for Hadoop. SRM Institute of Science and Technology, Ramapuram 3
  • 33. Messages and Batches, Schemas, Topics and Partitions  Topics and Partitions  Messages in Kafka are categorized into topics. The closest analogies for a topic are a database table or a folder in a filesystem. Topics are additionally broken down into a number of partitions. Going back to the “commit log” description, a partition is a single log. Messages are written to it in an append-only fashion, and are read in order from beginning to end. Note that as a topic typically has multiple partitions, there is no guarantee of message time-ordering across the entire topic, just within a single partition. Figure 1-5 shows a topic with four partitions, with writes being appended to the end of each one. Partitions are also the way that Kafka provides redundancy and scalability. Each partition can be hosted on a different server, which means that a single topic can be scaled horizontally across multiple servers to provide performance far beyond the ability of a single server. SRM Institute of Science and Technology, Ramapuram 4
  • 34. Messages and Batches, Schemas, Topics and Partitions SRM Institute of Science and Technology, Ramapuram 5
  • 35. Messages and Batches, Schemas, Topics and Partitions  The term stream is often used when discussing data within systems like Kafka. Most often, a stream is considered to be a single topic of data, regardless of the number of partitions. This represents a single stream of data moving from the producers to the consumers. This way of referring to messages is most common when discussing stream processing, which is when frameworks—some of which are Kafka Streams, Apache Samza, and Storm—operate on the messages in real time. This method of operation can be compared to the way offline frameworks, namely Hadoop, are designed to work on bulk data at a later time SRM Institute of Science and Technology, Ramapuram 6
  • 36. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-3 SLO 2 – Producers and Consumers SRM Institute of Science and Technology, Ramapuram 1
  • 37. Producers and Consumers  Producers create new messages. In other publish/subscribe systems, these may be called publishers or writers.  Consumers read messages. In other publish/subscribe systems, these clients may be called subscribers or readers. SRM Institute of Science and Technology, Ramapuram 2
  • 38. Brokers and Clusters  A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk.  Kafka brokers are designed to operate as part of a cluster. Within a cluster of brokers, one broker will also function as the cluster controller (elected automatically from the live members of the cluster). The controller is responsible for administrative operations, including assigning partitions to brokers and monitoring for broker failures. A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition SRM Institute of Science and Technology, Ramapuram 3
  • 39. Brokers and Clusters SRM Institute of Science and Technology, Ramapuram 4
  • 40. Data Ecosystem SRM Institute of Science and Technology, Ramapuram 5
  • 41. Use cases  Activity tracking  Messaging  Metrics and Logging  Commit log  Stream Processing SRM Institute of Science and Technology, Ramapuram 6
  • 42. Sending Messages with Producers Steps & Example  The simplest way to send a message is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 7
  • 43. Sending Messages with Producers Steps & Example  Sending a Message Synchronously The simplest way to send a message synchronously is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record).get(); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 8
  • 44. Sending Messages with Producers Steps & Example  Sending a Message Asynchronously private class DemoProducerCallback implements Callback { @Override public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { e.printStackTrace(); } } } ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA"); producer.send(record, new DemoProducerCallback()); SRM Institute of Science and Technology, Ramapuram 9
  • 45. Receiving Messages with Consumers Steps & Example  Creating a Kafka Consumer The following code snippet shows how to create a KafkaConsumer: Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); SRM Institute of Science and Technology, Ramapuram 10
  • 46. Receiving Messages with Consumers Steps & Example  Subscribing to Topics  The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to use: consumer.subscribe(Collections.singletonList("customerCountries"));  To subscribe to all test topics, we can call: consumer.subscribe("test.*"); SRM Institute of Science and Technology, Ramapuram 11
  • 47. Receiving Messages with Consumers Steps & Example The Poll Loop try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %sn", record.topic(), record.partition(), record.offset(), record.key(), record.value()); int updatedCount = 1; if (custCountryMap.countainsValue(record.value( ))) { updatedCount = custCountryMap.get(record.value()) + 1; } custCountryMap.put(record.value(), updatedCount) JSONObject json = new JSONObject(custCountryMap); System.out.println(json.toString(4)) } } } finally { consumer.close(); } SRM Institute of Science and Technology, Ramapuram 12
  • 48. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-4 SLO 1 – Brokers and Clusters SRM Institute of Science and Technology, Ramapuram 1
  • 49. Brokers and Clusters  A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk.  Kafka brokers are designed to operate as part of a cluster. Within a cluster of brokers, one broker will also function as the cluster controller (elected automatically from the live members of the cluster). The controller is responsible for administrative operations, including assigning partitions to brokers and monitoring for broker failures. A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition SRM Institute of Science and Technology, Ramapuram 2
  • 50. Brokers and Clusters SRM Institute of Science and Technology, Ramapuram 3
  • 51. Data Ecosystem SRM Institute of Science and Technology, Ramapuram 4
  • 52. Use cases  Activity tracking  Messaging  Metrics and Logging  Commit log  Stream Processing SRM Institute of Science and Technology, Ramapuram 5
  • 53. Sending Messages with Producers Steps & Example  The simplest way to send a message is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 6
  • 54. Sending Messages with Producers Steps & Example  Sending a Message Synchronously The simplest way to send a message synchronously is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record).get(); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 7
  • 55. Sending Messages with Producers Steps & Example  Sending a Message Asynchronously private class DemoProducerCallback implements Callback { @Override public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { e.printStackTrace(); } } } ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA"); producer.send(record, new DemoProducerCallback()); SRM Institute of Science and Technology, Ramapuram 8
  • 56. Receiving Messages with Consumers Steps & Example  Creating a Kafka Consumer The following code snippet shows how to create a KafkaConsumer: Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); SRM Institute of Science and Technology, Ramapuram 9
  • 57. Receiving Messages with Consumers Steps & Example  Subscribing to Topics  The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to use: consumer.subscribe(Collections.singletonList("customerCountries"));  To subscribe to all test topics, we can call: consumer.subscribe("test.*"); SRM Institute of Science and Technology, Ramapuram 10
  • 58. Receiving Messages with Consumers Steps & Example The Poll Loop try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %sn", record.topic(), record.partition(), record.offset(), record.key(), record.value()); int updatedCount = 1; if (custCountryMap.countainsValue(record.value( ))) { updatedCount = custCountryMap.get(record.value()) + 1; } custCountryMap.put(record.value(), updatedCount) JSONObject json = new JSONObject(custCountryMap); System.out.println(json.toString(4)) } } } finally { consumer.close(); } SRM Institute of Science and Technology, Ramapuram 11
  • 59. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-4 SLO 2 – Multiple Clusters, Data Ecosystem SRM Institute of Science and Technology, Ramapuram 1
  • 60. Brokers and Clusters  A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk.  Kafka brokers are designed to operate as part of a cluster. Within a cluster of brokers, one broker will also function as the cluster controller (elected automatically from the live members of the cluster). The controller is responsible for administrative operations, including assigning partitions to brokers and monitoring for broker failures. A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition SRM Institute of Science and Technology, Ramapuram 2
  • 61. Brokers and Clusters SRM Institute of Science and Technology, Ramapuram 3
  • 62. Data Ecosystem SRM Institute of Science and Technology, Ramapuram 4
  • 63. Use cases  Activity tracking  Messaging  Metrics and Logging  Commit log  Stream Processing SRM Institute of Science and Technology, Ramapuram 5
  • 64. Sending Messages with Producers Steps & Example  The simplest way to send a message is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 6
  • 65. Sending Messages with Producers Steps & Example  Sending a Message Synchronously The simplest way to send a message synchronously is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record).get(); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 7
  • 66. Sending Messages with Producers Steps & Example  Sending a Message Asynchronously private class DemoProducerCallback implements Callback { @Override public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { e.printStackTrace(); } } } ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA"); producer.send(record, new DemoProducerCallback()); SRM Institute of Science and Technology, Ramapuram 8
  • 67. Receiving Messages with Consumers Steps & Example  Creating a Kafka Consumer The following code snippet shows how to create a KafkaConsumer: Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); SRM Institute of Science and Technology, Ramapuram 9
  • 68. Receiving Messages with Consumers Steps & Example  Subscribing to Topics  The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to use: consumer.subscribe(Collections.singletonList("customerCountries"));  To subscribe to all test topics, we can call: consumer.subscribe("test.*"); SRM Institute of Science and Technology, Ramapuram 10
  • 69. Receiving Messages with Consumers Steps & Example The Poll Loop try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %sn", record.topic(), record.partition(), record.offset(), record.key(), record.value()); int updatedCount = 1; if (custCountryMap.countainsValue(record.value( ))) { updatedCount = custCountryMap.get(record.value()) + 1; } custCountryMap.put(record.value(), updatedCount) JSONObject json = new JSONObject(custCountryMap); System.out.println(json.toString(4)) } } } finally { consumer.close(); } SRM Institute of Science and Technology, Ramapuram 11
  • 70. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-5 SLO 1 – Sending Messages with Producers SRM Institute of Science and Technology, Ramapuram 1
  • 71. Sending Messages with Producers Steps & Example  The simplest way to send a message is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 2
  • 72. Sending Messages with Producers Steps & Example  Sending a Message Synchronously The simplest way to send a message synchronously is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record).get(); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 3
  • 73. Sending Messages with Producers Steps & Example  Sending a Message Asynchronously private class DemoProducerCallback implements Callback { @Override public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { e.printStackTrace(); } } } ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA"); producer.send(record, new DemoProducerCallback()); SRM Institute of Science and Technology, Ramapuram 4
  • 74. Receiving Messages with Consumers Steps & Example  Creating a Kafka Consumer The following code snippet shows how to create a KafkaConsumer: Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); SRM Institute of Science and Technology, Ramapuram 5
  • 75. Receiving Messages with Consumers Steps & Example  Subscribing to Topics  The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to use: consumer.subscribe(Collections.singletonList("customerCountries"));  To subscribe to all test topics, we can call: consumer.subscribe("test.*"); SRM Institute of Science and Technology, Ramapuram 6
  • 76. Receiving Messages with Consumers Steps & Example The Poll Loop try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %sn", record.topic(), record.partition(), record.offset(), record.key(), record.value()); int updatedCount = 1; if (custCountryMap.countainsValue(record.value( ))) { updatedCount = custCountryMap.get(record.value()) + 1; } custCountryMap.put(record.value(), updatedCount) JSONObject json = new JSONObject(custCountryMap); System.out.println(json.toString(4)) } } } finally { consumer.close(); } SRM Institute of Science and Technology, Ramapuram 7
  • 77. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-5 SLO 2 – Steps and Example - Sending Messages with Producers SRM Institute of Science and Technology, Ramapuram 1
  • 78. Sending Messages with Producers Steps & Example  The simplest way to send a message is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 2
  • 79. Sending Messages with Producers Steps & Example  Sending a Message Synchronously The simplest way to send a message synchronously is as follows: ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Precision Products", "France"); try { producer.send(record).get(); } catch (Exception e) { e.printStackTrace(); } SRM Institute of Science and Technology, Ramapuram 3
  • 80. Sending Messages with Producers Steps & Example  Sending a Message Asynchronously private class DemoProducerCallback implements Callback { @Override public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { e.printStackTrace(); } } } ProducerRecord<String, String> record = new ProducerRecord<>("CustomerCountry", "Biomedical Materials", "USA"); producer.send(record, new DemoProducerCallback()); SRM Institute of Science and Technology, Ramapuram 4
  • 81. Receiving Messages with Consumers Steps & Example  Creating a Kafka Consumer The following code snippet shows how to create a KafkaConsumer: Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); SRM Institute of Science and Technology, Ramapuram 5
  • 82. Receiving Messages with Consumers Steps & Example  Subscribing to Topics  The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to use: consumer.subscribe(Collections.singletonList("customerCountries"));  To subscribe to all test topics, we can call: consumer.subscribe("test.*"); SRM Institute of Science and Technology, Ramapuram 6
  • 83. Receiving Messages with Consumers Steps & Example The Poll Loop try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %sn", record.topic(), record.partition(), record.offset(), record.key(), record.value()); int updatedCount = 1; if (custCountryMap.countainsValue(record.value( ))) { updatedCount = custCountryMap.get(record.value()) + 1; } custCountryMap.put(record.value(), updatedCount) JSONObject json = new JSONObject(custCountryMap); System.out.println(json.toString(4)) } } } finally { consumer.close(); } SRM Institute of Science and Technology, Ramapuram 7
  • 84. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-6 SLO 1 – Receiving Messages with Consumers SRM Institute of Science and Technology, Ramapuram 1
  • 85. Receiving Messages with Consumers Steps & Example  Creating a Kafka Consumer The following code snippet shows how to create a KafkaConsumer: Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); SRM Institute of Science and Technology, Ramapuram 2
  • 86. Receiving Messages with Consumers Steps & Example  Subscribing to Topics  The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to use: consumer.subscribe(Collections.singletonList("customerCountries"));  To subscribe to all test topics, we can call: consumer.subscribe("test.*"); SRM Institute of Science and Technology, Ramapuram 3
  • 87. Receiving Messages with Consumers Steps & Example The Poll Loop try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %sn", record.topic(), record.partition(), record.offset(), record.key(), record.value()); int updatedCount = 1; if (custCountryMap.countainsValue(record.value( ))) { updatedCount = custCountryMap.get(record.value()) + 1; } custCountryMap.put(record.value(), updatedCount) JSONObject json = new JSONObject(custCountryMap); System.out.println(json.toString(4)) } } } finally { consumer.close(); } SRM Institute of Science and Technology, Ramapuram 4
  • 88. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-6 SLO 2 – Steps & Examples Receiving Messages with Consumers SRM Institute of Science and Technology, Ramapuram 1
  • 89. Receiving Messages with Consumers Steps & Example  Creating a Kafka Consumer The following code snippet shows how to create a KafkaConsumer: Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("group.id", "CountryCounter"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props); SRM Institute of Science and Technology, Ramapuram 2
  • 90. Receiving Messages with Consumers Steps & Example  Subscribing to Topics  The subcribe() method takes a list of topics as a parameter, so it’s pretty simple to use: consumer.subscribe(Collections.singletonList("customerCountries"));  To subscribe to all test topics, we can call: consumer.subscribe("test.*"); SRM Institute of Science and Technology, Ramapuram 3
  • 91. Receiving Messages with Consumers Steps & Example The Poll Loop try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { log.debug("topic = %s, partition = %s, offset = %d, customer = %s, country = %sn", record.topic(), record.partition(), record.offset(), record.key(), record.value()); int updatedCount = 1; if (custCountryMap.countainsValue(record.value( ))) { updatedCount = custCountryMap.get(record.value()) + 1; } custCountryMap.put(record.value(), updatedCount) JSONObject json = new JSONObject(custCountryMap); System.out.println(json.toString(4)) } } } finally { consumer.close(); } SRM Institute of Science and Technology, Ramapuram 4
  • 92. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-7 SLO 1 – Developing Kafka Stream Applications SRM Institute of Science and Technology, Ramapuram 1
  • 93. Developing Kafka Stream Applications  The Kafka Streams DSL is the high-level API that enables you to build Kafka Streams applications quickly.  The high-level API is very well thought out, and there are methods to handle most stream-processing needs out of the box, so you can create a sophisticated stream- processing program without much effort.  At the heart of the high-level API is the KStream object, which represents the streaming key/value pair records. Most of the methods in the Kafka Streams DSL return a reference to a KStream object, allowing for a fluent interface style of programming.  Additionally, a good percentage of the KStream methods accept types consisting of single-method interfaces allowing for the use of Java 8 lambda expressions. Taking these factors into account, you can imagine the simplicity and ease with which you can build a Kafka Streams program.
  • 94. Phases in Kafka Stream Applications Development  Your first program will be a toy application that takes incoming messages and converts them to uppercase characters, effectively yelling at anyone who reads the message.
  • 95. Phases in Kafka Stream Applications Development  This is a trivial example, but the code shown here is representative of what you’ll see in other Kafka Streams programs. In most of the examples, you’ll see a similar structure: 1. Define the configuration items. 2. Create Serde instances, either custom or predefined. 3. Build the processor topology. 4. Create and start the KStream.  When we get into the more advanced examples, the principal difference will be in the complexity of the processor topology. With that in mind, it’s time to build your first application.
  • 96. Phases in Kafka Stream Applications Development  Creating the topology for the Yelling App  The first step to creating any Kafka Streams application is to create a source node. The source node is responsible for consuming the records, from a topic, that will flow through the application.
  • 97. Phases in Kafka Stream Applications Development  The following line of code creates the source, or parent, node of the graph. KStream<String, String> simpleFirstStream = builder.stream("src-topic", Consumed.with(stringSerde, stringSerde));  The simpleFirstStreamKStream instance is set to consume messages written to the src- topic topic
  • 98. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-7 SLO 2 – Phases in Kafka Stream Application Development SRM Institute of Science and Technology, Ramapuram 1
  • 99. Phases in Kafka Stream Applications Development  Your first program will be a toy application that takes incoming messages and converts them to uppercase characters, effectively yelling at anyone who reads the message.
  • 100. Phases in Kafka Stream Applications Development  This is a trivial example, but the code shown here is representative of what you’ll see in other Kafka Streams programs. In most of the examples, you’ll see a similar structure: 1. Define the configuration items. 2. Create Serde instances, either custom or predefined. 3. Build the processor topology. 4. Create and start the KStream.  When we get into the more advanced examples, the principal difference will be in the complexity of the processor topology. With that in mind, it’s time to build your first application.
  • 101. Phases in Kafka Stream Applications Development  Creating the topology for the Yelling App  The first step to creating any Kafka Streams application is to create a source node. The source node is responsible for consuming the records, from a topic, that will flow through the application.
  • 102. Phases in Kafka Stream Applications Development  The following line of code creates the source, or parent, node of the graph. KStream<String, String> simpleFirstStream = builder.stream("src-topic", Consumed.with(stringSerde, stringSerde));  The simpleFirstStreamKStream instance is set to consume messages written to the src- topic topic
  • 103. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-8 SLO 1 – Constructing a Topology SRM Institute of Science and Technology, Ramapuram 1
  • 104. Constructing a Topology  BUILDING THE SOURCE NODE  You’ll start by building the source node and first processor of the topology by chaining two calls to the KStream API together. It should be fairly obvious by now what the role of the origin node is. The first processor in the topology will be responsible for masking credit card numbers to protect customer privacy. SRM Institute of Science and Technology, Ramapuram 2
  • 105. Constructing a Topology  KStream<String,Purchase> purchaseKStream = streamsBuilder.stream("transactions", Consumed.with(stringSerde, purchaseSerde)) .mapValues(p -> Purchase.builder(p).maskCreditCard().build());  You create the source node with a call to the StreamsBuilder.stream method using a default String serde, a custom serde for Purchase objects, and the name of the topic that’s the source of the messages for the stream  The next immediate call is to the KStream.mapValues method, taking a ValueMapper< V, V1> instance as a parameter. Value mappers take a single parameter of one type (a Purchase object, in this case) and map that object to a to a new value, possibly of another type. In this example, KStream.mapValues returns an object of the same type (Purchase), but with a masked credit card number.  Note that when using the KStream.mapValues method, the original key is unchanged and isn’t factored into mapping a new value. If you wanted to generate a new key/value pair or include the key in producing a new value, you’d use the KStream.map method that takes a KeyValueMapper<K, V, KeyValue<K1, V1>> instance. SRM Institute of Science and Technology, Ramapuram 3
  • 106. Constructing a Topology  BUILDING THE SECOND PROCESSOR  Now you’ll build the second processor, responsible for extracting pattern data from a topic, which ZMart can use to determine purchase patterns in regions of the country. You’ll also add a sink node responsible for writing the pattern data to a Kafka topic. SRM Institute of Science and Technology, Ramapuram 4
  • 107. Constructing a Topology  This new KStream will start to receive Purchase- Pattern objects created as a result of the mapValues call.  KStream<String, PurchasePattern> patternKStream = purchaseKStream.mapValues(purchase -> PurchasePattern.builder(purchase).build()); patternKStream.to("patterns", Produced.with(stringSerde,purchasePatternSerde));  Here, you declare a variable to hold the reference of the new KStream instance, because you’ll use it to print the results of the stream to the console with a print call. This is very useful during development and for debugging. The purchase-patterns processor forwards the records it receives to a child node of its own, defined by the method call KStream.to, writing to the patterns topic. SRM Institute of Science and Technology, Ramapuram 5
  • 108. Constructing a Topology  BUILDING THE THIRD PROCESSOR  The third processor in the topology is the customer rewards accumulator node, which will let ZMart track purchases made by members of their preferred customer club. The rewards accumulator sends data to a topic consumed by applications at ZMart HQ to determine rewards when customers complete purchases. SRM Institute of Science and Technology, Ramapuram 6
  • 109. Constructing a Topology  KStream<String, RewardAccumulator> rewardsKStream = purchaseKStream.mapValues(purchase -> RewardAccumulator.builder(purchase).build()); rewardsKStream.to("rewards", Produced.with(stringSerde,rewardAccumulatorSerde));  You build the rewards accumulator processor using what should be by now a familiar pattern: creating a new KStream instance that maps the raw purchase data contained in the record to a new object type. You also attach a sink node to the rewards accumulator so the results of the rewards KStream can be written to a topic and used for determining customer reward levels. SRM Institute of Science and Technology, Ramapuram 7
  • 110. Constructing a Topology  BUILDING THE LAST PROCESSOR  Finally, you’ll take the first KStream you created, purchaseKStream, and attach a sink node to write out the raw purchase records (with credit cards masked, of course) to a topic called purchases. The purchases topic will be used to feed into a NoSQL store such as Cassandra (http://cassandra.apache.org/), Presto (https://prestodb.io/), or Elastic Search (www.elastic.co/webinars/getting-started- elasticsearch) to perform ad hoc analysis. Figure 3.9 shows the final processor. SRM Institute of Science and Technology, Ramapuram 8
  • 111. Constructing a Topology  Specifically, you still performed the following steps:  Create a StreamsConfig instance.  Build one or more Serde instances.  Construct the processing topology.  Assemble all the components and start the Kafka Streams program.  In this application, I’ve mentioned using a Serde, but I haven’t explained why or how you create them. Let’s take some time now to discuss the role of the Serde in a Kafka Streams application. SRM Institute of Science and Technology, Ramapuram 9
  • 112. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-8 SLO 2 – Streams and State – Applying stateful operations SRM Institute of Science and Technology, Ramapuram 1
  • 113. Streams and State  The preceding fictional scenario illustrates something that most of us already know instinctively. Sometimes it’s easy to reason about what’s going on, but usually you need some context to make good decisions. When it comes to stream processing, we call that added context state.  At first glance, the notions of state and stream processing may seem to be at odds with each other. Stream processing implies a constant flow of discrete events that don’t have much to do with each other and need to be dealt with as they occur. The notion of state might evoke images of a static resource, such as a database table. SRM Institute of Science and Technology, Ramapuram 2
  • 114. Streams and State  In actuality, you can view these as one and the same. But the rate of change in a stream is potentially much faster and more frequent than in a database table.1 You don’t always need state to work with streaming data. In some cases, you may have discrete events or records that carry enough information to be valuable on their own. But more often than not, the incoming stream of data will need enrichment from some sort of store, either using information from events that arrived before, or joining related events with events from different streams. SRM Institute of Science and Technology, Ramapuram 3
  • 115. Applying Stateful Operation  In this topology, you produced a stream of purchase-transaction events. One of the processing nodes in the topology calculated reward points for customers based on the  amount of the sale. But in that processor, you just calculated the total number of points for the single transaction and forwarded the results.  If you added some state to the processor, you could keep track of the cumulative number of reward points. Then, the consuming application at ZMart would need to check the total and send out a reward if needed. SRM Institute of Science and Technology, Ramapuram 4
  • 116. Applying Stateful Operation  Now that you have a basic idea of how state can be useful in Kafka Streams (or any other streaming application), let’s look at some concrete examples.  You’ll start with transforming the stateless rewards processor into a stateful processor using transformValues.  You’ll keep track of the total bonus points achieved so far and the amount of time between purchases, to provide more information to downstream consumers. SRM Institute of Science and Technology, Ramapuram 5
  • 117. Applying Stateful Operation  The transformValues processor  The most basic of the stateful functions is KStream.transformValues. Figure 4.4 illustrates how the KStream.transformValues() method operates. This method is semantically the same as KStream.mapValues(), with a few exceptions. One difference is that transformValues has access to a StateStore instance to accomplish its task. The other difference is its ability to schedule operations to occur at regular intervals via a punctuate() method. SRM Institute of Science and Technology, Ramapuram 6
  • 118. Applying Stateful Operation  Stateful customer rewards  The rewards processor from the chapter 3 topology for ZMart extracts information for customers belonging to ZMart’s rewards program. Initially, the rewards processor used the KStream.mapValues() method to map the incoming Purchase object into a RewardAccumulator object. The RewardAccumulator object originally consisted of just two fields, the customer ID and the purchase total for the transaction. Now, the requirements have changed some, and points are being associated with the ZMart rewards program: SRM Institute of Science and Technology, Ramapuram 7
  • 119. Applying Stateful Operation  Initializing the value transformer  The first step is to set up or create any instance variables in the transformer init() method. In the init() method, you retrieve the state store created when building the processing topology SRM Institute of Science and Technology, Ramapuram 8
  • 120. Applying Stateful Operation  Mapping the Purchase object to a RewardAccumulator using state  Now that you’ve initialized the processor, you can move on to transforming a Purchase object using state. A few simple steps for performing the transformation are as follows:  1 Check for points accumulated so far by customer ID.  2 Sum the points for the current transaction and present the total.  3 Set the reward points on the RewardAccumulator to the new total amount.  4 Save the new total points by customer ID in the local state store. SRM Institute of Science and Technology, Ramapuram 9
  • 121. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-9 SLO 1 – Example Application Development with Kafka Streams SRM Institute of Science and Technology, Ramapuram 1
  • 122. Example Application Development  Word Count  Let’s walk through an abbreviated word count example for Kafka Streams. You can find the full example on GitHub.  The first thing you do when creating a stream-processing app is configure Kafka Streams. Kafka Streams has a large number of possible configurations, which we won’t discuss here, but you can find them in the documentation. In addition, you can also configure the producer and consumer embedded in Kafka Streams by adding any producer or consumer config to the Properties object: SRM Institute of Science and Technology, Ramapuram 2
  • 123. Example Application Development public class WordCountExample { public static void main(String[] args) throws Exception{ Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount"); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); SRM Institute of Science and Technology, Ramapuram 3
  • 124. Example Application Development  Every Kafka Streams application must have an application ID. This is used to coordinate the instances of the application and also when naming the internal local stores and the topics related to them. This name must be unique for each Kafka Streams application working with the same Kafka cluster.  The Kafka Streams application always reads data from Kafka topics and writes its output to Kafka topics. As we’ll discuss later, Kafka Streams applications also use Kafka for coordination. So we had better tell our app where to find Kafka.  When reading and writing data, our app will need to serialize and deserialize, so we provide default Serde classes. If needed, we can override these defaults later when building the streams topology. SRM Institute of Science and Technology, Ramapuram 4
  • 125. Example Application Development Now that we have the configuration, let’s build our streams topology: KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> source =builder.stream("wordcount-input"); final Pattern pattern = Pattern.compile("W+"); KStream counts = source.flatMapValues(value- >Arrays.asList(pattern.split(value.toLowerCase()))) .map((key, value) -> new KeyValue<Object,Object>(value, value)) .filter((key, value) -> (!value.equals("the"))).groupByKey() .count("CountStore").mapValues(value->Long.toString(value)).toStream(); counts.to("wordcount-output"); SRM Institute of Science and Technology, Ramapuram 5
  • 126. Example Application Development  We create a KStreamBuilder object and start defining a stream by pointing at the topic we’ll use as our input.  Each event we read from the source topic is a line of words; we split it up using a regular expression into a series of individual words. Then we take each word (currently a value of the event record) and put it in the event record key so it can be used in a group-by operation.  We filter out the word “the,” just to show how easy filtering is.  And we group by key, so we now have a collection of events for each unique word. SRM Institute of Science and Technology, Ramapuram 6
  • 127. Example Application Development  We count how many events we have in each collection. The result of counting is a Long data type. We convert it to a String so it will be easier for humans to read the results.  Only one thing left–write the results back to Kafka.  Now that we have defined the flow of transformations that our application will run, we just need to… run it: KafkaStreams streams = new KafkaStreams(builder, props); streams.start(); Thread.sleep(5000L); streams.close(); } } SRM Institute of Science and Technology, Ramapuram 7
  • 128. Example Application Development  Define a KafkaStreams object based on our topology and the properties we defined.  Start Kafka Streams.  After a while, stop it.  Thats it! In just a few short lines, we demonstrated how easy it is to implement a single event processing pattern (we applied a map and a filter on the events). We repartitioned the data by adding a group-by operator and then maintained simple local state when we counted the number of records that have each word as a key. Then we maintained simple local state when we counted the number of times each word appeared.  At this point, we recommend running the full example. The README in the GitHub repository contains instructions on how to run the example. SRM Institute of Science and Technology, Ramapuram 8
  • 129. Example Application Development  One thing you’ll notice is that you can run the entire example on your machine without installing anything except Apache Kafka. This is similar to the experience you may have seen when using Spark in something like Local Mode. The main difference is that if your input topic contains multiple partitions, you can run multiple instances of the WordCount application (just run the app in several different terminal tabs) and you have your first Kafka Streams processing cluster. The instances of the WordCount application talk to each other and coordinate the work. One of the biggest barriers to entry with Spark is that local mode is very easy to use, but then to run a production cluster, you need to install YARN or Mesos and then install Spark on all those machines, and then learn how to submit your app to the cluster. With the Kafka’s Streams API, you just start multiple instances of your app—and you have a cluster.  The exact same app is running on your development machine and in production. SRM Institute of Science and Technology, Ramapuram 9
  • 130. 18CSE489T - STREAMING ANALYTICS UNIT-2 Session-9 SLO 2 – Demo – Kafka Streams SRM Institute of Science and Technology, Ramapuram 1
  • 131. Demo – Kafka Streams  Stock Market Statistics  The next example is more involved—we will read a stream of stock market trading events that include the stock ticker, ask price, and ask size. In stock market trades, ask price is what a seller is asking for whereas bid price is what the buyer is suggesting to pay. Ask size is the number of shares the seller is willing to sell at that price. For simplicity of the example, we’ll ignore bids completely. We also won’t include a timestamp in our data; instead, we’ll rely on event time populated by our Kafka producer.  We will then create output streams that contains a few windowed statistics:  • Best (i.e., minimum) ask price for every five-second window  • Number of trades for every five-second window  • Average ask price for every five-second window  All statistics will be updated every second. SRM Institute of Science and Technology, Ramapuram 2
  • 132. Demo – Kafka Streams  For simplicity, we’ll assume our exchange only has 10 stock tickers trading in it. The setup and configuration are very similar to those we used in the “Word Count” on page 265: Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, "stockstat"); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, Constants.BROKER); props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, TradeSerde.class.getName()); SRM Institute of Science and Technology, Ramapuram 3
  • 133. Demo – Kafka Streams  The main difference is the Serde classes used. In the “Word Count”, we used strings for both key and value and therefore used the Serdes.String() class as a serializer and deserializer for both. In this example, the key is still a string, but the value is a Trade object that contains the ticker symbol, ask price, and ask size.  In order to serialize and deserialize this object (and a few other objects we used in this small app), we used the Gson library from Google to generate a JSon serializer and deserializer from our Java object. Then created a small wrapper that created a Serde object from those. Here is how we created the Serde: static public final class TradeSerde extends WrapperSerde<Trade> { public TradeSerde() { super(new JsonSerializer<Trade>(), new JsonDeserializer<Trade>(Trade.class)); }} SRM Institute of Science and Technology, Ramapuram 4
  • 134. Demo – Kafka Streams  Nothing fancy, but you need to remember to provide a Serde object for every object you want to store in Kafka—input, output, and in some cases, also intermediate results. To make this easier, we recommend generating these Serdes through projects like GSon, Avro, Protobufs, or similar. Now that we have everything configured, it’s time to build our topology: KStream<TickerWindow, TradeStats> stats = source.groupByKey() .aggregate(TradeStats::new, (k, v, tradestats) -> tradestats.add(v), TimeWindows.of(5000).advanceBy(1000), new TradeStatsSerde(), "trade-stats-store") .toStream((key, value) -> new TickerWindow(key.key(), key.window().start())).mapValues((trade) -> trade.computeAvgPrice()); stats.to(new TickerWindowSerde(), new TradeStatsSerde(), "stockstats-output"); SRM Institute of Science and Technology, Ramapuram 5
  • 135. Demo – Kafka Streams  We start by reading events from the input topic and performing a groupByKey() operation. Despite its name, this operation does not do any grouping. Rather, it ensures that the stream of events is partitioned based on the record key. Since wewrote the data into a topic with a key and didn’t modify the key before calling groupByKey(), the data is still partitioned by its key—so this method does nothing in this case.  After we ensure correct partitioning, we start the windowed aggregation. The “aggregate” method will split the stream into overlapping windows (a five-second window every second), and then apply an aggregate method on all the events in the window. The first parameter this method takes is a new object that will contain the results of the aggregation—Tradestats in our case. This is an object we created to contain all the statistics we are interested in for each time window— minimum price, average price, and number of trades. SRM Institute of Science and Technology, Ramapuram 6