SlideShare a Scribd company logo
1 of 5
KAFKA
Introduction:
Kafka is a distributed publish-subscribe messaging system that is designed to be
fast, scalable, and durable.
 Kafka maintains feeds of messages in categories called topics.
 Kafka messages are generated by processes called producers.
 The processes that subscribe to topics and process the feed of published
messages are called consumers.
 Kafka is run as a cluster comprised of one or more servers each of which
is called a broker.
Quick Start:
1. Create a topic
/usr/bin/kafka-topics --create --zookeeper zookeeperIP:2181 --
replication-factor1 --partitions1 --topic testTopic
2. Publish Messagevia Producer to a topic
/usr/bin/kafka-console-producer --broker-list producerIP:9092 --topic
testTopic
This is firstkafka message
3. Starting a consumer
/usr/bin/kafka-console-consumer --zookeeper zookeeperIP:2181 --topic
testTopic --from-beginning
If you start different consumers in different sessions of putty then you can see
messages being delivered to all the consumers as soon as the producer
publishes them
A bit more details:
A topic is a category or feed name to which messages are published. For each
topic, the Kafka cluster maintains a partitioned log that looks like…
Each partition is an ordered, immutable sequence of messages that is continually
appended to commit log. The messages in the partitions are each assigned a
sequential id number called the offset that uniquely identifies each message
within the partition.
The Kafka cluster retains all published messages whether or not they have been
consumed for a configurable period of time. Log retention can be set in two
either, day based retention or size based retention. Kafka's performance is
effectively constant with respect to data size so retaining lots of data is not a
problem.
QnA:
1. What type of messages canbe sent froma producer
The producer class takes two generic parameter i.e
Producer<K, V>
V: type of the message
K: type of the optional key associated with the message
So any kind of message can be sent for example String, JSON, AVRO
2. How a consumer can start reading from a particular offset
Kafka does not take care of the offset up till which a particular consumer has
already read. The consumer has to take care of the offset on his side. The
information regarding the offset up till he has consumed the messages have to
be stored elsewhere i.e HDFS/Db/HBase etc.
Kafka only provides two kind of reading from Beginning OR from Latest Time
3. So When touse Kafka?
Cloudera recommends using Kafka if the data will be consumed by multiple
applications
API Examples:
 A sample Producer
Propertiesprops=newProperties();
props.put("metadata.broker.list",args[0]);
props.put("zk.connect",args[1]);
props.put("serializer.class","kafka.serializer.StringEncoder");
props.put("request.required.acks","1");
StringTOPIC= "event";
ProducerConfigconfig=newProducerConfig(props);
Producer<String,String>producer=new Producer<String,String>(config);
String[] events={"Normal","Normal","Normal",…];
String[] truckIds= {"1", "2", "3","4"};
String[] driverIds={"11", "12", "13", "14"};
Stringmessage = newTimestamp(newDate().getTime()) +"|"
+ truckIds[2] + "|" + driverIds[2] +"|" + events[random.nextInt(evtCnt)] );
try {
KeyedMessage<String,String>data= new KeyedMessage<String,String>(TOPIC, message);
producer.send(data);
Thread.sleep(1000);
} catch (Exceptione) {
e.printStackTrace();
}
 A sample Consumer
Kafka provides a simple consumer which can be modified as per requirement
Steps for using a Simple Consumer
 Find an active Broker and find out which Broker is the leader for your topic
and partition
 Determine who the replica Brokers are for your topic and partition
 Build the request defining what data you are interested in
 Fetch the data
 Identify and recover from leader changes
Data fetch pseudo code
FetchRequestreq=new FetchRequestBuilder().clientId(clientName).addFetch(a_topic,a_partition,
readOffset, 100000).build();
FetchResponsefetchResponse =consumer.fetch(req);
if (fetchResponse.hasError()) {
//Error Handlingcode here
}
for (MessageAndOffset messageAndOffset:fetchResponse.messageSet(a_topic,a_partition)) {
longcurrentOffset=messageAndOffset.offset();
if (currentOffset<readOffset) {
//Properloggerhere
continue;
}
readOffset=messageAndOffset.nextOffset();
ByteBufferpayload=messageAndOffset.message().payload();
byte[] bytes=new byte[payload.limit()];
payload.get(bytes);
System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));
numRead++;
a_maxReads--;
}
Conclusion:
As you can see, Kafka has a unique design that makes it very useful for solving a
wide range of architectural challenges. It is important to make sure you use the
right approach for your use case and use it correctly to ensure high throughput,
low latency, high availability, and no loss of data.

More Related Content

What's hot

NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native EraNATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Erawallyqs
 
RabbitMQ Data Ingestion
RabbitMQ Data IngestionRabbitMQ Data Ingestion
RabbitMQ Data IngestionAlvaro Videla
 
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)wallyqs
 
Comparing processing frameworks v7
Comparing processing frameworks v7Comparing processing frameworks v7
Comparing processing frameworks v7Gabriela Choy
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
GopherFest 2017 - Adding Context to NATS
GopherFest 2017 -  Adding Context to NATSGopherFest 2017 -  Adding Context to NATS
GopherFest 2017 - Adding Context to NATSwallyqs
 
Debugging Network Issues
Debugging Network IssuesDebugging Network Issues
Debugging Network IssuesApcera
 
From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceSpyros Trigazis
 
Securing & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave NetSecuring & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave NetLuke Marsden
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingCREATE-NET
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
 
OpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesOpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesJinwoong Kim
 
Open stack day 2014 havana from grizzly
Open stack day 2014 havana from grizzlyOpen stack day 2014 havana from grizzly
Open stack day 2014 havana from grizzlyChoe Cheng-Dae
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQNahidul Kibria
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE
 
Dockercon Swarm Updated
Dockercon Swarm UpdatedDockercon Swarm Updated
Dockercon Swarm UpdatedDocker, Inc.
 

What's hot (18)

NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native EraNATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
 
RabbitMQ Data Ingestion
RabbitMQ Data IngestionRabbitMQ Data Ingestion
RabbitMQ Data Ingestion
 
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
 
Comparing processing frameworks v7
Comparing processing frameworks v7Comparing processing frameworks v7
Comparing processing frameworks v7
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Spring RabbitMQ
Spring RabbitMQSpring RabbitMQ
Spring RabbitMQ
 
GopherFest 2017 - Adding Context to NATS
GopherFest 2017 -  Adding Context to NATSGopherFest 2017 -  Adding Context to NATS
GopherFest 2017 - Adding Context to NATS
 
Debugging Network Issues
Debugging Network IssuesDebugging Network Issues
Debugging Network Issues
 
From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container service
 
Securing & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave NetSecuring & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave Net
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012
 
OpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesOpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and Kubernetes
 
Open stack day 2014 havana from grizzly
Open stack day 2014 havana from grizzlyOpen stack day 2014 havana from grizzly
Open stack day 2014 havana from grizzly
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQ
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Dockercon Swarm Updated
Dockercon Swarm UpdatedDockercon Swarm Updated
Dockercon Swarm Updated
 

Similar to KAFKA Quickstart

Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Knoldus Inc.
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringJoe Kutner
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guideChetan Khatri
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Timothy Spann
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...tdc-globalcode
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptxKoiuyt1
 
Streaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's BagheeraStreaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's BagheeraVarunkumar Manohar
 

Similar to KAFKA Quickstart (20)

Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 
Kafka overview
Kafka overviewKafka overview
Kafka overview
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptx
 
Streaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's BagheeraStreaming kafka search utility for Mozilla's Bagheera
Streaming kafka search utility for Mozilla's Bagheera
 

KAFKA Quickstart

  • 1. KAFKA Introduction: Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.  Kafka maintains feeds of messages in categories called topics.  Kafka messages are generated by processes called producers.  The processes that subscribe to topics and process the feed of published messages are called consumers.  Kafka is run as a cluster comprised of one or more servers each of which is called a broker. Quick Start: 1. Create a topic /usr/bin/kafka-topics --create --zookeeper zookeeperIP:2181 -- replication-factor1 --partitions1 --topic testTopic 2. Publish Messagevia Producer to a topic /usr/bin/kafka-console-producer --broker-list producerIP:9092 --topic testTopic This is firstkafka message
  • 2. 3. Starting a consumer /usr/bin/kafka-console-consumer --zookeeper zookeeperIP:2181 --topic testTopic --from-beginning If you start different consumers in different sessions of putty then you can see messages being delivered to all the consumers as soon as the producer publishes them A bit more details: A topic is a category or feed name to which messages are published. For each topic, the Kafka cluster maintains a partitioned log that looks like…
  • 3. Each partition is an ordered, immutable sequence of messages that is continually appended to commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition. The Kafka cluster retains all published messages whether or not they have been consumed for a configurable period of time. Log retention can be set in two either, day based retention or size based retention. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem. QnA: 1. What type of messages canbe sent froma producer The producer class takes two generic parameter i.e Producer<K, V> V: type of the message K: type of the optional key associated with the message So any kind of message can be sent for example String, JSON, AVRO 2. How a consumer can start reading from a particular offset Kafka does not take care of the offset up till which a particular consumer has already read. The consumer has to take care of the offset on his side. The information regarding the offset up till he has consumed the messages have to be stored elsewhere i.e HDFS/Db/HBase etc. Kafka only provides two kind of reading from Beginning OR from Latest Time
  • 4. 3. So When touse Kafka? Cloudera recommends using Kafka if the data will be consumed by multiple applications API Examples:  A sample Producer Propertiesprops=newProperties(); props.put("metadata.broker.list",args[0]); props.put("zk.connect",args[1]); props.put("serializer.class","kafka.serializer.StringEncoder"); props.put("request.required.acks","1"); StringTOPIC= "event"; ProducerConfigconfig=newProducerConfig(props); Producer<String,String>producer=new Producer<String,String>(config); String[] events={"Normal","Normal","Normal",…]; String[] truckIds= {"1", "2", "3","4"}; String[] driverIds={"11", "12", "13", "14"}; Stringmessage = newTimestamp(newDate().getTime()) +"|" + truckIds[2] + "|" + driverIds[2] +"|" + events[random.nextInt(evtCnt)] ); try { KeyedMessage<String,String>data= new KeyedMessage<String,String>(TOPIC, message); producer.send(data); Thread.sleep(1000); } catch (Exceptione) { e.printStackTrace(); }  A sample Consumer Kafka provides a simple consumer which can be modified as per requirement Steps for using a Simple Consumer  Find an active Broker and find out which Broker is the leader for your topic and partition  Determine who the replica Brokers are for your topic and partition
  • 5.  Build the request defining what data you are interested in  Fetch the data  Identify and recover from leader changes Data fetch pseudo code FetchRequestreq=new FetchRequestBuilder().clientId(clientName).addFetch(a_topic,a_partition, readOffset, 100000).build(); FetchResponsefetchResponse =consumer.fetch(req); if (fetchResponse.hasError()) { //Error Handlingcode here } for (MessageAndOffset messageAndOffset:fetchResponse.messageSet(a_topic,a_partition)) { longcurrentOffset=messageAndOffset.offset(); if (currentOffset<readOffset) { //Properloggerhere continue; } readOffset=messageAndOffset.nextOffset(); ByteBufferpayload=messageAndOffset.message().payload(); byte[] bytes=new byte[payload.limit()]; payload.get(bytes); System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8")); numRead++; a_maxReads--; } Conclusion: As you can see, Kafka has a unique design that makes it very useful for solving a wide range of architectural challenges. It is important to make sure you use the right approach for your use case and use it correctly to ensure high throughput, low latency, high availability, and no loss of data.