Kafka overview

APACHE KAFKA DEMYSTIFIED
Shanki Singh Gandhi
@shankisg

OVERVIEW
Apache Kafka is an open-source stream processing platform developed by
the Apache Software Foundation written in Scala and Java. The project aims
to provide a unified, high-throughput, low-latency platform for handling
real-time data feeds.

KEY POINTS
 Kafka is run as a cluster on one or more servers.
 The Kafka cluster stores streams of records in categories called topics.
 Each record consists of a key, a value, and a timestamp.

CONCEPTS
 Producer: Application that sends the messages.
 Consumer: Application that receives the messages.
 Message: Information that is sent from the producer to a consumer through Apache Kafka.
 Connection: A connection is a TCP connection between your application and the Kafka broker.
 Topic: A Topic is a category/feed name to which messages are stored and published.
 Topic partition: Kafka topics are divided into a number of partitions, which allows you to split data across
multiple brokers.
 Replicas A replica of a partition is a "backup" of a partition. Replicas never read or write data. They are used
to prevent data loss.
 Consumer Group: A consumer group includes the set of consumer processes that are subscribing to a
specific topic.
 Offset: The offset is a unique identifier of a record within a partition. It denotes the position of the consumer
in the partition.
 Node: A node is a single computer in the Apache Kafka cluster.
 Cluster: A cluster is a group of nodes i.e., a group of computers.

KAFKA TOPIC
 Topic is a category or feed name to which records are published.
 Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that
subscribe to the data written to it.
 Each partition is an ordered, immutable sequence of records that is continually appended to structured
commit log.
 The records in the partitions are each assigned a sequential id number called the offset that uniquely
identifies each record within the partition.

KAFKA APIS
• Producer Api
• Consumer Api
• Streams Api
• Connector Api

PRODUCER
 Producers publish data to the topics of their choice.
 Producers write to a single leader, this provides a means of load balancing production so that
each write can be serviced by a separate broker and machine.

CONSUMERS AND CONSUMER GROUPS
 Consumers label themselves with a consumer group name, and each record published to a topic is
delivered to one consumer instance within each subscribing consumer group. Consumer instances
can be in separate processes or on separate machines.
 If all the consumer instances have the same consumer group, then the records will effectively be
load balanced over the consumer instances.
 If all the consumer instances have different consumer groups, then each record will be broadcast to
all the consumer processes.

START KAFKA SERVER
 Download kafka1.0.0 from here
 Extract the code
tar -xzf kafka_2.11-1.0.0.tgz
cd kafka_2.11-1.0.0
 Start zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
 Start kafka server
bin/kafka-server-start.sh config/server.properties

BASIC KAFKA CLI COMMANDS
Create topic
 bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
List topics
 bin/kafka-topics.sh --list --zookeeper localhost:2181
Start producer
 bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Start consumer
 bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

CONNECTING KAFKA FROM PYTHON
 kafka-python
 Install kafka-python: pip install kafka-python
 Github: https://github.com/dpkp/kafka-python
 Documentation: https://kafka-python.readthedocs.io/en/master/index.html

PRODUCER SAMPLE CODE
import json
from kafka import KafkaProducer
# Send json data to a kafka topic
producer = KafkaProducer(value_serializer=json.dumps, bootstrap_servers=[kafka_url])
data = {key: value}
producer.send(“my-topic”, data )

CONSUMER SAMPLE CODE
from kafka import KafkaConsumer
# Connecting to kakfa and subscribing to a topic
consumer = KafkaConsumer(“my-topic”, group_id=“my-group”, bootstrap_servers=[kafka_url])
# Start consuming data
for msg in consumer:
print msg

IMPORTANT LINKS
 https://kafka.apache.org/intro
 https://kafka.apache.org/quickstart
 https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-
apache-kafka.html
 http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
 https://kafka-python.readthedocs.io/en/master/usage.html

Kafka overview

More Related Content

What's hot

Similar to Kafka overview

Recently uploaded

Kafka overview