Introduction to Apache Kafka

Traditional Messaging
● Java Messaging Service (JMS)

● Advanced Messaging Queuing Protocol (AMQP)

● Message Queuing Telemetry Transport (MQTT)

○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ

○ Rabbit MQ

○ Rabbit MQ
○ Hive MQ

A very famous Qns
https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ

“Performance-wise, both are excellent
performers, but have major architectural
differences.”
--from the quora qns discussion

What’s the Diff ?
● Server Pushes/delivers msg to Subscriber ● Subscriber pulls/picks up msg from server

What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg

What’s the Diff ?
○ Store each msg & its state(delivered etc)
○ Just store msg. Dont care whether pickedup or not

What’s the Diff ?
○ Maintain order of msg
○ Ordering logic dictated by client & storage format

What’s the Diff ?
● Hence mostly an ‘Online’ processing model
● Hence mostly an Oﬄine processing model

What’s the Diff ?
● Server can do complex routing logic.
● Client maintains routing logic. Server is blind to it.

What’s the Diff ?
● Server can do complex routing logic.
● Client maintains routing logic. Server is blind to it.
● Also. Subscriber stores state i.e. which msg’s it picked up

Apache Kafka
Notions:
● Publisher

Apache Kafka
Notions:
● Publisher
● Message

Apache Kafka
Notions:
● Publisher
● Message
● Topic

Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition

Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker

Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer

Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer
● Message Oﬀset

Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)

Summary
● Broker receives message & appends message to end of topic partition.

Summary
● Subscriber requests broker for msg at speciﬁc oﬀset in a Topic Partition.

Summary
● Subscriber requests broker for msg at specific offset in a Topic Partition.
● Upto Subscriber to remember which msg offset it has processed.

A lovely use case - REPLAY
● Since Subscriber requests for a message at an oﬀset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.

A lovely use case - REPLAY
● Since Subscriber requests for a message at an oﬀset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.
● Handy when outages occur.

Things to Ponder about
● How do i achieve high Read/Write Throughput ?

○ Have more partitions per topic , this determines read/write throughput

● Can multiple publishers publish concurrently to same topic partition ?

○ Yes

○ Yes
● Should multiple Consumers read from same topic partition ?

○ Yes
○ Ideally one Consumer per partition or Consumer group per partition

○ Yes
● What about replication of data ?

○ Yes
○ While creating a topic, you can set replication factor which applies to each topic partition.

○ Yes
● What about data retention time policy ?

○ Yes
○ While creating a topic, please set it. You can edit later on.

○ Yes
● Think about producer Partitioning key ...

○ Yes
● Think about producer Partitioning key …

Others ...
● Amazon Kinesis is similar to Kafka ….
● You have Redis - PubSub (diﬀerent guarantees, not similar to
kafka)

What i did not cover ? :)
● Kafka Replication mechanism
○ ISR = in sync replica set
● Tools like Kafka mirror
● Zookeeper interaction (yes kafka depends on zookeeper)

What’s new in kafka ?
● Kafka stream api
● Kafka Sql
● See release notes … :)

producer.send(“ Any Questions ? Thanks ”)

Introduction to Apache Kafka

More Related Content

Similar to Introduction to Apache Kafka

More from vishnu rao

Recently uploaded

Introduction to Apache Kafka