The document provides an overview of Kafka including its problem statement, use cases, key terminologies, architecture, and components. It defines topics as streams of data that can be split into partitions with a unique offset. Producers write data to brokers which replicate across partitions for fault tolerance. Consumers read data from partitions in a consumer group. Zookeeper manages the metadata and brokers act as the developers while topics are analogous to modules with partitions as tasks.
Overview of Kafka presentation, including problem statement, use cases, key terminology like Topic, Partition, and components such as Broker and Zookeeper.
Describes the problem statement highlighting many source and target system integrations and the need to manage high velocity data streams.
Overview of Kafka use cases including tracking user activity, log aggregation, system de-coupling, and streaming processing.
Highlights Kafka's capability to scale, handle millions of messages per second, and its characteristics of being fast, fault-tolerant, and horizontally scalable.
Definition of Topics and their structure, explaining how data is split into partitions and retrieved through unique offsets.
Analogy between Kafka Topics and NoSQL tables, detailing how messages are structured with keys and offsets.
Explains partitioning in Kafka, how it enforces distribution, parallelism, and message order guarantees within partitions.
Further explanation on offsets and their role in partitioning for message distribution, highlighting the importance of keys.
Overview of Kafka architecture including Zookeeper, Kafka nodes, brokers, producers, and consumer groups.
Details Zookeeper's function as a central service for managing distributed systems, its role in maintaining high availability.
Describes the role and structure of Kafka Brokers, their distribution of topics and management by Zookeeper.
Explanation of replication processes in Kafka, how it enables fault tolerance, and the roles of leaders and followers.
Analogy comparing replication in Kafka to a project manager-developer scenario, illustrating oversight and backups.
Discusses follower roles in replication, emphasizing their need to maintain state with the leader.
Describes how leader elections work within replication in Kafka, highlighting governance of topic partitions.
Continues the explanation around leader elections in replication, emphasizing key roles in data management.
Hierarchal breakdown of components in Kafka, indicating the structure of brokers, topics, and partitions.
Presents an analogy between an IT team’s structure and the Kafka ecosystem, defining roles like leaders and followers.
Summarizes the relationships between various components of Kafka including topics, partitions, and replicas.
Invitation for a demo, showcasing Kafka functionalities and operations.
Provides a link to read further technical articles on Kafka for additional insights.
Final thank you slide, concluding the presentation.
Source Source Source
TargetTarget Target
Problem statement
• Many source and target system integration
• High velocity streams
3.
Source Source Source
TargetTarget Target
Kafka
Use cases
• Tracking user activity
• Log aggregation
• De-coupling systems
• Streaming processing
4.
Producer Producer Producer
ConsumerConsumer Consumer
Kafka
Kafka
• Scale to 100s of nodes
• Handle millions of messages per second
• Real-time processing (~10ms)
Kafka is horizontally scalable, fault tolerant and
fast messaging system.
5.
Topic
• Stream ofdata
• Similar to table in a NoSQL
• Split into partition
• Data is retrieved through offset
• Offset unique per partition per topic
Kafka
Broker 1 Broker2 Broker 3
Topic A
(Partition 0)
Topic A
(Partition 0)
Topic A
(Partition 1)
Topic A
(Partition 1)
Topic A
(Partition 2)
Topic A
(Partition 2)
Topic B
(Partition 0)
Topic B
(Partition 0)
Topic B
(Partition 1)
Topic B
(Partition 1)
• Topic A – 3 Partitions
• Topic B – 2 Partitions
Partition
• Enables topic to be distributed
• Unit of parallelism
• Usually one topic many partition
• Order is guaranteed only within a partition
• Messages are immutable
8.
1 2 34 5 6
1 2 3 4
1 2 3 4 5
Partition 0
Partition 1
Partition 2
Topic A
Kafka
ProducerProducer
Offset
Partition – offset & key
• Key
Messages are written to partition based on
key
No key then round-robin
Keys are important to avoid hotspots.
• Offsets
Incremental unique id per partition
Zookeeper 0
(Follower)
Zookeeper 0
(Follower)
Zookeeper1
(Leader)
Zookeeper 1
(Leader)
Zookeeper 2
(Follower)
Zookeeper 2
(Follower)
Broker 0Broker 0 Broker 1Broker 1 Broker 2Broker 2 Broker 3Broker 3 Broker 5Broker 5
All Meta data
Writes
Zookeeper
• Hierarchical key-value store
• Configuration, synchronization and name registry services
• Ensemble layer
• Ties things together
• Ensures high availability
• Odd number of nodes
• More than 7 nodes not recommended
• Kafka can’t work without zookeeper
• Stores metadata
• Leader & follower nodes
• All writes only through leader node
• From Kafka 0.10 offsets are not managed by zookeeper
• Acts like a project manager (analogy)
Zookeeper is a centralized service for managing
distributed systems.
11.
Kafka
1 2 3
12
Partition 0
Partition 1
Topic A
Broker 1
1 2 3
1 2
Partition 2
Partition 3
Topic A
Broker 2
Producer Producer Producer
Consumer Consumer Consumer
Broker
• Single Kafka node
• Managed by Zookeeper
• Topic is distributed across brokers based on
partition and replication
• Acts like a developer (analogy)
12.
Kafka
Broker 1 Broker2 Broker 3
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Follower]
• Topic B – 2 Partitions
• Replication factor of 2
Topic B
(Partition 1)
[Leader]
Topic B
(Partition 1)
[Leader]
Topic B
(Partition 1)
[Follower]
Producer
Consumer
Group
Replication
• Copy of a partition in another broker
• Enables fault tolerant
• Follower partition replicates from leader
• Only leader serves both producer and
consumer
• ISR – In Sync Replica
13.
Dev Team
Developer 1Developer 2 Developer 3
Module B
(Task 0)
[Leader]
Module B
(Task 0)
[Leader]
Module B
(Task 0)
[Follower]
• Module B – 2 parallel task
• 1 back resource for module B
Module B
(Task 1)
[Leader]
Module B
(Task 1)
[Leader]
Module B
(Task 1)
[Follower]
Manager
(Leader)
Manager
(Leader)
Task
Assigner
Testing
Team
Replication – IT team analogy