Overview
•Problem statement
•Use cases
•What is Kafka?
Key
Terminologies
•Topic
•Partition
•offset
Kafka Cluster
•Zookeeper
•Broker
•Replication
•Analogy
•Relationship
Demo
•CLI
•Spark
Agenda
Source Source Source
Target Target Target
Problem statement
• Many source and target system integration
• High velocity streams
Source Source Source
Target Target Target
Kafka
Use cases
• Tracking user activity
• Log aggregation
• De-coupling systems
• Streaming processing
Producer Producer Producer
Consumer Consumer Consumer
Kafka
Kafka
• Scale to 100s of nodes
• Handle millions of messages per second
• Real-time processing (~10ms)
Kafka is horizontally scalable, fault tolerant and
fast messaging system.
Topic
• Stream of data
• Similar to table in a NoSQL
• Split into partition
• Data is retrieved through offset
• Offset unique per partition per topic
Key Column
1
2
1 2 3
1 2
Partition 0
Partition 1
Topic ATable A
• Table = Topic
• Key = offset
• Key unique per table = offset unique per partition + topic
• Row = message
NoSQL table & Topic Analogy
Kafka
Broker 1 Broker 2 Broker 3
Topic A
(Partition 0)
Topic A
(Partition 0)
Topic A
(Partition 1)
Topic A
(Partition 1)
Topic A
(Partition 2)
Topic A
(Partition 2)
Topic B
(Partition 0)
Topic B
(Partition 0)
Topic B
(Partition 1)
Topic B
(Partition 1)
• Topic A – 3 Partitions
• Topic B – 2 Partitions
Partition
• Enables topic to be distributed
• Unit of parallelism
• Usually one topic many partition
• Order is guaranteed only within a partition
• Messages are immutable
1 2 3 4 5 6
1 2 3 4
1 2 3 4 5
Partition 0
Partition 1
Partition 2
Topic A
Kafka
ProducerProducer
Offset
Partition – offset & key
• Key
 Messages are written to partition based on
key
 No key then round-robin
 Keys are important to avoid hotspots.
• Offsets
 Incremental unique id per partition
Producer
Producer
Producer
Zookeeper
node 1
Zookeeper
node n
Zookeeper Cluster
Broker 1
Kafka Cluster
Broker 2
Broker 3
Consumer
Group
Update metadata
WriteRead
Kafka Architecture
• Zookeeper
• Kafka nodes (brokers)
• Producers
• Consumer groups
• consumers
Zookeeper 0
(Follower)
Zookeeper 0
(Follower)
Zookeeper 1
(Leader)
Zookeeper 1
(Leader)
Zookeeper 2
(Follower)
Zookeeper 2
(Follower)
Broker 0Broker 0 Broker 1Broker 1 Broker 2Broker 2 Broker 3Broker 3 Broker 5Broker 5
All Meta data
Writes
Zookeeper
• Hierarchical key-value store
• Configuration, synchronization and name registry services
• Ensemble layer
• Ties things together
• Ensures high availability
• Odd number of nodes
• More than 7 nodes not recommended
• Kafka can’t work without zookeeper
• Stores metadata
• Leader & follower nodes
• All writes only through leader node
• From Kafka 0.10 offsets are not managed by zookeeper
• Acts like a project manager (analogy)
Zookeeper is a centralized service for managing
distributed systems.
Kafka
1 2 3
1 2
Partition 0
Partition 1
Topic A
Broker 1
1 2 3
1 2
Partition 2
Partition 3
Topic A
Broker 2
Producer Producer Producer
Consumer Consumer Consumer
Broker
• Single Kafka node
• Managed by Zookeeper
• Topic is distributed across brokers based on
partition and replication
• Acts like a developer (analogy)
Kafka
Broker 1 Broker 2 Broker 3
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Follower]
• Topic B – 2 Partitions
• Replication factor of 2
Topic B
(Partition 1)
[Leader]
Topic B
(Partition 1)
[Leader]
Topic B
(Partition 1)
[Follower]
Producer
Consumer
Group
Replication
• Copy of a partition in another broker
• Enables fault tolerant
• Follower partition replicates from leader
• Only leader serves both producer and
consumer
• ISR – In Sync Replica
Dev Team
Developer 1 Developer 2 Developer 3
Module B
(Task 0)
[Leader]
Module B
(Task 0)
[Leader]
Module B
(Task 0)
[Follower]
• Module B – 2 parallel task
• 1 back resource for module B
Module B
(Task 1)
[Leader]
Module B
(Task 1)
[Leader]
Module B
(Task 1)
[Follower]
Manager
(Leader)
Manager
(Leader)
Task
Assigner
Testing
Team
Replication – IT team analogy
Partition 0
(Leader)
Partition 0
(Leader)
Partition 0
(Follower)
Partition 0
(Follower)
Partition 0
(Follower)
Partition 0
(Follower)
Producer
Write
Pull changes
Pull changes
Replication - Followers
Kafka
Broker 1 Broker 2 Broker 3
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Leader]
• Topic B – 2 Partitions
• Replication factor of 2Topic B
(Partition 1)
[Leader]
Topic B
(Partition 1)
[Leader]
Topic B
(Partition 0)
[Follower]
Replication
Producer
Consumer
Group
All Reads
 Topic B
(Partition 1)
[Follower]
Replication
Replication
Replication – Leader election
Partition 0
(Leader)
Partition 0
(Leader)
Partition 1
(Leader)
Partition 1
(Leader)
Partition 2
(Follower)
Partition 2
(Follower)
Producer
Write
Pull changes

Replication – Leader election
FollowersReplicationScaleStreamNodes
Cluster
manager
Zookeeper
Broker 0
Topic 0
Partition 0
(Leader)
Replication
Partition 0
(Follower)
Partition 0
(Follower)
Topic 1
Broker 1 Topic 0
Partition 1
(Leader)
Replication
Partition 1
(Follower)
Partition 1
(Follower)
Components hierarchy
FollowersReplicationScaleStreamNodes
Cluster
manager
Zookeeper
Broker 0
Topic 0
Partition 0
(Leader)
Replication
Partition 0
(Follower)
Partition 0
(Follower)
Topic 1
Broker 1 Topic 0
Partition 1
(Leader)
Replication
Partition 1
(Follower)
Partition 1
(Follower)
Backup
team
members
SharingSub tasksmodulesDev Team
Project
manager
Manager
Developers
Module 0
Task 0
(Leader)
Knowledge
Sharing
Task 0
(Follower)
Task 0
(Follower)
Module 1
Developers Module 0
Task 1
(Leader)
Knowledge
Sharing
Task 1
(Follower)
Task 1
(Follower)
Kafka Cluster IT team
Zookeeper Manager
Broker Developer
Topic Module
Partition Task
Replication Knowledge sharing
Leader Developer who owns the task
follower Backup resource
IT Team and Kafka Cluster Analogy
Zookeeper Broker
1 1..*
Manage
Topic
1 0..1
has
Partition
1 1..*
Split into
Replica
1
1..*
has
Relationship summary
Demo
Read article @ https://www.linkedin.com/pulse/kafka-technical-overview-sylvester-daniel/
LinkedIn - https://www.linkedin.com/in/sylvesterdj/
Thank you

Kafka Technical Overview

  • 1.
    Overview •Problem statement •Use cases •Whatis Kafka? Key Terminologies •Topic •Partition •offset Kafka Cluster •Zookeeper •Broker •Replication •Analogy •Relationship Demo •CLI •Spark Agenda
  • 2.
    Source Source Source TargetTarget Target Problem statement • Many source and target system integration • High velocity streams
  • 3.
    Source Source Source TargetTarget Target Kafka Use cases • Tracking user activity • Log aggregation • De-coupling systems • Streaming processing
  • 4.
    Producer Producer Producer ConsumerConsumer Consumer Kafka Kafka • Scale to 100s of nodes • Handle millions of messages per second • Real-time processing (~10ms) Kafka is horizontally scalable, fault tolerant and fast messaging system.
  • 5.
    Topic • Stream ofdata • Similar to table in a NoSQL • Split into partition • Data is retrieved through offset • Offset unique per partition per topic
  • 6.
    Key Column 1 2 1 23 1 2 Partition 0 Partition 1 Topic ATable A • Table = Topic • Key = offset • Key unique per table = offset unique per partition + topic • Row = message NoSQL table & Topic Analogy
  • 7.
    Kafka Broker 1 Broker2 Broker 3 Topic A (Partition 0) Topic A (Partition 0) Topic A (Partition 1) Topic A (Partition 1) Topic A (Partition 2) Topic A (Partition 2) Topic B (Partition 0) Topic B (Partition 0) Topic B (Partition 1) Topic B (Partition 1) • Topic A – 3 Partitions • Topic B – 2 Partitions Partition • Enables topic to be distributed • Unit of parallelism • Usually one topic many partition • Order is guaranteed only within a partition • Messages are immutable
  • 8.
    1 2 34 5 6 1 2 3 4 1 2 3 4 5 Partition 0 Partition 1 Partition 2 Topic A Kafka ProducerProducer Offset Partition – offset & key • Key  Messages are written to partition based on key  No key then round-robin  Keys are important to avoid hotspots. • Offsets  Incremental unique id per partition
  • 9.
    Producer Producer Producer Zookeeper node 1 Zookeeper node n ZookeeperCluster Broker 1 Kafka Cluster Broker 2 Broker 3 Consumer Group Update metadata WriteRead Kafka Architecture • Zookeeper • Kafka nodes (brokers) • Producers • Consumer groups • consumers
  • 10.
    Zookeeper 0 (Follower) Zookeeper 0 (Follower) Zookeeper1 (Leader) Zookeeper 1 (Leader) Zookeeper 2 (Follower) Zookeeper 2 (Follower) Broker 0Broker 0 Broker 1Broker 1 Broker 2Broker 2 Broker 3Broker 3 Broker 5Broker 5 All Meta data Writes Zookeeper • Hierarchical key-value store • Configuration, synchronization and name registry services • Ensemble layer • Ties things together • Ensures high availability • Odd number of nodes • More than 7 nodes not recommended • Kafka can’t work without zookeeper • Stores metadata • Leader & follower nodes • All writes only through leader node • From Kafka 0.10 offsets are not managed by zookeeper • Acts like a project manager (analogy) Zookeeper is a centralized service for managing distributed systems.
  • 11.
    Kafka 1 2 3 12 Partition 0 Partition 1 Topic A Broker 1 1 2 3 1 2 Partition 2 Partition 3 Topic A Broker 2 Producer Producer Producer Consumer Consumer Consumer Broker • Single Kafka node • Managed by Zookeeper • Topic is distributed across brokers based on partition and replication • Acts like a developer (analogy)
  • 12.
    Kafka Broker 1 Broker2 Broker 3 Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] Topic B (Partition 0) [Follower] • Topic B – 2 Partitions • Replication factor of 2 Topic B (Partition 1) [Leader] Topic B (Partition 1) [Leader] Topic B (Partition 1) [Follower] Producer Consumer Group Replication • Copy of a partition in another broker • Enables fault tolerant • Follower partition replicates from leader • Only leader serves both producer and consumer • ISR – In Sync Replica
  • 13.
    Dev Team Developer 1Developer 2 Developer 3 Module B (Task 0) [Leader] Module B (Task 0) [Leader] Module B (Task 0) [Follower] • Module B – 2 parallel task • 1 back resource for module B Module B (Task 1) [Leader] Module B (Task 1) [Leader] Module B (Task 1) [Follower] Manager (Leader) Manager (Leader) Task Assigner Testing Team Replication – IT team analogy
  • 14.
    Partition 0 (Leader) Partition 0 (Leader) Partition0 (Follower) Partition 0 (Follower) Partition 0 (Follower) Partition 0 (Follower) Producer Write Pull changes Pull changes Replication - Followers
  • 15.
    Kafka Broker 1 Broker2 Broker 3 Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] • Topic B – 2 Partitions • Replication factor of 2Topic B (Partition 1) [Leader] Topic B (Partition 1) [Leader] Topic B (Partition 0) [Follower] Replication Producer Consumer Group All Reads  Topic B (Partition 1) [Follower] Replication Replication Replication – Leader election
  • 16.
    Partition 0 (Leader) Partition 0 (Leader) Partition1 (Leader) Partition 1 (Leader) Partition 2 (Follower) Partition 2 (Follower) Producer Write Pull changes  Replication – Leader election
  • 17.
    FollowersReplicationScaleStreamNodes Cluster manager Zookeeper Broker 0 Topic 0 Partition0 (Leader) Replication Partition 0 (Follower) Partition 0 (Follower) Topic 1 Broker 1 Topic 0 Partition 1 (Leader) Replication Partition 1 (Follower) Partition 1 (Follower) Components hierarchy
  • 18.
    FollowersReplicationScaleStreamNodes Cluster manager Zookeeper Broker 0 Topic 0 Partition0 (Leader) Replication Partition 0 (Follower) Partition 0 (Follower) Topic 1 Broker 1 Topic 0 Partition 1 (Leader) Replication Partition 1 (Follower) Partition 1 (Follower) Backup team members SharingSub tasksmodulesDev Team Project manager Manager Developers Module 0 Task 0 (Leader) Knowledge Sharing Task 0 (Follower) Task 0 (Follower) Module 1 Developers Module 0 Task 1 (Leader) Knowledge Sharing Task 1 (Follower) Task 1 (Follower) Kafka Cluster IT team Zookeeper Manager Broker Developer Topic Module Partition Task Replication Knowledge sharing Leader Developer who owns the task follower Backup resource IT Team and Kafka Cluster Analogy
  • 19.
    Zookeeper Broker 1 1..* Manage Topic 10..1 has Partition 1 1..* Split into Replica 1 1..* has Relationship summary
  • 20.
  • 21.
    Read article @https://www.linkedin.com/pulse/kafka-technical-overview-sylvester-daniel/ LinkedIn - https://www.linkedin.com/in/sylvesterdj/
  • 22.