Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kafka Technical Overview

291 views

Published on

A technical overview on Kafka including key terminologies like topic, partition, offset, Kafka architecture (zookeeper, broker and replication). Also covers Kafka and IT team analogy. Read the article @ https://www.linkedin.com/pulse/kafka-technical-overview-sylvester-daniel/

Published in: Technology
  • Great work, has me reinterested in aligning the use of Kafka (correctly)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Kafka Technical Overview

  1. 1. Overview •Problem statement •Use cases •What is Kafka? Key Terminologies •Topic •Partition •offset Kafka Cluster •Zookeeper •Broker •Replication •Analogy •Relationship Demo •CLI •Spark Agenda
  2. 2. Source Source Source Target Target Target Problem statement • Many source and target system integration • High velocity streams
  3. 3. Source Source Source Target Target Target Kafka Use cases • Tracking user activity • Log aggregation • De-coupling systems • Streaming processing
  4. 4. Producer Producer Producer Consumer Consumer Consumer Kafka Kafka • Scale to 100s of nodes • Handle millions of messages per second • Real-time processing (~10ms) Kafka is horizontally scalable, fault tolerant and fast messaging system.
  5. 5. Topic • Stream of data • Similar to table in a NoSQL • Split into partition • Data is retrieved through offset • Offset unique per partition per topic
  6. 6. Key Column 1 2 1 2 3 1 2 Partition 0 Partition 1 Topic ATable A • Table = Topic • Key = offset • Key unique per table = offset unique per partition + topic • Row = message NoSQL table & Topic Analogy
  7. 7. Kafka Broker 1 Broker 2 Broker 3 Topic A (Partition 0) Topic A (Partition 0) Topic A (Partition 1) Topic A (Partition 1) Topic A (Partition 2) Topic A (Partition 2) Topic B (Partition 0) Topic B (Partition 0) Topic B (Partition 1) Topic B (Partition 1) • Topic A – 3 Partitions • Topic B – 2 Partitions Partition • Enables topic to be distributed • Unit of parallelism • Usually one topic many partition • Order is guaranteed only within a partition • Messages are immutable
  8. 8. 1 2 3 4 5 6 1 2 3 4 1 2 3 4 5 Partition 0 Partition 1 Partition 2 Topic A Kafka ProducerProducer Offset Partition – offset & key • Key  Messages are written to partition based on key  No key then round-robin  Keys are important to avoid hotspots. • Offsets  Incremental unique id per partition
  9. 9. Producer Producer Producer Zookeeper node 1 Zookeeper node n Zookeeper Cluster Broker 1 Kafka Cluster Broker 2 Broker 3 Consumer Group Update metadata WriteRead Kafka Architecture • Zookeeper • Kafka nodes (brokers) • Producers • Consumer groups • consumers
  10. 10. Zookeeper 0 (Follower) Zookeeper 0 (Follower) Zookeeper 1 (Leader) Zookeeper 1 (Leader) Zookeeper 2 (Follower) Zookeeper 2 (Follower) Broker 0Broker 0 Broker 1Broker 1 Broker 2Broker 2 Broker 3Broker 3 Broker 5Broker 5 All Meta data Writes Zookeeper • Hierarchical key-value store • Configuration, synchronization and name registry services • Ensemble layer • Ties things together • Ensures high availability • Odd number of nodes • More than 7 nodes not recommended • Kafka can’t work without zookeeper • Stores metadata • Leader & follower nodes • All writes only through leader node • From Kafka 0.10 offsets are not managed by zookeeper • Acts like a project manager (analogy) Zookeeper is a centralized service for managing distributed systems.
  11. 11. Kafka 1 2 3 1 2 Partition 0 Partition 1 Topic A Broker 1 1 2 3 1 2 Partition 2 Partition 3 Topic A Broker 2 Producer Producer Producer Consumer Consumer Consumer Broker • Single Kafka node • Managed by Zookeeper • Topic is distributed across brokers based on partition and replication • Acts like a developer (analogy)
  12. 12. Kafka Broker 1 Broker 2 Broker 3 Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] Topic B (Partition 0) [Follower] • Topic B – 2 Partitions • Replication factor of 2 Topic B (Partition 1) [Leader] Topic B (Partition 1) [Leader] Topic B (Partition 1) [Follower] Producer Consumer Group Replication • Copy of a partition in another broker • Enables fault tolerant • Follower partition replicates from leader • Only leader serves both producer and consumer • ISR – In Sync Replica
  13. 13. Dev Team Developer 1 Developer 2 Developer 3 Module B (Task 0) [Leader] Module B (Task 0) [Leader] Module B (Task 0) [Follower] • Module B – 2 parallel task • 1 back resource for module B Module B (Task 1) [Leader] Module B (Task 1) [Leader] Module B (Task 1) [Follower] Manager (Leader) Manager (Leader) Task Assigner Testing Team Replication – IT team analogy
  14. 14. Partition 0 (Leader) Partition 0 (Leader) Partition 0 (Follower) Partition 0 (Follower) Partition 0 (Follower) Partition 0 (Follower) Producer Write Pull changes Pull changes Replication - Followers
  15. 15. Kafka Broker 1 Broker 2 Broker 3 Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] Topic B (Partition 0) [Leader] • Topic B – 2 Partitions • Replication factor of 2Topic B (Partition 1) [Leader] Topic B (Partition 1) [Leader] Topic B (Partition 0) [Follower] Replication Producer Consumer Group All Reads  Topic B (Partition 1) [Follower] Replication Replication Replication – Leader election
  16. 16. Partition 0 (Leader) Partition 0 (Leader) Partition 1 (Leader) Partition 1 (Leader) Partition 2 (Follower) Partition 2 (Follower) Producer Write Pull changes  Replication – Leader election
  17. 17. FollowersReplicationScaleStreamNodes Cluster manager Zookeeper Broker 0 Topic 0 Partition 0 (Leader) Replication Partition 0 (Follower) Partition 0 (Follower) Topic 1 Broker 1 Topic 0 Partition 1 (Leader) Replication Partition 1 (Follower) Partition 1 (Follower) Components hierarchy
  18. 18. FollowersReplicationScaleStreamNodes Cluster manager Zookeeper Broker 0 Topic 0 Partition 0 (Leader) Replication Partition 0 (Follower) Partition 0 (Follower) Topic 1 Broker 1 Topic 0 Partition 1 (Leader) Replication Partition 1 (Follower) Partition 1 (Follower) Backup team members SharingSub tasksmodulesDev Team Project manager Manager Developers Module 0 Task 0 (Leader) Knowledge Sharing Task 0 (Follower) Task 0 (Follower) Module 1 Developers Module 0 Task 1 (Leader) Knowledge Sharing Task 1 (Follower) Task 1 (Follower) Kafka Cluster IT team Zookeeper Manager Broker Developer Topic Module Partition Task Replication Knowledge sharing Leader Developer who owns the task follower Backup resource IT Team and Kafka Cluster Analogy
  19. 19. Zookeeper Broker 1 1..* Manage Topic 1 0..1 has Partition 1 1..* Split into Replica 1 1..* has Relationship summary
  20. 20. Demo
  21. 21. Read article @ https://www.linkedin.com/pulse/kafka-technical-overview-sylvester-daniel/ LinkedIn - https://www.linkedin.com/in/sylvesterdj/
  22. 22. Thank you

×