Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Grokking TechTalk #24: Kafka's principles and protocols

567 views

Published on

Bài talk sẽ giới thiệu về Kafka, và đào sâu về các principles của Kafka, các thiết kế của Kafka để làm Kafka nhanh, scalable và độ ổn định cao. Bài talk cũng chia sẻ về cách Kafka servers tương tác với Kafka clients.

Bài talk đào sâu vào internals của Kafka và phân tích tại sao các design decisions được thiết kế như vậy. Bài talk phù hợp cho các bạn software engineer đã, đang muốn tìm hiểu về các job queue, message queue khác nhau.

Speaker: Nguyen Quang Minh
- Software Engineer, Technical Lead @ Employment Hero
- Contributor of `ruby-kafka` (the most popular Kafka client for Ruby)

Published in: Technology
  • Be the first to comment

Grokking TechTalk #24: Kafka's principles and protocols

  1. 1. Kafka's principles and protocols Minh Nguyen Tech Lead @ Employment Hero nguyenquangminh.info
  2. 2. Hello. My name is Minh - A Ruby lover and a Golang amateur - I’m a nerdy guy - I love researching the underlying of systems - I love open-source world - And I am a cat owner
  3. 3. Agenda - The problems we are solving at Employment Hero - Fundamental concepts of Kafka - Kafka producers - Kafka consumers and consumer group - Consumer group flow (if we have time) - Introduction to Kafka protocols (if we have time)
  4. 4. The problems ... - Employment Hero is a startup whose main product is a HR platform. - It started in 2012, as a simple Ruby on Rails application developed by some developers. Ruby on Rails Feature A Feature B Feature C
  5. 5. Feature G Feature H ... Feature D Feature E Feature F The problems ... - Now there are > 100 employees, > 30 developers; multiple million dollars fund. - It becomes really huge system, consists of hundred modules; complicated frontend stacks and 2 mobile applications - Finally, we start to follow the microservice path, since 2017. React jQuery Backbone ?! Feature A Feature B Feature C Ruby on Rails Grape Sidekiq ...
  6. 6. Rails The problems ... Feature A Sinatra Feature B Golang Feature C Main app - The features with good boundary are extracted gradually into smaller services
  7. 7. A concrete example - When a user updates something, all the changes of the operation are captured. - The user’s supervisor and our support team are able to view, filter and search the audits.
  8. 8. A concrete example User A Signs contract User A uploads a signature User A agree the contract terms User A uses the signature in the contract Contract is marked Completed User A is marked Onboarded
  9. 9. A concrete example - A request could generate dozens of audits - Each audit must go through a data pipeline: + Persistent storage + Full-text search indexing + Government reporting - There are too much works in a single request
  10. 10. Our solution User A Signs contract Main app User A uploads a signature User A agree the contract terms ... Message Queue Audit service Produces Consumes Postgres ElasticSearch
  11. 11. The message queue - The message queue must be: + High available. + Durable. + Scalable. + Fast. Extremely fast. - After a lot of consideration, we choose Kafka
  12. 12. What is Kafka? - Open-source distributed streaming platform - Act as a message queue, let us publish and subscribe to streams of records - Allow process the record stream in real time - Able to connect to external systems for importing / exporting
  13. 13. What is Kafka? Producer Producer Producer Kafka Consumer Consumer Consumer Stream Processor Connector
  14. 14. Kafka’s fundamental concepts - Kafka organizes the messages by the concept of Topic. - Each topic has many Partitions. - Each partition is a list of durable messages. - When a message is sent to Kafka under a topic, the message is “sharded” to one partition of the topic. - The message partition assignment is decided by the producers.
  15. 15. Kafka’s fundamental concepts Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4
  16. 16. Kafka’s fundamental concepts Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 User A uploads a signature 5 1 2 43 Partition 4 Producer
  17. 17. Kafka’s fundamental concepts - The partitions could be distributed to multiple machines. Each machine is called a Broker. - Each broker could have 0, 1 or many partitions of the same topic; or even ones of different topics.
  18. 18. Kafka’s fundamental concepts Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Broker 101 Broker 102 Broker 103
  19. 19. Kafka’s fundamental concepts - The messages are persisted into hard disk - Kafka supports Replication to ensure the high-availability and fault-tolerance. - Each partition could have many replicas, based on replication factor. - The replicas are not necessarily on the same nodes
  20. 20. Kafka’s fundamental concepts Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Broker 101 Broker 102 Broker 103 1 2 43 Replica of 1 1 2 3 Replica of 2 1 2 43 Replica of 3 51 2 43 Replica of 4 Replication Factor = 2 Leader Partition
  21. 21. Kafka’s fundamental concepts - Only the leader partitions are allowed to receive the messages. - Then they sync the message to the replicas.
  22. 22. Kafka concepts Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Broker 101 Broker 102 Broker 103 1 2 43 Replica of 1 1 2 3 Replica of 2 1 2 43 Replica of 3 51 2 43 Replica of 4 Replication Factor = 2 User A uploads a signature Producer 5 5
  23. 23. Kafka’s fundamental concepts - When a leader partition dies, one of the replica is elected to become a new leader partition. - When that partition comes back, it becomes a replica and fetches the missing data from others. - All of this leader-replica mechanism is handled by Apache Zookeeper
  24. 24. i Kafka’s fundamental concepts Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Broker 101 Broker 102 Broker 103 1 2 43 Replica of 1 1 2 3 Replica of 2 1 2 43 Partition 3 51 2 43 Partition 4 Broker dies New leader
  25. 25. Kafka’s fundamental concepts - It is obvious that Kafka satisfied two characteristics: + Durability + High-availability
  26. 26. Kafka Producers - Kafka Producers try to be simple - At the beginning, the producers fetch the metadata from + List of brokers + Interesting topics and their partitions, replicas - They interact directly with various brokers - There are no centralized coordinators
  27. 27. Kafka Producers 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Broker 101 Broker 102 Broker 103 1 2 43 Replica of 1 1 2 3 Replica of 2 1 2 43 Replica of 3 51 2 43 Replica of 4 Producer A Producer B
  28. 28. Kafka Producers - Remove the write-bottleneck completely. - Each broker receives a reasonable number of messages - Want to scale? Add more partitions and more brokers. - The scaling is nearly linear
  29. 29. Kafka Consumers - Kafka consumers are much more complicated. - Just like producers, the consumers start their operations by fetching the medata. - Each consumer is able to connect to multiple brokers and encouraged to read from replicas. - Each broker handles a set of partitions from topics the consumer is interested in at once
  30. 30. Kafka Consumers Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Broker 101 Broker 102 Broker 103 1 2 43 Replica of 1 1 2 3 Replica of 2 1 2 43 Replica of 3 51 2 43 Replica of 4 Audit service
  31. 31. Kafka Consumers - The workload between brokers are balance. Reduce read-bottleneck - Each consumer has plenty of replicas to read from. - This increases the high availability and help in workload balance
  32. 32. Kafka Consumers Audit 1 2 43 Partition 1 1 2 3 Partition 2 Broker 101 Broker 102 1 2 43 Replica of 1 51 2 43 Replica of 2 Service A 1 2 3 Replica of 2 Broker 103 1 2 43 Replica of 1 5 Service B 1 2 3 Replica of 1 Broker 104 1 2 43 Replica of 2 5 Service C
  33. 33. Kafka Consumer Group - To help consumers scale easily, the concept of Consumer Group is introduced - Each consumer belongs to a Consumer Group - Each message is broadcasted to all the groups - Each group member exclusively handles messages from a partition
  34. 34. Kafka Consumer Group Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Audit service Consumer Consumer Report service
  35. 35. Kafka Consumer Group Audit 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Audit service Consumer Consumer Report service
  36. 36. Kafka Consumer Group - Each consumer is able to handle messages from more than 1 partition. - Guarantee all partitions are covered - Guarantee the message order within a partition - The members in the group decide how to contribute messages by themselves. - Sometimes, Kafka is called “Dump brokers, smart consumer”
  37. 37. Kafka Consumer Group - The workload is load-balanced between consumers of the same group - Want to scale? Increase brokers, increase partitions and increases number of consumers - Rule of thumb: - Number of consumers <= Number of partitions
  38. 38. Let’s get back to our example User A Main app User A uploads a signature User A agree the contract terms User A uses the signature in the contract Contract is marked Completed User A is marked Onboarded Partition 1 Broker 101 Replica of 2 Partition 2 Broker 102 Replica of 1 Audit consumer 1 Audit consumer 1 Partition 3 Replica of 3 User B Main app
  39. 39. Kafka Consumer Group - All those things about consumers and producers satisfied the last two characteristics: + Scalability + High-performance (half of the story)
  40. 40. Consumer group flow - At a time, there is a special broker that takes care of a group called group coordinator - The group coordinator is chosen randomly. Any broker can become a group coordinator of a group - Coordinator handles all group operations: join group, sync group, heartbeat, commit offsets, etc.
  41. 41. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Audit service Consumer Broker 101 Broker 102 New consumer
  42. 42. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer 1. Ask bootstrap broker about the group coordinator by Group Coordinator API. For example: broker 101 is the group coordinator Broker 101 Broker 102 Audit service
  43. 43. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer 2. Send join group request to group coordinator with consumer’s supported protocols. Broker 101 Broker 102 Join Audit service
  44. 44. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 Blocked 3. The new consumer is blocked by the group coordinator. The coordinator waits for “other” participants. Typically, it waits until all old group members send join request or exceed a timeout Audit service
  45. 45. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 Error! Need to re-join Blocked 4. After the group coordinator receives the join group request, other consumers will be indicated about the new member (via heartbeat, commit offset, etc). They are required to send join group request again Audit service
  46. 46. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 Join Still blocked Audit service
  47. 47. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 4. When all members are in or exceed a timeout, the group coordinator releases the block and returns response back to the members. Audit service
  48. 48. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 5. A lucky member is chosen to become this generation’s group leader. Its response attaches a list of group members and each member’s metadata.Leader Audit service
  49. 49. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 6. The group leader assigns the workload to each member based on the member’s metadata. Other members don’t have to do this taskLeader Audit service
  50. 50. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Broker 101 Broker 102 Sync Sync Consumer Sync Sync 7. All members continue to send sync group request. Like join group request, sync group is a block request. The leader’s request attaches member assignment Audit service
  51. 51. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 8. Each member receives the sync group response. This response includes the current member assignment Audit service
  52. 52. Consumer group flow 1 2 43 Partition 1 1 2 3 Partition 2 1 2 43 Partition 3 5 1 2 43 Partition 4 Consumer Consumer Consumer Consumer Broker 101 Broker 102 9. Finally, each consumer subscribes to the partitions it is assigned. New consumer becomes a group member Audit service
  53. 53. Consumer group flow - Provide flexibility and give consumers more power - However, great power comes great responsibilities. The consumer clients are usually complicated and hard to implement right and full feature!
  54. 54. Kafka Protocol - Kafka provides various powerful APIs for the clients - It implements its own binary protocol over TCP - The protocol follows request - response model - There are about ~20 APIs in newest version - Each API has its own version and Kafka ensures the backward compatibility
  55. 55. Kafka Protocol - Each field in the request / response has a type - There are primitive types: - int8, int16, int32, int64 - The composed types: - string: [size in int16][string] - bytes: [size in int32][bytes] - Array is supported: [size in int32][e1][e2]...
  56. 56. Request format Request Size (int32) API Key (int16) API Version (int16) Correlation Id (int32) ClientId (string) Each API has a numeric API key Each API has a specific version, which defines the body’s structure The same as Correlation ID in the request TopicMetadataRequest Number of topics (int32) Topic 1 (string) Topic 2 (string)
  57. 57. Request example Request Size (int32) API Key (int16) API Version (int16) Correlation Id (int32) ClientId (string) TopicMetadataRequest Number of topics (int32) Topic 1 (string) Topic 2 (string) 0x0003 0x0000002c 0x0000 0x00000000 0x000f 0x67726f6b6b696e672d636c69656e74 0x00000002 0x0005 0x5669657773 0x0006 0x4f7264657273 44 3 0 0 grokking-client 2 Views Orders Final request: 0x0000002c0003000000000000000f67726f6b6b696e672d636c69656e74000000020005566965777300064f 7264657273
  58. 58. Response format Response Size (int32) Correlation Id (int32) TopicMetadataResponse [Brokers] Node ID (int32) Host (string) Port (int32) [Topics] ErrorCode (int16) Topic name (string) [Partitions] ErrorCode (int16) Partition ID (int32) Replicas (array of int32) Isr (array of int32) Final response: 0x00000012d0000000000000003000003e9000c6163396662 3966343839343000002384000003eb000c353638636161336 36638303300002384000003ea000c38326234313034666634 376100002384000000020000000556696577730000000500 0000000002000003ea00000002000003ea000003eb000000 02000003ea000003eb000000000004000003e90000000200 0003e9000003eb00000002000003e9000003eb0000000000 01000003e900000002000003e9000003ea00000002000003 e9000003ea000000000003000003eb00000002000003eb00 0003ea00000002000003eb000003ea000000000000000003 eb00000002000003eb000003e900000002000003eb000003 e9000000064f726465727300000001000000000000000003e 900000001000003e900000001000003e9 301 0
  59. 59. Demo simple request Metadata program (Ruby)
  60. 60. Demo simple request Metadata program (Ruby)
  61. 61. Demo simple request Metadata program (Ruby)
  62. 62. Demo simple request Metadata program (Ruby)
  63. 63. References - Kafka introduction and concepts - Kafka official protocol document
  64. 64. Kafka is not a silver bullet - Kafka is fast and crazily scalable. - It is not easy to use. - The client libraries are just the tools. It doesn’t solve all of our problems. - Therefore, it is great understand the underlying to achieve more with Kafka.
  65. 65. What’s next? - Kafka Stream - Kafka transaction and exactly-once delivery - Kafka internal architecture and implementations
  66. 66. Employment Hero is hiring Fresher Seniors $500 - $1200 $2000 - $4000 https://tinyurl.com/eh-fresher https://tinyurl.com/eh-senior
  67. 67. Q & A

×