Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RabbitMQ vs Apache Kafka Part II Webinar

164 views

Published on

Join guest speaker Jack Vanlightly in Part II of this series of three webinars where we explore what RabbitMQ and Apache Kafka are and their approach to messaging.

Each technology has made very different decisions regarding every aspect of their design, each with strengths and weaknesses, enabling different architectural patterns.

WEBINAR LIVE DATE: Thursday 30 August 2018 | 17:30 CEST / 16:30 BST / 11:30 EDT / 08:30 PDT

———————————————————————

SPEAKER CONTACT DETAILS

JACK VANLIGHTLY - Jack Vanlightly is a cloud software architect based in Barcelona specialising in event-driven architectures, data processing pipelines and data stores both relational and non-relational.

Twitter: https://twitter.com/vanlightly


———————————————————————

COMPANY CONTACT DETAILS

ERLANG SOLUTIONS
- Website: https://www.erlang-solutions.com
- Twitter: https://www.twitter.com/ErlangSolutions
- LinkedIn: http://www.linkedin.com/company/erlan…
- GitHub: https://github.com/esl

Published in: Technology

RabbitMQ vs Apache Kafka Part II Webinar

  1. 1. RabbitMQ vs Apache Kafka Comparing two giants of the messaging space
  2. 2. Apache Kafka RabbitMQ Reliable Messaging • Message Delivery Guarantees • Message Ordering Guarantees • Message Durability • High Availability VS
  3. 3. Background • Jack Vanlightly • Cloud Architect and Data Engineer at SII Concatel, Barcelona • Event-Driven Architectures • Messaging Systems • Cloud Automation • Data Pipelines
  4. 4. RabbitMQ – Push Model Producer Exchange Queue Consumer route Consumer Push - Long-lived TCP connection - Consumer registers interest in queues - Broker pushes messages down connection in real-time Producer Publish - Send messages one at a time pushpublish Consumer
  5. 5. Producer Topic A (partition 2) Consumer Consumer Pull - Long-lived TCP connection - Consumer registers interest in a topic as part of a consumer group - Consumer makes requests for messages in batches Producer Publish - Send messages in batches Pull in batches Publish in batches Kafka – Pull Model Topic A (partition 1) Topic A (partition 3) Consumer
  6. 6. RabbitMQ – Why Push? The push model allows RabbitMQ to: • Offer low latency messaging. • Evenly distribute messages across competing consumers. • Keep processing order closer to delivery order in the face of competing consumers. A push model requires Back-Pressure: Consumer Prefetch. Pull (Apache Kafka) Push (RabbitMQ) VS Kafka – Why Pull? Because each partition cannot be read by more than one consumer of a consumer group, the consumer can pull batches of messages without: • affecting processing order • affecting message distribution amongst consumers Batching up of messages improves compression and throughput.
  7. 7. At-most-once. This means that a message will never be delivered more than once but messages might be lost. At-least-once. This means that we'll never lose a message but a message might end up being delivered to a consumer more than once. Exactly-once. The holy grail of messaging. All messages will be delivered exactly one time. Delivery vs Processing Delivered twice to be processed once. At-most-once At-least-once Message Acknowledgement Protocols
  8. 8. Consumer Application Hand-OverBrokerHand-OverProducer Application Chain of Responsibility Producer ConsumerBroker
  9. 9. RabbitMQ Producer Side Acknowledgements (Hand-Over)
  10. 10. Publisher Exchange Sends 10 messages (Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) basic.ack: 6 multiple=true basic.ack: 10 multiple=true Publisher Confirms - basic.ack (all ok!) - basic.nack (error!) - basic.return + basic.ack (undeliverable!) Flags - Multiple (I am acknowledging multiple message deliveries) - Mandatory (give me a basic.return if you can’t deliver to any queues) RabbitMQ – Producer Side Acknowledgements Queue Routes 10 messages
  11. 11. Mandatory=true Mandatory=false Publisher Exchange Sends 10 messages Mandatory = false (Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) basic.ack: 6 multiple=true basic.ack: 10 multiple=true RabbitMQ – Producer Side Acknowledgements Discards 10 messages X Publisher Exchange Sends 10 messages Mandatory = true (Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) basic.return: 6 multiple=true + basic.ack: 6 multiple=true basic.return: 10 multiple=true + basic.ack: 10 multiple=true Discards 10 messages X
  12. 12. Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure Large # of Messages in Flight = High Throughput, High Message Duplication on Failure Publisher Exchange 1000 messages in flight when connection fails RabbitMQ – Producer Side Duplication Publisher Exchange 1 message in flight when connection fails Resend 1000 messages 25% of the messages persisted to a queue Queues ends up with 250 duplicates Message was persisted to a queue but connection died before ack could be sent. Resend 1 message Queues ends up with 1 duplicate (Resent custom header)
  13. 13. RabbitMQ Consumer Side Acknowledgements (Hand-over)
  14. 14. Consumer Acknowledgements - basic.ack (all ok, remove from the queue!) - basic.nack, redeliver=false (error, but remove anyway) - basic.nack, redeliver=true (error, please redeliver) - basic.reject (same as basic.nack but without multiple flag support) Acknowledgement Mode - Auto Ack (Push me messages as fast as you can!) - Manual Ack (I will explicitly tell you when a message can be removed from the queue) RabbitMQ – Consumer Side Acknowledgements Queue Consumer Pushes 10 messages Delivery tag: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 basic.ack: 1 multiple=false, basic.ack: 2 multiple=false basic.ack: 3 multiple=false, basic.ack: 4 multiple=false basic.ack: 5 multiple=false, basic.ack: 6 multiple=false basic.ack: 7 multiple=false, basic.ack: 8 multiple=false basic.ack: 9 multiple=false, basic.ack: 9 multiple=false
  15. 15. Redelivered Flag Multiple Flag Flags - Multiple (I am acknowledging multiple messages) - Redelivered (This message is a redelivery) RabbitMQ – Consumer Side Acknowledgements Queue Consumer Pushes 10 messages Delivery tag: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 basic.ack: 6 multiple=true, basic.ack: 10 multiple=true Queue Consumer1. Pushes 1 message 2. basic.nack: 1 multiple=false redeliver=true Consumer3. Delivers message again with redelivered=true flag 4. basic.ack: 1 multiple=false
  16. 16. Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure Large # of Messages in Flight = High Throughput, High Message Duplication on Failure 1000 messages in flight when connection fails RabbitMQ – Consumer Side Duplication 1 message in flight when connection fails Redeliver 1000 messages 25% of the messages processed, but before ack could be sent when connection failed 250 messages get processed twice Message was processed but connection died before ack could be sent. Redeliver 1 message 1 message gets processed twice Queue Queue Consumer Consumer
  17. 17. RabbitMQ Broker Durability Durable Queues Persistent Messages Mirrored Queues
  18. 18. RabbitMQ – The Broker Surviving Broker Restart - Durable Queue - Persistent Message Surviving Broker Loss - Queue Mirroring (Clustering) Broker Restart Queue Message Non-Durable Queue Non-Persistent Message Queue Message Durable Queue Non-Persistent Message Queue Message Mirrored Queue Persistent Message Total Broker Loss Queue Message Queue Message Queue Message Queue Message Durable Queue Persistent Message Queue Message
  19. 19. Queue Mirror Queue Mirror Publisher ConsumerQueue Master
  20. 20. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue C Master Queue C Mirror Queue D (unmirrored) Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2
  21. 21. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue C Master Queue C Master (Promoted) Queue D (unmirrored) Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Queue C Mirror
  22. 22. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Master (Promoted) Queue B Master Queue B Mirror Queue C Master Queue C Master Queue D (unmirrored) Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Queue C Mirror
  23. 23. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3 Queue A Mirror Queue B Master Queue B Mirror Queue C Master Queue C Master Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Broker 1 Queue A Mirror Queue B Mirror Queue C Mirror
  24. 24. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Queue B Master Queue C Master Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Broker 1 Queue A Mirror Queue B Mirror Queue C Mirror Broker 3 Queue A Mirror Queue B Mirror
  25. 25. RabbitMQ Queue Mirror Synchronization And Queue Failover
  26. 26. RabbitMQ – Queue Mirrors - Synchronization Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual 1010 10 10 10 10 Three nodes, two mirrored queues each with 10 messages
  27. 27. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 1010 10 10 10 10 Broker 3 is lost
  28. 28. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 3 comes back. Mirror A is automatically synchronized. Mirror B remains at 0 messages. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 1010 10 10 10 0
  29. 29. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Each queue receives 10 more messages. Broker 2 is lost. Queue A fails over to mirror 3 without data loss. Broker 2 Queue A Master Broker 3Broker 1 Queue A Master Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 2020 20 20 20 10
  30. 30. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual ha-promote-on- failure = always Each queue receives 10 more messages. Broker 1 is lost. Queue B fails over to mirror 3 and loses 10 messages. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Master Queue B Mirror Queue B Master Queue B Master 3030 30 30 30 20
  31. 31. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual ha-promote-on- failure = when- synced Alternate scenario: ha-promote-on-failure = when-synced Queue B does not fail over as mirror 3 is unsynchronized. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Master Queue B Mirror Queue B Master Queue B Mirror 3030 30 30 30 20
  32. 32. RabbitMQ Queue Mirror Synchronization And New Mirrors
  33. 33. RabbitMQ – Queue Mirrors - Synchronization Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual 100m100m 100m 100m 100m 100m Three nodes, two mirrored queues each with 100 million messages
  34. 34. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Broker 3 is lost 100m100m 100m 100m 100m 100m
  35. 35. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 3 comes back. Queue A is unavailable due to synchronization. Queue B is available but mirror on broker 3 remains at 0 messages. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 100m100m * 100m 100m 0
  36. 36. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Queue A synchronization completes and the queue becomes available again. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 100m100m 100m 100m 100m 0
  37. 37. Balancing Data Safety with High Throughput - Producers wait periodically for acknowledgements - Consumers group acknowledgements with the Multiple flag - Persistent messages - Cluster - Queue mirroring with 1 mirror* RabbitMQ Optimizing for High Throughput - Producers fire and forget - Consumers use auto-ack mode - Non-Persistent messages - Non-Mirrored Queues - Cluster for throughput Optimizing for Data Safety - Producers wait for acknowledgements after each message or after small number of messages - Consumers acknowledge each message individually - Persistent messages - Cluster for durability - Queue mirroring with 2+ mirrors. - promote-on-failure=when- synced - ha-sync-mode=automatic for active queues
  38. 38. RabbitMQ Network Partitions? Slow Network Links? Flaky Links? See Part 3…
  39. 39. Apache Kafka Producer Side Acknowledgements (Hand-Over)
  40. 40. Apache Kafka – Replicated Partitions Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Each partition has a leader with 0 or more Followers. Producers send to leaders. Consumers consume from leaders. Followers exist for redundancy. Producer
  41. 41. Apache Kafka – Producer Side Acknowledgements acks - No acknowledgement (fire and forget). Acks=0 - Leader has persisted the message. Acks=1 - Leader and all In-Sync Replicas have persisted the message. Acks=All Producer Config Other settings - retries - enable.idempotence (limits throughput) - max.in.flight.requests.per. connection - batch.size - max.request.size Broker/Topic Config Other settings - default.replication.factor - min.insync.replicas - unclean.leader.election.enable
  42. 42. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer P 0 P 1 P 2 C1 C2 C3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3
  43. 43. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer C1 C2 C3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3 P 0 P 1 P 2
  44. 44. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer C1 C2 C3 Batch 1 Batch 2 Batch 4 Batch 4 Batch 5 Batch 1 Batch 4 Batch 5 Batch 1 Batch 3 Batch 1 Batch 2 Batch 3 Batch 2 Batch 3 Batch 5 P 0 P 1 P 2
  45. 45. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer C1 C2 C3 Batch 4 Batch 5 Batch 1 Batch 4 Batch 5 Batch 1 Batch 3 Batch 1 Batch 2 Batch 3 Batch 2 Batch 3 Batch 1 Batch 2 Batch 4 Batch 5 P 0 P 1 P 2
  46. 46. Avoid Data Loss and Producer-Side Duplicaction With enable.idempotence = true • enable.idempotence set to true • max.in.flight.requests.per.connection set to 5 or less • retries set to 1 or higher • acks set to ‘all’ Producers can retry until they succeed while avoiding message duplication.
  47. 47. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 2 Follower Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  48. 48. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 2 Leader (promoted) Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  49. 49. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Leader (promoted) Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 2 Leader Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  50. 50. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Leader Partition 1 Follower Partition 1 Follower Partition 2 Leader Partition 2 Leader Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  51. 51. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 1 Partition 0 Follower Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Broker 3 Partition 0 Follower Partition 1 Follower Partition 2 Follower Partition 3 Follower
  52. 52. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 1 Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 2 Follower Partition 3 Leader Partition Rebalancing Option 1: auto.leader. rebalance. enable=true Option 2: Rebalance leaders manually with kafka-preferred- replica-election.sh Partition 2 Follower Partition 3 Follower Broker 3 Partition 0 Follower Partition 1 Follower Partition 2 Leader Partition 3 Follower
  53. 53. Apache Kafka – The Broker – The ISR In-Sync Replica Set (ISR) - The leader + the followers who are up to date with the leader - A follower is removed from the ISR when either: - It has not sent any fetch requests to the leader with the replica.lag.time.max.ms time period - Has not been up to date with the leader for at least replica.lag.time.max.ms period - Followers send fetch requests to the leader at an interval of replica.fetch.wait.max.ms which should be lower than replica.lag.time.max.ms
  54. 54. Apache Kafka – The Broker – The ISR Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower Topic with: - 2 partitions - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 -1 -8 0 -9 One message a second arriving at each partition. Partitions in broker 3 lagging, but still within 10 second limit 0:01 0:00 0:08 0:09
  55. 55. Apache Kafka – The Broker – The ISR Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower 0 -22 -1 -19 Broker 3 seems to have an issue. It’s partitions have been out-of-sync for more than 10 seconds and are no longer in the ISR 0:00 0:01 0:22 0:19 Topic with: - 2 partitions - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500
  56. 56. Apache Kafka – Low Latency, High Availability Optimizing for Low Latency and High Availability Acks = 1 unclean.leader.election.enable = true
  57. 57. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -1 -8 Producer sends a message and the leader persists the message, then sends an ack. 0:01 0:08 Producer 1 message, acks = 1 Ack +0 ms due to replicas
  58. 58. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -8 Leader broker fails before followers fetch the message. 1 message lost in fail-over. 0:01 0:08 Producer Connection lost
  59. 59. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -8 Producer establishes connection to broker 1 and sends one message. 0:08 Producer 1 message, acks = 1 Ack +0 ms due to replicas
  60. 60. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -14 Broker 3 falls behind. Removed from ISR. 0:14 Producer 1 message, acks = 1 Ack +0 ms due to replicas
  61. 61. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Leader Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true Broker 1 fails. Unclean Leader Election allows Broker 3 partition that is not member of ISR to be elected leader. Fail over loses 15 acknowledged messages. But the partition remains available. Producer 1 message, acks = 1 Connection error
  62. 62. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true Producer uses alternate node in bootstrap.servers to find new partition leader. Fail-over produces message loss. Producer 1 message, acks = 1 Ack
  63. 63. Apache Kafka – Data Safety, Higher Latency Optimizing for Data Safety (Increased Latency, Lower Availability) acks = all replication.factor = 3 min.insync.replicas = 2 a quorum (n+1)/2
  64. 64. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 -1 -4 Broker 1 averaging 0.25 seconds lag. Broker 3 averaging on 4 seconds lag 0:00 0:04 Producer 1 message, acks = all
  65. 65. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 0 Broker 3 averaging on 4 seconds lag 0:00 0:00 Producer Ack +4000 ms due to replicas
  66. 66. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 -14 Broker 3 removed from ISR. Still two replicas in ISR. 0:00 0:14 Producer 1 message, acks = all Ack +250 ms due to replicas
  67. 67. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 -14 Broker 2 is lost. 0:00 0:14 Producer 1 message, acks = allConnection error
  68. 68. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 -14 Partition1 fails over to Broker 1 without message loss. Partition 1 will not accept more messages as ISR has only 1 node. 0:14 Producer 1 message, acks = allNotEnough Replicas
  69. 69. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 Broker 3 catches up. Producer retries and receives ack. 0:00 Producer 1 message, acks = allAck
  70. 70. Apache Kafka – Consumer Offset Tracking 1 2 3 4 5 6 7 Consumer 1 Read batch at offset Commit offset Optimizing for Throughput - Auto-commit (long period) Optimizing for Duplicate Read Avoidance - Auto-commit (short period) - Manual Commit Consumer Offset Commits - Auto-commit periodically - Manual Commit
  71. 71. Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure Large # of Messages in Flight = High Throughput, High Message Duplication on Failure 1000 messages in flight when consumer fails Apache Kafka – Consumer Side Duplication Fetch 1000 messages from same offset 25% of the messages processed, but before offset is committed the application fails 250 messages get processed a second time by replacement application Partition Consumer Consumer 1 message in flight when consumer fails Fetch 1 message from same offset Message is processed, but before offset is committed the application fails 1 message get processed a second time by replacement consumer Partition Consumer Consumer
  72. 72. Apache Kafka Network Partitions? Slow Network Links? Flaky Links? See Part 3…
  73. 73. RabbitMQ Kafka Fire-and-forget Publisher Confirms Availability During Replica Synchronization Fire-and-forget Leader Only All ISR Producer Side Idempotency Tunable Consistency Vs Availability Vs Latency Vs Throughput Configurable Redundancy Message Acknowledgements Replica Synchronization Can Cause Unavailability Consumer Side Redelivered flag Synchronous Replication Synchronous/ Asynchronous Replication
  74. 74. Thank you! Questions? Jack Vanlightly 12 NOVEMBER 2018 London, UK With keynotes and speakers from Goldman Sachs, Pivotal, Wunderlist/Microsoft, Erlang Solutions, CloudAMQP and more! Get your EARLY BIRD TICKET + 10% discount now! Early Bird ends 31 August

×