Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015, Conversant, Inc. All rights reserved.
PRESENTED BY
November 10, 2016
Data Loss and Data Duplication
in Kafka
Jayes...
© 2015, Conversant, Inc. All rights reserved.2
Kafka is a distributed, partitioned, replicated,
durable commit log service...
© 2015, Conversant, Inc. All rights reserved.3
 Kafka Overview
 Data Loss
 Data Duplication
 Data Loss and Duplicate P...
© 2015, Conversant, Inc. All rights reserved.4
Kafka Overview
© 2015, Conversant, Inc. All rights reserved.5
Kafka As A Log Abstraction
Client: Producer
Client: Consumer BClient: Consu...
© 2015, Conversant, Inc. All rights reserved.6
Topic Partitioning . . .
Kafka Broker
Client: Producer or Consumer
Source: ...
© 2015, Conversant, Inc. All rights reserved.7
Topic Partitioning – Scalability
Clients: Producer, Consumer
Leader
Replica...
© 2015, Conversant, Inc. All rights reserved.8
Topic Partitioning – redundancy
Client: Producer, Consumer
Kafka Broker 2
L...
© 2015, Conversant, Inc. All rights reserved.9
Topic Partitioning – Redundancy/durability
Kafka Broker 2
Leader
Replica
Re...
© 2015, Conversant, Inc. All rights reserved.10
Topic Partitioning – summary
 Log sharded into partitions
 Messages assi...
© 2015, Conversant, Inc. All rights reserved.11
Other Key Concepts
 Cluster = collection of brokers
 Broker-id = a uniqu...
© 2015, Conversant, Inc. All rights reserved.12
Data Loss
© 2015, Conversant, Inc. All rights reserved.13
Data Loss : Inevitable
Upto 0.01% data loss
For 700 billion messages / day...
© 2015, Conversant, Inc. All rights reserved.14
Data loss at the producer
Kafka Producer API
API Call-tree
kafkaProducer.s...
© 2015, Conversant, Inc. All rights reserved.15
dATA LOSS AT The CLUSTER (BY BROKERS)
Was it
a
leader?
Detected by
Control...
© 2015, Conversant, Inc. All rights reserved.16
Non-leader broker crash
Was it
a
leader?
Detected by
Controller via
zookee...
© 2015, Conversant, Inc. All rights reserved.17
Leader broker crash: Scenario 1
Was it
a
leader?
Detected by
Controller vi...
© 2015, Conversant, Inc. All rights reserved.18
Leader broker crash: Scenario 2
Was it
a
leader?
Detected by
Controller vi...
© 2015, Conversant, Inc. All rights reserved.19
dATA LOSS AT The CLUSTER (BY BROKERS)
Was it
a
leader?
Detected by
Control...
© 2015, Conversant, Inc. All rights reserved.20
FROM KAFKA-3919
© 2015, Conversant, Inc. All rights reserved.21
FROM KAFKA-4215
© 2015, Conversant, Inc. All rights reserved.22
Config for Data Durability and Consistency
 Producer config
- acks = -1 (...
© 2015, Conversant, Inc. All rights reserved.23
Config for Availability and Throughput
 Producer config
- acks = 0 (or 1)...
© 2015, Conversant, Inc. All rights reserved.24
Data Duplication
© 2015, Conversant, Inc. All rights reserved.25
Data Duplication: How it occurs
Client: Producer
Client: Consumer BClient:...
© 2015, Conversant, Inc. All rights reserved.26
Data Loss &
Duplication
Detection
© 2015, Conversant, Inc. All rights reserved.27
How to Detect Data loss & Duplication - 1
Memcache /
HBase /
Cassandra /
O...
© 2015, Conversant, Inc. All rights reserved.28
How to Detect Data loss & Duplication - 2
Memcache /
HBase /
Cassandra /
O...
© 2015, Conversant, Inc. All rights reserved.29
Data Duplication: How to minimize at consumer
Client: Producer
Client: Con...
© 2015, Conversant, Inc. All rights reserved.30
Monitoring
© 2015, Conversant, Inc. All rights reserved.31
Monitoring and Operations: JMX Metrics
Producer JMX Consumer JMX
© 2015, Conversant, Inc. All rights reserved.32
Questions?
© 2015, Conversant, Inc. All rights reserved.33
Jayesh Thakrar
jthakrar@conversantmedia.com
Upcoming SlideShare
Loading in …5
×

Data Loss and Duplication in Kafka

2,733 views

Published on

Data Loss and Duplication in Kafka

  1. 1. © 2015, Conversant, Inc. All rights reserved. PRESENTED BY November 10, 2016 Data Loss and Data Duplication in Kafka Jayesh Thakrar
  2. 2. © 2015, Conversant, Inc. All rights reserved.2 Kafka is a distributed, partitioned, replicated, durable commit log service. It provides the functionality of a messaging system, but with a unique design. Exactly once - each message is delivered once and only once
  3. 3. © 2015, Conversant, Inc. All rights reserved.3  Kafka Overview  Data Loss  Data Duplication  Data Loss and Duplicate Prevention  Monitoring AGENDA
  4. 4. © 2015, Conversant, Inc. All rights reserved.4 Kafka Overview
  5. 5. © 2015, Conversant, Inc. All rights reserved.5 Kafka As A Log Abstraction Client: Producer Client: Consumer BClient: Consumer A Kafka Server = Kafka Broker Topic: app_events Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  6. 6. © 2015, Conversant, Inc. All rights reserved.6 Topic Partitioning . . . Kafka Broker Client: Producer or Consumer Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying Topic: app_events
  7. 7. © 2015, Conversant, Inc. All rights reserved.7 Topic Partitioning – Scalability Clients: Producer, Consumer Leader Replica Replica Leader Replica Replica Leader Replica Replica Kafka Broker 0 Kafka Broker 1 Kafka Broker 2
  8. 8. © 2015, Conversant, Inc. All rights reserved.8 Topic Partitioning – redundancy Client: Producer, Consumer Kafka Broker 2 Leader Replica Replica Leader Replica Replica Leader Replica Replica Kafka Broker 0 Kafka Broker 1
  9. 9. © 2015, Conversant, Inc. All rights reserved.9 Topic Partitioning – Redundancy/durability Kafka Broker 2 Leader Replica Replica Leader Replica Replica Leader Replica Replica Kafka Broker 0 Kafka Broker 1 Pull-based inter-broker replication
  10. 10. © 2015, Conversant, Inc. All rights reserved.10 Topic Partitioning – summary  Log sharded into partitions  Messages assigned to partitions by API or custom partitioner  Partitions assigned to brokers (manual or automatic)  Partitions replicated (as needed)  Messages ordered within each partition  Message offset = absolute position in partition  Partitions stored on filesystem as ordered sequence of log segments (files)
  11. 11. © 2015, Conversant, Inc. All rights reserved.11 Other Key Concepts  Cluster = collection of brokers  Broker-id = a unique id (integer) assigned to each broker  Controller = functionality within each broker responsible for leader assignment and management, with one being the active controller  Replica = partition copy, represented (identified) by the broker-id  Assigned replicas = set of all replicas (broker-ids) for a partition  ISR = In-Sync Replicas = subset of assigned replicas (brokers) that are “in-sync/caught-up”* with the leader (ISR always includes the leader)
  12. 12. © 2015, Conversant, Inc. All rights reserved.12 Data Loss
  13. 13. © 2015, Conversant, Inc. All rights reserved.13 Data Loss : Inevitable Upto 0.01% data loss For 700 billion messages / day, that's up to 7 million / day
  14. 14. © 2015, Conversant, Inc. All rights reserved.14 Data loss at the producer Kafka Producer API API Call-tree kafkaProducer.send() …. accumulator.append() // buffer …. sender.send() // network I/O •Messages accumulate in buffer in batches •Batched by partition, retry at batch level •Expired batches dropped after retries •Error count and other metrics via JMX Data Loss at Producer •Failure to close / flush producer on termination •Dropped batches due to communication or other errors when acks = 0 or retry exhaustion •Data produced faster than delivery, causing BufferExhaustedException (deprecated in 0.10+)
  15. 15. © 2015, Conversant, Inc. All rights reserved.15 dATA LOSS AT The CLUSTER (BY BROKERS) Was it a leader? Detected by Controller via zookeeper Was it in ISR? Other replicas in ISR? Elect another leader Allow unclean election? ISR >= min.insync .replicas? Relax, everything will be fine Partition unavailable !! Other replicas available? Y Y N N Y Y Y Y N Broker Crashes N N N 1 2 4 5 6 3 7
  16. 16. © 2015, Conversant, Inc. All rights reserved.16 Non-leader broker crash Was it a leader? Detected by Controller via zookeeper Was it in ISR? Other replicas in ISR? Elect another leader Allow unclean election? ISR >= min.insync .replicas? Relax, everything will be fine Partition unavailable !! Other replicas available? Y Y N N Y Y Y Y N Broker Crashes N N N 1 2 4 5 6 3 7
  17. 17. © 2015, Conversant, Inc. All rights reserved.17 Leader broker crash: Scenario 1 Was it a leader? Detected by Controller via zookeeper Was it in ISR? Other replicas in ISR? Elect another leader Allow unclean election? ISR >= min.insync .replicas? Relax, everything will be fine Partition unavailable !! Other replicas available? Y Y N N Y Y Y Y N Broker Crashes N N N 1 2 4 5 6 3 7
  18. 18. © 2015, Conversant, Inc. All rights reserved.18 Leader broker crash: Scenario 2 Was it a leader? Detected by Controller via zookeeper Was it in ISR? Other replicas in ISR? Elect another leader Allow unclean election? ISR >= min.insync .replicas? Relax, everything will be fine Partition unavailable !! Other replicas available? Y Y N N Y Y Y Y N Broker Crashes N N N 1 2 4 5 6 3 7
  19. 19. © 2015, Conversant, Inc. All rights reserved.19 dATA LOSS AT The CLUSTER (BY BROKERS) Was it a leader? Detected by Controller via zookeeper Was it in ISR? Other replicas in ISR? Elect another leader Allow unclean election? ISR >= min.insync .replicas? Relax, everything will be fine Partition unavailable !! Other replicas available? Y Y N N Y Y Y Y N Potential data-loss depending upon acks config at producer. See KAFKA-3919 KAFKA-4215 Broker Crashes N N N 1 2 4 5 6 3 7
  20. 20. © 2015, Conversant, Inc. All rights reserved.20 FROM KAFKA-3919
  21. 21. © 2015, Conversant, Inc. All rights reserved.21 FROM KAFKA-4215
  22. 22. © 2015, Conversant, Inc. All rights reserved.22 Config for Data Durability and Consistency  Producer config - acks = -1 (or all) - max.block.ms (blocking on buffer full, default = 60000) and retries - request.timeout.ms (default = 30000) – it triggers retries  Topic config - min.insync.replicas = 2 (or higher)  Broker config - unclean.leader.election.enable = false - timeout.ms (default = 30000) – inter-broker timeout for acks
  23. 23. © 2015, Conversant, Inc. All rights reserved.23 Config for Availability and Throughput  Producer config - acks = 0 (or 1) - buffer.memory, batch.size, linger.ms (default = 100) - request.timeout.ms, max.block.ms (default = 60000), retries - max.in.flight.requests.per.connection  Topic config - min.insync.replicas = 1 (default)  Broker config - unclean.leader.election.enable = true
  24. 24. © 2015, Conversant, Inc. All rights reserved.24 Data Duplication
  25. 25. © 2015, Conversant, Inc. All rights reserved.25 Data Duplication: How it occurs Client: Producer Client: Consumer BClient: Consumer A Kafka Broker Topic: app_events Producer (API) retries = messages resent after timeout when retries > 1 Consumer consumes messages more than once after restart from unclean shutdown / crash
  26. 26. © 2015, Conversant, Inc. All rights reserved.26 Data Loss & Duplication Detection
  27. 27. © 2015, Conversant, Inc. All rights reserved.27 How to Detect Data loss & Duplication - 1 Memcache / HBase / Cassandra / Other Producer Kafka Consumer Topic, Partition, Offset | Msg Key or Hash KEY | VALUE 1) Msg from producer to Kafka 2) Ack from Kafka with details 3) Producer inserts into store 4) Consumer reads msg 5) Consumer validates msg If exists not duplicate consume msg delete msg If missing duplicate msg Audit: Remaining msgs in store are "lost" or "unconsumed" msgs 11 22 33 44 55Store
  28. 28. © 2015, Conversant, Inc. All rights reserved.28 How to Detect Data loss & Duplication - 2 Memcache / HBase / Cassandra / Other Producer Kafka Consumer Source, time-window | Msg count or some other checksum (e.g. totals, etc) KEY | VALUE 1) Msg from producer to Kafka 2) Ack from Kafka with details 3) Producer maintains window stats 4) Consumer reads msg 5) Consumer validates window stats at end of interval 11 22 33 44 55Store
  29. 29. © 2015, Conversant, Inc. All rights reserved.29 Data Duplication: How to minimize at consumer Client: Producer Client: Consumer BClient: Consumer A Kafka Broker Topic: app_events If possible, lookup last processed offset in destination at startup
  30. 30. © 2015, Conversant, Inc. All rights reserved.30 Monitoring
  31. 31. © 2015, Conversant, Inc. All rights reserved.31 Monitoring and Operations: JMX Metrics Producer JMX Consumer JMX
  32. 32. © 2015, Conversant, Inc. All rights reserved.32 Questions?
  33. 33. © 2015, Conversant, Inc. All rights reserved.33 Jayesh Thakrar jthakrar@conversantmedia.com

×