Successfully reported this slideshow.

Don't change the partition count for kafka topics!

0

Share

Loading in …3
×
1 of 28
1 of 28

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Don't change the partition count for kafka topics!

  1. 1. Don't Change the Partition Count for Kafka Topics!
  2. 2. $ whoami { "name": "Dainius Jocas", "company": { "name": "Vinted", "mission": "Make second-hand the first choice worldwide" }, "position": "Staff Engineer", "website": "https://www.jocas.lt", "twitter": "@dainius_jocas", "github": "dainiusjocas", "author_of_oss": ["lucene-grep"] } 2
  3. 3. Agenda 1. Intro 2. Setup 3. Heisenbug 4. Fix 5. Discussion 3
  4. 4. Intro I'll tell a story on how we've hunted down a Heisenbug in a system that should have prevented it by design in the very first place and finally fixed it. The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency control, data inconsistencies, and SRE with plenty of good intentions that in a series of unfortunate circumstances caused a nasty bug. 4
  5. 5. Setup (1) A detailed description of the Elasticsearch indexing pipeline setup: https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/ 5
  6. 6. Setup (2) 6
  7. 7. Setup (3): Elasticsearch - Optimistic concurrency control - Client sends the ‘_version’ number of the document in the indexing request - Elasticsearch promises that document with the highest version number is searchable - E.g. - A user changes the price of her listing in Vinted - The change results in new document version - Elasticsearch stores only the newer version of the listing with an updated price - Gist: Elasticsearch stores only the newest version a listing 7
  8. 8. Setup (4): Kafka - Data is not deleted when it gets “old” - retention.ms = -1 - Needed to support data reindexing into Elasticsearch - Log compaction - Kafka will always retain at least the last known value for each message key - This makes sure that we are not running out of disk space - Tombstone messages, i.e. messages with null body is for deletion - Newer messages has higher offset in Kafka topic partition 8
  9. 9. Setup(5): Kafka Connect - Framework and a library - Reads listing data from Kafka topics - Indexes listings into Elasticsearch - Error handling (e.g. dead letter queue) - Configuration, management - Indexing throughput - Concurrency 9
  10. 10. Setup: TL;DR We use Kafka topic partition offset as an Elasticsearch document _version number. This trick allows us to parallelize indexing into Elasticsearch and is worry-free from the data consistency point-of-view. 10
  11. 11. Heisenbug Elasticsearch fails to delete documents(!!!), i.e. serves stale data??? 11
  12. 12. Works on My Machine - Docker Compose cluster - Integration tests are in place - Works as expected 12
  13. 13. Testing Tested the functionality in the shared testing environment: ● Single node Kafka ● Single node Kafka Connect cluster ● Single node Elasticsearch Works as expected. 13
  14. 14. Let me try - I've tried to send a “tombstone” (i.e. Kafka record with null body) message directly to the Kafka topic. - Shockingly the document was still present in the Elasticsearch index!!! 14
  15. 15. Once again A document in an Elasticsearch index should have the _version that is equal to the offset attribute of the message in a Kafka topic partition. 15
  16. 16. Elasticsearch has this Document $ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq { "_index": "core-items_20200329084723", "_type": "_doc", "_id": "996229491", "_version": 734232221, "_seq_no": 22502992, "_primary_term": 1, "found": true } Version is 734232221 16
  17. 17. Tombstone message $ eim topic delete_records --topic=core-items --keys=996229491 { "offsets": [ { "partition": 17, "offset": 13361612, "error_code": null, "error": null } ] } Version is 13361612 17
  18. 18. Hmm? 734232221 vs. 13361612 18
  19. 19. Eureka! 734232221 vs. 13361612 - The newer message has a lower offset??? - How come the "older" record has a higher offset??? 19
  20. 20. 20
  21. 21. Who Changed the Number of Kafka Topic Partitions? I've opened the Grafana dashboard and noticed that a couple of months ago the partition count was increased from 6 to 24. 21
  22. 22. Problem 1. Kafka guarantees ordering of messages for a key in a partition. 2. But not across partitions for the same key!!! 22
  23. 23. The Technical Reason (1) - Kafka assigns partitions to messages by hashing the key of the message - But the increased partition count changed the function! partition_nr = hash(message.key) % partition_count 23
  24. 24. The technical reason (2) Most of the messages with a key were written to a different partition after the increase of partition count: probability_off_error = 1 - (1 / partition_count) 24
  25. 25. Why would one increase the partition count? - Partition is a scalability unit in Kafka. - write scalability (should fit in one node) - read scalability (consumers consume at least one partition) 25
  26. 26. Fix - Required a full re-ingestion of data from the primary datastore into Kafka. - I'd be enough to just write data to differently named topics. - However, we used the situation to upgrade the Kafka cluster from 1.1.1 to 2.4.0 (yes, another Kafka cluster) 26
  27. 27. How to prevent such a bug? - Don’t increase partition count if you rely on message ordering! - Do sensible defaults in Kafka settings. - If you don't rely on offset, e.g. message have no meaningful key (think logging), then increase of partition count will not cause any big troubles (just a rebalance of consumer groups). 27
  28. 28. Thank You! 28

×