The document discusses how increasing the partition count for a Kafka topic caused a "Heisenbug" by changing the hash function used to assign messages to partitions, violating the ordering guarantee within partitions. This led to Elasticsearch indexing messages out of order and failing to delete documents as expected. The bug was fixed by fully reingesting the data into a new Kafka cluster with a consistent partition count. The key lesson is not to change the partition count if an application relies on ordering of messages within a topic.
3. Intro
I'll tell a story on how we've hunted down a Heisenbug in a system that should
have prevented it by design in the very first place and finally fixed it.
The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency
control, data inconsistencies, and SRE with plenty of good intentions that in a
series of unfortunate circumstances caused a nasty bug.
3
4. Setup
A full description of the Elasticsearch indexing pipeline setup:
https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/
4
6. Setup: TL;DR
We use Kafka topic partition offset as an Elasticsearch document version
number.
This trick allows us to parallelize indexing to Elasticsearch and is worry-free from
the data consistency point-of-view.
6
8. Works on My Machine
- Docker Compose cluster
- Integration tests are in place
- Works as expected
8
9. Testing
Tested the functionality in the shared testing environment:
● Single node Kafka
● Single node Kafka Connect cluster
● Single node Elasticsearch
Works as expected.
9
10. Let me try
- I've tried to send a “tombstone” (i.e. Kafka record with null body) message
directly to the Kafka topic.
- Shockingly the document was still present in the Elasticsearch index!!!
10
11. Once again
A document in an Elasticsearch index should have the _version
that is the offset attribute of the message in a Kafka topic partition.
11
12. Elasticsearch has this Document
$ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq
{
"_index": "core-items_20200329084723",
"_type": "_doc",
"_id": "996229491",
"_version": 734232221,
"_seq_no": 22502992,
"_primary_term": 1,
"found": true
}
Version is 734232221
12
17. Somebody Changed the Number of Kafka Topic Partitions!
I've opened the Grafana dashboard and noticed that a couple of months ago
the partition count was increased from 6 to 24.
17
18. Problem
1. Kafka guarantees ordering of messages for a key in a partition.
2. But not across partitions for the same key!!!
18
19. The Technical Reason (1)
- Kafka assigns partitions to messages by hashing the key of the message
- But the increased partition count changed the function!
partition_nr = hash(message.key) % partition_count
19
20. The technical reason (2)
Most of the messages with a key were written to a different partition after the
increase of partition count:
probability_off_error = 1 - (1 / partition_count)
20
21. Why would one increase the partition count?
- Partition is a scalability unit in Kafka.
- write scalability (should fit in one node)
- read scalability (consumers consume at least one partition)
21
22. Fix
- Required a full re-ingestion of data from the primary datastore into Kafka.
- I'd be enough to just write data to differently named topics.
- However, we used the situation to upgrade the Kafka cluster from 1.1.1 to
2.4.0 (yes, another Kafka cluster)
22
23. How to prevent such a bug?
- Don’t increase partition count if you rely on message ordering!
- Do sensible defaults in Kafka settings.
- If you don't rely on offset, e.g. message have no meaningful key (think
logging), then increase of partition count will not cause any big troubles
(just a rebalance of consumer groups).
23