Intro
I'll tell a story on how we've hunted down a Heisenbug in a system that should
have prevented it by design in the very first place and finally fixed it.
The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency
control, data inconsistencies, and SRE with plenty of good intentions that in a
series of unfortunate circumstances caused a nasty bug.
4
Setup (1)
A detailed description of the Elasticsearch indexing pipeline setup:
https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/
5
Setup (3): Elasticsearch
- Optimistic concurrency control
- Client sends the ‘_version’ number of the document in the indexing request
- Elasticsearch promises that document with the highest version number is searchable
- E.g.
- A user changes the price of her listing in Vinted
- The change results in new document version
- Elasticsearch stores only the newer version of the listing with an updated price
- Gist: Elasticsearch stores only the newest version a listing
7
Setup (4): Kafka
- Data is not deleted when it gets “old”
- retention.ms = -1
- Needed to support data reindexing into Elasticsearch
- Log compaction
- Kafka will always retain at least the last known value for each message key
- This makes sure that we are not running out of disk space
- Tombstone messages, i.e. messages with null body is for deletion
- Newer messages has higher offset in Kafka topic partition
8
Setup(5): Kafka Connect
- Framework and a library
- Reads listing data from Kafka topics
- Indexes listings into Elasticsearch
- Error handling (e.g. dead letter queue)
- Configuration, management
- Indexing throughput
- Concurrency
9
Setup: TL;DR
We use Kafka topic partition offset as an Elasticsearch document _version
number.
This trick allows us to parallelize indexing into Elasticsearch and is worry-free
from the data consistency point-of-view.
10
Works on My Machine
- Docker Compose cluster
- Integration tests are in place
- Works as expected
12
Testing
Tested the functionality in the shared testing environment:
● Single node Kafka
● Single node Kafka Connect cluster
● Single node Elasticsearch
Works as expected.
13
Let me try
- I've tried to send a “tombstone” (i.e. Kafka record with null body) message
directly to the Kafka topic.
- Shockingly the document was still present in the Elasticsearch index!!!
14
Once again
A document in an Elasticsearch index should have the _version
that is equal to the offset attribute of the message in a Kafka topic
partition.
15
Elasticsearch has this Document
$ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq
{
"_index": "core-items_20200329084723",
"_type": "_doc",
"_id": "996229491",
"_version": 734232221,
"_seq_no": 22502992,
"_primary_term": 1,
"found": true
}
Version is 734232221
16
Who Changed the Number of Kafka Topic Partitions?
I've opened the Grafana dashboard and noticed that a couple of months ago
the partition count was increased from 6 to 24.
21
Problem
1. Kafka guarantees ordering of messages for a key in a partition.
2. But not across partitions for the same key!!!
22
The Technical Reason (1)
- Kafka assigns partitions to messages by hashing the key of the message
- But the increased partition count changed the function!
partition_nr = hash(message.key) % partition_count
23
The technical reason (2)
Most of the messages with a key were written to a different partition after the
increase of partition count:
probability_off_error = 1 - (1 / partition_count)
24
Why would one increase the partition count?
- Partition is a scalability unit in Kafka.
- write scalability (should fit in one node)
- read scalability (consumers consume at least one partition)
25
Fix
- Required a full re-ingestion of data from the primary datastore into Kafka.
- I'd be enough to just write data to differently named topics.
- However, we used the situation to upgrade the Kafka cluster from 1.1.1 to
2.4.0 (yes, another Kafka cluster)
26
How to prevent such a bug?
- Don’t increase partition count if you rely on message ordering!
- Do sensible defaults in Kafka settings.
- If you don't rely on offset, e.g. message have no meaningful key (think
logging), then increase of partition count will not cause any big troubles
(just a rebalance of consumer groups).
27