Don't change the partition count for kafka topics!

Dainius Jocas
Dainius JocasSoftware Engineer
Don't Change the Partition
Count for Kafka Topics!
Dainius Jocas, Staff Engineer @ Vinted
2021-04-08
Agenda
1. Intro
2. Setup
3. Heisenbug
4. Fix
5. Discussion
2
Intro
I'll tell a story on how we've hunted down a Heisenbug in a system that should
have prevented it by design in the very first place and finally fixed it.
The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency
control, data inconsistencies, and SRE with plenty of good intentions that in a
series of unfortunate circumstances caused a nasty bug.
3
Setup
A full description of the Elasticsearch indexing pipeline setup:
https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/
4
5
Setup: TL;DR
We use Kafka topic partition offset as an Elasticsearch document version
number.
This trick allows us to parallelize indexing to Elasticsearch and is worry-free from
the data consistency point-of-view.
6
Heisenbug
Elasticsearch fails to delete documents(!!!), i.e. serves stale data???
7
Works on My Machine
- Docker Compose cluster
- Integration tests are in place
- Works as expected
8
Testing
Tested the functionality in the shared testing environment:
● Single node Kafka
● Single node Kafka Connect cluster
● Single node Elasticsearch
Works as expected.
9
Let me try
- I've tried to send a “tombstone” (i.e. Kafka record with null body) message
directly to the Kafka topic.
- Shockingly the document was still present in the Elasticsearch index!!!
10
Once again
A document in an Elasticsearch index should have the _version
that is the offset attribute of the message in a Kafka topic partition.
11
Elasticsearch has this Document
$ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq
{
"_index": "core-items_20200329084723",
"_type": "_doc",
"_id": "996229491",
"_version": 734232221,
"_seq_no": 22502992,
"_primary_term": 1,
"found": true
}
Version is 734232221
12
Tombstone message
$ eim topic delete_records --topic=core-items --keys=996229491
{
"offsets": [
{
"partition": 17,
"offset": 13361612,
"error_code": null,
"error": null
}
]
}
Version is 13361612
13
Hmm?
734232221 vs. 13361612
14
Eureka!
734232221
vs.
13361612
- The newer message has a lower offset???
- How come the "older" record has a higher offset???
15
16
Somebody Changed the Number of Kafka Topic Partitions!
I've opened the Grafana dashboard and noticed that a couple of months ago
the partition count was increased from 6 to 24.
17
Problem
1. Kafka guarantees ordering of messages for a key in a partition.
2. But not across partitions for the same key!!!
18
The Technical Reason (1)
- Kafka assigns partitions to messages by hashing the key of the message
- But the increased partition count changed the function!
partition_nr = hash(message.key) % partition_count
19
The technical reason (2)
Most of the messages with a key were written to a different partition after the
increase of partition count:
probability_off_error = 1 - (1 / partition_count)
20
Why would one increase the partition count?
- Partition is a scalability unit in Kafka.
- write scalability (should fit in one node)
- read scalability (consumers consume at least one partition)
21
Fix
- Required a full re-ingestion of data from the primary datastore into Kafka.
- I'd be enough to just write data to differently named topics.
- However, we used the situation to upgrade the Kafka cluster from 1.1.1 to
2.4.0 (yes, another Kafka cluster)
22
How to prevent such a bug?
- Don’t increase partition count if you rely on message ordering!
- Do sensible defaults in Kafka settings.
- If you don't rely on offset, e.g. message have no meaningful key (think
logging), then increase of partition count will not cause any big troubles
(just a rebalance of consumer groups).
23
Thank You!
24
1 of 24

Recommended

Lessons Learned While Scaling Elasticsearch at Vinted by
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
189 views35 slides
Introduction to Presto at Treasure Data by
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
1.7K views28 slides
Presto in the cloud by
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
4.5K views6 slides
Prestogres internals by
Prestogres internalsPrestogres internals
Prestogres internalsSadayuki Furuhashi
9.1K views47 slides
Bullet: A Real Time Data Query Engine by
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
599 views44 slides
Apache kafka by
Apache kafkaApache kafka
Apache kafkaDaan Gerits
8.1K views48 slides

More Related Content

What's hot

VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D... by
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...Spark Summit
1.5K views33 slides
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M... by
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
4.3K views34 slides
Presto in my_use_case by
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
6.2K views12 slides
SSR: Structured Streaming for R and Machine Learning by
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learningfelixcss
360 views42 slides
Apache Kafka: New Features That You Might Not Know About by
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutYaroslav Tkachenko
4.9K views25 slides
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew... by
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...Spark Summit
6.5K views25 slides

What's hot(20)

VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D... by Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
Spark Summit1.5K views
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M... by Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit4.3K views
Presto in my_use_case by wyukawa
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
wyukawa 6.2K views
SSR: Structured Streaming for R and Machine Learning by felixcss
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
felixcss360 views
Apache Kafka: New Features That You Might Not Know About by Yaroslav Tkachenko
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
Yaroslav Tkachenko4.9K views
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew... by Spark Summit
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
Spark Summit6.5K views
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase by HBaseCon
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon3.2K views
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia by Databricks
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks11.7K views
Productizing Structured Streaming Jobs by Databricks
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks3.2K views
Building Continuous Application with Structured Streaming and Real-Time Data ... by Databricks
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
Databricks1.8K views
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin by Till Rohrmann
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Till Rohrmann5.7K views
What's New in Upcoming Apache Spark 2.3 by Databricks
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
Databricks6.6K views
Grokking TechTalk 9 - Building a realtime & offline editing service from scra... by Grokking VN
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking VN595 views
Natural Language Query and Conversational Interface to Apache Spark by Databricks
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
Databricks217 views
Specs2 whirlwind tour at Scaladays 2014 by Eric Torreborre
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
Eric Torreborre2.5K views
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling by Databricks
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks1.6K views
Multi dimension aggregations using spark and dataframes by Romi Kuntsman
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
Romi Kuntsman7.1K views
20140120 presto meetup_en by Ogibayashi
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
Ogibayashi2.3K views
A Journey into Databricks' Pipelines: Journey and Lessons Learned by Databricks
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Databricks3K views

Similar to Don't change the partition count for kafka topics!

Don't change the partition count for kafka topics! by
Don't change the partition count for kafka topics!Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!Dainius Jocas
58 views28 slides
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ... by
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
1.8K views21 slides
Migrating structured data between Hadoop and RDBMS by
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSBouquet
774 views21 slides
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa... by
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
3.7K views79 slides
What is Apache Kafka®? by
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?Eventador
32 views19 slides
What is apache Kafka? by
What is apache Kafka?What is apache Kafka?
What is apache Kafka?Kenny Gorman
224 views19 slides

Similar to Don't change the partition count for kafka topics!(20)

Don't change the partition count for kafka topics! by Dainius Jocas
Don't change the partition count for kafka topics!Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!
Dainius Jocas58 views
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ... by Modern Data Stack France
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Migrating structured data between Hadoop and RDBMS by Bouquet
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet774 views
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa... by Helena Edelson
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson3.7K views
What is Apache Kafka®? by Eventador
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
Eventador32 views
What is apache Kafka? by Kenny Gorman
What is apache Kafka?What is apache Kafka?
What is apache Kafka?
Kenny Gorman224 views
Designing Structured Streaming Pipelines—How to Architect Things Right by Databricks
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks4.5K views
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K... by HostedbyConfluent
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
HostedbyConfluent430 views
Breakthrough OLAP performance with Cassandra and Spark by Evan Chan
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
Evan Chan8.8K views
Multitenancy: Kafka clusters for everyone at LINE by kawamuray
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
kawamuray2K views
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi... by Fred de Villamil
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
Fred de Villamil2.1K views
Splice Machine Overview by Kunal Gupta
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
Kunal Gupta1.7K views
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming by Dibyendu Bhattacharya
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Open Security Operations Center - OpenSOC by Sheetal Dolas
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
Sheetal Dolas4.6K views
Jack Gudenkauf sparkug_20151207_7 by Jack Gudenkauf
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf778 views
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica by Data Con LA
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
Data Con LA2.2K views
User-space Network Processing by Ryousei Takano
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano5.7K views
Distributed Caching - Cache Unleashed by Avishek Patra
Distributed Caching - Cache UnleashedDistributed Caching - Cache Unleashed
Distributed Caching - Cache Unleashed
Avishek Patra183 views
Scaling opensimulator inventory using nosql by David Daeschler
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosql
David Daeschler1.6K views

Recently uploaded

Wire Rope by
Wire RopeWire Rope
Wire RopeIwiss Tools Co.,Ltd
9 views5 slides
SWM L15-L28_drhasan (Part 2).pdf by
SWM L15-L28_drhasan (Part 2).pdfSWM L15-L28_drhasan (Part 2).pdf
SWM L15-L28_drhasan (Part 2).pdfMahmudHasan747870
28 views93 slides
A multi-microcontroller-based hardware for deploying Tiny machine learning mo... by
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...IJECEIAES
12 views10 slides
cloud computing-virtualization.pptx by
cloud computing-virtualization.pptxcloud computing-virtualization.pptx
cloud computing-virtualization.pptxRajaulKarim20
85 views31 slides
NEW SUPPLIERS SUPPLIES (copie).pdf by
NEW SUPPLIERS SUPPLIES (copie).pdfNEW SUPPLIERS SUPPLIES (copie).pdf
NEW SUPPLIERS SUPPLIES (copie).pdfgeorgesradjou
14 views30 slides
Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th... by
Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th...Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th...
Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th...ahmedmesaiaoun
12 views84 slides

Recently uploaded(20)

A multi-microcontroller-based hardware for deploying Tiny machine learning mo... by IJECEIAES
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
IJECEIAES12 views
cloud computing-virtualization.pptx by RajaulKarim20
cloud computing-virtualization.pptxcloud computing-virtualization.pptx
cloud computing-virtualization.pptx
RajaulKarim2085 views
NEW SUPPLIERS SUPPLIES (copie).pdf by georgesradjou
NEW SUPPLIERS SUPPLIES (copie).pdfNEW SUPPLIERS SUPPLIES (copie).pdf
NEW SUPPLIERS SUPPLIES (copie).pdf
georgesradjou14 views
Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th... by ahmedmesaiaoun
Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th...Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th...
Performance of Back-to-Back Mechanically Stabilized Earth Walls Supporting th...
ahmedmesaiaoun12 views
7_DVD_Combinational_MOS_Logic_Circuits.pdf by Usha Mehta
7_DVD_Combinational_MOS_Logic_Circuits.pdf7_DVD_Combinational_MOS_Logic_Circuits.pdf
7_DVD_Combinational_MOS_Logic_Circuits.pdf
Usha Mehta59 views
Informed search algorithms.pptx by Dr.Shweta
Informed search algorithms.pptxInformed search algorithms.pptx
Informed search algorithms.pptx
Dr.Shweta13 views
Machine Element II Course outline.pdf by odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese17 views
_MAKRIADI-FOTEINI_diploma thesis.pptx by fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi6 views
Thermal aware task assignment for multicore processors using genetic algorithm by IJECEIAES
Thermal aware task assignment for multicore processors using genetic algorithm Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm
IJECEIAES30 views
2_DVD_ASIC_Design_FLow.pdf by Usha Mehta
2_DVD_ASIC_Design_FLow.pdf2_DVD_ASIC_Design_FLow.pdf
2_DVD_ASIC_Design_FLow.pdf
Usha Mehta19 views
fakenews_DBDA_Mar23.pptx by deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra812 views
13_DVD_Latch-up_prevention.pdf by Usha Mehta
13_DVD_Latch-up_prevention.pdf13_DVD_Latch-up_prevention.pdf
13_DVD_Latch-up_prevention.pdf
Usha Mehta10 views
What is Whirling Hygrometer.pdf by IIT KHARAGPUR
What is Whirling Hygrometer.pdfWhat is Whirling Hygrometer.pdf
What is Whirling Hygrometer.pdf
IIT KHARAGPUR 11 views

Don't change the partition count for kafka topics!

  • 1. Don't Change the Partition Count for Kafka Topics! Dainius Jocas, Staff Engineer @ Vinted 2021-04-08
  • 2. Agenda 1. Intro 2. Setup 3. Heisenbug 4. Fix 5. Discussion 2
  • 3. Intro I'll tell a story on how we've hunted down a Heisenbug in a system that should have prevented it by design in the very first place and finally fixed it. The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency control, data inconsistencies, and SRE with plenty of good intentions that in a series of unfortunate circumstances caused a nasty bug. 3
  • 4. Setup A full description of the Elasticsearch indexing pipeline setup: https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/ 4
  • 5. 5
  • 6. Setup: TL;DR We use Kafka topic partition offset as an Elasticsearch document version number. This trick allows us to parallelize indexing to Elasticsearch and is worry-free from the data consistency point-of-view. 6
  • 7. Heisenbug Elasticsearch fails to delete documents(!!!), i.e. serves stale data??? 7
  • 8. Works on My Machine - Docker Compose cluster - Integration tests are in place - Works as expected 8
  • 9. Testing Tested the functionality in the shared testing environment: ● Single node Kafka ● Single node Kafka Connect cluster ● Single node Elasticsearch Works as expected. 9
  • 10. Let me try - I've tried to send a “tombstone” (i.e. Kafka record with null body) message directly to the Kafka topic. - Shockingly the document was still present in the Elasticsearch index!!! 10
  • 11. Once again A document in an Elasticsearch index should have the _version that is the offset attribute of the message in a Kafka topic partition. 11
  • 12. Elasticsearch has this Document $ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq { "_index": "core-items_20200329084723", "_type": "_doc", "_id": "996229491", "_version": 734232221, "_seq_no": 22502992, "_primary_term": 1, "found": true } Version is 734232221 12
  • 13. Tombstone message $ eim topic delete_records --topic=core-items --keys=996229491 { "offsets": [ { "partition": 17, "offset": 13361612, "error_code": null, "error": null } ] } Version is 13361612 13
  • 15. Eureka! 734232221 vs. 13361612 - The newer message has a lower offset??? - How come the "older" record has a higher offset??? 15
  • 16. 16
  • 17. Somebody Changed the Number of Kafka Topic Partitions! I've opened the Grafana dashboard and noticed that a couple of months ago the partition count was increased from 6 to 24. 17
  • 18. Problem 1. Kafka guarantees ordering of messages for a key in a partition. 2. But not across partitions for the same key!!! 18
  • 19. The Technical Reason (1) - Kafka assigns partitions to messages by hashing the key of the message - But the increased partition count changed the function! partition_nr = hash(message.key) % partition_count 19
  • 20. The technical reason (2) Most of the messages with a key were written to a different partition after the increase of partition count: probability_off_error = 1 - (1 / partition_count) 20
  • 21. Why would one increase the partition count? - Partition is a scalability unit in Kafka. - write scalability (should fit in one node) - read scalability (consumers consume at least one partition) 21
  • 22. Fix - Required a full re-ingestion of data from the primary datastore into Kafka. - I'd be enough to just write data to differently named topics. - However, we used the situation to upgrade the Kafka cluster from 1.1.1 to 2.4.0 (yes, another Kafka cluster) 22
  • 23. How to prevent such a bug? - Don’t increase partition count if you rely on message ordering! - Do sensible defaults in Kafka settings. - If you don't rely on offset, e.g. message have no meaningful key (think logging), then increase of partition count will not cause any big troubles (just a rebalance of consumer groups). 23