SlideShare a Scribd company logo
1 of 28
Download to read offline
Don't Change the
Partition Count for
Kafka Topics!
$ whoami
{
"name": "Dainius Jocas",
"company": {
"name": "Vinted",
"mission": "Make second-hand the first choice worldwide"
},
"position": "Staff Engineer",
"website": "https://www.jocas.lt",
"twitter": "@dainius_jocas",
"github": "dainiusjocas",
"author_of_oss": ["lucene-grep"]
}
2
Agenda
1. Intro
2. Setup
3. Heisenbug
4. Fix
5. Discussion
3
Intro
I'll tell a story on how we've hunted down a Heisenbug in a system that should
have prevented it by design in the very first place and finally fixed it.
The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency
control, data inconsistencies, and SRE with plenty of good intentions that in a
series of unfortunate circumstances caused a nasty bug.
4
Setup (1)
A detailed description of the Elasticsearch indexing pipeline setup:
https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/
5
Setup (2)
6
Setup (3): Elasticsearch
- Optimistic concurrency control
- Client sends the ‘_version’ number of the document in the indexing request
- Elasticsearch promises that document with the highest version number is searchable
- E.g.
- A user changes the price of her listing in Vinted
- The change results in new document version
- Elasticsearch stores only the newer version of the listing with an updated price
- Gist: Elasticsearch stores only the newest version a listing
7
Setup (4): Kafka
- Data is not deleted when it gets “old”
- retention.ms = -1
- Needed to support data reindexing into Elasticsearch
- Log compaction
- Kafka will always retain at least the last known value for each message key
- This makes sure that we are not running out of disk space
- Tombstone messages, i.e. messages with null body is for deletion
- Newer messages has higher offset in Kafka topic partition
8
Setup(5): Kafka Connect
- Framework and a library
- Reads listing data from Kafka topics
- Indexes listings into Elasticsearch
- Error handling (e.g. dead letter queue)
- Configuration, management
- Indexing throughput
- Concurrency
9
Setup: TL;DR
We use Kafka topic partition offset as an Elasticsearch document _version
number.
This trick allows us to parallelize indexing into Elasticsearch and is worry-free
from the data consistency point-of-view.
10
Heisenbug
Elasticsearch fails to delete documents(!!!), i.e. serves stale data???
11
Works on My Machine
- Docker Compose cluster
- Integration tests are in place
- Works as expected
12
Testing
Tested the functionality in the shared testing environment:
● Single node Kafka
● Single node Kafka Connect cluster
● Single node Elasticsearch
Works as expected.
13
Let me try
- I've tried to send a “tombstone” (i.e. Kafka record with null body) message
directly to the Kafka topic.
- Shockingly the document was still present in the Elasticsearch index!!!
14
Once again
A document in an Elasticsearch index should have the _version
that is equal to the offset attribute of the message in a Kafka topic
partition.
15
Elasticsearch has this Document
$ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq
{
"_index": "core-items_20200329084723",
"_type": "_doc",
"_id": "996229491",
"_version": 734232221,
"_seq_no": 22502992,
"_primary_term": 1,
"found": true
}
Version is 734232221
16
Tombstone message
$ eim topic delete_records --topic=core-items --keys=996229491
{
"offsets": [
{
"partition": 17,
"offset": 13361612,
"error_code": null,
"error": null
}
]
}
Version is 13361612
17
Hmm?
734232221 vs. 13361612
18
Eureka!
734232221
vs.
13361612
- The newer message has a lower offset???
- How come the "older" record has a higher offset???
19
20
Who Changed the Number of Kafka Topic Partitions?
I've opened the Grafana dashboard and noticed that a couple of months ago
the partition count was increased from 6 to 24.
21
Problem
1. Kafka guarantees ordering of messages for a key in a partition.
2. But not across partitions for the same key!!!
22
The Technical Reason (1)
- Kafka assigns partitions to messages by hashing the key of the message
- But the increased partition count changed the function!
partition_nr = hash(message.key) % partition_count
23
The technical reason (2)
Most of the messages with a key were written to a different partition after the
increase of partition count:
probability_off_error = 1 - (1 / partition_count)
24
Why would one increase the partition count?
- Partition is a scalability unit in Kafka.
- write scalability (should fit in one node)
- read scalability (consumers consume at least one partition)
25
Fix
- Required a full re-ingestion of data from the primary datastore into Kafka.
- I'd be enough to just write data to differently named topics.
- However, we used the situation to upgrade the Kafka cluster from 1.1.1 to
2.4.0 (yes, another Kafka cluster)
26
How to prevent such a bug?
- Don’t increase partition count if you rely on message ordering!
- Do sensible defaults in Kafka settings.
- If you don't rely on offset, e.g. message have no meaningful key (think
logging), then increase of partition count will not cause any big troubles
(just a rebalance of consumer groups).
27
Thank You!
28

More Related Content

What's hot

Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
 
Big data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands OnBig data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands Onhkbhadraa
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitMarkus Günther
 
What's new in Ansible 2.0
What's new in Ansible 2.0What's new in Ansible 2.0
What's new in Ansible 2.0Allan Denot
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonSingleStore
 
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgIntroduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgzznate
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming InfoDoug Chang
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
 
Moving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMoving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMike Dorman
 
ClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei MilovidovClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei MilovidovAltinity Ltd
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxDatabricks
 
Recent Updates at Embulk Meetup #3
Recent Updates at Embulk Meetup #3Recent Updates at Embulk Meetup #3
Recent Updates at Embulk Meetup #3Muga Nishizawa
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulkTeguh Nugraha
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
 
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...DataStax Academy
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaJason Bell
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 

What's hot (20)

Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
Big data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands OnBig data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands On
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnit
 
What's new in Ansible 2.0
What's new in Ansible 2.0What's new in Ansible 2.0
What's new in Ansible 2.0
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
 
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgIntroduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
Moving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMoving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the World
 
ClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei MilovidovClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei Milovidov
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
 
Recent Updates at Embulk Meetup #3
Recent Updates at Embulk Meetup #3Recent Updates at Embulk Meetup #3
Recent Updates at Embulk Meetup #3
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 

Similar to Don't change the partition count for kafka topics!

Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!Dainius Jocas
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSBouquet
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?Eventador
 
What is apache Kafka?
What is apache Kafka?What is apache Kafka?
What is apache Kafka?Kenny Gorman
 
Martin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMarakana Inc.
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on MesosJoe Stein
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...Saurabh Nanda
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala LoveNatan Silnitsky
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Api versioning w_docker_and_nginx
Api versioning w_docker_and_nginxApi versioning w_docker_and_nginx
Api versioning w_docker_and_nginxLee Wilkins
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Zabbix
 

Similar to Don't change the partition count for kafka topics! (20)

Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 
What is apache Kafka?
What is apache Kafka?What is apache Kafka?
What is apache Kafka?
 
Martin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for Scala
 
Scala+data
Scala+dataScala+data
Scala+data
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Devoxx
DevoxxDevoxx
Devoxx
 
Api versioning w_docker_and_nginx
Api versioning w_docker_and_nginxApi versioning w_docker_and_nginx
Api versioning w_docker_and_nginx
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
Deep Dive into AWS Fargate
Deep Dive into AWS FargateDeep Dive into AWS Fargate
Deep Dive into AWS Fargate
 
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
 

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

Don't change the partition count for kafka topics!

  • 1. Don't Change the Partition Count for Kafka Topics!
  • 2. $ whoami { "name": "Dainius Jocas", "company": { "name": "Vinted", "mission": "Make second-hand the first choice worldwide" }, "position": "Staff Engineer", "website": "https://www.jocas.lt", "twitter": "@dainius_jocas", "github": "dainiusjocas", "author_of_oss": ["lucene-grep"] } 2
  • 3. Agenda 1. Intro 2. Setup 3. Heisenbug 4. Fix 5. Discussion 3
  • 4. Intro I'll tell a story on how we've hunted down a Heisenbug in a system that should have prevented it by design in the very first place and finally fixed it. The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency control, data inconsistencies, and SRE with plenty of good intentions that in a series of unfortunate circumstances caused a nasty bug. 4
  • 5. Setup (1) A detailed description of the Elasticsearch indexing pipeline setup: https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/ 5
  • 7. Setup (3): Elasticsearch - Optimistic concurrency control - Client sends the ‘_version’ number of the document in the indexing request - Elasticsearch promises that document with the highest version number is searchable - E.g. - A user changes the price of her listing in Vinted - The change results in new document version - Elasticsearch stores only the newer version of the listing with an updated price - Gist: Elasticsearch stores only the newest version a listing 7
  • 8. Setup (4): Kafka - Data is not deleted when it gets “old” - retention.ms = -1 - Needed to support data reindexing into Elasticsearch - Log compaction - Kafka will always retain at least the last known value for each message key - This makes sure that we are not running out of disk space - Tombstone messages, i.e. messages with null body is for deletion - Newer messages has higher offset in Kafka topic partition 8
  • 9. Setup(5): Kafka Connect - Framework and a library - Reads listing data from Kafka topics - Indexes listings into Elasticsearch - Error handling (e.g. dead letter queue) - Configuration, management - Indexing throughput - Concurrency 9
  • 10. Setup: TL;DR We use Kafka topic partition offset as an Elasticsearch document _version number. This trick allows us to parallelize indexing into Elasticsearch and is worry-free from the data consistency point-of-view. 10
  • 11. Heisenbug Elasticsearch fails to delete documents(!!!), i.e. serves stale data??? 11
  • 12. Works on My Machine - Docker Compose cluster - Integration tests are in place - Works as expected 12
  • 13. Testing Tested the functionality in the shared testing environment: ● Single node Kafka ● Single node Kafka Connect cluster ● Single node Elasticsearch Works as expected. 13
  • 14. Let me try - I've tried to send a “tombstone” (i.e. Kafka record with null body) message directly to the Kafka topic. - Shockingly the document was still present in the Elasticsearch index!!! 14
  • 15. Once again A document in an Elasticsearch index should have the _version that is equal to the offset attribute of the message in a Kafka topic partition. 15
  • 16. Elasticsearch has this Document $ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq { "_index": "core-items_20200329084723", "_type": "_doc", "_id": "996229491", "_version": 734232221, "_seq_no": 22502992, "_primary_term": 1, "found": true } Version is 734232221 16
  • 17. Tombstone message $ eim topic delete_records --topic=core-items --keys=996229491 { "offsets": [ { "partition": 17, "offset": 13361612, "error_code": null, "error": null } ] } Version is 13361612 17
  • 19. Eureka! 734232221 vs. 13361612 - The newer message has a lower offset??? - How come the "older" record has a higher offset??? 19
  • 20. 20
  • 21. Who Changed the Number of Kafka Topic Partitions? I've opened the Grafana dashboard and noticed that a couple of months ago the partition count was increased from 6 to 24. 21
  • 22. Problem 1. Kafka guarantees ordering of messages for a key in a partition. 2. But not across partitions for the same key!!! 22
  • 23. The Technical Reason (1) - Kafka assigns partitions to messages by hashing the key of the message - But the increased partition count changed the function! partition_nr = hash(message.key) % partition_count 23
  • 24. The technical reason (2) Most of the messages with a key were written to a different partition after the increase of partition count: probability_off_error = 1 - (1 / partition_count) 24
  • 25. Why would one increase the partition count? - Partition is a scalability unit in Kafka. - write scalability (should fit in one node) - read scalability (consumers consume at least one partition) 25
  • 26. Fix - Required a full re-ingestion of data from the primary datastore into Kafka. - I'd be enough to just write data to differently named topics. - However, we used the situation to upgrade the Kafka cluster from 1.1.1 to 2.4.0 (yes, another Kafka cluster) 26
  • 27. How to prevent such a bug? - Don’t increase partition count if you rely on message ordering! - Do sensible defaults in Kafka settings. - If you don't rely on offset, e.g. message have no meaningful key (think logging), then increase of partition count will not cause any big troubles (just a rebalance of consumer groups). 27