Don't change the partition count for kafka topics!

•

0 likes•88 views

The document discusses how increasing the partition count for a Kafka topic caused a "Heisenbug" by changing the hash function used to assign messages to partitions, violating the ordering guarantee within partitions. This led to Elasticsearch indexing messages out of order and failing to delete documents as expected. The bug was fixed by fully reingesting the data into a new Kafka cluster with a consistent partition count. The key lesson is not to change the partition count if an application relies on ordering of messages within a topic.

Engineering

Don't Change the Partition
Count for Kafka Topics!
Dainius Jocas, Staff Engineer @ Vinted
2021-04-08

Agenda
1. Intro
2. Setup
3. Heisenbug
4. Fix
5. Discussion
2

Intro
I'll tell a story on how we've hunted down a Heisenbug in a system that should
have prevented it by design in the very first place and finally fixed it.
The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency
control, data inconsistencies, and SRE with plenty of good intentions that in a
series of unfortunate circumstances caused a nasty bug.
3

Setup
A full description of the Elasticsearch indexing pipeline setup:
https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/
4

Setup: TL;DR
We use Kafka topic partition offset as an Elasticsearch document version
number.
This trick allows us to parallelize indexing to Elasticsearch and is worry-free from
the data consistency point-of-view.
6

Heisenbug
Elasticsearch fails to delete documents(!!!), i.e. serves stale data???
7

Works on My Machine
- Docker Compose cluster
- Integration tests are in place
- Works as expected
8

Testing
Tested the functionality in the shared testing environment:
● Single node Kafka
● Single node Kafka Connect cluster
● Single node Elasticsearch
Works as expected.
9

Let me try
- I've tried to send a “tombstone” (i.e. Kafka record with null body) message
directly to the Kafka topic.
- Shockingly the document was still present in the Elasticsearch index!!!
10

Once again
A document in an Elasticsearch index should have the _version
that is the offset attribute of the message in a Kafka topic partition.
11

$Elasticsearch has this Document $ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq { "_index": "core-items_20200329084723", "_type": "_doc", "_id": "996229491", "_version": 734232221, "_seq_no": 22502992, "_primary_term": 1, "found": true } Version is 734232221 12$

$Tombstone message $ eim topic delete_records --topic=core-items --keys=996229491 { "offsets": [ { "partition": 17, "offset": 13361612, "error_code": null, "error": null } ] } Version is 13361612 13$

Eureka!
734232221
vs.
13361612
- The newer message has a lower offset???
- How come the "older" record has a higher offset???
15

Somebody Changed the Number of Kafka Topic Partitions!
I've opened the Grafana dashboard and noticed that a couple of months ago
the partition count was increased from 6 to 24.
17

Problem
1. Kafka guarantees ordering of messages for a key in a partition.
2. But not across partitions for the same key!!!
18

The Technical Reason (1)
- Kafka assigns partitions to messages by hashing the key of the message
- But the increased partition count changed the function!
partition_nr = hash(message.key) % partition_count
19

The technical reason (2)
Most of the messages with a key were written to a different partition after the
increase of partition count:
probability_off_error = 1 - (1 / partition_count)
20

Why would one increase the partition count?
- Partition is a scalability unit in Kafka.
- write scalability (should fit in one node)
- read scalability (consumers consume at least one partition)
21

Fix
- Required a full re-ingestion of data from the primary datastore into Kafka.
- I'd be enough to just write data to differently named topics.
- However, we used the situation to upgrade the Kafka cluster from 1.1.1 to
2.4.0 (yes, another Kafka cluster)
22

How to prevent such a bug?
- Don’t increase partition count if you rely on message ordering!
- Do sensible defaults in Kafka settings.
- If you don't rely on offset, e.g. message have no meaningful key (think
logging), then increase of partition count will not cause any big troubles
(just a rebalance of consumer groups).
23

This document discusses lessons learned from scaling Elasticsearch at Vinted, an online second-hand marketplace. It describes the Elasticsearch cluster in early 2020 with over 400 nodes handling 300k requests per minute and 160 million documents. Performance issues included high latency and slow queries during peaks. The document then details optimizations made around indexing IDs as keywords instead of integers, using timestamps instead of date math, and replacing expensive function_score queries with distance_feature queries. It concludes with the improved 2021 cluster handling over 1 million requests per minute on 3 clusters of 160 nodes each, with dedicated staff and testing to support ongoing growth.

Introduction to Presto at Treasure Data

Taro L. Saito

Presto is a distributed SQL query engine that was developed by Facebook to make SQL queries scalable for large datasets. It translates SQL queries into multiple parallel tasks that can process data across many servers without using intermediate storage. This allows Presto to handle millions of records per second. Presto is now open source and used by many companies for interactive analysis of petabyte-scale datasets.

Presto in the cloud

Qubole

Qubole offers Presto as a service, providing an interactive query engine that is 2.5-7x faster than Hive for querying data stored in S3. Customers can write queries without managing the Presto cluster, which Qubole handles along with scheduling, collaboration tools, and REST API support. Qubole has customized Presto for better integration with its Hadoop and Hive implementations, through optimizations, bug fixes, and pre-installed SerDes.

Prestogres internals

Sadayuki Furuhashi

Prestogres is a PostgreSQL protocol gateway for Presto that allows Presto to be queried using standard BI tools through ODBC/JDBC. It works by rewriting queries at the pgpool-II middleware layer and executing the rewritten queries on Presto using PL/Python functions. This allows Presto to integrate with the existing BI tool ecosystem while avoiding the complexity of implementing the full PostgreSQL protocol. Key aspects of the Prestogres implementation include faking PostgreSQL system catalogs, handling multi-statement queries and errors, and security definition. Future work items include better supporting SQL syntax like casts and temporary tables.

Bullet: A Real Time Data Query Engine

DataWorks Summit

Bullet is an open sourced, lightweight, pluggable querying system for streaming data without a persistence layer implemented on top of Storm. It allows you to filter, project, and aggregate on data in transit. It includes a UI and WS. Instead of running queries on a finite set of data that arrived and was persisted or running a static query defined at the startup of the stream, our queries can be executed against an arbitrary set of data arriving after the query is submitted. In other words, it is a look-forward system. Bullet is a multi-tenant system that scales independently of the data consumed and the number of simultaneous queries. Bullet is pluggable into any streaming data source. It can be configured to read from systems such as Storm, Kafka, Spark, Flume, etc. Bullet leverages Sketches to perform its aggregate operations such as distinct, count distinct, sum, count, min, max, and average. An instance of Bullet is currently running at Yahoo against its user engagement data pipeline. We’ll highlight how it is powering internal use-cases such as web page and native app instrumentation validation. Finally, we’ll show a demo of Bullet and go over query performance numbers.

Apache kafka

Daan Gerits

Presto Meetup 2016 Small Start

Hiroshi Toyama

1. The presenter discusses their use of Presto for analytics at their company, including joining data across different data sources and using window functions on MySQL data. 2. They explain how they integrate Presto with other tools like re:dash for visualization and Embulk for ETL workflows. 3. While Presto solves many of their problems, they still require some ETL and have encountered issues like large repository sizes and coordinator bottlenecks.

Logging for Production Systems in The Container Era

Sadayuki Furuhashi

Logging for Production Systems in The Container Era discusses how to effectively collect and analyze logs and metrics in microservices-based container environments. It introduces Fluentd as a centralized log collection service that supports pluggable input/output, buffering, and aggregation. Fluentd allows collecting logs from containers and routing them to storage systems like Kafka, HDFS and Elasticsearch. It also supports parsing, filtering and enriching log data through plugins.

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

Presto in my_use_case

wyukawa

The document summarizes the speaker's use of Presto for log analysis. Key points include: - Presto was selected due to familiarity from others and ease of use compared to other options. - Presto is used for batch queries with Hive and interactive queries. Results are accessed through Cognos using Prestogres. - Managing Presto involves deployment with Ansible, configuration tuning, and monitoring with tools like GrowthForecast and jstat2gf. - While Presto has been stable overall, the speaker notes some version upgrade issues but sees leverage from its frequent updates.

SSR: Structured Streaming for R and Machine Learning

felixcss

Stepping beyond ETL in batches, large enterprises are looking at ways to generate more up-to-date insights. As we step into the age of Continuous Application, this session will explore the ever more popular Structure Streaming API in Apache Spark, its application to R, and building examples of machine learning use cases. Starting with an introduction to the high-level concepts, the session will dive into the core of the execution plan internals and examine how SparkR extends the existing system to add the streaming capability. Learn how to build various data science applications on data streams integrating with R packages to leverage the rich R ecosystem of 10k+ packages. Session hashtag: #SFdev2

Apache Kafka: New Features That You Might Not Know About

Yaroslav Tkachenko

In the last two years Apache Kafka rapidly introduced new versions, going from 0.10.x to 2.x. It can be hard to keep up with all the updates and a lot of companies still run 0.10.x clusters (or even older ones). Join this session to learn new exciting features in Kafka introduced in 0.11, 1.0, 1.1 and 2.0 versions including, but not limited to, the new protocol and message headers, transactional support and exactly-only delivery semantics, as well as controller changes that make it possible to shutdown even large clusters in seconds.

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...

Spark Summit

Kafka Connect allows for building real-time data pipelines with Kafka and Spark Streaming by enabling large-scale streaming data import and export to Kafka. It provides a separation of concerns between connectors that are responsible for importing or exporting data and tasks that run in parallel to perform the work. Kafka Connect supports at least once delivery guarantees through automatic offset checkpointing and recovery. When combined with Spark Streaming, it increases the number of systems Spark Streaming can integrate with and reduces the need for Spark-specific connectors by leveraging Kafka as the streaming data storage layer.

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase

HBaseCon

Blackbird is a large-scale object store built at Rocket Fuel, which stores 100+ TB of data and provides real time access to 10 billion+ objects in a 2-3 milliseconds at a rate of 1 million+ times per second. In this talk (an update from HBaseCon 2014), we will describe Blackbird's comprehensive collections API and various examples of how it can be used to model collections like sets, maps, and aggregates on these collections like counters, etc. We will also illustrate the flexibility and power of the API by modeling custom collection types that are unique to the Rocket Fuel context.

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Databricks

Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.

Productizing Structured Streaming Jobs

Databricks

"Structured Streaming was a new streaming API introduced to Spark over 2 years ago in Spark 2.0, and was announced GA as of Spark 2.2. Databricks customers have processed over a hundred trillion rows in production using Structured Streaming. We received dozens of questions on how to best develop, monitor, test, deploy and upgrade these jobs. In this talk, we aim to share best practices around what has worked and what hasn't across our customer base. We will tackle questions around how to plan ahead, what kind of code changes are safe for structured streaming jobs, how to architect streaming pipelines which can give you the most flexibility without sacrificing performance by using tools like Databricks Delta, how to best monitor your streaming jobs and alert if your streams are falling behind or are actually failing, as well as how to best test your code."

Building Continuous Application with Structured Streaming and Real-Time Data ...

Databricks

This document summarizes a presentation about building a structured streaming connector for continuous applications using Azure Event Hubs as the streaming data source. It discusses key design considerations like representing offsets, implementing the getOffset and getBatch methods required by structured streaming sources, and challenges with testing asynchronous behavior. It also outlines issues contributed back to the Apache Spark community around streaming checkpoints and recovery.

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

Querying Data Pipeline with AWS Athena

Yaroslav Tkachenko

Data Pipeline team at Demonware (Activision) has to deal with routing large amounts of data from various sources to many destinations every day. Our team always wanted to be able to query processed data for debugging and analytical purposes, but creating large data warehouses was never our priority, since it usually happens downstream. AWS Athena is completely serverless query service that doesn't require any infrastructure setup or complex provisioning. We just needed to save some of our data streams to AWS S3 and define a schema. Just a few simple steps, but in the end we were able to write complex SQL queries against gigabytes of data and get results in seconds. In this presentation I want to show multiple ways to stream your data to AWS S3, explain some underlying tech, show how to define a schema and finally share some of the best practices we applied.

What's New in Upcoming Apache Spark 2.3

Databricks

Apache Spark 2.0 set the architectural foundations of Structure in Spark, Unified high-level APIs, Structured Streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2. Continuing forward in that spirit, the upcoming release of Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. In this talk, we want to share with the community some salient aspects of soon to be released Spark 2.3 features: • Kubernetes Scheduler Backend • PySpark Performance and Enhancements • Continuous Structured Streaming Processing • DataSource v2 APIs • Structured Streaming v2 APIs

Grokking TechTalk 9 - Building a realtime & offline editing service from scra...

Grokking VN

https://www.youtube.com/watch?v=_Wqy1B8PXD4&feature=youtu.be Talk presented by Vu Nguyen, CTO @ Liti Book (Vietnamese) Brief intro: In this talk, I would like to share how we build a system for LitiBook that can handle (1) real-time editing, (2) offline editing, (3) synchronizing between devices and (4) conflict between different editing sessions. There are not many applications out there can do all these above things. (Evernote does not resolve conflict. Hackpad, Trello and Asana do not support offline). So the challenge is really interesting. About speaker: Vu Nguyen is a young and passionate engineer who founded Liti Book with his friend. Liti Book aimed to develp the next generation of productivity tool which is support more collaboratoin, more real-time editing, ... www.grokking.org

Natural Language Query and Conversational Interface to Apache Spark

Databricks

Apache Spark has been a great technology for processing and analyzing Big Data. However, it is not accessible to business users, who don’t have technical or programming skills. In this talk, I’ll talk about recent efforts in the space of “Conversational analytics”. This paradigm allows any user to ask text and voice questions, in natural language, of their data to a bot and receive back a natural language and visual result. A key technology is natural language to SQL translation, where we translate natural language queries from a user into Spark SQL queries that can go against a Databricks system, and that can be easily trained on different schemas and databases. This NLP technology needs to be further combined with dialog management, natural-language generation/narration, data understanding and modeling, augmented analytics and automated visualization generation in order to achieve the goal of “Conversational Analytics”. Using such a technology, a user can ask, in plain English, “How many cases of Covid were there in the last 2 months in states that had no social distancing mandates by type of transmission”, and then dig deeper into the results in a conversational manner to uncover hidden insights from Covid datasets in a Spark instance. We believe that having access to such data and insights at their fingertips can help users make appropriate decisions quickly, improve data literacy and even overcome the scourge of fake news for the general public.

Specs2 whirlwind tour at Scaladays 2014

Eric Torreborre

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling

Databricks

Since mid-2016, Spark-as-a-Service has been available to researchers in Sweden from the Rise SICS ICE Data Center at www.hops.site. In this session, Dowling will discuss the challenges in building multi-tenant Spark structured streaming applications on YARN that are metered and easy-to-debug. The platform, called Hopsworks, is in an entirely UI-driven environment built with only open-source software. Learn how they use the ELK stack (Elasticsearch, Logstash and Kibana) for logging and debugging running Spark streaming applications; how they use Grafana and InfluxDB for monitoring Spark streaming applications; and, finally, how Apache Zeppelin can provide interactive visualizations and charts to end-users. This session will also show how Spark applications are run within a ‘project’ on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In addition, hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.afka topics are protected from access by users that are not members of the project. We will also discuss the experiences of our users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.

Multi dimension aggregations using spark and dataframes

Romi Kuntsman

20140120 presto meetup_en

Ogibayashi

Presto was used to analyze logs collected in a Hadoop cluster. It provided faster query performance compared to Hive+Tez, with results returning in seconds rather than hours. Presto was deployed across worker nodes and performed better than Hive+Tez for different query and data formats. With repeated queries, Presto's performance improved further due to caching, while Hive+Tez showed no change. Overall, Presto demonstrated itself to be a faster solution for interactive queries on large log data.

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Databricks

With components like Spark SQL, MLlib, and Streaming, Spark is a unified engine for building data applications. In this talk, we will take a look at how we use Spark on our own Databricks platform throughout our data pipeline for use cases such as ETL, data warehousing, and real time analysis. We will demonstrate how these applications empower engineering and data analytics. We will also share some lessons learned from building our data pipeline around security and operations. This talk will include examples on how to use Structured Streaming (a.k.a Streaming DataFrames) for online analysis, SparkR for offline analysis, and how we connect multiple sources to achieve a Just-In-Time Data Warehouse.

Don't change the partition count for kafka topics!

Dainius Jocas

This document describes how increasing the partition count for a Kafka topic caused a "Heisenbug" where documents were not properly deleted from Elasticsearch. The system relied on Kafka message offsets to correspond to Elasticsearch document versions for consistency. However, increasing partitions changed the hashing function used to assign messages to partitions, causing messages for the same key to be stored in different partitions and have non-sequential offsets. This broke the correspondence between offsets and versions, resulting in documents failing to be deleted when tombstone messages were processed. The issue was resolved by reingesting all data into Kafka with a new topic using sequential partitioning.

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...

Modern Data Stack France

Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution) Avec l'extraction de données stockées dans une base de données relationnelle à l'aide d'un outil de BI avancé, et avec l'envoi via Kafka des données vers Tachyon, plusieurs sessions Spark peuvent travailler sur le même dataset en limitant la duplication. On obtient grâce à cela une communication à coût contrôlé entre la base de données d'origine et Spark ce qui permet de réintroduire de manière dynamique les données modifiées avec MLlib tout en travaillant sur des données à jour. Les résultats préliminaires seront partagés durant cette présentation.

What's hot

VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...

Spark Summit

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

Presto in my_use_case

wyukawa

SSR: Structured Streaming for R and Machine Learning

felixcss

Apache Kafka: New Features That You Might Not Know About

Yaroslav Tkachenko

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...

Spark Summit

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase

HBaseCon

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Databricks

Productizing Structured Streaming Jobs

Databricks

Building Continuous Application with Structured Streaming and Real-Time Data ...

Databricks

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

Querying Data Pipeline with AWS Athena

Yaroslav Tkachenko

What's New in Upcoming Apache Spark 2.3

Databricks

Grokking TechTalk 9 - Building a realtime & offline editing service from scra...

Grokking VN

Natural Language Query and Conversational Interface to Apache Spark

Databricks

Specs2 whirlwind tour at Scaladays 2014

Eric Torreborre

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling

Databricks

Multi dimension aggregations using spark and dataframes

Romi Kuntsman

20140120 presto meetup_en

Ogibayashi

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Databricks

What's hot (20)

VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Presto in my_use_case

SSR: Structured Streaming for R and Machine Learning

Apache Kafka: New Features That You Might Not Know About

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Productizing Structured Streaming Jobs

Building Continuous Application with Structured Streaming and Real-Time Data ...

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Querying Data Pipeline with AWS Athena

What's New in Upcoming Apache Spark 2.3

Grokking TechTalk 9 - Building a realtime & offline editing service from scra...

Natural Language Query and Conversational Interface to Apache Spark

Specs2 whirlwind tour at Scaladays 2014

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling

Multi dimension aggregations using spark and dataframes

20140120 presto meetup_en

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Similar to Don't change the partition count for kafka topics!

Don't change the partition count for kafka topics!

Dainius Jocas

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...

Modern Data Stack France

Migrating structured data between Hadoop and RDBMS

Bouquet

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

Helena Edelson

What is Apache Kafka®?

Eventador

What is apache Kafka?

Kenny Gorman

Designing Structured Streaming Pipelines—How to Architect Things Right

Databricks

"Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved. What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data? What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output? When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency? How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions. These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem."

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

HostedbyConfluent

In this session we share our experience of building a real-time data pipelines at Tencent PCG - one that handles 20 trillion daily messages with 700 clusters and 100Gb/s bursting traffic from a single app. We discuss our roadmap of enhancing Kafka to break its limits in terms of scalability, robustness and cost of operation. We first built a proxy layer that aggregates physical clusters in a way agnostic to the clients. While this architecture solves many operational problems, it requires significant development to stay future-proof. With retrospection with our customer and careful study of the ongoing work from the community, we then designed a region federation solution in the broker layer, which allows us to deploy clusters at a much larger scale than previously possible, while at the same time providing better failure recovery and operability. We discuss how we make this development compatible with KIP-500 and KIP-405, and the two KIP (693, 694) that we submitted for discussion.

Breakthrough OLAP performance with Cassandra and Spark

Evan Chan

Multitenancy: Kafka clusters for everyone at LINE

kawamuray

Yuto Kawamura from LINE Corporation presented on their use of Apache Kafka clusters to provide multitenancy for different internal teams. They face challenges in ensuring isolation between client workloads and preventing abusive clients. Their solutions include request quotas to limit client resource usage, slow logs to identify slow requests, and changes to the broker code to pre-warm caches and minimize the impact of disk reads during message fetching. With these approaches, they are able to reliably operate shared Kafka clusters with high throughput and multiple tenants.

SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...

Fred de Villamil

Splice Machine Overview

Kunal Gupta

Splice Machine is a SQL relational database management system built on Hadoop. It aims to provide the scalability, flexibility and cost-effectiveness of Hadoop with the transactional consistency, SQL support and real-time capabilities of a traditional RDBMS. Key features include ANSI SQL support, horizontal scaling on commodity hardware, distributed transactions using multi-version concurrency control, and massively parallel query processing by pushing computations down to individual HBase regions. It combines Apache Derby for SQL parsing and processing with HBase/HDFS for storage and distribution. This allows it to elastically scale out while supporting rich SQL, transactions, analytics and real-time updates on large datasets.

Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming

Dibyendu Bhattacharya

Open Security Operations Center - OpenSOC

Sheetal Dolas

A noETL Parallel Streaming Transformation Loader using Spark, Kafka & Vertica

Data Con LA

ETL, ELT and Lambda architectures have evolved into a [non]Streaming general purpose data ingestion pipeline, that is scalable through distributed processing, for Big Data Analytics over hybrid Data Warehouses in Hadoop and MPP Columnar stores like HPE-Vertica. Bio: Jack Gudenkauf (https://www.linkedin.com/in/jackglinkedin) has over twenty-nine years of experience designing and implementing Internet scale distributed systems. Jack is currently the CEO & Founder of the startup BigDataInfra. He was previously; VP of Big Data at Playtika, a hands-on manager of the Twitter Analytics Data Warehouse team, spent 15 years at Microsoft shipping 15 products, and prior to Microsoft he managed his own consulting company after he began his career as an MIS Director of several startup companies.

Jack Gudenkauf sparkug_20151207_7

Jack Gudenkauf

This document describes a Parallel Streaming Transformation Loader (PSTL) that uses Kafka, Spark, and Vertica for real-time data ingestion and analytics. It summarizes the PSTL as follows: 1. The PSTL ingests streaming data from Kafka into Spark RDDs in parallel. 2. Spark is used to transform the data, including assigning IDs and hashing records to partitions. 3. The transformed data is written in parallel from the Spark partitions directly to Vertica for analytics and querying. 4. Vertica demonstrated impressive parallel copy performance of 2.42 billion rows in under 8 minutes using this approach.

User-space Network Processing

Ryousei Takano

This document discusses optimizations for TCP/IP networking performance on multicore systems. It describes several inefficiencies in the Linux kernel TCP/IP stack related to shared resources between cores, broken data locality, and per-packet processing overhead. It then introduces mTCP, a user-level TCP/IP stack that addresses these issues through a thread model with pairwise threading, batch packet processing from I/O to applications, and a BSD-like socket API. mTCP achieves a 2.35x performance improvement over the kernel TCP/IP stack on a web server workload.

Distributed Caching - Cache Unleashed

Avishek Patra

Scaling opensimulator inventory using nosql

David Daeschler

Mysql Latency

srubinstein

The document discusses strategies for managing replication latency in a distributed database system. It provides examples of average and maximum replication latencies between different database nodes. It also summarizes different approaches tried to reliably clear caches when data is updated, including using a multicast notification bus, database queues, and splitting data functionally across nodes.

Similar to Don't change the partition count for kafka topics! (20)

Don't change the partition count for kafka topics!

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...

Migrating structured data between Hadoop and RDBMS

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

What is Apache Kafka®?

What is apache Kafka?

Designing Structured Streaming Pipelines—How to Architect Things Right

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

Breakthrough OLAP performance with Cassandra and Spark

Multitenancy: Kafka clusters for everyone at LINE

SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...

Splice Machine Overview

Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming

Open Security Operations Center - OpenSOC

A noETL Parallel Streaming Transformation Loader using Spark, Kafka & Vertica

Jack Gudenkauf sparkug_20151207_7

User-space Network Processing

Distributed Caching - Cache Unleashed

Scaling opensimulator inventory using nosql

Mysql Latency

Recently uploaded

Wearable antenna for antenna applications

Madhumitha Jayaram

22CYT12-Unit-V-E Waste and its Management.ppt

KrishnaveniKrishnara1

Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.

Properties Railway Sleepers and Test.pptx

MDSABBIROJJAMANPAYEL

sieving analysis and results interpretation

ssuser36d3051

Embedded machine learning-based road conditions and driving behavior monitoring

IJECEIAES

Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.

CSM Cloud Service Management Presentarion

rpskprasana

bank management system in java and mysql report1.pdf

Divyam548318

Generative AI leverages algorithms to create various forms of content

Hitesh Mohapatra

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

insn4465

原版一模一样【微信：741003700 】【(csu毕业证书)查尔斯特大学毕业证硕士学历】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf

MIGUELANGEL966976

2. Operations Strategy in a Global Environment.ppt

PuktoonEngr

Exception Handling notes in java exception

Ratnakar Mikkili

A review on techniques and modelling methodologies used for checking electrom...

nooriasukmaningtyas

The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.

International Conference on NLP, Artificial Intelligence, Machine Learning an...

gerogepatton

International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.

PPT on GRP pipes manufacturing and testing

anoopmanoharan2

Heat Resistant Concrete Presentation ppt

mamunhossenbd75

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...

IJECEIAES

Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network

Series of visio cisco devices Cisco_Icons.ppt

PauloRodrigues104553

6th International Conference on Machine Learning & Applications (CMLA 2024)

ClaraZara1

ACEP Magazine edition 4th launched on 05.06.2024

Rahul

This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.

Recently uploaded (20)

Wearable antenna for antenna applications

22CYT12-Unit-V-E Waste and its Management.ppt

Properties Railway Sleepers and Test.pptx

sieving analysis and results interpretation

Embedded machine learning-based road conditions and driving behavior monitoring

CSM Cloud Service Management Presentarion

bank management system in java and mysql report1.pdf

Generative AI leverages algorithms to create various forms of content

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf

2. Operations Strategy in a Global Environment.ppt

Exception Handling notes in java exception

A review on techniques and modelling methodologies used for checking electrom...

International Conference on NLP, Artificial Intelligence, Machine Learning an...

PPT on GRP pipes manufacturing and testing

Heat Resistant Concrete Presentation ppt

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...

Series of visio cisco devices Cisco_Icons.ppt

6th International Conference on Machine Learning & Applications (CMLA 2024)

ACEP Magazine edition 4th launched on 05.06.2024

Don't change the partition count for kafka topics!

1. Don't Change the Partition Count for Kafka Topics! Dainius Jocas, Staff Engineer @ Vinted 2021-04-08

2. Agenda 1. Intro 2. Setup 3. Heisenbug 4. Fix 5. Discussion 2

3. Intro I'll tell a story on how we've hunted down a Heisenbug in a system that should have prevented it by design in the very first place and finally fixed it. The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency control, data inconsistencies, and SRE with plenty of good intentions that in a series of unfortunate circumstances caused a nasty bug. 3

4. Setup A full description of the Elasticsearch indexing pipeline setup: https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/ 4

5. 5

6. Setup: TL;DR We use Kafka topic partition offset as an Elasticsearch document version number. This trick allows us to parallelize indexing to Elasticsearch and is worry-free from the data consistency point-of-view. 6

7. Heisenbug Elasticsearch fails to delete documents(!!!), i.e. serves stale data??? 7

8. Works on My Machine - Docker Compose cluster - Integration tests are in place - Works as expected 8

9. Testing Tested the functionality in the shared testing environment: ● Single node Kafka ● Single node Kafka Connect cluster ● Single node Elasticsearch Works as expected. 9

10. Let me try - I've tried to send a “tombstone” (i.e. Kafka record with null body) message directly to the Kafka topic. - Shockingly the document was still present in the Elasticsearch index!!! 10

11. Once again A document in an Elasticsearch index should have the _version that is the offset attribute of the message in a Kafka topic partition. 11

12. Elasticsearch has this Document $ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq { "_index": "core-items_20200329084723", "_type": "_doc", "_id": "996229491", "_version": 734232221, "_seq_no": 22502992, "_primary_term": 1, "found": true } Version is 734232221 12

13. Tombstone message $ eim topic delete_records --topic=core-items --keys=996229491 { "offsets": [ { "partition": 17, "offset": 13361612, "error_code": null, "error": null } ] } Version is 13361612 13

14. Hmm? 734232221 vs. 13361612 14

15. Eureka! 734232221 vs. 13361612 - The newer message has a lower offset??? - How come the "older" record has a higher offset??? 15

16. 16

17. Somebody Changed the Number of Kafka Topic Partitions! I've opened the Grafana dashboard and noticed that a couple of months ago the partition count was increased from 6 to 24. 17

18. Problem 1. Kafka guarantees ordering of messages for a key in a partition. 2. But not across partitions for the same key!!! 18

19. The Technical Reason (1) - Kafka assigns partitions to messages by hashing the key of the message - But the increased partition count changed the function! partition_nr = hash(message.key) % partition_count 19

20. The technical reason (2) Most of the messages with a key were written to a different partition after the increase of partition count: probability_off_error = 1 - (1 / partition_count) 20

21. Why would one increase the partition count? - Partition is a scalability unit in Kafka. - write scalability (should fit in one node) - read scalability (consumers consume at least one partition) 21

22. Fix - Required a full re-ingestion of data from the primary datastore into Kafka. - I'd be enough to just write data to differently named topics. - However, we used the situation to upgrade the Kafka cluster from 1.1.1 to 2.4.0 (yes, another Kafka cluster) 22

23. How to prevent such a bug? - Don’t increase partition count if you rely on message ordering! - Do sensible defaults in Kafka settings. - If you don't rely on offset, e.g. message have no meaningful key (think logging), then increase of partition count will not cause any big troubles (just a rebalance of consumer groups). 23

24. Thank You! 24

Don't change the partition count for kafka topics!

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Don't change the partition count for kafka topics!

Similar to Don't change the partition count for kafka topics! (20)

Recently uploaded

Recently uploaded (20)

Don't change the partition count for kafka topics!