Don't change the partition count for kafka topics!

•

0 likes•88 views

The document discusses how increasing the partition count for a Kafka topic caused a "Heisenbug" by changing the hash function used to assign messages to partitions, violating the ordering guarantee within partitions. This led to Elasticsearch indexing messages out of order and failing to delete documents as expected. The bug was fixed by fully reingesting the data into a new Kafka cluster with a consistent partition count. The key lesson is not to change the partition count if an application relies on ordering of messages within a topic.

Engineering

Don't Change the Partition
Count for Kafka Topics!
Dainius Jocas, Staff Engineer @ Vinted
2021-04-08

Agenda
1. Intro
2. Setup
3. Heisenbug
4. Fix
5. Discussion
2

Intro
I'll tell a story on how we've hunted down a Heisenbug in a system that should
have prevented it by design in the very first place and finally fixed it.
The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency
control, data inconsistencies, and SRE with plenty of good intentions that in a
series of unfortunate circumstances caused a nasty bug.
3

Setup
A full description of the Elasticsearch indexing pipeline setup:
https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/
4

Setup: TL;DR
We use Kafka topic partition offset as an Elasticsearch document version
number.
This trick allows us to parallelize indexing to Elasticsearch and is worry-free from
the data consistency point-of-view.
6

Heisenbug
Elasticsearch fails to delete documents(!!!), i.e. serves stale data???
7

Works on My Machine
- Docker Compose cluster
- Integration tests are in place
- Works as expected
8

Testing
Tested the functionality in the shared testing environment:
● Single node Kafka
● Single node Kafka Connect cluster
● Single node Elasticsearch
Works as expected.
9

Let me try
- I've tried to send a “tombstone” (i.e. Kafka record with null body) message
directly to the Kafka topic.
- Shockingly the document was still present in the Elasticsearch index!!!
10

Once again
A document in an Elasticsearch index should have the _version
that is the offset attribute of the message in a Kafka topic partition.
11

$Elasticsearch has this Document $ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq { "_index": "core-items_20200329084723", "_type": "_doc", "_id": "996229491", "_version": 734232221, "_seq_no": 22502992, "_primary_term": 1, "found": true } Version is 734232221 12$

$Tombstone message $ eim topic delete_records --topic=core-items --keys=996229491 { "offsets": [ { "partition": 17, "offset": 13361612, "error_code": null, "error": null } ] } Version is 13361612 13$

Eureka!
734232221
vs.
13361612
- The newer message has a lower offset???
- How come the "older" record has a higher offset???
15

Somebody Changed the Number of Kafka Topic Partitions!
I've opened the Grafana dashboard and noticed that a couple of months ago
the partition count was increased from 6 to 24.
17

Problem
1. Kafka guarantees ordering of messages for a key in a partition.
2. But not across partitions for the same key!!!
18

The Technical Reason (1)
- Kafka assigns partitions to messages by hashing the key of the message
- But the increased partition count changed the function!
partition_nr = hash(message.key) % partition_count
19

The technical reason (2)
Most of the messages with a key were written to a different partition after the
increase of partition count:
probability_off_error = 1 - (1 / partition_count)
20

Why would one increase the partition count?
- Partition is a scalability unit in Kafka.
- write scalability (should fit in one node)
- read scalability (consumers consume at least one partition)
21

Fix
- Required a full re-ingestion of data from the primary datastore into Kafka.
- I'd be enough to just write data to differently named topics.
- However, we used the situation to upgrade the Kafka cluster from 1.1.1 to
2.4.0 (yes, another Kafka cluster)
22

How to prevent such a bug?
- Don’t increase partition count if you rely on message ordering!
- Do sensible defaults in Kafka settings.
- If you don't rely on offset, e.g. message have no meaningful key (think
logging), then increase of partition count will not cause any big troubles
(just a rebalance of consumer groups).
23

What's hot

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...

Spark Summit

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

Presto in my_use_case

wyukawa

Stepping beyond ETL in batches, large enterprises are looking at ways to generate more up-to-date insights. As we step into the age of Continuous Application, this session will explore the ever more popular Structure Streaming API in Apache Spark, its application to R, and building examples of machine learning use cases. Starting with an introduction to the high-level concepts, the session will dive into the core of the execution plan internals and examine how SparkR extends the existing system to add the streaming capability. Learn how to build various data science applications on data streams integrating with R packages to leverage the rich R ecosystem of 10k+ packages. Session hashtag: #SFdev2

SSR: Structured Streaming for R and Machine Learning

felixcss

In the last two years Apache Kafka rapidly introduced new versions, going from 0.10.x to 2.x. It can be hard to keep up with all the updates and a lot of companies still run 0.10.x clusters (or even older ones). Join this session to learn new exciting features in Kafka introduced in 0.11, 1.0, 1.1 and 2.0 versions including, but not limited to, the new protocol and message headers, transactional support and exactly-only delivery semantics, as well as controller changes that make it possible to shutdown even large clusters in seconds.

Apache Kafka: New Features That You Might Not Know About

Yaroslav Tkachenko

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...

Spark Summit

Blackbird is a large-scale object store built at Rocket Fuel, which stores 100+ TB of data and provides real time access to 10 billion+ objects in a 2-3 milliseconds at a rate of 1 million+ times per second. In this talk (an update from HBaseCon 2014), we will describe Blackbird's comprehensive collections API and various examples of how it can be used to model collections like sets, maps, and aggregates on these collections like counters, etc. We will also illustrate the flexibility and power of the API by modeling custom collection types that are unique to the Rocket Fuel context.

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase

HBaseCon

Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Databricks

"Structured Streaming was a new streaming API introduced to Spark over 2 years ago in Spark 2.0, and was announced GA as of Spark 2.2. Databricks customers have processed over a hundred trillion rows in production using Structured Streaming. We received dozens of questions on how to best develop, monitor, test, deploy and upgrade these jobs. In this talk, we aim to share best practices around what has worked and what hasn't across our customer base. We will tackle questions around how to plan ahead, what kind of code changes are safe for structured streaming jobs, how to architect streaming pipelines which can give you the most flexibility without sacrificing performance by using tools like Databricks Delta, how to best monitor your streaming jobs and alert if your streams are falling behind or are actually failing, as well as how to best test your code."

Productizing Structured Streaming Jobs

Databricks

One of the biggest challenges in data science is to build a continuous data application which delivers results rapidly and reliably. Spark Streaming offers a powerful solution for real-time data processing. However, the challenge remains in how to connect them with various continuous and real-time data sources, guaranteeing the responsiveness and reliability of data applications. In this talk, Nan and Arijit will summarize their experiences learned from serving the real-time Spark-based data analytic solutions on Azure HDInsight. Their solution seamlessly integrates Spark and Azure EventHubs which is a hyper-scale telemetry ingestion service enabling users to ingress massive amounts of telemetry into the cloud and read the data from multiple applications using publish-subscribe semantics. They’ll will cover three topics: bridging the gap of data communication model in Spark and data source, accommodating Spark to rate control and message addressing of data source, and the co-design of fault tolerance Mechanisms. This talk will share the insights on how to build continuous data applications with Spark and boost more availabilities of connectors for Spark and different real-time data sources.

Building Continuous Application with Structured Streaming and Real-Time Data ...

Databricks

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

Data Pipeline team at Demonware (Activision) has to deal with routing large amounts of data from various sources to many destinations every day. Our team always wanted to be able to query processed data for debugging and analytical purposes, but creating large data warehouses was never our priority, since it usually happens downstream. AWS Athena is completely serverless query service that doesn't require any infrastructure setup or complex provisioning. We just needed to save some of our data streams to AWS S3 and define a schema. Just a few simple steps, but in the end we were able to write complex SQL queries against gigabytes of data and get results in seconds. In this presentation I want to show multiple ways to stream your data to AWS S3, explain some underlying tech, show how to define a schema and finally share some of the best practices we applied.

Querying Data Pipeline with AWS Athena

Yaroslav Tkachenko

Apache Spark 2.0 set the architectural foundations of Structure in Spark, Unified high-level APIs, Structured Streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2. Continuing forward in that spirit, the upcoming release of Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. In this talk, we want to share with the community some salient aspects of soon to be released Spark 2.3 features: • Kubernetes Scheduler Backend • PySpark Performance and Enhancements • Continuous Structured Streaming Processing • DataSource v2 APIs • Structured Streaming v2 APIs

What's New in Upcoming Apache Spark 2.3

Databricks

https://www.youtube.com/watch?v=_Wqy1B8PXD4&feature=youtu.be Talk presented by Vu Nguyen, CTO @ Liti Book (Vietnamese) Brief intro: In this talk, I would like to share how we build a system for LitiBook that can handle (1) real-time editing, (2) offline editing, (3) synchronizing between devices and (4) conflict between different editing sessions. There are not many applications out there can do all these above things. (Evernote does not resolve conflict. Hackpad, Trello and Asana do not support offline). So the challenge is really interesting. About speaker: Vu Nguyen is a young and passionate engineer who founded Liti Book with his friend. Liti Book aimed to develp the next generation of productivity tool which is support more collaboratoin, more real-time editing, ... www.grokking.org

Grokking TechTalk 9 - Building a realtime & offline editing service from scra...

Grokking VN

Apache Spark has been a great technology for processing and analyzing Big Data. However, it is not accessible to business users, who don’t have technical or programming skills. In this talk, I’ll talk about recent efforts in the space of “Conversational analytics”. This paradigm allows any user to ask text and voice questions, in natural language, of their data to a bot and receive back a natural language and visual result. A key technology is natural language to SQL translation, where we translate natural language queries from a user into Spark SQL queries that can go against a Databricks system, and that can be easily trained on different schemas and databases. This NLP technology needs to be further combined with dialog management, natural-language generation/narration, data understanding and modeling, augmented analytics and automated visualization generation in order to achieve the goal of “Conversational Analytics”. Using such a technology, a user can ask, in plain English, “How many cases of Covid were there in the last 2 months in states that had no social distancing mandates by type of transmission”, and then dig deeper into the results in a conversational manner to uncover hidden insights from Covid datasets in a Spark instance. We believe that having access to such data and insights at their fingertips can help users make appropriate decisions quickly, improve data literacy and even overcome the scourge of fake news for the general public.

Natural Language Query and Conversational Interface to Apache Spark

Databricks

Specs2 whirlwind tour at Scaladays 2014

Eric Torreborre

Since mid-2016, Spark-as-a-Service has been available to researchers in Sweden from the Rise SICS ICE Data Center at www.hops.site. In this session, Dowling will discuss the challenges in building multi-tenant Spark structured streaming applications on YARN that are metered and easy-to-debug. The platform, called Hopsworks, is in an entirely UI-driven environment built with only open-source software. Learn how they use the ELK stack (Elasticsearch, Logstash and Kibana) for logging and debugging running Spark streaming applications; how they use Grafana and InfluxDB for monitoring Spark streaming applications; and, finally, how Apache Zeppelin can provide interactive visualizations and charts to end-users. This session will also show how Spark applications are run within a ‘project’ on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In addition, hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.afka topics are protected from access by users that are not members of the project. We will also discuss the experiences of our users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling

Databricks

Multi dimension aggregations using spark and dataframes

Romi Kuntsman

20140120 presto meetup_en

Ogibayashi

With components like Spark SQL, MLlib, and Streaming, Spark is a unified engine for building data applications. In this talk, we will take a look at how we use Spark on our own Databricks platform throughout our data pipeline for use cases such as ETL, data warehousing, and real time analysis. We will demonstrate how these applications empower engineering and data analytics. We will also share some lessons learned from building our data pipeline around security and operations. This talk will include examples on how to use Structured Streaming (a.k.a Streaming DataFrames) for online analysis, SparkR for offline analysis, and how we connect multiple sources to achieve a Just-In-Time Data Warehouse.

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Databricks

What's hot (20)

VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Presto in my_use_case

SSR: Structured Streaming for R and Machine Learning

Apache Kafka: New Features That You Might Not Know About

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Productizing Structured Streaming Jobs

Building Continuous Application with Structured Streaming and Real-Time Data ...

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Querying Data Pipeline with AWS Athena

What's New in Upcoming Apache Spark 2.3

Grokking TechTalk 9 - Building a realtime & offline editing service from scra...

Natural Language Query and Conversational Interface to Apache Spark

Specs2 whirlwind tour at Scaladays 2014

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling

Multi dimension aggregations using spark and dataframes

20140120 presto meetup_en

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Similar to Don't change the partition count for kafka topics!

Don't change the partition count for kafka topics!

Dainius Jocas

Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution) Avec l'extraction de données stockées dans une base de données relationnelle à l'aide d'un outil de BI avancé, et avec l'envoi via Kafka des données vers Tachyon, plusieurs sessions Spark peuvent travailler sur le même dataset en limitant la duplication. On obtient grâce à cela une communication à coût contrôlé entre la base de données d'origine et Spark ce qui permet de réintroduire de manière dynamique les données modifiées avec MLlib tout en travaillant sur des données à jour. Les résultats préliminaires seront partagés durant cette présentation.

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...

Modern Data Stack France

Migrating structured data between Hadoop and RDBMS

Bouquet

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

Helena Edelson

What is Apache Kafka®?

Eventador

What is apache Kafka?

Kenny Gorman

"Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved. What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data? What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output? When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency? How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions. These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem."

Designing Structured Streaming Pipelines—How to Architect Things Right

Databricks

In this session we share our experience of building a real-time data pipelines at Tencent PCG - one that handles 20 trillion daily messages with 700 clusters and 100Gb/s bursting traffic from a single app. We discuss our roadmap of enhancing Kafka to break its limits in terms of scalability, robustness and cost of operation. We first built a proxy layer that aggregates physical clusters in a way agnostic to the clients. While this architecture solves many operational problems, it requires significant development to stay future-proof. With retrospection with our customer and careful study of the ongoing work from the community, we then designed a region federation solution in the broker layer, which allows us to deploy clusters at a much larger scale than previously possible, while at the same time providing better failure recovery and operability. We discuss how we make this development compatible with KIP-500 and KIP-405, and the two KIP (693, 694) that we submitted for discussion.

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

HostedbyConfluent

Breakthrough OLAP performance with Cassandra and Spark

Evan Chan

Multitenancy: Kafka clusters for everyone at LINE

kawamuray

SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...

Fred de Villamil

Splice Machine Overview

Kunal Gupta

Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming

Dibyendu Bhattacharya

Open Security Operations Center - OpenSOC

Sheetal Dolas

Jack Gudenkauf sparkug_20151207_7

Jack Gudenkauf

ETL, ELT and Lambda architectures have evolved into a [non]Streaming general purpose data ingestion pipeline, that is scalable through distributed processing, for Big Data Analytics over hybrid Data Warehouses in Hadoop and MPP Columnar stores like HPE-Vertica. Bio: Jack Gudenkauf (https://www.linkedin.com/in/jackglinkedin) has over twenty-nine years of experience designing and implementing Internet scale distributed systems. Jack is currently the CEO & Founder of the startup BigDataInfra. He was previously; VP of Big Data at Playtika, a hands-on manager of the Twitter Analytics Data Warehouse team, spent 15 years at Microsoft shipping 15 products, and prior to Microsoft he managed his own consulting company after he began his career as an MIS Director of several startup companies.

A noETL Parallel Streaming Transformation Loader using Spark, Kafka & Vertica

Data Con LA

User-space Network Processing

Ryousei Takano

Distributed Caching - Cache Unleashed

Avishek Patra

Scaling opensimulator inventory using nosql

David Daeschler

Mysql Latency

srubinstein

Similar to Don't change the partition count for kafka topics! (20)

Don't change the partition count for kafka topics!

HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...

Migrating structured data between Hadoop and RDBMS

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

What is Apache Kafka®?

What is apache Kafka?

Designing Structured Streaming Pipelines—How to Architect Things Right

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

Breakthrough OLAP performance with Cassandra and Spark

Multitenancy: Kafka clusters for everyone at LINE

SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...

Splice Machine Overview

Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming

Open Security Operations Center - OpenSOC

Jack Gudenkauf sparkug_20151207_7

A noETL Parallel Streaming Transformation Loader using Spark, Kafka & Vertica

User-space Network Processing

Distributed Caching - Cache Unleashed

Scaling opensimulator inventory using nosql

Mysql Latency

Recently uploaded

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Coefficient of Thermal Expansion and their Importance.pptx

Asutosh Ranjan

ONLINE FOOD ORDER SYSTEM is a website designed primarily for use in the food delivery industry. This system will allow hotels and restaurants to increase scope of business by reducing the labor cost involved. The system also allows to quickly and easily manage an online menu which customers can browse and use to place orders with just few clicks. Restaurant employees then use these orders through an easy to navigate graphical interface for efficient processing.

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

Kamal Acharya

Online banking management system project.pdf

Kamal Acharya

Vivazz, Mieres Social Housing Design Spain

timesproduction05

AKTU Computer Networks notes --- Unit 3.pdf

ankushspencer015

Structural Analysis and Design of Foundations: A Comprehensive Handbook for Students and Professionals. Unlock the potential of foundation design with Dr. Costas Sachpazis’s enlightening handbook, a meticulously crafted guide poised to become an indispensable resource for both budding and seasoned civil engineers. This comprehensive manual illuminates the theoretical and practical aspects of structural analysis and design across various types of foundations and retaining walls. Within these pages, Dr. Sachpazis distills complex engineering principles into digestible, step-by-step processes, enhanced by detailed diagrams, case studies, and real-world examples that bridge the gap between academic study and professional application. From soil mechanics and load calculations to innovative design techniques and sustainability considerations, this book covers a vast landscape of structural engineering. Key Features: • In-Depth Analysis and Design: Explore thorough explanations of both shallow and deep foundation designs, supported by case studies that demonstrate their practical implementations. • Practical Guides: Benefit from detailed guides on site investigation, bearing capacity calculations, and settlement analysis, ensuring designs are both robust and reliable. • Innovative Techniques: Discover the latest advancements in foundation technology and retaining wall design, preparing you for future trends in civil engineering. • Educational Tools: Utilize this handbook as an educational tool, perfect for both classroom learning and professional development. Whether you're a student eager to learn the fundamentals or a professional seeking to deepen your expertise, Dr. Sachpazis’s handbook is designed to support and inspire excellence in the field of structural engineering. Embrace this opportunity to enhance your skills and contribute to building safer, more efficient structures.

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Dr.Costas Sachpazis

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf

Suman Jyoti

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss This Chance Of Getting Into My Sexy Boobs? Booking Contact Details WhatsApp Chat: +91-8250192130 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 30-april-2024(v.n)

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

ranjana rawat

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Booking Booking Now open +91- 7737669865 Why you Choose Us- +91- 7737669865 HOT⇄ 7737669865 Mr ashu ji Call Mr ashu Ji +91- 7737669865 (V020524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

roncy bisnoi

Data security is rapidly gaining importance as the volume of data companies collect, analyze and monetize grows exponentially. New data processing tools and platforms are emerging at an increasing rate, as are the ways in which an organization consumes data. In this presentation Mukund Sarma and Feni Chawla talk about the unique technical and cultural challenges of running a data security program and share some practical solutions that have worked well at our company. These slides were presented at the BSides Seattle 2024 conference.

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

fenichawla

The Educational Administration: Theory and Practice publishes prominent empirical and conceptual articles focused on timely and critical leadership and policy issues of educational organizations. The journal embraces traditional and emergent research paradigms, methods, and issues. The journal particularly promotes the publication of rigorous and relevant scholarly work that enhances linkages among and utility for educational policy, practice, and research arenas. The goal of the editorial team and the journal’s editorial board is to promote sound scholarship and a clear and continuing dialogue among scholars and practitioners from a broad spectrum of education. Educational Administration: Theory and Practice presents prominent empirical and conceptual articles focused on timely and critical leadership and policy issues facing educational organizations. As an editorial team, we embrace traditional and emergent theoretical frameworks, research methods, and topics. We particularly promote the publication of rigorous and relevant scholarly work with utility for educational policy, practice, and research. The journal’s primary focus is on studies of educational leadership, organizations, leadership development, and policy as they relate to elementary and secondary levels of education. Examinations of leadership and policy that fall outside K-12 are considered insofar as there are meaningful connections to the K-12 arena (e.g., college pipeline). International comparative investigations are welcome to the extent they have implications for a broad audience.s.

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

Christo Ananth

UNIT - IV - Air Compressors and its Performance

sivaprakash250

LIST OF EXPERIMENTS: 1. Implement simple vector addition in Tensor Flow. 2. Implement a regression model in Keras. 3. Implement a perception in TensorFlow/Keras Environment. 4. Implement a Feed Forward Network in TensorFlow/Keras. 5. Implement an image classifier using CNN in TensorFlow/Keras. 6. Improve the deep Learning model by fine tuning hyper parameters. 7. Implement a Transfer Learning concept in image classification. 8. Using a pre trained model on Keras for transfer learning. 9. Perform Sentimental Analysis using RNN. 10. Implement an LSTM based Auto encoding inTensorflow/Keras. 11. Image generation using GAN. ADDITIONAL EXPERIMENTS 12. Train a deep Learning model to classify a given image using pre trained model. 13. Recommendation system from sales data using Deep Learning. 14. Implement Object detection using CNN. 15. Implement any simple Reinforcement Algorithm for an NLP problem.

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Asst.prof M.Gokilavani

Double rodded leveling 1 pdf activity 01

KreezheaRecto

Call Girl Bhosari Indira Call Now: 8617697112 Bhosari Escorts Booking Contact Details WhatsApp Chat: +91-8617697112 Bhosari Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts Bhosari understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide –

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7

Call Girls in Nagpur High Profile Call Girls

UNIT-III FMM. DIMENSIONAL ANALYSIS

rknatarajan

African Journal of Biological Sciences is an International peer-reviewed, Open Access journal that publishes original research articles as well as review articles in all areas of Biological Sciences. It operates a fully open access publishing model which allows open global access to its published content. This model is supported through Article Processing Charges. For more information on Article Processing charges click here. Its scope embraces Animal Sciences, Biochemistry, Bioinformatics, Biotechnology, Botany, Cell Biology, Developmental Biology, Ecology, Environmental Sciences, Ethno Medicine, Food Science, Freshwater Biology, Genetics, Immunology, Marine Biology, Microbiology, Molecular Biology, Physiology, Plant Sciences, Structural Biology,Toxicology,Zoology etc. It is essential that authors prepare their manuscripts according to established specifications. Failure to follow them may result in papers being delayed or rejected. Therefore, contributors are strongly encouraged to read the author guidelines carefully before preparing a manuscript for submission. The manuscripts should be checked carefully for grammatical, punctuation errors. All papers are subjected to peer review. All articles published in this journal represent the opinion of the authors and not reflect the official policy of the Journal of African Journal of Biological Sciences

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Christo Ananth

N-Grade deals with the maintenance of university, department, faculty, student information within the university. N-Grade is an automation system, which is used to store the department, faculty, student, courses and information of a university. Starting from registration of a new student in the university, it maintains all the details regarding the attendance and marks of the students. The project deals with retrieval of information through an INTRANET based campus wide portal. It collects related information from all the departments of an organization and maintains files, which are used to generate reports in various forms to measure individual and overall performance of the students.

University management System project report..pdf

Kamal Acharya

Increased aeration of the soil; Stabilized soil structure; Higher and more diversified crop production; Better workability of the land; Earlier planting dates; Reduction of peak discharges by an increased temporary storage of water in the soil decomposition of organic matter; soil subsidence; reduced irrigation efficiency; increased risk of drought. excessive leaching of valuable nutrients from the soil; downstream environmental damage by salty or otherwise polluted drainage water; the presence of ditches, canals, and structures impending accessibility and interfering with other infrastructural elements of the land.

chapter 5.pptx: drainage and irrigation engineering

mulugeta48

Recently uploaded (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...

Coefficient of Thermal Expansion and their Importance.pptx

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

Online banking management system project.pdf

Vivazz, Mieres Social Housing Design Spain

AKTU Computer Networks notes --- Unit 3.pdf

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

UNIT - IV - Air Compressors and its Performance

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Double rodded leveling 1 pdf activity 01

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7

UNIT-III FMM. DIMENSIONAL ANALYSIS

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

University management System project report..pdf

chapter 5.pptx: drainage and irrigation engineering

Don't change the partition count for kafka topics!

1. Don't Change the Partition Count for Kafka Topics! Dainius Jocas, Staff Engineer @ Vinted 2021-04-08

2. Agenda 1. Intro 2. Setup 3. Heisenbug 4. Fix 5. Discussion 2

3. Intro I'll tell a story on how we've hunted down a Heisenbug in a system that should have prevented it by design in the very first place and finally fixed it. The story involves Kafka, Kafka Connect, Elasticsearch, optimistic concurrency control, data inconsistencies, and SRE with plenty of good intentions that in a series of unfortunate circumstances caused a nasty bug. 3

4. Setup A full description of the Elasticsearch indexing pipeline setup: https://vinted.engineering/2021/01/12/elasticsearch-indexing-pipeline/ 4

5. 5

6. Setup: TL;DR We use Kafka topic partition offset as an Elasticsearch document version number. This trick allows us to parallelize indexing to Elasticsearch and is worry-free from the data consistency point-of-view. 6

7. Heisenbug Elasticsearch fails to delete documents(!!!), i.e. serves stale data??? 7

8. Works on My Machine - Docker Compose cluster - Integration tests are in place - Works as expected 8

9. Testing Tested the functionality in the shared testing environment: ● Single node Kafka ● Single node Kafka Connect cluster ● Single node Elasticsearch Works as expected. 9

10. Let me try - I've tried to send a “tombstone” (i.e. Kafka record with null body) message directly to the Kafka topic. - Shockingly the document was still present in the Elasticsearch index!!! 10

11. Once again A document in an Elasticsearch index should have the _version that is the offset attribute of the message in a Kafka topic partition. 11

12. Elasticsearch has this Document $ curl prod:9200/core-items_20200329084723/_doc/996229491?_source=false | jq { "_index": "core-items_20200329084723", "_type": "_doc", "_id": "996229491", "_version": 734232221, "_seq_no": 22502992, "_primary_term": 1, "found": true } Version is 734232221 12

13. Tombstone message $ eim topic delete_records --topic=core-items --keys=996229491 { "offsets": [ { "partition": 17, "offset": 13361612, "error_code": null, "error": null } ] } Version is 13361612 13

14. Hmm? 734232221 vs. 13361612 14

15. Eureka! 734232221 vs. 13361612 - The newer message has a lower offset??? - How come the "older" record has a higher offset??? 15

16. 16

17. Somebody Changed the Number of Kafka Topic Partitions! I've opened the Grafana dashboard and noticed that a couple of months ago the partition count was increased from 6 to 24. 17

18. Problem 1. Kafka guarantees ordering of messages for a key in a partition. 2. But not across partitions for the same key!!! 18

19. The Technical Reason (1) - Kafka assigns partitions to messages by hashing the key of the message - But the increased partition count changed the function! partition_nr = hash(message.key) % partition_count 19

20. The technical reason (2) Most of the messages with a key were written to a different partition after the increase of partition count: probability_off_error = 1 - (1 / partition_count) 20

21. Why would one increase the partition count? - Partition is a scalability unit in Kafka. - write scalability (should fit in one node) - read scalability (consumers consume at least one partition) 21

22. Fix - Required a full re-ingestion of data from the primary datastore into Kafka. - I'd be enough to just write data to differently named topics. - However, we used the situation to upgrade the Kafka cluster from 1.1.1 to 2.4.0 (yes, another Kafka cluster) 22

23. How to prevent such a bug? - Don’t increase partition count if you rely on message ordering! - Do sensible defaults in Kafka settings. - If you don't rely on offset, e.g. message have no meaningful key (think logging), then increase of partition count will not cause any big troubles (just a rebalance of consumer groups). 23

24. Thank You! 24

Don't change the partition count for kafka topics!

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Don't change the partition count for kafka topics!

Similar to Don't change the partition count for kafka topics! (20)

Recently uploaded

Recently uploaded (20)

Don't change the partition count for kafka topics!