"In this talk, we delve into a specific detail of the latest evolution of Kafka's consumer group protocol as introduced in KIP-848 - Server-Side Assignors. This transformation not only streamlines client operations but also introduces a new dimension of efficiency and flexibility in message processing.
Before exploring the features of server-side assignors, it's essential to understand the fundamental concept of Kafka consumer groups. In Kafka, a consumer group is a collection of consumers working together to process data from one or more topics. The assignors are responsible for dividing the partitions of topics subscribed to by the group - among the consumers.
The server side assignor is pluggable and the client can choose the one that it wants to use by providing its name in the heartbeat request. There are two types of assignors configured in the coordinator: Range and Uniform.
The Range Assignor, known for its ability to distribute partitions across consumers evenly, ensures that each consumer handles at least one partition per topic, making it an ideal choice for scenarios with fluctuating topic demands. Conversely, the Uniform Assignor, which comes in two flavors – Optimized and General – adopts a more intelligent approach, selecting the most suitable assignment strategy based on the group's subscription pattern.
By the end of this session, attendees will gain a deeper understanding of Kafka's new server-side assignors, equipped with the knowledge to make informed decisions for optimizing their consumer group configurations."
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way.
Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks.
Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case.
Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."
"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics.
Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
"Trendyol, Turkey's leading e-commerce company, is committed to positively impacting the lives of millions of customers. Our decision-making processes are entirely driven by data. As a data warehouse team, our primary goal is to provide accurate and up-to-date data, enabling the extraction of valuable business insights.
We utilize the benefits provided by Kafka and Kafka Connect to facilitate the transfer of data from the source to our analytical environment. We recently transitioned our Kafka Connect clusters from on-premise VMs to Kubernetes. This shift was driven by our desire to effectively manage rapid growth(marked by a growing number of producers, consumers, and daily messages), ensuring proper monitoring and consistency. Consistency is crucial, especially in instances where we employ Single Message Transforms to manipulate records like filtering based on their keys or converting a JSON Object into a JSON string.
Monitoring our cluster's health is key and we achieve this through Grafana dashboards and alerts generated through kube-state-metrics. Additionally, Kafka Connect's JMX metrics, coupled with NewRelic, are employed for comprehensive monitoring.
The session will aim to explain our approach to NRT data ingestion, outlining the role of Kafka and Kafka Connect, our transition journey to K8s, and methods employed to monitor the health of our clusters."
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
"Join our lightning talk to delve into the strategies vital for maintaining a resilient Kafka service.
While proactive monitoring is key for issue prevention, failures will still occur. Rapid detection tools will enable you to identify and resolve problems before they impact end-users. This session explores the techniques employed by Kafka cloud providers for this detection, many of which are also applicable if you are managing independent Kafka clusters or applications.
The talk focuses on health-checking, a powerful tool that encompasses an application and its monitoring to validate Kafka environment availability. The session navigates through Kafka health-check methods, sharing best practices, identifying common pitfalls, and highlighting the monitoring of critical performance metrics like throughput and latency for early issue detection.
Attendees will gain valuable insights into the art of health-checking their Kafka environment, equipping them with the tools to identify and address issues before they escalate into critical problems. We invite all Kafka enthusiasts to join us in this talk to foster a deeper understanding of Kafka health-checking and ensure the continued smooth operation of your Kafka environment."
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
"Stream processing systems traditionally gave their users the choice between at least once processing and at most once processing: accepting duplicate data or missing data. But ideally we would provide exactly-once processing, where every event in the input data is represented exactly once in the output.
Kafka provides a transaction API that enables exactly-once when using Kafka as your source and sink. But this API has turned out to not be well suited for use by high level streaming systems, requiring various work arounds to still provide transactional processing.
In this talk, I’ll cover how the transaction API works, and how systems like Arroyo and Flink have used it to build exactly-once support, and how improvements to the transactional API will enable better end-to-end support for consistent stream processing."
"In this talk, we will explore the exciting world of IoT and computer vision by presenting a unique project: Fish Plays Pokemon. Using an ESP Eye camera connected to an ESP32 and other IoT devices, to monitor fish's movements in an aquarium.
This project showcases the power of IoT and computer vision, demonstrating how even a fish can play a popular video game. We will discuss the challenges we faced during development, including real-time processing, IoT device integration, and Kafka message consumption.
By the end of the talk, attendees will have a better understanding of how to combine IoT, computer vision, and the usage of a serverless cloud to create innovative projects. They will also learn how to integrate IoT devices with Kafka to simulate keyboard behavior, opening up endless possibilities for real-time interactions between the physical and digital worlds."
What is tiered storage and what is it good for? After this session you will know how to leverage the tiered storage feature to enable longer retention than the storage attached to brokers allows. You will get acquainted with the different configuration options and know what to expect when you enable the feature, like for example when will the first upload to the remote object storage take place.
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
"Real-time 24/7 monitoring and verification of massive data is challenging – even more so for the world’s second largest manufacturer of memory chips and semiconductors. Tolerance levels are incredibly small, any small defect needs to be identified and dealt with immediately. The goal of semiconductor manufacturing is to improve yield and minimize unnecessary work.
However, even with real-time data collection, the data was not easy to manipulate by users and it took many days to enable stream processing requests – limiting its usefulness and value to the business.
You’ll hear why SK hynix switched to Confluent and how we developed a self-service stream process portal on top of it. Now users have an easy-to-use service to manipulate the data they want.
Results have been impressive, stream processing requests are available the same day – previously taking 5 days! We were also able to drive down costs by 10% as stream processing requests no longer require additional hardware.
What you’ll take away from our talk:
- What were the pain points in the previous environment
- How we transitioned to Confluent without service downtime
- Creating a self-service stream processing portal built on top of Connect and ksqlDB
- Use case of stream process portal"
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way.
Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks.
Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case.
Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."
"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics.
Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
"Trendyol, Turkey's leading e-commerce company, is committed to positively impacting the lives of millions of customers. Our decision-making processes are entirely driven by data. As a data warehouse team, our primary goal is to provide accurate and up-to-date data, enabling the extraction of valuable business insights.
We utilize the benefits provided by Kafka and Kafka Connect to facilitate the transfer of data from the source to our analytical environment. We recently transitioned our Kafka Connect clusters from on-premise VMs to Kubernetes. This shift was driven by our desire to effectively manage rapid growth(marked by a growing number of producers, consumers, and daily messages), ensuring proper monitoring and consistency. Consistency is crucial, especially in instances where we employ Single Message Transforms to manipulate records like filtering based on their keys or converting a JSON Object into a JSON string.
Monitoring our cluster's health is key and we achieve this through Grafana dashboards and alerts generated through kube-state-metrics. Additionally, Kafka Connect's JMX metrics, coupled with NewRelic, are employed for comprehensive monitoring.
The session will aim to explain our approach to NRT data ingestion, outlining the role of Kafka and Kafka Connect, our transition journey to K8s, and methods employed to monitor the health of our clusters."
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
"Join our lightning talk to delve into the strategies vital for maintaining a resilient Kafka service.
While proactive monitoring is key for issue prevention, failures will still occur. Rapid detection tools will enable you to identify and resolve problems before they impact end-users. This session explores the techniques employed by Kafka cloud providers for this detection, many of which are also applicable if you are managing independent Kafka clusters or applications.
The talk focuses on health-checking, a powerful tool that encompasses an application and its monitoring to validate Kafka environment availability. The session navigates through Kafka health-check methods, sharing best practices, identifying common pitfalls, and highlighting the monitoring of critical performance metrics like throughput and latency for early issue detection.
Attendees will gain valuable insights into the art of health-checking their Kafka environment, equipping them with the tools to identify and address issues before they escalate into critical problems. We invite all Kafka enthusiasts to join us in this talk to foster a deeper understanding of Kafka health-checking and ensure the continued smooth operation of your Kafka environment."
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
"Stream processing systems traditionally gave their users the choice between at least once processing and at most once processing: accepting duplicate data or missing data. But ideally we would provide exactly-once processing, where every event in the input data is represented exactly once in the output.
Kafka provides a transaction API that enables exactly-once when using Kafka as your source and sink. But this API has turned out to not be well suited for use by high level streaming systems, requiring various work arounds to still provide transactional processing.
In this talk, I’ll cover how the transaction API works, and how systems like Arroyo and Flink have used it to build exactly-once support, and how improvements to the transactional API will enable better end-to-end support for consistent stream processing."
"In this talk, we will explore the exciting world of IoT and computer vision by presenting a unique project: Fish Plays Pokemon. Using an ESP Eye camera connected to an ESP32 and other IoT devices, to monitor fish's movements in an aquarium.
This project showcases the power of IoT and computer vision, demonstrating how even a fish can play a popular video game. We will discuss the challenges we faced during development, including real-time processing, IoT device integration, and Kafka message consumption.
By the end of the talk, attendees will have a better understanding of how to combine IoT, computer vision, and the usage of a serverless cloud to create innovative projects. They will also learn how to integrate IoT devices with Kafka to simulate keyboard behavior, opening up endless possibilities for real-time interactions between the physical and digital worlds."
What is tiered storage and what is it good for? After this session you will know how to leverage the tiered storage feature to enable longer retention than the storage attached to brokers allows. You will get acquainted with the different configuration options and know what to expect when you enable the feature, like for example when will the first upload to the remote object storage take place.
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
"Real-time 24/7 monitoring and verification of massive data is challenging – even more so for the world’s second largest manufacturer of memory chips and semiconductors. Tolerance levels are incredibly small, any small defect needs to be identified and dealt with immediately. The goal of semiconductor manufacturing is to improve yield and minimize unnecessary work.
However, even with real-time data collection, the data was not easy to manipulate by users and it took many days to enable stream processing requests – limiting its usefulness and value to the business.
You’ll hear why SK hynix switched to Confluent and how we developed a self-service stream process portal on top of it. Now users have an easy-to-use service to manipulate the data they want.
Results have been impressive, stream processing requests are available the same day – previously taking 5 days! We were also able to drive down costs by 10% as stream processing requests no longer require additional hardware.
What you’ll take away from our talk:
- What were the pain points in the previous environment
- How we transitioned to Confluent without service downtime
- Creating a self-service stream processing portal built on top of Connect and ksqlDB
- Use case of stream process portal"
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
"Discover how default configurations might impact ingestion times, especially when dealing with large files. We'll explore a real-world scenario with a 20,000,000+ line file, assessing metrics and exploring the bottleneck in the default setup. Understand the intricacies of batch size calculations and how to optimize them based on your unique data characteristics.
Walk away with actionable insights as we showcase a practical example, turning a 7-hour ingestion process into a mere 30 minutes for over 30,000,000 records in a Kafka topic. Uncover metrics, configurations, and best practices to elevate the performance of your Kafka Connect CSV source connectors. Don't miss this opportunity to optimize your data pipeline and ensure smooth, efficient data flow."
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
"In order to meet the current and ever-increasing demand for near-zero RPO/RTO systems, a focus on resiliency is critical. While Kafka offers built-in resiliency features, a perfect blend of client and cluster resiliency is necessary in order to achieve a highly resilient Kafka client application.
At Fidelity Investments, Kafka is used for a variety of event streaming needs such as core brokerage trading platforms, log aggregation, communication platforms, and data migrations. In this lightening talk, we will discuss the governance framework that has enabled producers and consumers to achieve their SLAs during unprecedented failure scenarios. We will highlight how we automated resiliency tests through chaos engineering and tightly integrated observability dashboards for Kafka clients to analyze and optimize client configurations. And finally, we will summarize the chaos test suite and the ""test, test and test"" mantra that are helping Fidelity Investments reach its goal of a future with zero down-time."
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
"There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option!
In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds."
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met.
I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines.
Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
"Almost everyone has heard about large language models, and tens of millions of people have tried out OpenAI ChatGPT and Google Bard. However, the intricate architecture and underlying mathematics driving these remarkable systems remain elusive to many.
LLM's are fascinating - so let's grab a drink and find out how these systems are built and dive deep into their inner workings. In the length of time it to enjoy a round of drinks, you'll understand the inner workings of these models. We'll take our first sip of word vectors, enjoy the refreshing taste of the transformer, and drain a glass understanding how these models are trained on phenomenally large quantities of data.
Large language models for your streaming application - explained with a little maths and a lot of pub stories"
"Monitoring is a fundamental operation when running Kafka and Kafka applications in production. There are numerous metrics available when using Kafka, however the sheer number is overwhelming, making it challenging to know where to start and how to properly utilise them.
This session will introduce you to some of the key metrics that should be monitored and best practices in fine tuning your monitoring. We will delve into which metrics are the indicators for cluster’s availability and performance and are the most helpful when debugging client applications."
Kafka Streams relies on state restoration for maintaining standby tasks as failure recovery mechanism as well as for restoring the state after rebalance scenarios. When you are scaling up or down your application instances, it is necessary to know the current state of the restoration process for each active and standby task in order to prevent a long restoration process as much as possible. During this presentation, you will get an understanding of how KIP-869 provides valuable information about the current active task restoration after a rebalance and KIP-988 opens a window to the continuous process of standby restoration. When you encounter a situation in which you need to choose whether or not to scale up or down your application instances, both KIPs will be an invaluable ally for you.
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
"In this talk, we will dive into the world of Kafka producer configs and explore how to understand and optimize them for better performance. We will cover the different types of configs, their impact on performance, and how to tune them to achieve the best results. Whether you're new to Kafka or a seasoned pro, this session will provide valuable insights and practical tips for improving your Kafka producer performance.
- Introduction to Kafka producer internal and workflow
- Understanding the producer configs like linger.ms, batch.size, buffer.memory and their impact on performance
- Learning about producer configs like max.block.ms, delivery.timeout.ms, request.timeout.ms and retries to make producer more resilient.
- Discuss configs like enable.idempotence, max.in.flight.requests.per.connection and transaction related configs to achieve delivery guarantees.
- Q&A session with attendees to address specific questions and concerns."
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
"Data contracts are one of the hottest topics in the data management community. A data contract is a formal agreement between a data producer and its consumers, aimed at reducing data downtime and improving data quality. Schemas are an important part of data contracts, but they are not the only relevant element.
In this talk, we’ll:
1. see why data contracts are so important but also difficult to implement;
2. identify the characteristics of a well-designed data contract:
discuss the anatomy of a data contract, its main elements and, how to formally describe them;
3. show how to manage the lifecycle of a data contract leveraging Confluent Platform's services."
"In the realm of stateful stream processing, Apache Flink has emerged as a powerful and versatile platform. However, the conventional SQL-based approach often limits the full potential of Flink applications.
We will delve into the benefits of adopting a code-first approach, which provides developers with greater control over application logic, facilitates complex transformations, and enables more efficient handling of state and time. We will also discuss how the code-first approach can lead to more maintainable and testable code, ultimately improving the overall quality of your Flink applications.
Whether you're a seasoned Flink developer or just starting your journey, this talk will provide valuable insights into how a code-first approach can revolutionize your stream processing applications."
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
"Change Data Capture (CDC) has become a commodity in data engineering, much in part due to the ever-rising success of Debezium [1]. But is that all there is? In this lightning talk, we’ll outline the current state of the CDC ecosystem, and understand why adopting a Debezium alternative is still a hard sell. If you’ve ever wondered what else is out there, but can’t keep up with the sprawling of new tools in the ecosystem; we’ll wrap it up for you!
[1] https://debezium.io/"
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
"Separation of compute and storage has become the de-facto standard in the data industry for batch processing.
The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world.
In this talk, we'll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks.
Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka's architecture from the ground up, but the benefits are worth it.
This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero."
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
"Regular performance testing is one of the pillars of Kafka Streams’ reliability and efficiency. Beyond ensuring dependable releases, regular performance testing supports engineers in new feature development with the ability to easily test the performance impact of their features, compare different approaches, etc.
In this session, Alex and John share their experience from developing, using, and maintaining a performance testing framework for Kafka Streams that has prevented multiple performance regressions over the last 5 years. They cover guiding principles and architecture, how to ensure statistical significance and stability of results, and how to automate regression detection for actionable notifications.
This talk sheds light on how Apache Kafka is able to foster a vibrant open-source community while maintaining a high performance bar across many years and releases. It also empowers performance-minded engineers to avoid common pitfalls and bring high-quality performance testing to their own systems."
How to Build an Event-based Control Center for the Electrical GridHostedbyConfluent
"The energy transition has brought with it a wave of new challenges for the high-voltage grid. One of the central problems is the volatility of renewable energies. A future-proof control center must have enough flexibility to meet the demands of the energy market in real time.
Existing industrial systems cannot meet the requirements of modern network operators. Our projects aim to solve these challenges with sophisticated event-based architectures, with Apache Kafka at its core. These architectures need to solve complex requirements like switching high critical grid assets and integrate hundreds of thousands of sensors. We used a couple of event- based patterns like SAGAs, Event Sourcing and more to overcome those challenges. Additionally, we had to integrate the OT systems in this modern tech stack. (IEC, OPCUA)
Some projects are already in shadow operation and are expected to steer first grid parts in 2025."
Keep Your Kafka Cloud Costs in Check with ShowbacksHostedbyConfluent
"Apache Kafka is the cool kid on the block when it comes to distributed streaming platforms. It's gained popularity in cloud environments due to its scalability, fault tolerance, and real-time data processing capabilities. As more organizations move their data processing and analysis workflows to the cloud, Kafka has become an essential part of their architecture.
But, let's face it, running Kafka in the cloud can be a bit of a headache. It requires careful planning and execution to ensure optimal performance, reliability, and cost-effectiveness. In this talk, we're going to introduce you to a little something called ""showbacks"". Showbacks are a way of giving users feedback on the resources they're consuming and the associated costs. It's like telling your friends to keep their hands out of the cookie jar before they eat all the cookies. It also works with developers and partitions.
We'll even give you some real-world examples of how New Relic has successfully implemented showbacks to achieve cost savings and operational efficiency in our Kafka deployments like auto-scaling, Availability Zone awareness, or client tunning. We promise to keep it fun, technical, and full of practical insights and actionable recommendations to help you achieve optimal Kafka performance and cost-effectiveness in the cloud."
When Securing Access to Data is About Life and DeathHostedbyConfluent
"The Norwegian health care sectors success is dependent on the security and availability of enormous amounts of critical data. The state of affairs is that data does not flow. At Norsk helsenett we have recently started using Kafka to secure this flow.
Our critical data is your critical data, and a matter of life and death. This I how we do that:
- Redundancy to us starts with the number 3, creating the infrastructure and using it to manage ""always available"" services and data, working towards zero downtime and data loss.
- Security is paramount, how did we implement mTLS?
- How do we monitor and support usage of data we cannot look at using the benefits of .Net and SignalR in Blazor together with Kafka."
Aggregating Ad Events with Kafka Streams and Interactive Queries at InvidiHostedbyConfluent
"Invidi ad decisioning engine needs semi-realtime feedback on the performance of the ad campaigns it runs.
In the heart of this feedback loop there is a service that aggregates 1B+ daily ad tracking events and serves campaign performance time series to the ad decisioning engine over http. Recently we successfully rewrote it as a pure Kafka Streams application with all data being stored in Kafka and served via Interactive Queries.
The experience was surprisingly not straightforward and we had to trade off some of the simplicity of our processing topology to increase scalability and lower resource consumption.
In this talk we plan to go over the system architecture and share the issues we faced and how we solved them.
Here are some highlights:
- The distribution of our aggregation keys is very skewed, so early repartitioning resulted in poor scalability. To mitigate this issue we used a scatter-gather approach avoiding repartitioning and combining results in IQ. To minimize memory consumption we had to combine the above approach with pre-aggregating events before re-partitioning in a lambda-architecture style.
- Due to multiple stores sharing buffer memory we had to resort to manually deleting entries in our live windowed stores to avoid premature flushing of the aggregates due to cache thrashing.
- We had to implement our own in-memory windowed store to increase IQ performance
We hope that our findings will be helpful to a wider audience.
We also plan to file and fix the issues we discovered in the near future."
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...HostedbyConfluent
"Consumer scaling is a crucial element for many Apache Kafka users. Who doesn’t want to save money by efficiently managing their resources, shutting down unnecessary instances when there is no traffic, quickly scaling up during peak hours and while doing all of that - avoiding annoying and often unnecessary rebalancing.
To achieve all of this you need to understand how consumer assignment works, how nodes are affected by data load and what are common causes of rebalancing. But most importantly - what assignors to choose based on your use case and what metrics to use to measure your data load.
You wonder how we know what are good and bad practices? At Aiven, we've seen firsthand both successful and not-so-great approaches to consumer scaling and rebalancing. The insights we're sharing with you come directly from our experience working on many projects with Apache Kafka.
We’ll discuss metrics that are essential for understanding data load and deciding when to scale. We'll cover a variety of approaches you can take - from commonly used lag exporters, to Knative scalers that are based on concurrent requests and finally insights from our own experience developing a speed lag predictor that goes beyond the basics by calculating the velocity of data load changes. We’ll highlight advantages and disadvantages of each approach and when you should use it.
Next, we'll look at various assignors that are available and guide you on how to choose the most suitable one for your scenario. We'll pay special attention to the challenges faced by stateful applications and the potential pitfalls of frequent scaling, such as overloaded brokers.
Armed with this knowledge, you’ll have what is needed to build scalable systems, minimize downtime and save costs when working with Apache Kafka. Let's make your Kafka experience as smooth and efficient as possible!"
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingHostedbyConfluent
"The Apache Flink community is working on a significant milestone with the planned release of Flink 2.0, marking a major evolution since the inception of Flink 1.0 in 2016. In this insightful talk, we will delve into the key enhancements and transformations slated for the 2.0 version, offering a comprehensive overview for users and developers eager to embrace the next frontier of stream and batch processing.
The talk will commence by exploring some of the core philosophies of Flink, emphasizing the unification of batch and streaming processing. We'll dissect the roadmap's commitment to a seamless blending of batch and streaming applications. Flink 2.0 isn't just about features; it's also a reimagining of Flink's architecture. We'll delve into the disaggregated state management approach, leveraging distributed file systems. This evolution is geared towards better load balancing and improved efficiency in cloud-native architectures. APIs play a pivotal role in Flink's evolution, and Flink 2.0 is no exception. The talk will outline plans to retire deprecated APIs, overhaul the configuration layer, new abstractions and of course other plans that live in the community.
Join us on this exploration of Flink 2.0, where we'll unravel the future of stream and batch processing and showcase how these advancements will shape the landscape of real-time data analytics."
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...HostedbyConfluent
"In the realm of real time streaming application, Kafka is commonly chosen as the datastore. This demands the use of faster and expensive storage at scale. However, the need for high-performance storage volume often translates to steep costs, hindering its wider adoption. To tackle this challenge, we have integrated the tiered storage feature introduced in KIP-405 in our Strimzi-operated Kafka. With tiered storage and our custom remote plugin, we provide a cost-effective solution allowing data to be stored in a cheaper storage, and thus enabling extended retention periods and allow efficiently backfilling of past data. This innovation allows our streaming applications to benefit without the strain of hefty storage expenses.
In this session, we'll familiarize audience on how we use Kafka to build an end-to-end streaming applications while also touching the behind-the-scene aspect of how we were able to reduce the cost to make this affordable for wider adoption. We will share our experience on integrating the tiered storage feature with Strimzi Kafka operator and Strimzi operated Kafka clusters. Moreover, we will share our journey of optimizing performance for our remote storage manager implementation and the valuable insights gained along the way."
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
"Discover how default configurations might impact ingestion times, especially when dealing with large files. We'll explore a real-world scenario with a 20,000,000+ line file, assessing metrics and exploring the bottleneck in the default setup. Understand the intricacies of batch size calculations and how to optimize them based on your unique data characteristics.
Walk away with actionable insights as we showcase a practical example, turning a 7-hour ingestion process into a mere 30 minutes for over 30,000,000 records in a Kafka topic. Uncover metrics, configurations, and best practices to elevate the performance of your Kafka Connect CSV source connectors. Don't miss this opportunity to optimize your data pipeline and ensure smooth, efficient data flow."
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
"In order to meet the current and ever-increasing demand for near-zero RPO/RTO systems, a focus on resiliency is critical. While Kafka offers built-in resiliency features, a perfect blend of client and cluster resiliency is necessary in order to achieve a highly resilient Kafka client application.
At Fidelity Investments, Kafka is used for a variety of event streaming needs such as core brokerage trading platforms, log aggregation, communication platforms, and data migrations. In this lightening talk, we will discuss the governance framework that has enabled producers and consumers to achieve their SLAs during unprecedented failure scenarios. We will highlight how we automated resiliency tests through chaos engineering and tightly integrated observability dashboards for Kafka clients to analyze and optimize client configurations. And finally, we will summarize the chaos test suite and the ""test, test and test"" mantra that are helping Fidelity Investments reach its goal of a future with zero down-time."
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
"There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option!
In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds."
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met.
I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines.
Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
"Almost everyone has heard about large language models, and tens of millions of people have tried out OpenAI ChatGPT and Google Bard. However, the intricate architecture and underlying mathematics driving these remarkable systems remain elusive to many.
LLM's are fascinating - so let's grab a drink and find out how these systems are built and dive deep into their inner workings. In the length of time it to enjoy a round of drinks, you'll understand the inner workings of these models. We'll take our first sip of word vectors, enjoy the refreshing taste of the transformer, and drain a glass understanding how these models are trained on phenomenally large quantities of data.
Large language models for your streaming application - explained with a little maths and a lot of pub stories"
"Monitoring is a fundamental operation when running Kafka and Kafka applications in production. There are numerous metrics available when using Kafka, however the sheer number is overwhelming, making it challenging to know where to start and how to properly utilise them.
This session will introduce you to some of the key metrics that should be monitored and best practices in fine tuning your monitoring. We will delve into which metrics are the indicators for cluster’s availability and performance and are the most helpful when debugging client applications."
Kafka Streams relies on state restoration for maintaining standby tasks as failure recovery mechanism as well as for restoring the state after rebalance scenarios. When you are scaling up or down your application instances, it is necessary to know the current state of the restoration process for each active and standby task in order to prevent a long restoration process as much as possible. During this presentation, you will get an understanding of how KIP-869 provides valuable information about the current active task restoration after a rebalance and KIP-988 opens a window to the continuous process of standby restoration. When you encounter a situation in which you need to choose whether or not to scale up or down your application instances, both KIPs will be an invaluable ally for you.
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
"In this talk, we will dive into the world of Kafka producer configs and explore how to understand and optimize them for better performance. We will cover the different types of configs, their impact on performance, and how to tune them to achieve the best results. Whether you're new to Kafka or a seasoned pro, this session will provide valuable insights and practical tips for improving your Kafka producer performance.
- Introduction to Kafka producer internal and workflow
- Understanding the producer configs like linger.ms, batch.size, buffer.memory and their impact on performance
- Learning about producer configs like max.block.ms, delivery.timeout.ms, request.timeout.ms and retries to make producer more resilient.
- Discuss configs like enable.idempotence, max.in.flight.requests.per.connection and transaction related configs to achieve delivery guarantees.
- Q&A session with attendees to address specific questions and concerns."
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
"Data contracts are one of the hottest topics in the data management community. A data contract is a formal agreement between a data producer and its consumers, aimed at reducing data downtime and improving data quality. Schemas are an important part of data contracts, but they are not the only relevant element.
In this talk, we’ll:
1. see why data contracts are so important but also difficult to implement;
2. identify the characteristics of a well-designed data contract:
discuss the anatomy of a data contract, its main elements and, how to formally describe them;
3. show how to manage the lifecycle of a data contract leveraging Confluent Platform's services."
"In the realm of stateful stream processing, Apache Flink has emerged as a powerful and versatile platform. However, the conventional SQL-based approach often limits the full potential of Flink applications.
We will delve into the benefits of adopting a code-first approach, which provides developers with greater control over application logic, facilitates complex transformations, and enables more efficient handling of state and time. We will also discuss how the code-first approach can lead to more maintainable and testable code, ultimately improving the overall quality of your Flink applications.
Whether you're a seasoned Flink developer or just starting your journey, this talk will provide valuable insights into how a code-first approach can revolutionize your stream processing applications."
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
"Change Data Capture (CDC) has become a commodity in data engineering, much in part due to the ever-rising success of Debezium [1]. But is that all there is? In this lightning talk, we’ll outline the current state of the CDC ecosystem, and understand why adopting a Debezium alternative is still a hard sell. If you’ve ever wondered what else is out there, but can’t keep up with the sprawling of new tools in the ecosystem; we’ll wrap it up for you!
[1] https://debezium.io/"
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
"Separation of compute and storage has become the de-facto standard in the data industry for batch processing.
The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world.
In this talk, we'll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks.
Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka's architecture from the ground up, but the benefits are worth it.
This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero."
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
"Regular performance testing is one of the pillars of Kafka Streams’ reliability and efficiency. Beyond ensuring dependable releases, regular performance testing supports engineers in new feature development with the ability to easily test the performance impact of their features, compare different approaches, etc.
In this session, Alex and John share their experience from developing, using, and maintaining a performance testing framework for Kafka Streams that has prevented multiple performance regressions over the last 5 years. They cover guiding principles and architecture, how to ensure statistical significance and stability of results, and how to automate regression detection for actionable notifications.
This talk sheds light on how Apache Kafka is able to foster a vibrant open-source community while maintaining a high performance bar across many years and releases. It also empowers performance-minded engineers to avoid common pitfalls and bring high-quality performance testing to their own systems."
How to Build an Event-based Control Center for the Electrical GridHostedbyConfluent
"The energy transition has brought with it a wave of new challenges for the high-voltage grid. One of the central problems is the volatility of renewable energies. A future-proof control center must have enough flexibility to meet the demands of the energy market in real time.
Existing industrial systems cannot meet the requirements of modern network operators. Our projects aim to solve these challenges with sophisticated event-based architectures, with Apache Kafka at its core. These architectures need to solve complex requirements like switching high critical grid assets and integrate hundreds of thousands of sensors. We used a couple of event- based patterns like SAGAs, Event Sourcing and more to overcome those challenges. Additionally, we had to integrate the OT systems in this modern tech stack. (IEC, OPCUA)
Some projects are already in shadow operation and are expected to steer first grid parts in 2025."
Keep Your Kafka Cloud Costs in Check with ShowbacksHostedbyConfluent
"Apache Kafka is the cool kid on the block when it comes to distributed streaming platforms. It's gained popularity in cloud environments due to its scalability, fault tolerance, and real-time data processing capabilities. As more organizations move their data processing and analysis workflows to the cloud, Kafka has become an essential part of their architecture.
But, let's face it, running Kafka in the cloud can be a bit of a headache. It requires careful planning and execution to ensure optimal performance, reliability, and cost-effectiveness. In this talk, we're going to introduce you to a little something called ""showbacks"". Showbacks are a way of giving users feedback on the resources they're consuming and the associated costs. It's like telling your friends to keep their hands out of the cookie jar before they eat all the cookies. It also works with developers and partitions.
We'll even give you some real-world examples of how New Relic has successfully implemented showbacks to achieve cost savings and operational efficiency in our Kafka deployments like auto-scaling, Availability Zone awareness, or client tunning. We promise to keep it fun, technical, and full of practical insights and actionable recommendations to help you achieve optimal Kafka performance and cost-effectiveness in the cloud."
When Securing Access to Data is About Life and DeathHostedbyConfluent
"The Norwegian health care sectors success is dependent on the security and availability of enormous amounts of critical data. The state of affairs is that data does not flow. At Norsk helsenett we have recently started using Kafka to secure this flow.
Our critical data is your critical data, and a matter of life and death. This I how we do that:
- Redundancy to us starts with the number 3, creating the infrastructure and using it to manage ""always available"" services and data, working towards zero downtime and data loss.
- Security is paramount, how did we implement mTLS?
- How do we monitor and support usage of data we cannot look at using the benefits of .Net and SignalR in Blazor together with Kafka."
Aggregating Ad Events with Kafka Streams and Interactive Queries at InvidiHostedbyConfluent
"Invidi ad decisioning engine needs semi-realtime feedback on the performance of the ad campaigns it runs.
In the heart of this feedback loop there is a service that aggregates 1B+ daily ad tracking events and serves campaign performance time series to the ad decisioning engine over http. Recently we successfully rewrote it as a pure Kafka Streams application with all data being stored in Kafka and served via Interactive Queries.
The experience was surprisingly not straightforward and we had to trade off some of the simplicity of our processing topology to increase scalability and lower resource consumption.
In this talk we plan to go over the system architecture and share the issues we faced and how we solved them.
Here are some highlights:
- The distribution of our aggregation keys is very skewed, so early repartitioning resulted in poor scalability. To mitigate this issue we used a scatter-gather approach avoiding repartitioning and combining results in IQ. To minimize memory consumption we had to combine the above approach with pre-aggregating events before re-partitioning in a lambda-architecture style.
- Due to multiple stores sharing buffer memory we had to resort to manually deleting entries in our live windowed stores to avoid premature flushing of the aggregates due to cache thrashing.
- We had to implement our own in-memory windowed store to increase IQ performance
We hope that our findings will be helpful to a wider audience.
We also plan to file and fix the issues we discovered in the near future."
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...HostedbyConfluent
"Consumer scaling is a crucial element for many Apache Kafka users. Who doesn’t want to save money by efficiently managing their resources, shutting down unnecessary instances when there is no traffic, quickly scaling up during peak hours and while doing all of that - avoiding annoying and often unnecessary rebalancing.
To achieve all of this you need to understand how consumer assignment works, how nodes are affected by data load and what are common causes of rebalancing. But most importantly - what assignors to choose based on your use case and what metrics to use to measure your data load.
You wonder how we know what are good and bad practices? At Aiven, we've seen firsthand both successful and not-so-great approaches to consumer scaling and rebalancing. The insights we're sharing with you come directly from our experience working on many projects with Apache Kafka.
We’ll discuss metrics that are essential for understanding data load and deciding when to scale. We'll cover a variety of approaches you can take - from commonly used lag exporters, to Knative scalers that are based on concurrent requests and finally insights from our own experience developing a speed lag predictor that goes beyond the basics by calculating the velocity of data load changes. We’ll highlight advantages and disadvantages of each approach and when you should use it.
Next, we'll look at various assignors that are available and guide you on how to choose the most suitable one for your scenario. We'll pay special attention to the challenges faced by stateful applications and the potential pitfalls of frequent scaling, such as overloaded brokers.
Armed with this knowledge, you’ll have what is needed to build scalable systems, minimize downtime and save costs when working with Apache Kafka. Let's make your Kafka experience as smooth and efficient as possible!"
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingHostedbyConfluent
"The Apache Flink community is working on a significant milestone with the planned release of Flink 2.0, marking a major evolution since the inception of Flink 1.0 in 2016. In this insightful talk, we will delve into the key enhancements and transformations slated for the 2.0 version, offering a comprehensive overview for users and developers eager to embrace the next frontier of stream and batch processing.
The talk will commence by exploring some of the core philosophies of Flink, emphasizing the unification of batch and streaming processing. We'll dissect the roadmap's commitment to a seamless blending of batch and streaming applications. Flink 2.0 isn't just about features; it's also a reimagining of Flink's architecture. We'll delve into the disaggregated state management approach, leveraging distributed file systems. This evolution is geared towards better load balancing and improved efficiency in cloud-native architectures. APIs play a pivotal role in Flink's evolution, and Flink 2.0 is no exception. The talk will outline plans to retire deprecated APIs, overhaul the configuration layer, new abstractions and of course other plans that live in the community.
Join us on this exploration of Flink 2.0, where we'll unravel the future of stream and batch processing and showcase how these advancements will shape the landscape of real-time data analytics."
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...HostedbyConfluent
"In the realm of real time streaming application, Kafka is commonly chosen as the datastore. This demands the use of faster and expensive storage at scale. However, the need for high-performance storage volume often translates to steep costs, hindering its wider adoption. To tackle this challenge, we have integrated the tiered storage feature introduced in KIP-405 in our Strimzi-operated Kafka. With tiered storage and our custom remote plugin, we provide a cost-effective solution allowing data to be stored in a cheaper storage, and thus enabling extended retention periods and allow efficiently backfilling of past data. This innovation allows our streaming applications to benefit without the strain of hefty storage expenses.
In this session, we'll familiarize audience on how we use Kafka to build an end-to-end streaming applications while also touching the behind-the-scene aspect of how we were able to reduce the cost to make this affordable for wider adoption. We will share our experience on integrating the tiered storage feature with Strimzi Kafka operator and Strimzi operated Kafka clusters. Moreover, we will share our journey of optimizing performance for our remote storage manager implementation and the valuable insights gained along the way."
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
2. What are consumer
groups?
Drawbacks of the current
protocol
How is the new protocol
different?
What are assignors?
What are their roles and
properties?
What are the different
types of assignors?
Range Assignor with
Examples
Uniform Assignor with
Examples
How do we know which assignor to
use when?
Any questions?
2
Assignors
Types of Assignors
Assignor Selection Process
Questions
Introduction
Agenda
3. What are consumer
groups?
Drawbacks of the current
protocol
How is the new protocol
different?
What are assignors?
What are their roles and
properties?
How do we know which assignor to
use when?
Any questions?
3
Assignors
Types of Assignors
Assignor Selection Process
Questions
Introduction
Agenda
What are the different
types of assignors?
Range Assignor with
Examples
Uniform Assignor with
Examples
4. What are consumer
groups?
Drawbacks of the current
protocol
How is the new protocol
different?
What are assignors?
What are their roles and
properties?
How do we know which assignor to
use when?
Any questions?
4
Assignors
Types of Assignors
Assignor Selection Process
Questions
Introduction
Agenda
What are the different
types of assignors?
Range Assignor with
Examples
Uniform Assignor with
Examples
5. What are consumer
groups?
Drawbacks of the current
protocol
How is the new protocol
different?
What are assignors?
What are their roles and
properties?
How do we know which assignor to
use when?
Any questions?
5
Assignors
Types of Assignors
Assignor Selection Process
Questions
Introduction
Agenda
What are the different
types of assignors?
Range Assignor with
Examples
Uniform Assignor with
Examples
6. What are consumer
groups?
Drawbacks of the current
protocol
How is the new protocol
different?
What are assignors?
What are their roles and
properties?
How do we know which assignor to
use when?
Any questions?
6
Assignors
Types of Assignors
Assignor Selection Process
Questions
Introduction
Agenda
What are the different
types of assignors?
Range Assignor with
Examples
Uniform Assignor with
Examples
13. Types of subscriptions
Homogenous Subscriptions:
All members of the consumer group are subscribed to the same
set of topics.
M1
M2
M3
Consumer
Group
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
T1
T2
T3
Topics
14. Types of subscriptions
Heterogenous Subscriptions:
Different members are subscribed to different topics.
M1
M2
M3
Consumer
Group
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
T1
T2
T3
Topics
16. Drawbacks of Current Protocol
Thick Client:
Debugging and updates rely heavily on clients
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Global Synchronization Barrier:
A single misbehaving consumer can disrupt the
entire group
18. Improvements in the new protocol: KIP-848
ConsumerGroupHeartbeat API
Integrate partition assignments within heartbeats
streamlining state management.
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Complexity Shift from Client to Server
Easier debugging and update adoption
Incremental & Cooperative Rebalance
Eliminates existing global sync barriers
19. The Current Protocol:
A Layered Protocol
The Next-Gen Protocol:
Server Side Assignors
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
22. Role of an Assignor
The assignment is not set in stone.
Assign partitions to members
when the group coordinator says so
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
23. How does the assignor work in the new protocol?
Assignor
Metadata
Group
Coordinator
Target
Assignment
Input
Output
1. Member Info
2. Current Assignment
3. Topic and Partition
Info
Rebalance Triggers
4. Assignor Changes
1. Member Joins/Leaves
2. Topic Metadata Changes
3. Broker rack changes
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
24. General Properties of Assignors
Balance Even distribution of subscribed topic partitions across all
group members to prevent overloading.
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Stickiness
Minimizes partition movements between members by
retaining as many partitions from the existing
assignment.
Rack Awareness Aligns partition replicas with members in the same
rack to minimize cross-zone traffic and costs.
25. General Properties of Assignors
Balance Even distribution of subscribed topic partitions across all
group members to prevent overloading.
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Rack Awareness Aligns partition replicas with members in the same
rack to minimize cross-zone traffic and costs.
Stickiness
Minimize partition movements between members by
retaining as many partitions from the existing
assignment as possible.
26. General Properties of Assignors
Balance Even distribution of subscribed topic partitions across all
group members to prevent overloading.
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Stickiness
Minimizes partition movements between members by
retaining as many partitions from the existing
assignment.
Rack Awareness Align partition replicas with members in the same
rack to minimize cross-zone traffic and costs.
27. Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Topic Name = Gold
Partition = Coin
Members of
Consumer Group
Analogy for Rack Matching
32. Types of Assignors
32
Range Assignor Uniform Assignor
Assigns contiguous
partition ranges to
consumers.
Ensures ordered
processing within topics
for balanced workload
distribution.
Distributes topic partitions
among group members in
a balanced fashion.
Two implementations:
★ Optimized
★ General
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
34. M0 M1 M2 M3
TOPIC 1
Consumer
Group 1
Range Assignor
Partition Distribution:
• Each subscribed member receives at least one partition
from that topic.
P0 P1 P2
* Except when number of members outnumber partitions.
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
35. Partition Co-Allocation:
• Each member receives the same partition number from every
subscribed topic.
M0 M1 M2 M3
P0 P1 P2
TOPIC 2
P0 P1 P2
TOPIC 1
Consumer
Group 1
Range Assignor
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
36. Per-topic co-allocation for joins:
36
Source: medium.com
Credit: Gavin Fong
Range Assignor
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
37. 37
Map members
per Topic
Calculate quotas Assign sticky
partitions
Assign the leftover
partitions using
ranges
*Rack Awareness is not yet implemented for the Range Assignor
Range Assignor Algorithm
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
38. Topic 1 Topic 2
P2
P0 P1
P2
P1
P0
M1
M2
Metadata Input
Range Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
39. Conditions:
1. Each princess gets an
equal share of the gold.
2. Once split equally, the
extra coins will also be split
uniformly starting from the
eldest ones first..
3. Each princess can only get
one extra gold coin.
Partition = Coin
Members of
Consumer Group
Topic Name = Gold
Map
Members
Calculate
Quotas
Assign Leftover
Partitions
Assign Sticky
Partitions
Range Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
40. Min Quota : Minimum partitions that each member should get
Num of partitions (Coins) = 3
Number of members subscribed (Princesses) = 2
3 / 2 = 1
Each princess gets at least 1 coin
Map
Members
Calculate
Quotas
Assign Leftover
Partitions
Assign Sticky
Partitions
Range Assignor
Example 1 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
41. Number of members with extra partitions
Num of partitions (Coins) = 3
Number of members subscribed (Princesses) = 2
1 extra coin given to the eldest first
Map
Members
Calculate
Quotas
Assign Leftover
Partitions
Assign Sticky
Partitions
Range Assignor
Example 1 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
3 % 2 = 1
42. Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Members Per Topic -
T1 -> M1, M2
T2 -> M1, M2
M1 M2
T1P0 T1P1 T1P2
➢ Partitions are sorted in ascending order
➢ Members are implicitly sorted in the same order per topic
Topic 1
Consumer Group
Map
Members
Assign Leftover
Partitions
Range Assignor
Example 1 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
Quotas
Assign Sticky
Partitions
43. Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Members Per Topic -
T1 -> M1, M2
T2 -> M1, M2
M1 M2
T1P0 T1P1 T1P2
Quotas Ref
MinQuota = 1
Members with extra
Partition = 1
extra
Map
Members
Assign Leftover
Partitions
Range Assignor
Example 1 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
Quotas
Assign Sticky
Partitions
44. Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Members Per Topic -
T1 -> M1, M2
T2 -> M1, M2
M1 M2
T1P0 T1P1 T1P2
Quotas Ref
MinQuota = 1
Members with extra
Partition = 1
Map
Members
Assign Leftover
Partitions
Range Assignor
Example 1 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
Quotas
Assign Sticky
Partitions
45. Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Members Per Topic -
T1 -> M1, M2
T2 -> M1, M2
M1 M2
T1P0 T1P1 T1P2
Quotas Ref
MinQuota = 1
Members with extra
Partition = 1
Topic 2 -> same process as topic 1
extra
Topic Name = Silver
Map
Members
Assign Leftover
Partitions
Range Assignor
Example 1 - Topic 2
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
Quotas
Assign Sticky
Partitions
46. M1 M2
P0 P1 P2
TOPIC 2
P0 P1 P2
TOPIC 1
Consumer
Group 1
Map
Members
Assign Leftover
Partitions
Range Assignor
Example 1 - Final Assignment
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
Quotas
Assign Sticky
Partitions
47. M1 M2
P0 P1 P2
TOPIC 2
P0 P1 P2
TOPIC 1
Consumer
Group 1
Range Assignor
Example 1 - Trigger rebalance by adding a new member
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
M3
48. Topic 1 Topic 2
P2
P0 P1
P2
P1
P0
M1
M2
Metadata Input
M3
NOTE:
Subscriptions of members can be different
Joins won’t be feasible in this case
Range Assignor
Example 2
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
49. Number of members subscribed = 3
Number of partitions = 3
Min Quota = numPartitionsForTopic / number of subscribed members
Extra Partitions = numPartitionsForTopic % number of subscribed members
Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2, M3
Members Per Topic -
T1 -> M1, M2, M3
T2 -> M1, M2
3 / 3 = 1
Range Assignor
Example 2 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Map
Members
Calculate
Quotas
Assign Leftover
Partitions
Assign Sticky
Partitions
3 % 3 = 0
50. Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Members Per Topic -
T1 -> M1, M2, M3
T2 -> M1, M2
Prev Assignment Ref
M1 -> T1P0, T1P1
T2P0, T2P1
M2 -> T1P2, T2P2
T1P0 T1P1 T1P2
M1 M2
Available for
assignment
Range Assignor
Example 2 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Map
Members
Assign Leftover
Partitions
Assign Sticky
Partitions
Calculate
Quotas
Quotas
Min Quota = 1
Members with Extras = 0
51. Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Members Per Topic -
T1 -> M1, M2, M3
T2 -> M1, M2
M3
T1P1 Unassigned
Partitions
Map
Members
Assign Leftover
Partitions
Calculate
Quotas
Assign Sticky
Partitions
Range Assignor
Example 2 - Topic 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
52. Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Members Per Topic -
T1 -> M1, M2, M3
T2 -> M1, M2
Prev Assignment Ref
M1 -> T1P0, T1P1
T2P0, T2P1
M2 -> T1P2, T2P2
T2P0 T2P1 T2P2
M1 M2
Quotas
Min Quota = 1
Members with Extras = 1
extra
Range Assignor
Example 2 - Topic 2
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Map
Members
Assign Leftover
Partitions
Assign Sticky
Partitions
Calculate
Quotas
53. M1 M2
P0 P1 P2
TOPIC 2
P0 P1 P2
TOPIC 1
Consumer
Group 1
M3
1 revocation
1 allocation
Range Assignor
Example 2 - Final Assignment
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
54. Highly unbalanced distribution is possible since every extra
partition always goes to the first few members.
54
M1
M2
T1P0 T1P1 T2P0 T2P1 T3P0 T3P1
T1P2 T2P2 T3P2 extras
M1 has twice as many partitions as M2
Range Assignor
Drawbacks
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
56. The uniform assignor is implemented in two different ways:
● Optimized Assignment Builder
● General Assignment Builder
The right implementation is automatically chosen based on the topic subscriptions.
This is abstracted away from the user.
Uniform
General
Optimized
Homogeneous
Topic Subscriptions
Heterogeneous
Topic Subscriptions
Uniform Assignor
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
57. ❏ Homogenous Subscriptions:
❏ All partitions have the same weight.
❏ Rebalance times can be improved with quotas.
❏ Heterogenous Subscriptions:
❏ More complexity involved to achieve a balanced assignment.
❏ Less frequent.
Why do we need two implementations?
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
58. 58
Calculate
Quotas
Assign Sticky
Partitions
Rack Aware
Round Robin
Assignment*
Unassigned
Partitions
Round Robin
*Rack aware matching is possible only when both member and broker racks are configured
Optimized Uniform Assignment Builder
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
60. Min Quota = total partitions / total number of members
Num Members with Extra Partitions = total partitions % total number of members
Metadata Ref
Topics - T1, T2
Num of Partitions per
topic - 3
Members - M1, M2
Subscriptions -
M1 -> T1, T2
M2 -> T1, T2
(3 * 2) / 2 = 3
(3 * 2) % 3 = 0
Optimized Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
quotas
Assign Sticky
Partitions
Rack aware round
robin assignment*
Unassigned
Round Robin
62. M1
Rack 1
M2
Rack 2
Number of members
in same rack as Partition
P0 = M1 (1)
P1 = M1, M2 (2)
P2 = M2 (1)
T1P0 T1P1
T1P2
T2P0 T2P1
T2P2
Sort partitions in ascending order of number of members in the same rack
Sorting is done to maximize rack matching
Optimized Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
quotas
Assign Sticky
Partitions
Rack aware round
robin assignment*
Unassigned
Round Robin
63. M1 M2
Rack 1 Rack 2
T1P0
T1P1 T1P2
T2P0
T2P1 T2P2
Quota is already met
No other member in the
same rack
Later partition will go to
M2 with no rack matching
Quotas Ref
MinQuota = 3
Life Without Sorting
Optimized Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
quotas
Assign Sticky
Partitions
Rack aware round
robin assignment*
Unassigned
Round Robin
Number of members
in same rack as Partition
P0 = M1 (1)
P1 = M1, M2 (2)
P2 = M2 (1)
64. M1 M2
Rack 1 Rack 2
Number of members
in same rack as Partition
P0 = M1 (1)
P1 = M1, M2 (2)
P2 = M2 (1)
T1P0 T1P1
T1P2
T2P0 T2P1
T2P2
Quotas Ref
MinQuota = 2
Members with extra
Partition = 0
Optimized Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
quotas
Assign Sticky
Partitions
Rack aware round
robin assignment*
Unassigned
Round Robin
65. Consumer
Group 1
M1
M2
T1P0 T1P1
T1P2
T2P0
T2P1
T2P2
Optimized Uniform Assignor
Example 1 - Final Assignment
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Calculate
quotas
Assign Sticky
Partitions
Rack aware round
robin assignment*
Unassigned
Round Robin
66. 66
Assign Sticky
Partitions
Rack Aware
Round Robin
Unassigned
Round Robin Balance
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
General Uniform Assignment Builder
67. Key Differences from prev assignors:
❏ No concept of Quotas
❏ No good way to pre-determine quotas
❏ Tried a greedy algorithm and failed at a few edge cases
❏ Sorting members and topics based on various criteria
❏ Rebalance Loops
❏ Iterate over the assignment until a balance is reached
General Uniform Assignor
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
69. M1 T1 -> P0, P1, P2
M2 T1 -> P3
T2 -> P0
M3 T1 -> P4, P5
T2 -> P1, P2, P3
T3 -> P0, P1, P2, P3
CURRENT ASSIGNMENT
M1
M2
M3
TARGET ASSIGNMENT
If partition is in
the same rack as
member:
Retain
CURRENT PARTITION OWNERS
If partition is in a
different rack:
Track for future
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Rack aware round
robin
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
70. M1 T1 -> P0, P1, P2
M2 T1 -> P3
T2 -> P0
M3 T1 -> P4, P5
T2 -> P1, P2, P3
T3 -> P0, P1, P2, P3
CURRENT ASSIGNMENT
M1
M2
M3
TARGET ASSIGNMENT
If partition is in
the same rack as
member:
Retain
CURRENT PARTITION OWNERS
If partition is in a
different rack:
Track for future
M1
P3
P0
P4
RACK 0
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Rack aware round
robin
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
71. M1 T1 -> P0, P1, P2
M2 T1 -> P3
T2 -> P0
M3 T1 -> P4, P5
T2 -> P1, P2, P3
T3 -> P0, P1, P2, P3
CURRENT ASSIGNMENT
M1
M2
M3
TARGET ASSIGNMENT
If partition is in
the same rack as
member:
Retain
CURRENT PARTITION OWNERS
If partition is in a
different rack:
Track for future
M1
P3
P0
P4
RACK 0
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Rack aware round
robin
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
T1 -> P0
72. M1 T1 -> P0, P1, P2
M2 T1 -> P3
T2 -> P0
M3 T1 -> P4, P5
T2 -> P1, P2, P3
T3 -> P0, P1, P2, P3
CURRENT ASSIGNMENT
M1
M2
M3
TARGET ASSIGNMENT
If partition is in
the same rack as
member:
Retain
CURRENT PARTITION OWNERS
If partition is in a
different rack:
Track for future
M1
P3
P0
P4
RACK 0
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Rack aware round
robin
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
T1 -> P0
T1P1
M1
73. M1 T1 -> P0, P1, P2
M2 T1 -> P3
T2 -> P0
M3 T1 -> P4, P5
T2 -> P1, P2, P3
T3 -> P0, P1, P2, P3
CURRENT ASSIGNMENT
M1
M2
M3
TARGET ASSIGNMENT
If partition is in
the same rack as
member:
Retain
CURRENT PARTITION OWNERS
If partition is in a
different rack:
Track for future
M1
P3
P0
P4
RACK 0
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Rack aware round
robin
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
T1 -> P0
T1P1
M1
74. M1 T1 -> P0, P1, P2
M2 T1 -> P3
T2 -> P0
M3 T1 -> P4, P5
T2 -> P1, P2, P3
T3 -> P0, P1, P2, P3
CURRENT ASSIGNMENT
M1
M2
M3
TARGET ASSIGNMENT
If partition is in
the same rack as
member:
Retain
CURRENT PARTITION OWNERS
If partition is in a
different rack:
Track for future
M1
P3
P0
P4
RACK 0
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Rack aware round
robin
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
T1 -> P0
T1P1
M1
T1P2
M1
75. M1 T1 -> P0, P1, P2
M2 T1 -> P3
T2 -> P0
M3 T1 -> P4, P5
T2 -> P1, P2, P3
T3 -> P0, P1, P2, P3
CURRENT ASSIGNMENT
M1 T1 -> P0
M2 T1 ->
T2 -> P0
M3 T1 -> P5
T2 -> P1, P2
T3 -> P1, P2
TARGET ASSIGNMENT
T1P1 T1P2 T1P3 T1P4 T2P3 T3P0 T3P3
M1 M1 M2 M3 M3 M3 M3
If partition is in
the same rack as
member:
Retain
CURRENT PARTITION OWNERS
If partition is in a
different rack:
Track for future
General Uniform Assignor
Example 1
76. M1
P3
P0
P4
M2
P1
P0
P4
M3
P1 P2
P5
P5
RACK 0 RACK 1 RACK 2
Assign partition if member in
matching rack
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
Rack aware
round robin
77. M1 T1 -> P0, P3, P4
M2 T1 -> P1
T2 -> P0
M3 T1 -> P5, P2
T2 -> P1, P2
T3 -> P1, P2
TARGET ASSIGNMENT
T1P3 T1P2 T3P3 T2P3 T1P4 T3P0 T1P1
If partition is in
the same rack as
member:
Assign
SORTED UNASSIGNED PARTITIONS
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Unassigned
Round Robin
Balance
Assign Sticky
Partitions
Rack aware
round robin
78. M1 T1 -> P0, P3, P4
M2 T1 -> P1
T2 -> P0
M3 T1 -> P5, P2
T2 -> P1, P2
T3 -> P1, P2
TARGET ASSIGNMENT
T3P3 T2P3 T3P0
UNASSIGNED PARTITIONS
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Balance
Assign Sticky
Partitions
Unassigned
Round Robin
Rack aware round
robin
SORT SUBSCRIBED
MEMBERS IN ASCENDING
ORDER
BY ASSIGNMENT SIZE AND ASSIGN
FOR BALANCE
79. P3
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Balance
Assign Sticky
Partitions
Unassigned
Round Robin
Rack aware round
robin
T3P3 T2P3 T3P0
TARGET ASSIGNMENT
UNASSIGNED PARTITIONS
Subscriptions -
T1 -> M1, M2, M3
T2 -> M2, M3
T3 -> M3
M3
M1 T1 -> P0, P3, P4
M2 T1 -> P1
T2 -> P0
M3 T1 -> P5, P2
T2 -> P1, P2
T3 -> P1, P2
80. M1 T1 -> P0, P3, P4
M2 T1 -> P1
T2 -> P0
M3 T1 -> P5, P2
T2 -> P1, P2
T3 -> P1, P2, P3
P3
SORT SUBSCRIBED MEMBERS IN
ASCENDING ORDER BY
ASSIGNMENT SIZE
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Balance
Assign Sticky
Partitions
Unassigned
Round Robin
Rack aware round
robin
T3P3 T2P3 T3P0
UNASSIGNED PARTITIONS
Subscriptions -
T1 -> M1, M2, M3
T2 -> M2, M3
T3 -> M3
M3
M2 TARGET ASSIGNMENT
81. 3
3
8
TARGET ASSIGNMENT
M1 T1 -> P0, P3, P4
M2 T1 -> P1
T2 -> P0, P3
M3 T1 -> P5, P2
T2 -> P1, P2
T3 -> P1, P2, P3, P0
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Assign Sticky
Partitions
Rack aware round
robin
Balance
Unassigned
Round Robin
82. Reassignments are made based on the comparative load of members.
○ Try to maintain rack matching and stickiness
Difference between two member’s assignment sizes is one.
OR
No member can receive additional partitions without disrupting the balance.
What is considered a balanced assignment?
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Assign Sticky
Partitions
Rack aware round
robin
Balance
Unassigned
Round Robin
Balancing the Assignment
83. 3
3
8
TARGET ASSIGNMENT
M1 T1 -> P0, P3, P4
M2 T1 -> P1
T2 -> P0, P3
M3 T1 -> P5, P2
T2 -> P1, P2
T3 -> P1, P2, P3, P0
T1P2 can be moved from M3 to member M1
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Assign Sticky
Partitions
Rack aware round
robin
Balance
Unassigned
Round Robin
84. 4
3
7
TARGET ASSIGNMENT
M1 T1 -> P0, P3, P4, P2
M2 T1 -> P1
T2 -> P0, P3
M3 T1 -> P5
T2 -> P1, P2
T3 -> P1, P2, P3, P0
T1P2 can be moved from M3 to member M1
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Assign Sticky
Partitions
Rack aware round
robin
Balance
Unassigned
Round Robin
85. 5
4
5
M1 T1 -> P0, P1, P2, P3, P4
M2 T2 -> P0, P1, P2, P3
M3 T1 -> P5
T3 -> P0, P1, P2, P3
General Uniform Assignor
Example 1
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Assign Sticky
Partitions
Rack aware round
robin
Balance
Unassigned
Round Robin
Final Assignment
88. When should you use the Range Assignor?
• To balance the load of a single topic evenly among members
• Perform joins on co-partitioned streams
• Particularly beneficial for applications where the processing
sequence of partitions is critical
When should you use the Uniform Assignor?
• To uniformly distribute the overall load among ALL members
• No use cases for joins or partition ordering
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Choosing the right assignor
90. Pluggable Assignor:
Clients specify preferred server-side assignor in heartbeat requests.
Multiple Assignors Specified:
The group coordinator uses the most common one.
No Assignor Specified:
The group coordinator will default to the first one in
group.consumer.assignors.
Error for Unsupported Assignors:
Heartbeats with invalid assignors receive an UNSUPPORTED_ASSIGNOR
error.
Introduction > Assignors > Types of Assignors > Assignor Selection Process > Questions
Assignor Selection: Server Side Mode
91. What are consumer
groups?
Drawbacks of the current
protocol
How is the new protocol
different?
What are assignors?
What are their roles and
properties?
What are the different
types of assignors?
Examples of Range
Assignors
Examples of Uniform
Assignors
How do we know which assignor to
use when?
Any questions?
91
Assignors
Types of Assignors
Assignor Selection Process
Questions
Introduction
Agenda