Netflix changed its data pipeline architecture recently to use Kafka as the gateway for data collection for all applications which processes hundreds of billions of messages daily. This session will discuss the motivation of moving to Kafka, the architecture and improvements we have added to make Kafka work in AWS. We will also share the lessons learned and future plans.
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...confluent
1. Running a single Kafka cluster is not sufficient for high availability and disaster recovery. Having multiple clusters across different data centers is necessary to handle failures like an entire data center being demolished.
2. There are different approaches to setting up Kafka clusters across multiple data centers, including a "stretch cluster" with at least one broker and Zookeeper in each data center, or running two independent clusters and replicating data between them asynchronously or actively.
3. With multiple data center replication, there are tradeoffs around latency, throughput, infrastructure costs, and the difficulty of handling consumer offsets during failover. The optimal solution depends on an organization's specific availability, data consistency, and disaster recovery requirements.
This document discusses reliability guarantees in Apache Kafka. It explains that Kafka provides reliability through replication of data across multiple brokers. It describes concepts like in-sync replicas, unclean leader election, and how to configure replication factor and minimum in-sync replicas. The document also covers best practices for producers like setting acks to all, and for consumers like committing offsets manually and handling rebalances. It emphasizes the importance of monitoring for errors, lag, and data reconciliation to ensure reliability.
The document discusses reliability guarantees in Apache Kafka. It explains that Kafka provides reliability through replication of data across multiple brokers. As long as the minimum number of in-sync replicas (ISRs) is maintained, messages will not be lost even if individual brokers fail. It also discusses best practices for producers and consumers to ensure data is not lost such as using acks=all for producers, disabling unclean leader election, committing offsets only after processing is complete, and monitoring for errors, lag and reconciliation of message counts.
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.
Speaker: Damien Gasparina, Engineer, Confluent
Here's how to fail at Apache Kafka brilliantly!
https://www.meetup.com/Paris-Data-Engineers/events/260694777/
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafkaconfluent
The document introduces Apache Kafka's new exactly once semantics that provide exactly once, in-order delivery of records per partition and atomic writes across multiple partitions. It discusses the existing at-least once delivery semantics and issues around duplicates. The new approach uses idempotent producers, sequence numbers, and transactions to ensure exactly once delivery and coordination across partitions. It also provides up to 20% higher throughput for producers and 50% for consumers through more efficient data formatting and batching. The new features are available in Apache Kafka 0.11 released in June 2017.
Netflix changed its data pipeline architecture recently to use Kafka as the gateway for data collection for all applications which processes hundreds of billions of messages daily. This session will discuss the motivation of moving to Kafka, the architecture and improvements we have added to make Kafka work in AWS. We will also share the lessons learned and future plans.
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...confluent
1. Running a single Kafka cluster is not sufficient for high availability and disaster recovery. Having multiple clusters across different data centers is necessary to handle failures like an entire data center being demolished.
2. There are different approaches to setting up Kafka clusters across multiple data centers, including a "stretch cluster" with at least one broker and Zookeeper in each data center, or running two independent clusters and replicating data between them asynchronously or actively.
3. With multiple data center replication, there are tradeoffs around latency, throughput, infrastructure costs, and the difficulty of handling consumer offsets during failover. The optimal solution depends on an organization's specific availability, data consistency, and disaster recovery requirements.
This document discusses reliability guarantees in Apache Kafka. It explains that Kafka provides reliability through replication of data across multiple brokers. It describes concepts like in-sync replicas, unclean leader election, and how to configure replication factor and minimum in-sync replicas. The document also covers best practices for producers like setting acks to all, and for consumers like committing offsets manually and handling rebalances. It emphasizes the importance of monitoring for errors, lag, and data reconciliation to ensure reliability.
The document discusses reliability guarantees in Apache Kafka. It explains that Kafka provides reliability through replication of data across multiple brokers. As long as the minimum number of in-sync replicas (ISRs) is maintained, messages will not be lost even if individual brokers fail. It also discusses best practices for producers and consumers to ensure data is not lost such as using acks=all for producers, disabling unclean leader election, committing offsets only after processing is complete, and monitoring for errors, lag and reconciliation of message counts.
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.
Speaker: Damien Gasparina, Engineer, Confluent
Here's how to fail at Apache Kafka brilliantly!
https://www.meetup.com/Paris-Data-Engineers/events/260694777/
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafkaconfluent
The document introduces Apache Kafka's new exactly once semantics that provide exactly once, in-order delivery of records per partition and atomic writes across multiple partitions. It discusses the existing at-least once delivery semantics and issues around duplicates. The new approach uses idempotent producers, sequence numbers, and transactions to ensure exactly once delivery and coordination across partitions. It also provides up to 20% higher throughput for producers and 50% for consumers through more efficient data formatting and batching. The new features are available in Apache Kafka 0.11 released in June 2017.
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020HostedbyConfluent
What does a Kafka administrator need to do if they have a user who demands that message delivery be guaranteed, fast, and low cost? In this talk we walk through the architecture we created to deliver for such users. Learn around the alternatives we considered and the pros and cons around what we came up with.
In this talk, we’ll be forced to dive into broker restart and failure scenarios and things we need to do to prevent leader elections from slowing down incoming requests. We’ll need to take care of the consumers as well to ensure that they don’t process the same request twice. We also plan to describe our architecture by showing a demo of simulated requests being produced into Kafka clusters and consumers processing them in lieu of us aggressively causing failures on the Kafka clusters.
We hope the audience walks away with a deeper understanding of what it takes to build robust Kafka clients and how to tune them to accomplish stringent delivery guarantees.
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
- Apache Kafka is a streaming platform that provides high availability, durability, and the ability to retain database-like data through features like log compaction.
- It ensures reliability through configurable replication, automatic failover, and an in-sync replica process.
- Log compaction allows Kafka to retain only the latest value for each message key in the log, useful for building indexes and retaining only updated records.
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019confluent
Cloud migration: it's practically a rite of passage for anyone who's built infrastructure on bare metal. When we migrated our 5-year-old Kafka deployment from the datacenter to GCP, we were faced with the task of making our highly mutable server infrastructure more cloud-friendly. This led to a surprising decision: we chose to run our Kafka cluster on Kubernetes. I'll share war stories from our Kafka migration journey, explain why we chose Kubernetes over arguably simpler options like GCP VMs, and present the lessons we learned while making our way toward a stable and self-healing Kubernetes deployment. I'll also go through some improvements in the more recent Kafka releases that make upgrades crucial for any Kafka deployment on immutable and ephemeral infrastructure. You'll learn what happens when you try to run one complex distributed system on top of another, and come away with some handy tricks for automating cloud cluster management, plus some migration pitfalls to avoid. And if you're not sure whether running Kafka on Kubernetes is right for you, our experiences should provide some extra data points that you can use as you make that decision.
Introducing Exactly Once Semantics To Apache KafkaApurva Mehta
Here are slides from my talk on introducing exactly once semantics to Apache Kafka. The talk was given at the Kafka Summit NYC, 8 May 2017.
The slides dive into the design of transactions in Apache Kafka.
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? confluent
This document discusses Kafka deployment strategies at Goldman Sachs. It describes how Kafka is deployed across multiple datacenters for high availability. It also discusses various failure scenarios like host, network, or entire datacenter failures and how the deployment is designed to minimize impact. The document also summarizes the monitoring, alerting, and management tooling built around Kafka clusters to provide health checks, metrics collection, and topic management capabilities.
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
In our payments platform at Goldman Sachs Transaction Banking, Apache Kafka plays a critical role as the messaging bus in our micro-services architecture. Being a part of the financial service industry we need to ensure high-availability of our platform and quick response time during failures.
In this talk we will explore how we monitor and alert on the health of our Kafka clusters using our heartbeat application and clients using DataDog dashboards. We will see how we consolidate JMX metrics such as error-rates, connection-rates, latencies and consumer lag from all producers and consumers using JMX agent sidecar to provide a live view of the health of our entire infrastructure. We will also discuss our culture of game days where we regularly test the resiliency of all the clients in our infrastructure by simulating various failure scenarios to improve the overall availability of our infrastructure.
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
LinkedIn’s Kafka deployment is nearing 1300 brokers that move close to 1.3 trillion messages a day. While operating Kafka smoothly even at this scale is testament to both Kafka’s scalability and the operational expertise of LinkedIn SREs we occasionally run into some very interesting bugs at this scale. In this talk I will dive into a production issue that we recently encountered as an example of how even a subtle bug can suddenly manifest at scale and cause a near meltdown of the cluster. We will go over how we detected and responded to the situation, investigated it after the fact and summarize some lessons learned and best-practices from this incident.
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...confluent
- Saxo Bank is migrating to a data mesh architecture using Apache Kafka and Avro schemas to distribute data across domains and enable data sharing.
- They are working to automate the onboarding process for new data domains and producers/consumers to simplify development and ensure governance.
- Some challenges include limited support for .NET in Confluent platforms, compatibility issues between code generators and the schema registry, and mapping complex database schemas to Avro schemas.
This document discusses building a fault-tolerant Kafka cluster on AWS to handle 2.5 billion requests per day. It covers choosing AWS instance types and broker counts, spreading brokers across availability zones, configuring replication and partitioning, automating fault tolerance, adding metrics and alerts, and testing the cluster's resilience. Key decisions include broker placement, topic partitioning, Zookeeper ensemble sizing, and automation to dynamically reassign partitions and change configurations in response to failures or added capacity.
1) Apache Kafka is a distributed streaming platform that can be used for publish-subscribe messaging and storing and processing streams of data. However, there are many potential anti-patterns to be aware of when using Kafka.
2) Some common anti-patterns include not properly configuring data durability, ignoring error handling and exceptions, failing to use Kafka's built-in retries and idempotence features, and not embracing Kafka's at least once processing semantics.
3) It is also important to properly configure Kafka for production use by tuning OS settings, reading documentation on best practices, implementing monitoring, and addressing topics and partitioning design.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...HostedbyConfluent
Kubernetes became the de-facto standard for running cloud-native applications. And many users turn to it also to run stateful applications such as Apache Kafka. You can use different tools to deploy Kafka on Kubernetes - write your own YAML files, use Helm Charts, or go for one of the available operators. But there is one thing all of these have in common. You still need very good knowledge of Kubernetes to make sure your Kafka cluster works properly in all situations. This talk will cover different Kubernetes features such as resources, affinity, tolerations, pod disruption budgets, topology spread constraints and more. And it will explain why they are important for Apache Kafka and how to use them. If you are interested in running Kafka on Kubernetes and do not know all of these, this is a talk for you.
This document provides guidance on upgrading Kafka. It emphasizes the importance of upgrading early and often to the latest bugfix release in order to address security vulnerabilities and other bugs. It recommends using automated rolling upgrades to upgrade brokers one by one with zero downtime. It also outlines best practices like backing up configurations, reading release notes, and ensuring protocol compatibility when upgrading.
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...confluent
Over the years at Yelp, we have relied on Kafka to build many complex applications and stream processing data-pipelines that solve a multitude of use cases, including powering our product experimentation workflow, search indexing, asynchronous task processing and more. Today, Kafka is at the core of our infrastructure. These applications use different versions of Kafka clients and different programming languages.To fulfill the requirements of these diverse use cases, we run several specialized Kafka clusters for high-availability, consistency, exactly-once and infinite retention. We endeavor to keep our clusters up-to-date with newer Kafka versions that bring with them several critical bug fixes and exciting features like dynamic broker configuration, exactly-once semantics, kafka offset management and improved tooling. Our journey with Kafka started with version 0.8.2.0. Upgrading Kafka while ensuring client compatibility, zero-downtime, negligible performance degradation across our ever-growing multi-regional cluster deployment exposed us to a plethora of unique challenges. This session will focus on the challenges we encountered and how we evolved our infrastructure tooling and upgrade strategy to overcome them. I will be talking about: -- How we rolled out new features such as kafka offset storage, message timestamp, reassignment auto-throttling, etc. -- Core technical issues discovered during upgrades such as failure of log cleaners due to large offsets while upgrading. -- The in-house test-suite that we built in order to: validate new kafka versions against our existing tooling and client-libraries, exercise the upgrade and rollback process and benchmark performance. -- The automation we built for safe and fast rolling upgrades and broker configuration deployment.
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
Slack processes over 1.2 trillion messages written and 3.4 trillion messages read daily across its real-time messaging platform, generating around 1 petabyte of streaming data. With thousands of engineers and tens of thousands of producer processes, Slack relies on Apache Kafka as the commit log for its distributed database to handle its massive scale of real-time messaging.
This document provides an introduction to Apache Kafka. It describes Kafka as a distributed messaging system with features like durability, scalability, publish-subscribe capabilities, and ordering. It discusses key Kafka concepts like producers, consumers, topics, partitions and brokers. It also summarizes use cases for Kafka and how to implement producers and consumers in code. Finally, it briefly outlines related tools like Kafka Connect and Kafka Streams that build upon the Kafka platform.
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
This document discusses disaster recovery strategies for Apache Kafka clusters running across multiple data centers. It outlines several failure scenarios like an entire data center being demolished and recommends solutions like running a single Kafka cluster across multiple near-by data centers. It then describes a "stretch cluster" approach using 3 data centers with replication between them to provide high availability. The document also discusses active-active replication between two data center clusters and challenges around consumer offsets not being identical across data centers during a failover. It recommends approaches like tracking timestamps and failing over consumers based on time.
No data loss pipeline with apache kafkaJiangjie Qin
The document discusses how to configure Apache Kafka to prevent data loss and message reordering in a data pipeline. It recommends settings like enabling block on buffer full, using acks=all for synchronous message acknowledgment, limiting in-flight requests, and committing offsets only after messages are processed. It also suggests replicating topics across at least 3 brokers and using a minimum in-sync replica factor of 2. Mirror makers can further ensure no data loss or reordering by consuming from one cluster and producing to another in order while committing offsets. Custom consumer listeners and message handlers allow for mirroring optimizations.
Streaming in Practice - Putting Apache Kafka in Productionconfluent
This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production
This document provides an introduction to Akka Streams, which implements the Reactive Streams specification. It discusses the limitations of traditional concurrency models and Actor models in dealing with modern challenges like high availability and large data volumes. Reactive Streams aims to provide a minimalistic asynchronous model with back pressure to prevent resource exhaustion. Akka Streams builds on the Akka framework and Actor model to provide a streaming data flow library that uses Reactive Streams interfaces. It allows defining processing pipelines with sources, flows, and sinks and includes features like graph DSL, back pressure, and integration with other Reactive Streams implementations.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020HostedbyConfluent
What does a Kafka administrator need to do if they have a user who demands that message delivery be guaranteed, fast, and low cost? In this talk we walk through the architecture we created to deliver for such users. Learn around the alternatives we considered and the pros and cons around what we came up with.
In this talk, we’ll be forced to dive into broker restart and failure scenarios and things we need to do to prevent leader elections from slowing down incoming requests. We’ll need to take care of the consumers as well to ensure that they don’t process the same request twice. We also plan to describe our architecture by showing a demo of simulated requests being produced into Kafka clusters and consumers processing them in lieu of us aggressively causing failures on the Kafka clusters.
We hope the audience walks away with a deeper understanding of what it takes to build robust Kafka clients and how to tune them to accomplish stringent delivery guarantees.
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
- Apache Kafka is a streaming platform that provides high availability, durability, and the ability to retain database-like data through features like log compaction.
- It ensures reliability through configurable replication, automatic failover, and an in-sync replica process.
- Log compaction allows Kafka to retain only the latest value for each message key in the log, useful for building indexes and retaining only updated records.
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019confluent
Cloud migration: it's practically a rite of passage for anyone who's built infrastructure on bare metal. When we migrated our 5-year-old Kafka deployment from the datacenter to GCP, we were faced with the task of making our highly mutable server infrastructure more cloud-friendly. This led to a surprising decision: we chose to run our Kafka cluster on Kubernetes. I'll share war stories from our Kafka migration journey, explain why we chose Kubernetes over arguably simpler options like GCP VMs, and present the lessons we learned while making our way toward a stable and self-healing Kubernetes deployment. I'll also go through some improvements in the more recent Kafka releases that make upgrades crucial for any Kafka deployment on immutable and ephemeral infrastructure. You'll learn what happens when you try to run one complex distributed system on top of another, and come away with some handy tricks for automating cloud cluster management, plus some migration pitfalls to avoid. And if you're not sure whether running Kafka on Kubernetes is right for you, our experiences should provide some extra data points that you can use as you make that decision.
Introducing Exactly Once Semantics To Apache KafkaApurva Mehta
Here are slides from my talk on introducing exactly once semantics to Apache Kafka. The talk was given at the Kafka Summit NYC, 8 May 2017.
The slides dive into the design of transactions in Apache Kafka.
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? confluent
This document discusses Kafka deployment strategies at Goldman Sachs. It describes how Kafka is deployed across multiple datacenters for high availability. It also discusses various failure scenarios like host, network, or entire datacenter failures and how the deployment is designed to minimize impact. The document also summarizes the monitoring, alerting, and management tooling built around Kafka clusters to provide health checks, metrics collection, and topic management capabilities.
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
In our payments platform at Goldman Sachs Transaction Banking, Apache Kafka plays a critical role as the messaging bus in our micro-services architecture. Being a part of the financial service industry we need to ensure high-availability of our platform and quick response time during failures.
In this talk we will explore how we monitor and alert on the health of our Kafka clusters using our heartbeat application and clients using DataDog dashboards. We will see how we consolidate JMX metrics such as error-rates, connection-rates, latencies and consumer lag from all producers and consumers using JMX agent sidecar to provide a live view of the health of our entire infrastructure. We will also discuss our culture of game days where we regularly test the resiliency of all the clients in our infrastructure by simulating various failure scenarios to improve the overall availability of our infrastructure.
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
LinkedIn’s Kafka deployment is nearing 1300 brokers that move close to 1.3 trillion messages a day. While operating Kafka smoothly even at this scale is testament to both Kafka’s scalability and the operational expertise of LinkedIn SREs we occasionally run into some very interesting bugs at this scale. In this talk I will dive into a production issue that we recently encountered as an example of how even a subtle bug can suddenly manifest at scale and cause a near meltdown of the cluster. We will go over how we detected and responded to the situation, investigated it after the fact and summarize some lessons learned and best-practices from this incident.
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...confluent
- Saxo Bank is migrating to a data mesh architecture using Apache Kafka and Avro schemas to distribute data across domains and enable data sharing.
- They are working to automate the onboarding process for new data domains and producers/consumers to simplify development and ensure governance.
- Some challenges include limited support for .NET in Confluent platforms, compatibility issues between code generators and the schema registry, and mapping complex database schemas to Avro schemas.
This document discusses building a fault-tolerant Kafka cluster on AWS to handle 2.5 billion requests per day. It covers choosing AWS instance types and broker counts, spreading brokers across availability zones, configuring replication and partitioning, automating fault tolerance, adding metrics and alerts, and testing the cluster's resilience. Key decisions include broker placement, topic partitioning, Zookeeper ensemble sizing, and automation to dynamically reassign partitions and change configurations in response to failures or added capacity.
1) Apache Kafka is a distributed streaming platform that can be used for publish-subscribe messaging and storing and processing streams of data. However, there are many potential anti-patterns to be aware of when using Kafka.
2) Some common anti-patterns include not properly configuring data durability, ignoring error handling and exceptions, failing to use Kafka's built-in retries and idempotence features, and not embracing Kafka's at least once processing semantics.
3) It is also important to properly configure Kafka for production use by tuning OS settings, reading documentation on best practices, implementing monitoring, and addressing topics and partitioning design.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...HostedbyConfluent
Kubernetes became the de-facto standard for running cloud-native applications. And many users turn to it also to run stateful applications such as Apache Kafka. You can use different tools to deploy Kafka on Kubernetes - write your own YAML files, use Helm Charts, or go for one of the available operators. But there is one thing all of these have in common. You still need very good knowledge of Kubernetes to make sure your Kafka cluster works properly in all situations. This talk will cover different Kubernetes features such as resources, affinity, tolerations, pod disruption budgets, topology spread constraints and more. And it will explain why they are important for Apache Kafka and how to use them. If you are interested in running Kafka on Kubernetes and do not know all of these, this is a talk for you.
This document provides guidance on upgrading Kafka. It emphasizes the importance of upgrading early and often to the latest bugfix release in order to address security vulnerabilities and other bugs. It recommends using automated rolling upgrades to upgrade brokers one by one with zero downtime. It also outlines best practices like backing up configurations, reading release notes, and ensuring protocol compatibility when upgrading.
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...confluent
Over the years at Yelp, we have relied on Kafka to build many complex applications and stream processing data-pipelines that solve a multitude of use cases, including powering our product experimentation workflow, search indexing, asynchronous task processing and more. Today, Kafka is at the core of our infrastructure. These applications use different versions of Kafka clients and different programming languages.To fulfill the requirements of these diverse use cases, we run several specialized Kafka clusters for high-availability, consistency, exactly-once and infinite retention. We endeavor to keep our clusters up-to-date with newer Kafka versions that bring with them several critical bug fixes and exciting features like dynamic broker configuration, exactly-once semantics, kafka offset management and improved tooling. Our journey with Kafka started with version 0.8.2.0. Upgrading Kafka while ensuring client compatibility, zero-downtime, negligible performance degradation across our ever-growing multi-regional cluster deployment exposed us to a plethora of unique challenges. This session will focus on the challenges we encountered and how we evolved our infrastructure tooling and upgrade strategy to overcome them. I will be talking about: -- How we rolled out new features such as kafka offset storage, message timestamp, reassignment auto-throttling, etc. -- Core technical issues discovered during upgrades such as failure of log cleaners due to large offsets while upgrading. -- The in-house test-suite that we built in order to: validate new kafka versions against our existing tooling and client-libraries, exercise the upgrade and rollback process and benchmark performance. -- The automation we built for safe and fast rolling upgrades and broker configuration deployment.
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
Slack processes over 1.2 trillion messages written and 3.4 trillion messages read daily across its real-time messaging platform, generating around 1 petabyte of streaming data. With thousands of engineers and tens of thousands of producer processes, Slack relies on Apache Kafka as the commit log for its distributed database to handle its massive scale of real-time messaging.
This document provides an introduction to Apache Kafka. It describes Kafka as a distributed messaging system with features like durability, scalability, publish-subscribe capabilities, and ordering. It discusses key Kafka concepts like producers, consumers, topics, partitions and brokers. It also summarizes use cases for Kafka and how to implement producers and consumers in code. Finally, it briefly outlines related tools like Kafka Connect and Kafka Streams that build upon the Kafka platform.
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
This document discusses disaster recovery strategies for Apache Kafka clusters running across multiple data centers. It outlines several failure scenarios like an entire data center being demolished and recommends solutions like running a single Kafka cluster across multiple near-by data centers. It then describes a "stretch cluster" approach using 3 data centers with replication between them to provide high availability. The document also discusses active-active replication between two data center clusters and challenges around consumer offsets not being identical across data centers during a failover. It recommends approaches like tracking timestamps and failing over consumers based on time.
No data loss pipeline with apache kafkaJiangjie Qin
The document discusses how to configure Apache Kafka to prevent data loss and message reordering in a data pipeline. It recommends settings like enabling block on buffer full, using acks=all for synchronous message acknowledgment, limiting in-flight requests, and committing offsets only after messages are processed. It also suggests replicating topics across at least 3 brokers and using a minimum in-sync replica factor of 2. Mirror makers can further ensure no data loss or reordering by consuming from one cluster and producing to another in order while committing offsets. Custom consumer listeners and message handlers allow for mirroring optimizations.
Streaming in Practice - Putting Apache Kafka in Productionconfluent
This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production
This document provides an introduction to Akka Streams, which implements the Reactive Streams specification. It discusses the limitations of traditional concurrency models and Actor models in dealing with modern challenges like high availability and large data volumes. Reactive Streams aims to provide a minimalistic asynchronous model with back pressure to prevent resource exhaustion. Akka Streams builds on the Akka framework and Actor model to provide a streaming data flow library that uses Reactive Streams interfaces. It allows defining processing pipelines with sources, flows, and sinks and includes features like graph DSL, back pressure, and integration with other Reactive Streams implementations.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
Microservices interaction at scale using Apache KafkaIvan Ursul
This document discusses using Apache Kafka to enable communication between microservices at scale. It begins by describing how monolithic applications can be broken into independent microservices. Next, it covers common communication patterns for microservices, including shared databases, separate databases per service, and asynchronous messaging. The bulk of the document then focuses on Apache Kafka, describing it as a distributed publish-subscribe messaging system that is fast, scalable and durable. It covers how Kafka works, including its use of a commit log distributed across brokers, and common usage patterns such as event sourcing, change data capture, and Kafka Connect. Overall, the document promotes using Kafka as the backbone for event-driven communication between microservices.
Speaker: Jun Rao, VP of Apache Kafka and Co-founder of Confluent
The controller is the brain of Apache Kafka®. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
In this talk, Jun will outline the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker. Jun will then describe recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Jun Rao is the co-founder of Confluent, a company that provides a streaming data platform on top of Apache Kafka. Previously, Jun was a senior staff engineer at LinkedIn, where he led the development of Kafka, and a researcher at IBM's Almaden research datacenter, where he conducted research on database and distributed systems. Jun is the PMC chair of Apache Kafka and a committer of Cassandra. He writes at https://cnfl.io/blog-jun-rao.
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity. In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues – in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
Speakers: Gwen Shapira, Xavier Leaute (Confluence)
Gwen is a software engineer at Confluent working on core Apache Kafka. She has 15 years of experience working with code and customers to build scalable data architectures. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Xavier Leaute is One of the first engineers to Confluent team, Xavier is responsible for analytics infrastructure, including real-time analytics in KafkaStreams. He was previously a quantitative researcher at BlackRock. Prior to that, he held various research and analytics roles at Barclays Global Investors and MSCI.
Migration Effort in the Cloud - The Case of Cloud PlatformsStefan Kolb
Get the book "On the Portability of Applications in Platform as a Service" at https://www.amazon.de/dp/3863096312
Presentation from IEEE CLOUD 2015. Full paper at http://bit.ly/paasmigration
Kubernetes Failure Stories - KubeCon Europe BarcelonaHenning Jacobs
Talk given on 2019-05-21 at KubeCon Barcelona: https://kccnceu19.sched.com/event/MPcM/kubernetes-failure-stories-and-how-to-crash-your-clusters-henning-jacobs-zalando-se
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience's unknown unknowns about running Kubernetes in production.
Strategies and techniques to optimize Kafka brokers and producers to minimize data loss under huge traffic volume, limited configuration options, less ideal and constant changing environment and balance against cost.
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 80+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience’s unknown unknowns about running Kubernetes in production.
https://2018.container.camp/uk/schedule/running-kubernetes-in-production-a-million-ways-to-crash-your-cluster/
There has been a lot of activity in V3DV, the Vulkan driver for Raspberry Pi 4,
over the last year: we have significantly reworked our synchronization code,
obtained Vulkan 1.1 conformance, implemented Vulkan 1.2 support, continued to
work on compiler optimizations and more.
In this talk I would like to go through the main development milestones and
changes we implemented in the driver as well as discussing some limitations of
the underlying hardware platform that have discouraged us from implementing
features such as scalar block layout or fp16.
(c) X.Org Developer Conference (XDC) 2022
October 4-6, 2022
Minneapolis, Minnesota, USA
https://indico.freedesktop.org/event/2/
The document discusses running LLVM buildbots at Linaro. It describes that Linaro runs over 160 LLVM builders to test commits across architectures, projects, platforms, and build types. Maintaining the buildbots is difficult due to limited resources, flaky builds, and differing perspectives between committers and maintainers. The future includes running more builds in pre-commit to catch issues earlier and reduce surprises for committers.
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
Apache Kafka is the most popular open-source stream-processing software for collecting, processing, storing, and analyzing data at scale. Most known for its excellent performance, low latency, fault tolerance, and high throughput, it's capable of handling thousands of messages per second. For mission-critical applications, how do you ensure that the performance delivered is the performance required? This is especially important as Kafka is written in Java and Scala and runs on the JVM. The JVM is a fantastic platform that delivers on an internet scale. In this session, we'll explore how making changes to the JVM design can eliminate the problems of garbage collection pauses and raise the throughput of applications. For cloud-based Kafka applications, this can deliver both lower latency and reduced infrastructure costs. All without changing a line of code!
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
This document discusses how the Azul Platform Prime JVM can improve Kafka performance without any code changes. It summarizes that the Azul JVM replaces HotSpot with the C4 garbage collector and Falcon JIT compiler to eliminate stop-the-world garbage collection pauses and improve adaptive compilation. This results in up to 20% better performance for Kafka workloads and allowed one customer to reduce their cloud hardware costs by 15% while maintaining throughput.
Would you ever play an online game if you were not able to communicate with your teammates? Isn’t it fun if you can make new friends, arrange pre-made games and celebrate your victories with people you like to play with?
Riot Games’ League of Legends handles millions of online players at any given time. Each chat server is responsible for routing over 1 billion real time events a day. In order to support the overwhelming user base and be prepared future growth, as well as pave the road for the upcoming features, chat infrastructure had to be designed and built with the utmost care, so that it would never fail the players.
In this talk I would like to present how we achieved linear scalability, improved the overall fault tolerance, created a framework for real time code upgrades and got ready for the new features we want to ship. I will also discuss in detail why we chose to use Erlang as a foundation for the system, and why we migrated our data from MySQL to Riak.
Reactive mistakes - ScalaDays Chicago 2017Petr Zapletal
Reactive applications are becoming a de-facto industry standard and, if employed correctly, toolkits like Lightbend Reactive Platform make the implementation easier than ever. But design of these systems might be challenging as it requires particular mindset shift to tackle problems we might not be used to. In this talk we’re going to discuss the most common things I’ve seen in the field that prevented applications to work as expected. I’d like to talk about typical pitfalls that might cause troubles, about trade-offs that might not be fully understood or important choices that might be overlooked including persistent actors pitfalls, tackling of network partitions, proper implementations of graceful shutdown or distributed transactions, trade-offs of micro-services or actors and more.
This talk should be interesting for anyone who is thinking about, implementing, or have already deployed reactive application. My goal is to provide a comprehensive explanation of common problems to be sure they won’t be repeated by fellow developers. The talk is a little bit more focused on Lightbend platform but understanding of the concepts we are going to talk about should be beneficial for everyone interested in this field.
The document discusses Mininet, an open source network emulator used for testing SDN ideas. It provides an overview of Mininet 1.0 and its functional fidelity before describing plans for Mininet 2.0 to improve performance fidelity through techniques like resource isolation, network invariants, and reproducible experiments. The document uses the example of DCTCP traffic to demonstrate how network invariants can validate emulator results.
[KCD GT 2023] Demystifying etcd failure scenarios for Kubernetes.pdfWilliam Caban
Etcd is a key-value store used by Kubernetes to store cluster data. The document discusses etcd failure scenarios and myths regarding using etcd in Kubernetes clusters. It provides best practices for configuring etcd heartbeat and election timers and hardware specifications to maintain stability. Common etcd failure modes like leader failure, follower failure, and network partitions are also covered.
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
Many organizations use Apache Kafka® to build data pipelines that span multiple geographically distributed data centers, for use cases ranging from high availability and disaster recovery, to data aggregation and regulatory compliance.
The journey from single-cluster deployments to multi-cluster deployments can be daunting, as you need to deal with networking configurations, security models and operational challenges. Geo-replication support for Kafka has come a long way, with both open-source and commercial solutions that support various replication topologies and disaster recovery strategies.
So, grab your towel, and join us on this journey as we look at tools, practices, and patterns that can help us build reliable, scalable, secure, global (if not inter-galactic) data pipelines that meet your business needs, and might even save the world from certain destruction.
This document provides an agenda and overview of Kafka on Kubernetes. It begins with an introduction to Kafka fundamentals and messaging systems. It then discusses key ideas behind Kafka's architecture like data parallelism and batching. The rest of the document explains various Kafka concepts in detail like topics, partitions, producers, consumers, and replication. It also introduces Kubernetes concepts relevant for running Kafka like StatefulSets, StorageClasses and the operator pattern. The goal is to help understand how to build event-driven systems using Kafka and deploy it on Kubernetes.
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
This document discusses scalable network designs and how modern networks can help applications. It begins with a brief history of network software and describes how switches now run Linux. Typical network designs are presented starting small and scaling up through multiple racks and core switches. The benefits of layer 3 designs, jumbo frames, and deep buffers to prevent packet loss are covered. Finally, it discusses how the network can help applications by detecting server failures, redirecting traffic, and enabling fast failover through features only possible by the switch running Linux.
Similar to Kafka Summit SF 2017 - Running Kafka as a Service at Scale (20)
Building API data products on top of your real-time data infrastructureconfluent
This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products.
You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering.
You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks.
Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Santander Stream Processing with Apache Flinkconfluent
Flink is becoming the de facto standard for stream processing due to its scalability, performance, fault tolerance, and language flexibility. It supports stream processing, batch processing, and analytics through one unified system. Developers choose Flink for its robust feature set and ability to handle stream processing workloads at large scales efficiently.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
This document discusses networking options and best practices for Confluent Cloud. It provides an overview of public endpoints, private link, and peering options. It then discusses best practices for private networking architectures on Azure using hub-and-spoke and private link designs. Finally, it addresses networking considerations and challenges for Kafka Connect managed connectors, as well as planned enhancements for DNS peering and outbound private link support.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
This document discusses moving to an event-driven architecture using Confluent. It begins by outlining some of the limitations of traditional messaging middleware approaches. Confluent provides benefits like stream processing, persistence, scalability and reliability while avoiding issues like lack of structure, slow consumers, and technical debt. The document then discusses how Confluent can help modernize architectures, enable new real-time use cases, and reduce costs through migration. It provides examples of how companies like Advance Auto Parts and Nord/LB have benefitted from implementing Confluent platforms.
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Orca: Nocode Graphical Editor for Container OrchestrationPedro J. Molina
Tool demo on CEDI/SISTEDES/JISBD2024 at A Coruña, Spain. 2024.06.18
"Orca: Nocode Graphical Editor for Container Orchestration"
by Pedro J. Molina PhD. from Metadev
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...Luigi Fugaro
Vector databases are redefining data handling, enabling semantic searches across text, images, and audio encoded as vectors.
Redis OM for Java simplifies this innovative approach, making it accessible even for those new to vector data.
This presentation explores the cutting-edge features of vector search and semantic caching in Java, highlighting the Redis OM library through a demonstration application.
Redis OM has evolved to embrace the transformative world of vector database technology, now supporting Redis vector search and seamless integration with OpenAI, Hugging Face, LangChain, and LlamaIndex. This talk highlights the latest advancements in Redis OM, focusing on how it simplifies the complex process of vector indexing, data modeling, and querying for AI-powered applications. We will explore the new capabilities of Redis OM, including intuitive vector search interfaces and semantic caching, which reduce the overhead of large language model (LLM) calls.
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
The Comprehensive Guide to Validating Audio-Visual Performances.pdfkalichargn70th171
Ensuring the optimal performance of your audio-visual (AV) equipment is crucial for delivering exceptional experiences. AV performance validation is a critical process that verifies the quality and functionality of your AV setup. Whether you're a content creator, a business conducting webinars, or a homeowner creating a home theater, validating your AV performance is essential.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio, Inc.
Alluxio Webinar
June. 18, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jianjian Xie (Staff Software Engineer, Alluxio)
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
What is Continuous Testing in DevOps - A Definitive Guide.pdfkalichargn70th171
Once an overlooked aspect, continuous testing has become indispensable for enterprises striving to accelerate application delivery and reduce business impacts. According to a Statista report, 31.3% of global enterprises have embraced continuous integration and deployment within their DevOps, signaling a pervasive trend toward hastening release cycles.
Superpower Your Apache Kafka Applications Development with Complementary Open...Paul Brebner
Kafka Summit talk (Bangalore, India, May 2, 2024, https://events.bizzabo.com/573863/agenda/session/1300469 )
Many Apache Kafka use cases take advantage of Kafka’s ability to integrate multiple heterogeneous systems for stream processing and real-time machine learning scenarios. But Kafka also exists in a rich ecosystem of related but complementary stream processing technologies and tools, particularly from the open-source community. In this talk, we’ll take you on a tour of a selection of complementary tools that can make Kafka even more powerful. We’ll focus on tools for stream processing and querying, streaming machine learning, stream visibility and observation, stream meta-data, stream visualisation, stream development including testing and the use of Generative AI and LLMs, and stream performance and scalability. By the end you will have a good idea of the types of Kafka “superhero” tools that exist, which are my favourites (and what superpowers they have), and how they combine to save your Kafka applications development universe from swamploads of data stagnation monsters!
45. 45
Broker 1 Broker 2 Broker N
Producer Producer Producer
Consumer Consumer
Kafka Cluster
Topic partition
REPLICATIONREPLICATION
Controller
Where did my topic partitions go?
46. 46
● Leader Election
● Replica Reassignment
● Create Topic
● Delete Topic
● Add Partitions
● Broker start and shutdown
Responsibilities of the Controller
No Controller - No Cluster
47. 47
● Controller state - topic creation, topic deletion etc
● Time taken to perform an operation
● Rate at which an admin operation is performed
● Queue sizes within the controller
Lack of Metrics
49. 49
Root Cause
Upgrades!
● Synchronous per-partition
zookeeper writes
● Sequential per-partition
controller-to-broker requests
● Complicated concurrency
semantics
● No separation of control plane
from data plane
51. 51
Zero controller downtime!
● Highly available cluster
● 10x faster leader elections
● More number of topic partitions per cluster
● Faster broker shutdown and upgrades
53. 53
Broker 1 Broker 2 Broker N
Producer Producer Producer
Consumer Consumer
Kafka Cluster
Topic partition
REPLICATIONREPLICATION
Can I get my latency?
54. 54
But I have bytes quota set
● Throttle byte rate per second on the broker
● Response delayed on exceeding threshold
● Avoids bad clients from consuming all the bandwidth
56. 56
Root Cause
● Too many small sized requests
● DDos attack from client
● Decompression on the server takes a long time
● With more consumer instances, more requests
57. 57
bin/kafka-configs --zookeeper localhost:2181 --alter --add-config
'request_percentage=50' --entity-name user1 --entity-type users
Request Quota - percentage of time a client can spend on request
handler (I/O) threads and network threads within each quota window
Predictable Latency