Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020

•

0 likes•860 views

- Upgrades should be done often to get bug fixes and improvements, following the upgrade guide carefully. Start with a healthy cluster and upgrade components outward from Zookeeper to Kafka brokers to clients. Don't rush the process or have any unresolved partition reassignments. - Collect JMX metrics to monitor the cluster as outages can be prolonged without visibility. The Kafka defaults are suitable for single node deployments but replication factor, threads, and broker configuration should be tuned for larger clusters. - Quotas like replication throttling and bandwidth/request limits per client or topic should be used to protect the cluster and clients. Log files should separate each component and be retained for a few days. Consider multiple clusters by SLA

Technology

Learnings from the Field
Lessons from working dozens of
small & large deployments
Mitch Henderson
2020 - Kafka Summit

Who am I?
Mitch Henderson
Customer Success Technical Architect
At Confluent for ~3 years.
1000s of support cases
Exposed to 1000s of deployments

4
Why live with
known bugs?
Apache Kafka 2.5.0
● 92 Bug fixes
● 65 improvements.
● 7 new features

6
How to upgrade?
● Read the upgrade guide 3 times.
○ Do you understand the
API/Protocol versions? This
is important.
● Start with a healthy cluster!
○ No URP! Seriously, NONE!
● Work outward. Zookeeper ->
Kafka brokers ->
connect/Streams/SR -> clients
● One node(JMV instance) at a
time!
● Upgrade binaries.
● Wait for URP to return to none!

7
What not to do? ● Replace old brokers with new
brokers, unless you have to.
● Upgrade multiple components at
the same time
● Multiple changes at once.
● Start with unhealthy cluster.
● Rush the process
● Do not move on to next step with
any URP!!!!!

Want to prolong an outage?
Don’t have metric collection.

12
Common Questions ● What tool to use?
● How often to poll the JMX
interface?
● Will this cause performance issues?
● How long do I need to keep these
metrics?

Apache Kafka defaults are
suitable for single node
deployments.
Which to change?

14
Replication factor=3
num.partitions
num.network.threads=
8
num.io.threads
Number of disks or 8
broker.rack
auto.topics.create.enabled=false
replica.selector.class:
org.apache.kafka.common.replica.RackAwareReplicaSelector
num.replica.fetchers=16
Brokers:
JVM Heap

16
Clients:
acks=ALL
Application dependant:
batch.size
linger.ms
delivery.timeout.ms
client.rack
Do you really need EOS?
Topics:
Retention.ms & retention.bytes
min.insync.replicas

Logging - Can’t know where
you’re going without knowing
where you’ve been

18
Each component should go-to its own log files.
org.apache.log4j.RollingFileAppender is your friend use it!
Without it you will fill up your logging disk and bad things
will happen!
You should plan to keep at least a few days of logs.
Do not be afraid to turn on debug level logging. There is a
JMX bean for this! No more need to restart brokers.

Quotas! Protect the cluster
and the clients!

20
Mandatory Quotas!
Replication quota!
This prevents a broker that’s recovering overwhelming the leaders!
This will also prevent a rebalance from stealing all the cluster
resources!
It will save your butt at 3am!
bin/kafka-configs … --alter
--add-config
'leader.replication.throttled.rate=10000'
--entity-type broker

21
Two types of client quotas
Bandwidth
Bytes in/out
Request based
Everything in
Kafka is a request

22
Bandwidth quotas
● Easy to reason about
● Easy to implement.
● Easy to monitor
○ per-client metric to indicate throttle times
● Great way to capacity plan your cluster!

23
Request quotas
● Added in KIP-124
● Motivation was to limit clients from overwhelming the
network threads and request threads
● defined as a percent of utilization of:
((num.io.threads + num.network.threads) * 100%)
● More difficult to reason about but very useful in environments where
clients are concerned about latency.

24
Storage Quotas
also called retention
retention.ms & retention.bytes
If you’re not setting these BOTH on every
single topic you’re asking for trouble.

25
Suggestions:
SET A QUOTA FOR ALL CLIENTS!
Set a retention on all topics!

27
Answer: Many clusters!
Bucket by SLA or Criticality.
Easier maintenance.
Easier tuning.
Better monitoring.
Safer!
Why not? More sprawl
It’s a balance.

28
Single tenant
Any SLA
Multi-tenant but groups by
application group or LOB
Medium SLA
Multi-tenant
high SLA
SLA

Thank you!
@mr_mitchellh
mitch@confluent.io
cnfl.io/meetups cnfl.io/slackcnfl.io/blog

What's hot

Apache Kafka is the most popular open-source stream-processing software for collecting, processing, storing, and analyzing data at scale. Most known for its excellent performance, low latency, fault tolerance, and high throughput, it's capable of handling thousands of messages per second. For mission-critical applications, how do you ensure that the performance delivered is the performance required? This is especially important as Kafka is written in Java and Scala and runs on the JVM. The JVM is a fantastic platform that delivers on an internet scale. In this session, we'll explore how making changes to the JVM design can eliminate the problems of garbage collection pauses and raise the throughput of applications. For cloud-based Kafka applications, this can deliver both lower latency and reduced infrastructure costs. All without changing a line of code!

Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul

HostedbyConfluent

Kafka is one of the most important foundation services at Zendesk. It became even more crucial with the introduction of Global Event Bus which my team built to propagate events between Kafka clusters hosted at different parts of the world and between different products. As part of its rollout, we had to add mTLS support in all of our Kafka Clusters (we have quite a few of them), this was to make propagation of events between clusters hosted at different parts of the world secure. It was quite a journey, but we eventually built a solution that is working well for us. Things I will be sharing as part of the talk: 1. Establishing the use case/problem we were trying to solve (why we needed mTLS) 2. Building a Certificate Authority with open source tools (with self-signed Root CA) 3. Building helper components to generate certificates automatically and regenerate them before they expire (helps using a shorter TTL (Time To Live) which is good security practice) for both Kafka Clients and Brokers 4. Hot reloading regenerated certificates on Kafka brokers without downtime 5. What we built to rotate the self-signed root CA without downtime as well across the board 6. Monitoring and alerts on TTL of certificates 7. Performance impact of using TLS (along with why TLS affects kafka’s performance) 8. What we are doing to drive adoption of mTLS for existing Kafka clients using PLAINTEXT protocol by making onboarding easier 9. How this will become a base for other features we want, eg ACL, Rate Limiting (by using the principal from the TLS certificate as Identity of clients)

Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020

confluent

What does a Kafka administrator need to do if they have a user who demands that message delivery be guaranteed, fast, and low cost? In this talk we walk through the architecture we created to deliver for such users. Learn around the alternatives we considered and the pros and cons around what we came up with. In this talk, we’ll be forced to dive into broker restart and failure scenarios and things we need to do to prevent leader elections from slowing down incoming requests. We’ll need to take care of the consumers as well to ensure that they don’t process the same request twice. We also plan to describe our architecture by showing a demo of simulated requests being produced into Kafka clusters and consumers processing them in lieu of us aggressively causing failures on the Kafka clusters. We hope the audience walks away with a deeper understanding of what it takes to build robust Kafka clients and how to tune them to accomplish stringent delivery guarantees.

Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020

HostedbyConfluent

Connecting kafka message systems with scylla

Maheedhar Gunturu

Introducción a Stream Processing utilizando Kafka Streams

confluent

When your Kafka clusters start growing so is the cost associated with them. As administrators we have to ensure that the service we support is operating in the most reliable way to satisfy the customers. However, for our business it is as important that we ensure the same service is also cost-efficient. There are two ways we can optimize the cost of service – tuning broker machines and tuning the data transfers. Minimizing data transfer is the largest return on investment since that is what accounts for the most spend. With the use of Kafka administrative tools and metrics we can find multiple ways to reduce the data transfers in the clusters. The presentation will cover various techniques administrators of Kafka service can employ to reduce the data transfers and to save the operational costs. Reducing cross-AZ traffic, optimizing batching with use of DumpLogSegment script, utilizing Kafka metrics to shut down unused data streams and more. With an objective of making our Kafka deployment as cost effective as possible, we have gained money saving tricks. And we would love to share them with the community.

Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat

HostedbyConfluent

What happened when our biggest and most important Kafka cluster went rogue all of a sudden, and while trying to recover it, a single, crucial misconfiguration made things even worse? At a company like Taboola, where service availability and latency are our top priority, this was a disaster. With 300K messages/sec and 250TB of messages produced each day to our on-premise Kafka clusters, and mirrored to our central Kafka cluster, we always try to ensure Kafka behaves well under high loads of traffic and unexpected cluster failures. So when our main Kafka cluster went crazy we had a serious issue on our hands. This session is the story of how we learned the hard way about mitigating cluster failures with the proper configurations in place.

Oops! I started a broker | Yinon Kahta, Taboola

HostedbyConfluent

Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul

HostedbyConfluent

Watch this talk here: https://www.confluent.io/online-talks/scaling-security-on-100s-of-millions-of-mobile-devices-using-kafka-and-scylla-on-demand Join mobile cybersecurity leader Lookout as they talk through their data ingestion journey. Lookout enables enterprises to protect their data by evaluating threats and risks at post-perimeter endpoint devices and providing access to corporate data after conditional security scans. Their continuous assessment of device health creates a massive amount of telemetry data, forcing new approaches to data ingestion. Learn how Lookout changed its approach in order to grow from 1.5 million devices to 100 million devices and beyond, by implementing Confluent Platform and switching to Scylla.

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...

confluent

More and more Enterprises are relying on Apache Kafka to run their businesses. Cluster administrators need the ability to mirror data between clusters to provide high availability and disaster recovery. MirrorMaker 2, released recently as part of Kafka 2.4.0, allows you to mirror multiple clusters and create many replication topologies. Learn all about this awesome new tool and how to reliably and easily mirror clusters. We will first describe how MirrorMaker 2 works, including how it addresses all the shortcomings of MirrorMaker 1. We will also cover how to decide between its many deployment modes. Finally, we will share our experience running it in production as well as our tips and tricks to get a smooth ride.

Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...

HostedbyConfluent

Westpac Bank Tech Talk 1: Dive into Apache Kafka

confluent

Kubernetes became the de-facto standard for running cloud-native applications. And many users turn to it also to run stateful applications such as Apache Kafka. You can use different tools to deploy Kafka on Kubernetes - write your own YAML files, use Helm Charts, or go for one of the available operators. But there is one thing all of these have in common. You still need very good knowledge of Kubernetes to make sure your Kafka cluster works properly in all situations. This talk will cover different Kubernetes features such as resources, affinity, tolerations, pod disruption budgets, topology spread constraints and more. And it will explain why they are important for Apache Kafka and how to use them. If you are interested in running Kafka on Kubernetes and do not know all of these, this is a talk for you.

Everything you ever needed to know about Kafka on Kubernetes but were afraid ...

HostedbyConfluent

Consuming messages in parallel is what Apache Kafka® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are a number of situations where Kafka’s partition-level parallelism gets in the way of optimal design. This session will go over some of these types of situations that can benefit from parallel message processing within a single application instance (aka slow consumers or competing consumers), and then introduce the new Parallel Consumer labs project from Confluent, which can improve functionality and massively improve performance in such situations. It will cover the - Different ordering modes of the client - Relative performance improvements - Usage with other components like Kafka Streams - An introduction to the internal architecture of the project - How it can achieve all this in a reassignment friendly manner

Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent

HostedbyConfluent

Apache Kafka is a key part of the Big Data infrastructure at Salesforce, enabling publish/subscribe and data transport in near real-time at enterprise scale handling trillions of messages per day. In this session, hear from the teams at Salesforce that manage Kafka as a service, running over a hundred clusters across on-premise and public cloud environments with over 99.9% availability. Hear about best practices and innovations, including: * How to manage multi-tenant clusters in a hybrid environment * High volume data pipelines with Mirus replicating data to Kafka and blob storage * Kafka Fault Injection Framework built on Trogdor and Kibosh * Automated recovery without data loss * Using Envoy as an SNI-routing Kafka gateway We hope the audience will have practical takeaways for building, deploying, operating, and managing Kafka at scale in the enterprise.

Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...

HostedbyConfluent

While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. Many organizations understand the use cases around their data – fraud detection, quality of service and technical operations, user behavior analysis, for example – but are not necessarily data infrastructure experts. In this session, we’ll follow the flow of data through an end to end system built to handle tens of terabytes an hour of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive are actually stitched together; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality. Attendees will leave this session knowing not just which open source projects go into a system such as this, but how they work together, what tradeoffs and decisions need to be addressed, and how to present a single general purpose data platform to multiple applications. This session should be attended by data infrastructure engineers and architects planning, building, or maintaining similar systems.

Building an Event-oriented Data Platform with Kafka, Eric Sammer

confluent

Netflix Data Pipeline With Kafka

Steven Wu

Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...

HostedbyConfluent

Watch this talk here: https://www.confluent.io/online-talks/how-apache-kafka-works-on-demand Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer. From there we’ll cover strategies for building powerful stream processing applications, including high availability through replication, data retention policies, producer design and producer guarantees. We’ll delve into the details of delivery guarantees, including exactly-once semantics, partition strategies and consumer group rebalances. The talk will finish with a discussion of compacted topics, troubleshooting strategies and a security overview. This session is part 3 of 4 in our Fundamentals for Apache Kafka series.

How Apache Kafka® Works

confluent

Apache Kafka - Martin Podval

Martin Podval

Streaming all over the world Real life use cases with Kafka Streams

confluent

What's hot (20)

Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul

Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020

Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020

Connecting kafka message systems with scylla

Introducción a Stream Processing utilizando Kafka Streams

Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat

Oops! I started a broker | Yinon Kahta, Taboola

Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...

Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Everything you ever needed to know about Kafka on Kubernetes but were afraid ...

Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent

Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...

Building an Event-oriented Data Platform with Kafka, Eric Sammer

Netflix Data Pipeline With Kafka

Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...

How Apache Kafka® Works

Apache Kafka - Martin Podval

Streaming all over the world Real life use cases with Kafka Streams

Similar to Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020

Citi Tech Talk: Monitoring and Performance

confluent

Tokyo AK Meetup Speedtest - Share.pdf

ssuser2ae721

Netflix Data Pipeline With Kafka

Allen (Xiaozhong) Wang

IBM MQ - better application performance

MarkTaylorIBM

Rate limits and all about

Alexander Tokarev

OnPrem Monitoring.pdf

TarekHamdi8

kafka

Ariel Moskovich

BAXTER phase 1b

Franck MIKULECZ

"As Apache Kafka gains widespread adoption, an increasing number of people face its pitfalls. Despite completing courses and reading documentation, many encounter hurdles navigating Kafka's subtle complexities. Join us for an enlightening session led by the customer support team of Conduktor, where we engage daily with users grappling with Kafka's subtleties. We've observed recurring themes in user queries: What happens when a consumer group rebalances? What is an advertised listener? Why aren't my records displayed in chronological order when I consume them? How does retention work? For all these questions, the answer is ""It depends"". In this talk, we aim to demystify these uncertainties by presenting nuanced scenarios for each query. That way you will be more confident on how your Kafka infrastructure works behind the scenes, and you'll be equipped to share this knowledge with your colleagues. By being aware of the most common misconceptions, you should be able to both speed up your own learning curve and also help others more effectively."

Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective

HostedbyConfluent

Non-Kafkaesque Apache Kafka - Yottabyte 2018

Otávio Carvalho

The goal of most streams processing jobs is to process data and deliver insights to the business – fast. Unfortunately, sometimes our streams processing jobs fall short of this goal. Or perhaps the job used to run fine, but one day it just isn’t fast enough? In this talk, we’ll dive into the challenges of analyzing performance of real-time stream processing applications. We’ll share troubleshooting suggestions and some of our favorite tools. So next time someone asks “why is this taking so long?”, you’ll know what to do.

Why is My Stream Processing Job Slow? with Xavier Leaute

Databricks

Cloud Messaging Service: Technical Overview

Messaging Meetup

Zendcon scaling magento

Mathew Beane

Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2y2yPiS. Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. Filmed at qconsf.com. Colin McCabe is a Kafka committer at Confluent, working on the scalability and extensibility of Kafka. Previously, he worked on the Hadoop Distributed Filesystem and the Ceph Filesystem.

Kafka Needs No Keeper

C4Media

Kat Grigg, Confluent, Senior Customer Success Architect + Jen Snipes, Confluent, Senior Customer Success Architect This presentation will cover tips and best practices for Apache Kafka. In this talk, we will be covering the basic internals of Kafka and how these components integrate together including brokers, topics, partitions, consumers and producers, replication, and Zookeeper. We will be talking about the major categories of operations you need to be setting up and monitoring including configuration, deployment, maintenance, monitoring and then debugging. https://www.meetup.com/KafkaBayArea/events/270915296/

Tips & Tricks for Apache Kafka®

confluent

Talk from FOSDEM 2022 It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes. However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect. Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should. In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature. Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload. Topics: 1. Understand what we mean by high concurrency. 2. Understand ACID & MVCC in Postgres. 3. Understand how high concurrency affects Postgres performance. 4. Understand how locks/latches affect Postgres performance. 5. Understand how high transaction rates can affect Postgres. 6. Mitigation strategies for high concurrency scenarios.

Slow things down to make them go faster [FOSDEM 2022]

Jimmy Angelakos

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story

vanphp

Cocktail of Environments. How to Mix Test and Development Environments and St...

Aleksandr Tarasov

The number of deployments of Apache Kafka at enterprise scale has greatly increased in the years since Kafka’s original development in 2010. Along with this rapid growth has come a wide variety of use cases and deployment strategies that transcend what Kafka’s creators imagined when they originally developed the technology. As the scope and reach of streaming data platforms based on Apache Kafka has grown, the need to understand monitoring and troubleshooting strategies has as well. Dustin Cote and Ryan Pridgeon share their experience supporting Apache Kafka at enterprise-scale and explore monitoring and troubleshooting techniques to help you avoid pitfalls when scaling large-scale Kafka deployments. Topics include: - Effective use of JMX for Kafka - Tools for preventing small problems from becoming big ones - Efficient architectures proven in the wild - Finding and storing the right information when it all goes wrong Visit www.confluent.io for more information.

Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka

confluent

Introduction to Apache Kafka

Ricardo Bravo

Similar to Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020 (20)

Citi Tech Talk: Monitoring and Performance

Tokyo AK Meetup Speedtest - Share.pdf

Netflix Data Pipeline With Kafka

IBM MQ - better application performance

Rate limits and all about

OnPrem Monitoring.pdf

kafka

BAXTER phase 1b

Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective

Non-Kafkaesque Apache Kafka - Yottabyte 2018

Why is My Stream Processing Job Slow? with Xavier Leaute

Cloud Messaging Service: Technical Overview

Zendcon scaling magento

Kafka Needs No Keeper

Tips & Tricks for Apache Kafka®

Slow things down to make them go faster [FOSDEM 2022]

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story

Cocktail of Environments. How to Mix Test and Development Environments and St...

Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka

Introduction to Apache Kafka

More from HostedbyConfluent

"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way. Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks. Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case. Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

HostedbyConfluent

"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics. Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"

Renaming a Kafka Topic | Kafka Summit London

HostedbyConfluent

"Trendyol, Turkey's leading e-commerce company, is committed to positively impacting the lives of millions of customers. Our decision-making processes are entirely driven by data. As a data warehouse team, our primary goal is to provide accurate and up-to-date data, enabling the extraction of valuable business insights. We utilize the benefits provided by Kafka and Kafka Connect to facilitate the transfer of data from the source to our analytical environment. We recently transitioned our Kafka Connect clusters from on-premise VMs to Kubernetes. This shift was driven by our desire to effectively manage rapid growth(marked by a growing number of producers, consumers, and daily messages), ensuring proper monitoring and consistency. Consistency is crucial, especially in instances where we employ Single Message Transforms to manipulate records like filtering based on their keys or converting a JSON Object into a JSON string. Monitoring our cluster's health is key and we achieve this through Grafana dashboards and alerts generated through kube-state-metrics. Additionally, Kafka Connect's JMX metrics, coupled with NewRelic, are employed for comprehensive monitoring. The session will aim to explain our approach to NRT data ingestion, outlining the role of Kafka and Kafka Connect, our transition journey to K8s, and methods employed to monitor the health of our clusters."

Evolution of NRT Data Ingestion Pipeline at Trendyol

HostedbyConfluent

"Join our lightning talk to delve into the strategies vital for maintaining a resilient Kafka service. While proactive monitoring is key for issue prevention, failures will still occur. Rapid detection tools will enable you to identify and resolve problems before they impact end-users. This session explores the techniques employed by Kafka cloud providers for this detection, many of which are also applicable if you are managing independent Kafka clusters or applications. The talk focuses on health-checking, a powerful tool that encompasses an application and its monitoring to validate Kafka environment availability. The session navigates through Kafka health-check methods, sharing best practices, identifying common pitfalls, and highlighting the monitoring of critical performance metrics like throughput and latency for early issue detection. Attendees will gain valuable insights into the art of health-checking their Kafka environment, equipping them with the tools to identify and address issues before they escalate into critical problems. We invite all Kafka enthusiasts to join us in this talk to foster a deeper understanding of Kafka health-checking and ensure the continued smooth operation of your Kafka environment."

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

HostedbyConfluent

"Stream processing systems traditionally gave their users the choice between at least once processing and at most once processing: accepting duplicate data or missing data. But ideally we would provide exactly-once processing, where every event in the input data is represented exactly once in the output. Kafka provides a transaction API that enables exactly-once when using Kafka as your source and sink. But this API has turned out to not be well suited for use by high level streaming systems, requiring various work arounds to still provide transactional processing. In this talk, I’ll cover how the transaction API works, and how systems like Arroyo and Flink have used it to build exactly-once support, and how improvements to the transactional API will enable better end-to-end support for consistent stream processing."

Exactly-once Stream Processing with Arroyo and Kafka

HostedbyConfluent

"In this talk, we will explore the exciting world of IoT and computer vision by presenting a unique project: Fish Plays Pokemon. Using an ESP Eye camera connected to an ESP32 and other IoT devices, to monitor fish's movements in an aquarium. This project showcases the power of IoT and computer vision, demonstrating how even a fish can play a popular video game. We will discuss the challenges we faced during development, including real-time processing, IoT device integration, and Kafka message consumption. By the end of the talk, attendees will have a better understanding of how to combine IoT, computer vision, and the usage of a serverless cloud to create innovative projects. They will also learn how to integrate IoT devices with Kafka to simulate keyboard behavior, opening up endless possibilities for real-time interactions between the physical and digital worlds."

Fish Plays Pokemon | Kafka Summit London

HostedbyConfluent

Tiered Storage 101 | Kafla Summit London

HostedbyConfluent

"Real-time 24/7 monitoring and verification of massive data is challenging – even more so for the world’s second largest manufacturer of memory chips and semiconductors. Tolerance levels are incredibly small, any small defect needs to be identified and dealt with immediately. The goal of semiconductor manufacturing is to improve yield and minimize unnecessary work. However, even with real-time data collection, the data was not easy to manipulate by users and it took many days to enable stream processing requests – limiting its usefulness and value to the business. You’ll hear why SK hynix switched to Confluent and how we developed a self-service stream process portal on top of it. Now users have an easy-to-use service to manipulate the data they want. Results have been impressive, stream processing requests are available the same day – previously taking 5 days! We were also able to drive down costs by 10% as stream processing requests no longer require additional hardware. What you’ll take away from our talk: - What were the pain points in the previous environment - How we transitioned to Confluent without service downtime - Creating a self-service stream processing portal built on top of Connect and ksqlDB - Use case of stream process portal"

Building a Self-Service Stream Processing Portal: How And Why

HostedbyConfluent

"Discover how default configurations might impact ingestion times, especially when dealing with large files. We'll explore a real-world scenario with a 20,000,000+ line file, assessing metrics and exploring the bottleneck in the default setup. Understand the intricacies of batch size calculations and how to optimize them based on your unique data characteristics. Walk away with actionable insights as we showcase a practical example, turning a 7-hour ingestion process into a mere 30 minutes for over 30,000,000 records in a Kafka topic. Uncover metrics, configurations, and best practices to elevate the performance of your Kafka Connect CSV source connectors. Don't miss this opportunity to optimize your data pipeline and ensure smooth, efficient data flow."

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

HostedbyConfluent

"In order to meet the current and ever-increasing demand for near-zero RPO/RTO systems, a focus on resiliency is critical. While Kafka offers built-in resiliency features, a perfect blend of client and cluster resiliency is necessary in order to achieve a highly resilient Kafka client application. At Fidelity Investments, Kafka is used for a variety of event streaming needs such as core brokerage trading platforms, log aggregation, communication platforms, and data migrations. In this lightening talk, we will discuss the governance framework that has enabled producers and consumers to achieve their SLAs during unprecedented failure scenarios. We will highlight how we automated resiliency tests through chaos engineering and tightly integrated observability dashboards for Kafka clients to analyze and optimize client configurations. And finally, we will summarize the chaos test suite and the ""test, test and test"" mantra that are helping Fidelity Investments reach its goal of a future with zero down-time."

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

HostedbyConfluent

"There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option! In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds."

Navigating Private Network Connectivity Options for Kafka Clusters

HostedbyConfluent

"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met. I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines. Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

HostedbyConfluent

"Almost everyone has heard about large language models, and tens of millions of people have tried out OpenAI ChatGPT and Google Bard. However, the intricate architecture and underlying mathematics driving these remarkable systems remain elusive to many. LLM's are fascinating - so let's grab a drink and find out how these systems are built and dive deep into their inner workings. In the length of time it to enjoy a round of drinks, you'll understand the inner workings of these models. We'll take our first sip of word vectors, enjoy the refreshing taste of the transformer, and drain a glass understanding how these models are trained on phenomenally large quantities of data. Large language models for your streaming application - explained with a little maths and a lot of pub stories"

Explaining How Real-Time GenAI Works in a Noisy Pub

HostedbyConfluent

"Monitoring is a fundamental operation when running Kafka and Kafka applications in production. There are numerous metrics available when using Kafka, however the sheer number is overwhelming, making it challenging to know where to start and how to properly utilise them. This session will introduce you to some of the key metrics that should be monitored and best practices in fine tuning your monitoring. We will delve into which metrics are the indicators for cluster’s availability and performance and are the most helpful when debugging client applications."

TL;DR Kafka Metrics | Kafka Summit London

HostedbyConfluent

Kafka Streams relies on state restoration for maintaining standby tasks as failure recovery mechanism as well as for restoring the state after rebalance scenarios. When you are scaling up or down your application instances, it is necessary to know the current state of the restoration process for each active and standby task in order to prevent a long restoration process as much as possible. During this presentation, you will get an understanding of how KIP-869 provides valuable information about the current active task restoration after a rebalance and KIP-988 opens a window to the continuous process of standby restoration. When you encounter a situation in which you need to choose whether or not to scale up or down your application instances, both KIPs will be an invaluable ally for you.

A Window Into Your Kafka Streams Tasks | KSL

HostedbyConfluent

"In this talk, we will dive into the world of Kafka producer configs and explore how to understand and optimize them for better performance. We will cover the different types of configs, their impact on performance, and how to tune them to achieve the best results. Whether you're new to Kafka or a seasoned pro, this session will provide valuable insights and practical tips for improving your Kafka producer performance. - Introduction to Kafka producer internal and workflow - Understanding the producer configs like linger.ms, batch.size, buffer.memory and their impact on performance - Learning about producer configs like max.block.ms, delivery.timeout.ms, request.timeout.ms and retries to make producer more resilient. - Discuss configs like enable.idempotence, max.in.flight.requests.per.connection and transaction related configs to achieve delivery guarantees. - Q&A session with attendees to address specific questions and concerns."

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

HostedbyConfluent

"Data contracts are one of the hottest topics in the data management community. A data contract is a formal agreement between a data producer and its consumers, aimed at reducing data downtime and improving data quality. Schemas are an important part of data contracts, but they are not the only relevant element. In this talk, we’ll: 1. see why data contracts are so important but also difficult to implement; 2. identify the characteristics of a well-designed data contract: discuss the anatomy of a data contract, its main elements and, how to formally describe them; 3. show how to manage the lifecycle of a data contract leveraging Confluent Platform's services."

Data Contracts Management: Schema Registry and Beyond

HostedbyConfluent

"In the realm of stateful stream processing, Apache Flink has emerged as a powerful and versatile platform. However, the conventional SQL-based approach often limits the full potential of Flink applications. We will delve into the benefits of adopting a code-first approach, which provides developers with greater control over application logic, facilitates complex transformations, and enables more efficient handling of state and time. We will also discuss how the code-first approach can lead to more maintainable and testable code, ultimately improving the overall quality of your Flink applications. Whether you're a seasoned Flink developer or just starting your journey, this talk will provide valuable insights into how a code-first approach can revolutionize your stream processing applications."

Code-First Approach: Crafting Efficient Flink Apps

HostedbyConfluent

"Change Data Capture (CDC) has become a commodity in data engineering, much in part due to the ever-rising success of Debezium [1]. But is that all there is? In this lightning talk, we’ll outline the current state of the CDC ecosystem, and understand why adopting a Debezium alternative is still a hard sell. If you’ve ever wondered what else is out there, but can’t keep up with the sprawling of new tools in the ecosystem; we’ll wrap it up for you! [1] https://debezium.io/"

Debezium vs. the World: An Overview of the CDC Ecosystem

HostedbyConfluent

"Separation of compute and storage has become the de-facto standard in the data industry for batch processing. The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world. In this talk, we'll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks. Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka's architecture from the ground up, but the benefits are worth it. This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero."

Beyond Tiered Storage: Serverless Kafka with No Local Disks

HostedbyConfluent

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Renaming a Kafka Topic | Kafka Summit London

Evolution of NRT Data Ingestion Pipeline at Trendyol

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Exactly-once Stream Processing with Arroyo and Kafka

Fish Plays Pokemon | Kafka Summit London

Tiered Storage 101 | Kafla Summit London

Building a Self-Service Stream Processing Portal: How And Why

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Navigating Private Network Connectivity Options for Kafka Clusters

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Explaining How Real-Time GenAI Works in a Noisy Pub

TL;DR Kafka Metrics | Kafka Summit London

A Window Into Your Kafka Streams Tasks | KSL

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Data Contracts Management: Schema Registry and Beyond

Code-First Approach: Crafting Efficient Flink Apps

Debezium vs. the World: An Overview of the CDC Ecosystem

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Top 10 Most Downloaded Games on Play Store in 2024

SynarionITSolutions

Scaling API-first – The story of a global engineering organization

Radu Cotescu

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Real Time Object Detection Using Open CV

Khem

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

A Principled Technologies deployment guide Conclusion Deploying VMware Cloud Foundation 5.1 on next gen Dell PowerEdge servers brings together critical virtualization capabilities and high-performing hardware infrastructure. Relying on our hands-on experience, this deployment guide offers a comprehensive roadmap that can guide your organization through the seamless integration of advanced VMware cloud solutions with the performance and reliability of Dell PowerEdge servers. In addition to the deployment efficiency, the Cloud Foundation 5.1 and PowerEdge solution delivered strong performance while running a MySQL database workload. By leveraging VMware Cloud Foundation 5.1 and PowerEdge servers, you could help your organization embrace cloud computing with confidence, potentially unlocking a new level of agility, scalability, and efficiency in your data center operations.

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Principled Technologies

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Top 10 Most Downloaded Games on Play Store in 2024

Scaling API-first – The story of a global engineering organization

Exploring the Future Potential of AI-Enabled Smartphone Processors

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Boost Fertility New Invention Ups Success Rates.pdf

Why Teams call analytics are critical to your entire business

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

A Domino Admins Adventures (Engage 2024)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Real Time Object Detection Using Open CV

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

🐬 The future of MySQL is Postgres 🐘

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

AWS Community Day CPH - Three problems of Terraform

Axa Assurance Maroc - Insurer Innovation Award 2024

Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020

1. Learnings from the Field Lessons from working dozens of small & large deployments Mitch Henderson 2020 - Kafka Summit

2. Who am I? Mitch Henderson Customer Success Technical Architect At Confluent for ~3 years. 1000s of support cases Exposed to 1000s of deployments

3. Upgrades! Do them often!

4. 4 Why live with known bugs? Apache Kafka 2.5.0 ● 92 Bug fixes ● 65 improvements. ● 7 new features

5. 5

6. 6 How to upgrade? ● Read the upgrade guide 3 times. ○ Do you understand the API/Protocol versions? This is important. ● Start with a healthy cluster! ○ No URP! Seriously, NONE! ● Work outward. Zookeeper -> Kafka brokers -> connect/Streams/SR -> clients ● One node(JMV instance) at a time! ● Upgrade binaries. ● Wait for URP to return to none!

7. 7 What not to do? ● Replace old brokers with new brokers, unless you have to. ● Upgrade multiple components at the same time ● Multiple changes at once. ● Start with unhealthy cluster. ● Rush the process ● Do not move on to next step with any URP!!!!!

8. Want to prolong an outage? Don’t have metric collection.

9. 9 JMXReleased September 2004

10. 10

11. 11

12. 12 Common Questions ● What tool to use? ● How often to poll the JMX interface? ● Will this cause performance issues? ● How long do I need to keep these metrics?

13. Apache Kafka defaults are suitable for single node deployments. Which to change?

14. 14 Replication factor=3 num.partitions num.network.threads= 8 num.io.threads Number of disks or 8 broker.rack auto.topics.create.enabled=false replica.selector.class: org.apache.kafka.common.replica.RackAwareReplicaSelector num.replica.fetchers=16 Brokers: JVM Heap

15. 15

16. 16 Clients: acks=ALL Application dependant: batch.size linger.ms delivery.timeout.ms client.rack Do you really need EOS? Topics: Retention.ms & retention.bytes min.insync.replicas

17. Logging - Can’t know where you’re going without knowing where you’ve been

18. 18 Each component should go-to its own log files. org.apache.log4j.RollingFileAppender is your friend use it! Without it you will fill up your logging disk and bad things will happen! You should plan to keep at least a few days of logs. Do not be afraid to turn on debug level logging. There is a JMX bean for this! No more need to restart brokers.

19. Quotas! Protect the cluster and the clients!

20. 20 Mandatory Quotas! Replication quota! This prevents a broker that’s recovering overwhelming the leaders! This will also prevent a rebalance from stealing all the cluster resources! It will save your butt at 3am! bin/kafka-configs … --alter --add-config 'leader.replication.throttled.rate=10000' --entity-type broker

21. 21 Two types of client quotas Bandwidth Bytes in/out Request based Everything in Kafka is a request

22. 22 Bandwidth quotas ● Easy to reason about ● Easy to implement. ● Easy to monitor ○ per-client metric to indicate throttle times ● Great way to capacity plan your cluster!

23. 23 Request quotas ● Added in KIP-124 ● Motivation was to limit clients from overwhelming the network threads and request threads ● defined as a percent of utilization of: ((num.io.threads + num.network.threads) * 100%) ● More difficult to reason about but very useful in environments where clients are concerned about latency.

24. 24 Storage Quotas also called retention retention.ms & retention.bytes If you’re not setting these BOTH on every single topic you’re asking for trouble.

25. 25 Suggestions: SET A QUOTA FOR ALL CLIENTS! Set a retention on all topics!

26. Single Cluster or Many Clusters?

27. 27 Answer: Many clusters! Bucket by SLA or Criticality. Easier maintenance. Easier tuning. Better monitoring. Safer! Why not? More sprawl It’s a balance.

28. 28 Single tenant Any SLA Multi-tenant but groups by application group or LOB Medium SLA Multi-tenant high SLA SLA

29. Thank you! @mr_mitchellh mitch@confluent.io cnfl.io/meetups cnfl.io/slackcnfl.io/blog

Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020

Similar to Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020 (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Learnings from the Field. Lessons from Working with Dozens of Small & Large Deployments (Mitchell Henderson, Confluent) Kafka Summit 2020