Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity. In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues – in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
Speakers: Gwen Shapira, Xavier Leaute (Confluence)
Gwen is a software engineer at Confluent working on core Apache Kafka. She has 15 years of experience working with code and customers to build scalable data architectures. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Xavier Leaute is One of the first engineers to Confluent team, Xavier is responsible for analytics infrastructure, including real-time analytics in KafkaStreams. He was previously a quantitative researcher at BlackRock. Prior to that, he held various research and analytics roles at Barclays Global Investors and MSCI.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2y2yPiS.
Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. Filmed at qconsf.com.
Colin McCabe is a Kafka committer at Confluent, working on the scalability and extensibility of Kafka. Previously, he worked on the Hadoop Distributed Filesystem and the Ceph Filesystem.
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
Developing Realtime Data Pipelines With Apache Kafka. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity. In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues – in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
Speakers: Gwen Shapira, Xavier Leaute (Confluence)
Gwen is a software engineer at Confluent working on core Apache Kafka. She has 15 years of experience working with code and customers to build scalable data architectures. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Xavier Leaute is One of the first engineers to Confluent team, Xavier is responsible for analytics infrastructure, including real-time analytics in KafkaStreams. He was previously a quantitative researcher at BlackRock. Prior to that, he held various research and analytics roles at Barclays Global Investors and MSCI.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2y2yPiS.
Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. Filmed at qconsf.com.
Colin McCabe is a Kafka committer at Confluent, working on the scalability and extensibility of Kafka. Previously, he worked on the Hadoop Distributed Filesystem and the Ceph Filesystem.
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
Developing Realtime Data Pipelines With Apache Kafka. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...confluent
We have been served well by Zookeeper over the years, but it is time for Kafka to stand on its own. This is a talk on the ongoing effort to replace the use of Zookeeper in Kafka: why we want to do it and how it will work. We will discuss the limitations we have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. This effort will not be completed over night, but we will discuss our progress, what work is remaining, and how contributors can help. (Note that I am proposing this as a joint talk with Colin McCabe, who is also a committer on the Apache Kafka project.)
Streaming in Practice - Putting Apache Kafka in Productionconfluent
This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
Confluent Cloud runs a modified version of Apache Kafka - redesigned to be cloud-native and deliver a serverless user experience. In this talk, we will discuss key improvements we've made to Kafka and how they contribute to Confluent Cloud availability, elasticity, and multi-tenancy. You'll learn about innovations that you can use on-prem, and everything you need to make the most of Confluent Cloud.
JDD2015: Make your world event driven - Krzysztof DębskiPROIDEA
MAKE YOUR WORLD EVENT DRIVEN
Just after you set up your first microservice you realize that the game has just started. You need to improve latency in your application and reduce unnecessary communication.
To make your architecture fully decoupled you need to embrace asynchronous communication. Good way to achieve that is to switch to Event Driven Architecture.
We will see how to use Kafka in your microservices. We will also cover some pitfalls you might face during using Kafka and how to deal with them.
After the talk you will know the toolset that are need to improve your microservice ecosystem.
Swift Install Workshop - OpenStack Conference Spring 2012Joe Arnold
OpenStack Swift is a highly-available distributed object storage
system which supports highly concurrent workloads. Swift is the
backbone behind Cloud Files, Rackspace's storage-as-a-service
offering.
In this workshop, which will be hosted by members of SwiftStack, Inc.,
we'll walk you through deployment and use of OpenStack Swift. We'll
begin by showing you how to install Swift from the ground up.
You'll learn:
- what you should know about Swift's architecture
- how to bootstrap a basic Swift installation
After that, we'll cover how to use Swift, including information on:
- creating accounts and users
- adding, removing, and managing data
- building applications on top of Swift
Bring your laptop (with virutalization extensions enabled in the BIOS)
and we will walk through setting up Swift in a virtual machine. We'll
also build an entire application on top of Swift to illustrate how to
use Swift as a storage service. This is a workshop you won't want to
miss!
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
Let’s be honest: Running a distributed stateful stream processor that is able to handle terabytes of state and tens of gigabytes of data per second while being highly available and correct (in an exactly-once sense) does not work without any planning, configuration and monitoring. While the Flink developer community tries to make everything as simple as possible, it is still important to be aware of all the requirements and implications In this talk, we will provide some insights into the greatest operations mysteries of Flink from a high-level perspective: - Capacity and resource planning: Understand the theoretical limits. - Memory and CPU configuration: Distribute resources according to your needs. - Setting up High Availability: Planning for failures. - Checkpointing and State Backends: Ensure correctness and fast recovery For each of the listed topics, we will introduce the concepts of Flink and provide some best practices we have learned over the past years supporting Flink users in production.
SignalFx engineer Rajiv Kurian's presentation on why we wrote our own Kafka consumer, the performance goals, and the performance gains achieved.
Download the slides to see animations showing hardware details. These slides were converged from Keynote to Powerpoint, so there may be some oddness with slide transitions!
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
Amazon EC2 provides a broad selection of instance types to deliver high performance for a diverse mix of applications. In this session, we overview the drivers of system performance and discuss in depth how Amazon EC2 instances deliver system performance while also providing elasticity and complete control over your infrastructure. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
This talk was presented at SRE NYC Meetup on August 16, 2017 at Squarespace HQ.
https://www.youtube.com/watch?v=UJ1QAKprVr4
As the engineering teams at Squarespace grow, we have been building more and more microservices. However, this has added operational strain as we try to shoehorn a growing, complex dynamic environment into our static data center infrastructure. We needed to rethink how we handle deployments, dependency management, resource allocation, monitoring, and alerting. Docker containerization and Kubernetes orchestration helps us tackle many of these problems, but the journey has been challenging. In this talk, we’ll discuss the challenges of running Kubernetes in a datacenter and how we switched to a more SLA-focused alert structure than per instance health with Prometheus and AlertManager.
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
Confluent Platform is supporting London Metal Exchange’s Kafka Centre of Excellence across a number of projects with the main objective to provide a reliable, resilient, scalable and overall efficient Kafka as a Service model to the teams across the entire London Metal Exchange estate.
Webinar Back to Basics 3 - Introduzione ai Replica SetMongoDB
Un set di repliche in MongoDB è un gruppo di processi che mantengono copie dei dati su diversi server di database. Assicurano ridondanza e disponibilità elevata e sono la base di tutte le distribuzioni in produzione di MongoDB.
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...confluent
We have been served well by Zookeeper over the years, but it is time for Kafka to stand on its own. This is a talk on the ongoing effort to replace the use of Zookeeper in Kafka: why we want to do it and how it will work. We will discuss the limitations we have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. This effort will not be completed over night, but we will discuss our progress, what work is remaining, and how contributors can help. (Note that I am proposing this as a joint talk with Colin McCabe, who is also a committer on the Apache Kafka project.)
Streaming in Practice - Putting Apache Kafka in Productionconfluent
This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
Confluent Cloud runs a modified version of Apache Kafka - redesigned to be cloud-native and deliver a serverless user experience. In this talk, we will discuss key improvements we've made to Kafka and how they contribute to Confluent Cloud availability, elasticity, and multi-tenancy. You'll learn about innovations that you can use on-prem, and everything you need to make the most of Confluent Cloud.
JDD2015: Make your world event driven - Krzysztof DębskiPROIDEA
MAKE YOUR WORLD EVENT DRIVEN
Just after you set up your first microservice you realize that the game has just started. You need to improve latency in your application and reduce unnecessary communication.
To make your architecture fully decoupled you need to embrace asynchronous communication. Good way to achieve that is to switch to Event Driven Architecture.
We will see how to use Kafka in your microservices. We will also cover some pitfalls you might face during using Kafka and how to deal with them.
After the talk you will know the toolset that are need to improve your microservice ecosystem.
Swift Install Workshop - OpenStack Conference Spring 2012Joe Arnold
OpenStack Swift is a highly-available distributed object storage
system which supports highly concurrent workloads. Swift is the
backbone behind Cloud Files, Rackspace's storage-as-a-service
offering.
In this workshop, which will be hosted by members of SwiftStack, Inc.,
we'll walk you through deployment and use of OpenStack Swift. We'll
begin by showing you how to install Swift from the ground up.
You'll learn:
- what you should know about Swift's architecture
- how to bootstrap a basic Swift installation
After that, we'll cover how to use Swift, including information on:
- creating accounts and users
- adding, removing, and managing data
- building applications on top of Swift
Bring your laptop (with virutalization extensions enabled in the BIOS)
and we will walk through setting up Swift in a virtual machine. We'll
also build an entire application on top of Swift to illustrate how to
use Swift as a storage service. This is a workshop you won't want to
miss!
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
Let’s be honest: Running a distributed stateful stream processor that is able to handle terabytes of state and tens of gigabytes of data per second while being highly available and correct (in an exactly-once sense) does not work without any planning, configuration and monitoring. While the Flink developer community tries to make everything as simple as possible, it is still important to be aware of all the requirements and implications In this talk, we will provide some insights into the greatest operations mysteries of Flink from a high-level perspective: - Capacity and resource planning: Understand the theoretical limits. - Memory and CPU configuration: Distribute resources according to your needs. - Setting up High Availability: Planning for failures. - Checkpointing and State Backends: Ensure correctness and fast recovery For each of the listed topics, we will introduce the concepts of Flink and provide some best practices we have learned over the past years supporting Flink users in production.
SignalFx engineer Rajiv Kurian's presentation on why we wrote our own Kafka consumer, the performance goals, and the performance gains achieved.
Download the slides to see animations showing hardware details. These slides were converged from Keynote to Powerpoint, so there may be some oddness with slide transitions!
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
Amazon EC2 provides a broad selection of instance types to deliver high performance for a diverse mix of applications. In this session, we overview the drivers of system performance and discuss in depth how Amazon EC2 instances deliver system performance while also providing elasticity and complete control over your infrastructure. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
This talk was presented at SRE NYC Meetup on August 16, 2017 at Squarespace HQ.
https://www.youtube.com/watch?v=UJ1QAKprVr4
As the engineering teams at Squarespace grow, we have been building more and more microservices. However, this has added operational strain as we try to shoehorn a growing, complex dynamic environment into our static data center infrastructure. We needed to rethink how we handle deployments, dependency management, resource allocation, monitoring, and alerting. Docker containerization and Kubernetes orchestration helps us tackle many of these problems, but the journey has been challenging. In this talk, we’ll discuss the challenges of running Kubernetes in a datacenter and how we switched to a more SLA-focused alert structure than per instance health with Prometheus and AlertManager.
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
Confluent Platform is supporting London Metal Exchange’s Kafka Centre of Excellence across a number of projects with the main objective to provide a reliable, resilient, scalable and overall efficient Kafka as a Service model to the teams across the entire London Metal Exchange estate.
Webinar Back to Basics 3 - Introduzione ai Replica SetMongoDB
Un set di repliche in MongoDB è un gruppo di processi che mantengono copie dei dati su diversi server di database. Assicurano ridondanza e disponibilità elevata e sono la base di tutte le distribuzioni in produzione di MongoDB.
Similar to Citi TechTalk Session 2: Kafka Deep Dive (20)
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
Transforming applications built with traditional messaging solutions such as TIBCO, MQ and Solace to be scalable, reliable and ready for the move to cloud
How can applications built with traditional messaging technologies like TIBCO, Solace and IBM MQ be modernised and be made cloud ready? What are the advantages to Event Streaming approaches to pub/sub vs traditional message queues? What are the strengeths and weaknesses of both approaches, and what use cases and requirements are actually a better fit for messaging than Kafka?
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
5. developer.confluent.io
Record Schema
Record =>
timestamp
key
value
Headers
Event Stream
key/
value
Bytes
Area Description
0
Magic
Byte
Confluent serialization format version number;
currently always 0.
1-4
Schema
ID
4-byte schema ID as returned by Schema
Registry.
5-... Data Serialized data for the specified schema format.
13. developer.confluent.io
Records Accumulated Into Record Batches
Record =>
timestamp
key
value
Headers
RecordBatch =>
…
…
attributes: int16
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
4: zstd
…
…
records: [Records]
Compression
Record 1
Record 2
Record n
…
14. developer.confluent.io
Record Batch 1
Record Batch 2
Record Batch n
.
.
.
Produce Request => acks [topic_data]
acks => INT16
topic_data => topic [data]
topic => STRING
data => partition record_set
partition => INT32
topic_data => topic [data]
topic => STRING
data => partition record_set
partition => INT32
record_set => BYTES
.
.
.
record_set => BYTES
Record Batches Drained Into Produce Requests
Record =>
timestamp
key
value
Headers
RecordBatch =>
…
…
attributes: int16
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
4: zstd
…
…
records: [Records]
Compression
Record 1
Record 2
Record n
…
linger.ms
batch.size
23. developer.confluent.io
Producer
23
acks=1
enable.idempotence=false
max.request.size=1MB
retries=MAX_INT
delivery.timeout.ms=2min
max.in.flight.requests.
per.connection=5
Serializer
● Retrieves and
caches schemas
from Schema
Registry
Partitioner
● Java client uses
murmur2 for
hashing
● If key not
provided
performs round
robin
● If keys
unbalanced it will
overload one
leader
● Upcoming
changes in KIP-
794
Sender thread
● Batches grouped
by destination
broker into
requests
● Multiple batches
to different
partitions
potentially in the
same producer
request
Record accumulator
● Buffer per partition,
seldom used partitions
may not achieve high
batching
● If many producers are in
the same JVM, memory
and GC could become
important
● Sticky partitioner could
be used to increase
batches in the case of
round robin (KIP-
408/KIP-794)
Compression
● At batch level
● Allows faster transfer to
the broker
● Reduces the inter
broker replication load
● Reduces page cache &
disk space utilization on
brokers
● Gzip is more CPU
intensive, Snappy is
lighter, LZ4/ZStd are a
good balance*
compress.type=none
batch.size=16KB
buffer.memory=32MB
max.block.ms=60s
record batch request
batch.size=16KB
linger.ms=0
buffer.memory=32MB
max.block.ms=60s
compress.type=none
33. developer.confluent.io
Group Startup: Step 1 - Find Group Coordinator
Broker
Group Coordinator
__consumer_offsets
Broker
__consumer_offsets
Broker
__consumer_offsets
Consumer Group
Consumer 1
group.id=1
Consumer 2
group.id=1
Sent to any broker
34. developer.confluent.io
Group Startup: Step 2 - Members Join
Broker
Group Coordinator
__consumer_offsets
Broker
topic-a
Broker
topic-a
Consumer Group
Consumer 1
group.id=1
group leader
Consumer 2
group.id=1
35. developer.confluent.io
Group Startup: Step 3 - Partitions Assigned
Consumer Group
Consumer 1
group.id=1
group leader
Consumer 2
group.id=1
Broker
Group Coordinator
__consumer_offsets
Broker
topic-a
Broker
topic-a
39. developer.confluent.io
Consumer Group
Consumer 1
group.id=1
poll( )
Consumer 2
group.id=1
poll( )
Determining Starting Offset to Consume
Broker
Group Coordinator
__consumer_offsets
Broker
topic-a
Broker
topic-a
If no committed
offset is available,
auto.offset.reset
value determines
starting offset
40. developer.confluent.io
Group Coordinator Failover
Broker
Group Coordinator
__consumer_offsets
Broker
Group Coordinator
__consumer_offsets
Broker
Group Coordinator
__consumer_offsets
Consumer Group
Consumer 1
group.id=1
Consumer 2
group.id=1
Group coordinator
fails over to
__consumer_offsets
new partition leader
41. developer.confluent.io
Consumer Group
Consumer 1
group.id=1
poll( )
HeartbeatThread
Consumer Group Rebalance Triggers
topic_a
P0
P1
P2
P3
topic_b
P0
P1
Topic added or deleted that matches subscription
consumer.subscribe(Pattern.compile("topic_.*");
# of partitions
increases, e.g.
from 3 to 4
Consumer instance
joins or leaves group,
e.g. heartbeat timeout
Consumer 2
group.id=1
poll( )
HeartbeatThread
Consumer 3
group.id=1
poll( )
HeartbeatThread
43. developer.confluent.io
Stop-the-world Rebalance
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0)
(p1)
(p2)
1
2 3
4
Consumers:
1) Revoke current partition
assignment and clean up
the partition states
2)Join the group
3)Sync with the group
4)Receive new partition
assignments
a)Build the partition
state
b)Resume consumption
44. developer.confluent.io
Stop-the-world Problem 1 -
Rebuilding the State
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0)
(p1)
(p2)
Since partitions p0 and p1
are assigned to the same
consumer instance,
rebuilding the state is
unnecessary
45. developer.confluent.io
Stop-the-world Problem 2 -
Paused Processing
Processing
paused
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0)
(p1)
(p2)
Processing pauses for all
subscribed partitions for
the duration of the
rebalance
● The pausing for p0 and
p1 is unnecessary
46. developer.confluent.io
Avoid Needless State Rebuild with StickyAssignor
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0, p2)
(p1)
(p2)
Processing
paused
Partition reassigned
● State cleanup and build
Assigned partitions self-revoked
● Clean up state
47. developer.confluent.io
Avoid Processing Pause with CooperativeStickyAssignor
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
(p0)
(p1)
(p2)
p0, p1
consumption
continues
1st rebalance
Synchronization
barrier
SyncGroupResponse revokes p2 assignment
World does not stop!
p2
revoked
49. developer.confluent.io
Consumer Group
Consumer 1
group.id=1
group.instance.id=1
HeartbeatThread
Avoid Rebalance with Static
Group Membership
topic_a
P0
P1
P2
P3
topic_b
P0
P1
Consumer 2
group.id=1
group.instance.id=2
HeartbeatThread
Consumer 3
group.id=1
group.instance.id=3
HeartbeatThread
Establishes
static group
membership
Members do not
send LeaveGroup
request when
they are stopped
Group
Coordinator
No rebalance if
member rejoins prior
to session.timeout.ms
51. developer.confluent.io
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 P7
Broker
balances topic
($10)
transfers topic
A->B
Broker
balances topic
$10
P0
P0
P1
Why Are Transactions Needed?
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
1
4
Alice pays
Bob $10
1. Event is fetched by the
consumer
2. Debit event is written
3. Credit event is written
4. Transfer event offset is
committed
Event is written to
transfers topic
Alice, ($10)
Bob, $10
2
3
Downstream App
Consumer API
52. developer.confluent.io
Atomic Transaction
Broker
Consumer Group
Coordinator
__consumer_offsets topic
P7
Broker
balances topic
($10) A
transfers topic
A->B
Broker
balances topic
P0
P0
P1
Kafka Transactions Deliver Exactly Once
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
1
4
Alice, ($10)
Bob, $10
2
3
Downstream App
Consumer API
Transaction is only
committed if all
parts succeed
Is aborted if
any part fails
Using transactions with
Kafka Streams is quite
simple:
1) Set processing.guarantee
to exactly_once_v2 in
StreamsConfig
2) Set isolation.level to
read_committed in the
Consumer configuration
54. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 P7
Broker
balances topic
($10)($10)
transfers topic
A->B
Broker
balances topic
$10
P0
P0
P1
Downstream App
Consumer API
System Failure Without Transactions
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
Funds Transfer App
Consumer API
Producer API
1. Event fetched by consumer
2. Alice’s account debited
Application instance fails
without committing offset and
new application instance starts
1. Event fetched by consumer
2. Alice’s account is debited a
second time
3. Bob’s account is credited
4. Consumer offset committed
5. Two debit events processed
by downstream consumer
1
Alice, ($10)
2
Alice, ($10)
Bob, $10
2
5
1
3
4
55. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
P7
Broker
balances topic
($10)
transfers topic
A->B
Broker
balances topic
P0
P0
P1
Broker
Transaction Coordinator
__transaction_state topic
System Failure with Transactions
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
transactional.id='fund-tr'
Coordinates txn and
persists txn metadata
2
1
Alice, ($10)
4
3
Downstream App
Consumer API
'fund-tr'=>pid e0 P0
1. Requests txn ID, is returned
PID and txn epoch
2. Event fetched by consumer
3. Notifies coordinator of
partition being written to
4. Alice’s account debited
isolation.level=
'read_committed'
56. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
P7
Broker
balances topic
($10) A
transfers topic
A->B
Broker
balances topic
P0
P0
P1
System Failure with Transactions
Funds Transfer App
Consumer API
Producer API
Broker
Transaction Coordinator
__transaction_state topic
Alice
Bob
transfer $10
Alice → Bob
transactional.id='fund-tr'
2
1
Alice, ($10)
4
3
Downstream App
Consumer API
1. Requests txn ID, is returned PID
and txn epoch
2. Event fetched by consumer
3. Notifies coordinator of partition
being written to
4. Alice’s account debited
Application instance fails
without committing offset and
new application instance starts
1. New instance requests txn ID
a. Coordinator fences
previous instance by
aborting pending txn and
bumping up epoch
2. Downstream consumer with
read_committed discards
aborted events
Funds Transfer App
Consumer API
Producer API 1
transactional.id='fund-tr'
2
isolation.level=
'read_committed'
'fund-tr'=>pid e0 P0 A pid e1
57. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 C P7
Broker
balances topic
($10) C
transfers topic
A->B
Broker
balances topic
$10 C
P0
P0
P1
Broker
Transaction Coordinator
__transaction_state topic
'fund-tr'=>pid e0 P0 P1 P7 C
System with Successful Committed Transaction
Funds Transfer App
Consumer API
Producer API
Downstream App
Consumer API
Alice
Bob
transfer $10
Alice → Bob
1. Requests txn ID and
assigned PID and epoch
2. Event fetched by consumer
3. Notifies coordinator of
partition being written to
4. Alice’s account debited
5. Bob’s account credited
6. Consumer offset committed
7. Notify coordinator that
transaction is complete
8. Coordinator writes commit
markers to p0, p1, p7
9. Downstream consumer with
read_committed processes
committed events
2
1
Alice, ($10)
4
3
Bob, $10
5
6
7
9
transactional.id='fund-tr'
isolation.level=
'read_committed'
8
58. developer.confluent.io
Consuming Transactions with read_committed
● Leader maintains last stable offset
(LSO), the smallest offset of any
open transaction
● Fetch response includes
○ only records up to LSO
○ metadata for skipping aborted
records
Broker
balances topic
57
pid 1
($10)
58
pid 2
($8)
60
pid 2
$8
61
pid 1
A
62
pid 1
($7)
63
pid 2
C
64
pid 2
($9)
65
pid 1
$7
66
pid 2
$9
67
pid 1
C
LSO
Consumer able to
read these records
Offset 57 discarded
by consumer
fetch
response
HW
59. developer.confluent.io
A
Interacting with External Systems
Atomic writes to Kafka and
external systems are not
supported
● Instead, write the
transactional output to a
Kafka topic first
● Rely on idempotence to
propagate the data from
the output topic to the
external system
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 C P7
Broker
balances topic
($10) C
transfers topic
A->B
Broker
balances topic
$10 C
P0
P0
P1
Broker
Transaction Coordinator
__transaction_state topic
txn id=>pid e0 P0 P1 P7 C
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
transactional.id='fund-tr'
External System
Kafka
Connect
'fund-tr'=>pid e0 P0 P1 P7 C