Does your organization struggle with updating of its Kafka Streams application? Releasing a new version of a Kafka Streams application can be challenging, especially if its state has to be preserved between releases. Consider these best-practices and architectural ideas to make this process smoother and improve your release process.
Having experienced accidental removal of change-log topics and needing to expand partitions, it is much easier to handle with some planning. With the proper planning, you can achieve easier application upgrades.
Key take-aways from the session include:
* How do minimize the rebuilding of the state-stores.
* How to change stream topologies without affecting the existing state stores.
* What you can do when you absolutely need to increase the number of partitions within your application.
* How to leveraging schemas for application releases.
* Measures to prevent data corruption, especially if Kafka is not only your system of record but also your source of truth.
* Techniques to support rolling back an application.
* The advantages of splitting apart a Kafka Streams application into multiple applications.
Know Your Topics – A Deep Dive on Topic IDs with KIP-516 with Justine Olshan ...HostedbyConfluent
When Apache Kafka® was first created, topics were identified solely by topic name—but this isn't always sufficient. Find out in this talk why the Kafka community decided to add topic IDs to Kafka as a part of KIP-516. Learn which new features related to topic IDs have been rolled out, and learn about some of the benefits that are still on the way.
We'll be covering new features in Kafka versions 2.8, 3.0, and 3.1 and how to upgrade to using topic IDs. We'll see how topic IDs are used in KRaft mode and tiered storage, and take a tour through some of the internals and the thought processes around these changes—as well as some of the future plans for topic IDs.
Kafka Streams State Stores Being Persistentconfluent
Being Persistent: A Look Into Kafka Streams State Stores, Neil Buesing, Principal Solutions Architect, Rill Data
Meetup link: https://www.meetup.com/TwinCities-Apache-Kafka/events/284002062/
Dennis Wittekind, Confluent, Senior Customer Success Engineer
Perhaps you have heard of Kafka Connect and think it would be a great fit in your application's architecture, but you like to know how things work before you propose them to your team? Perhaps you know enough Connect to be dangerous, but you haven't had the time to really understand all the moving pieces? This meetup talk is for you! We'll briefly introduce Connect to the uninitiated, and then jump in to underlying concepts and considerations you should make when running Connect in production! We'll even run a live demo! What could go wrong!?
https://www.meetup.com/Saint-Louis-Kafka-meetup-group/events/272687113/
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
Kafka, Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform (Kafka Core + Kafka Connect + Kafka Streams) for building streaming data pipelines and streaming data applications.
This talk, that I gave at the Chicago Java Users Group (CJUG) on June 8th 2017, is mainly focusing on Kafka Streams, a lightweight open source Java library for building stream processing applications on top of Kafka using Kafka topics as input/output.
You will learn more about the following:
1. Apache Kafka: a Streaming Data Platform
2. Overview of Kafka Streams: Before Kafka Streams? What is Kafka Streams? Why Kafka Streams? What are Kafka Streams key concepts? Kafka Streams APIs and code examples?
3. Writing, deploying and running your first Kafka Streams application
4. Code and Demo of an end-to-end Kafka-based Streaming Data Application
5. Where to go from here?
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
Know Your Topics – A Deep Dive on Topic IDs with KIP-516 with Justine Olshan ...HostedbyConfluent
When Apache Kafka® was first created, topics were identified solely by topic name—but this isn't always sufficient. Find out in this talk why the Kafka community decided to add topic IDs to Kafka as a part of KIP-516. Learn which new features related to topic IDs have been rolled out, and learn about some of the benefits that are still on the way.
We'll be covering new features in Kafka versions 2.8, 3.0, and 3.1 and how to upgrade to using topic IDs. We'll see how topic IDs are used in KRaft mode and tiered storage, and take a tour through some of the internals and the thought processes around these changes—as well as some of the future plans for topic IDs.
Kafka Streams State Stores Being Persistentconfluent
Being Persistent: A Look Into Kafka Streams State Stores, Neil Buesing, Principal Solutions Architect, Rill Data
Meetup link: https://www.meetup.com/TwinCities-Apache-Kafka/events/284002062/
Dennis Wittekind, Confluent, Senior Customer Success Engineer
Perhaps you have heard of Kafka Connect and think it would be a great fit in your application's architecture, but you like to know how things work before you propose them to your team? Perhaps you know enough Connect to be dangerous, but you haven't had the time to really understand all the moving pieces? This meetup talk is for you! We'll briefly introduce Connect to the uninitiated, and then jump in to underlying concepts and considerations you should make when running Connect in production! We'll even run a live demo! What could go wrong!?
https://www.meetup.com/Saint-Louis-Kafka-meetup-group/events/272687113/
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
Kafka, Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform (Kafka Core + Kafka Connect + Kafka Streams) for building streaming data pipelines and streaming data applications.
This talk, that I gave at the Chicago Java Users Group (CJUG) on June 8th 2017, is mainly focusing on Kafka Streams, a lightweight open source Java library for building stream processing applications on top of Kafka using Kafka topics as input/output.
You will learn more about the following:
1. Apache Kafka: a Streaming Data Platform
2. Overview of Kafka Streams: Before Kafka Streams? What is Kafka Streams? Why Kafka Streams? What are Kafka Streams key concepts? Kafka Streams APIs and code examples?
3. Writing, deploying and running your first Kafka Streams application
4. Code and Demo of an end-to-end Kafka-based Streaming Data Application
5. Where to go from here?
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Getting up to speed with Kafka Connect: from the basics to the latest feature...HostedbyConfluent
"Kafka Connect is an ideal tool for building data pipelines. It is both reliable and scalable, with a pluggable interface that lets you flow data between Kafka and any system you need. A Connect pipeline is made up of many different components, and understanding how each of these interact together is essential, even for the simplest setup.
In this talk we will introduce the Connect components, from connectors, to transformations to the runtime itself. We will also share some of the new capabilities and best practices that you should be aware of to help you run and manage connectors effectively.
Finally we will talk about some different open source projects that have been built on top of Connect that can help you get the most out of the framework."
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. Notably, we introduce Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases.
Testing Kafka containers with Testcontainers: There and back again with Vikto...HostedbyConfluent
Did you ever wonder how your applications will behave once deployed to production?
Sure, you have unit tests, and your test coverage is sky-high.
However, you might depend on external resources like Apache Kafka® or Kafka Connect connectors, kSQL, etc.
Moreover, without proper integration testing, you cannot be confident about the stability of your production environment.
In this session, Viktor talks about Testcontainers, a library (that was initially created for JVM, now exists in many languages) that provides lightweight, disposable instances of shared databases, clusters, and anything else that can run in a Docker container!
After a rapid-fire introduction to the core concepts of the containers how they can help improve integration testing, we’re going to zoom in to supported out-of-the-box containers. You will learn how to test the complex stacks like Apache Kafka®-based streaming platform (or even Confluent Cloud) and other components.
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Apache Kafka without Zookeeper is now production ready! This talk is about how you can run without ZooKeeper, and why you should.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
(Gwen Shapira + Matthias J. Sax, Confluent) Kafka Summit SF 2018
Kafka Streams, Apache Kafka’s stream processing library, allows developers to build sophisticated stateful stream processing applications which you can deploy in an environment of your choice. Kafka Streams is not only scalable, but fully elastic allowing for dynamic scale-in and scale-out as the library handles state migration transparently in the background. By running Kafka Streams applications on Kubernetes, you will be able to use Kubernetes powerful control plane to standardize and simplify the application management—from deployment to dynamic scaling.
In this technical deep dive, we’ll explain the internals of dynamic scaling and state migration in Kafka Streams. We’ll then show, with a live demo, how a Kafka Streams application can run in a Docker container on Kubernetes and the dynamic scaling of an application running in Kubernetes.
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
Kafka Streams is the popular stream processing component of Apache Kafka®. One of its best features is stateful operations. Kafka Streams works hard to ensure stateful operations can scale horizontally and survive failures, but doing so takes time. Kafka Streams offers the concept of ""standby-tasks,"" allowing for near-zero downtime failover, but surprisingly this feature still isn't widely used. The could be for various reasons, from lack of awareness to needing additional resources.
This presentation will cover how standby tasks work and how they're enabled. Additionally, I'll cover the work done in KIP-441 that enables faster scaling out for stateful tasks and provides more balanced stateful assignments. I'll also dive into the consumer rebalance protocol improvements that enable KIP-441 to be effective.
Attendees of this presentation will walk away understanding how and when to use standby tasks, leverage the improvements from KIP-441, and have a deeper understanding of how Kafka Streams works with state.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
High level introduction to Kafka Connect and Kafka Streams, two components of the Apache Kafka open source framework. See the concepts, architecture and features.
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022HostedbyConfluent
If you were to ask any developer, ""what's a schema and where is it used?"" Most likely, you'd get an answer involving a relational database. The truth is the domain objects used in applications represent a contract, an implied schema, whether developers choose to acknowledge them or not. But even if you recognize the need for a formal schema, what's the best way to manage them?
This presentation will contain some theory and primarily practical application for schemas with Schema Registry. I'll briefly explain what a schema is and how it's very relevant to any application working with Kafka today. It will go into the practical, introducing Schema Registry, describing how it works and how developers can leverage it to provide schemas across an organization. The discussion will cover working with Schema Registry from the command line, how to leverage it with Kafka clients, and the supported serialization formats. Some established build tools that make life easier for the Kafka developer will also be covered.
Attendees will walk away with knowledge of Schema Registry and a solid understanding of how it works, how to integrate them into Kafka clients. They'll also learn enough about the supported serialization frameworks to start implementing schemas right away in their Kafka development efforts.
Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction with El...HostedbyConfluent
According to Gartner Forecasts, the worldwide end-user spending on public cloud services is forecast to grow by 23% in 2021, to a total of $332B.
Kafka is no different in that matter. Organizations all over the world are using Kafka as their main stream-processing platform for collecting, processing, and analyzing data at scale. As organizations evolve and grow, data rates grow too, as does the consequent Kafka deployment cost.
So what can we do? -- In this talk, we are going to address exactly this problem.
We will understand what we are paying for when running a self-hosted Kafka deployment, where we can cut costs, how to develop an economic mindset, and what we can proactively do to reduce our cloud infrastructure cost.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Automate Your Kafka Cluster with Kubernetes Custom Resources confluent
(Sam Obeid, Shopify) Kafka Summit SF 2018
At Shopify we manage multiple Apache Kafka clusters in multiple locations in Google’s cloud platform. We deploy our Kafka clusters as Kubernetes StatefulSets, and we use other K8s workloads to implement different tasks. Automating critical and repetitive operational tasks is one of our top priorities.
In this talk we’ll discuss how we leveraged Kubernetes Custom Resources and Controllers to automate some of the key cluster operational tasks, to detect clusters configuration changes and react to these changes with required actions. We will go through actual examples we implemented at Shopify, how we solved the problem of cluster discovery and how we automated topics creation across different clusters with zero human intervention and safety controls.
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
Over the last few years, we have been working on removing the dependency on ZooKeeper from Apache Kafka®. Instead of using an external system to store metadata, Kafka can now manage its own metadata. This new mode of operation is called Kafka Raft mode, or ""KRaft"" for short. It has many performance and scalability benefits.
This talk will discuss our efforts to get KRaft mode production-ready. We will talk about the old and new architectures, and how we adapted features to work in both worlds. We will also talk about our experiences with testing and deploying the new software. Finally, we'll talk about what's planned for the future.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
Talk 1 : Evolution of the GoPro's data platform
In this talk, we will share GoPro’s experiences in building Data Analytics Cluster in Cloud. We will discuss: evolution of data platform from fixed-size Hadoop clusters to Cloud-based Spark Cluster with Centralized Hive Metastore +S3: Cost Benefits and DevOp Impact; Configurable, spark-based batch Ingestion/ETL framework;
Migration Streaming framework to Cloud + S3;
Analytics metrics delivery with Slack integration;
BedRock: Data Platform Management, Visualization & Self-Service Portal
Visualizing Machine learning Features via Google Facets + Spark
Speakers: Chester Chen
Chester Chen is the Head of Data Science & Engineering, GoPro. Previously, he was the Director of Engineering at Alpine Data Lab.
David Winters
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka data ingestion pipeline. Previously He worked at Apple & Splice Machines.
Hao Zou
Hao is a Senior big data engineer at Data Science and Engineering team. Previously He worked as Alpine Data Labs and Pivotal
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Getting up to speed with Kafka Connect: from the basics to the latest feature...HostedbyConfluent
"Kafka Connect is an ideal tool for building data pipelines. It is both reliable and scalable, with a pluggable interface that lets you flow data between Kafka and any system you need. A Connect pipeline is made up of many different components, and understanding how each of these interact together is essential, even for the simplest setup.
In this talk we will introduce the Connect components, from connectors, to transformations to the runtime itself. We will also share some of the new capabilities and best practices that you should be aware of to help you run and manage connectors effectively.
Finally we will talk about some different open source projects that have been built on top of Connect that can help you get the most out of the framework."
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. Notably, we introduce Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases.
Testing Kafka containers with Testcontainers: There and back again with Vikto...HostedbyConfluent
Did you ever wonder how your applications will behave once deployed to production?
Sure, you have unit tests, and your test coverage is sky-high.
However, you might depend on external resources like Apache Kafka® or Kafka Connect connectors, kSQL, etc.
Moreover, without proper integration testing, you cannot be confident about the stability of your production environment.
In this session, Viktor talks about Testcontainers, a library (that was initially created for JVM, now exists in many languages) that provides lightweight, disposable instances of shared databases, clusters, and anything else that can run in a Docker container!
After a rapid-fire introduction to the core concepts of the containers how they can help improve integration testing, we’re going to zoom in to supported out-of-the-box containers. You will learn how to test the complex stacks like Apache Kafka®-based streaming platform (or even Confluent Cloud) and other components.
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Apache Kafka without Zookeeper is now production ready! This talk is about how you can run without ZooKeeper, and why you should.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
(Gwen Shapira + Matthias J. Sax, Confluent) Kafka Summit SF 2018
Kafka Streams, Apache Kafka’s stream processing library, allows developers to build sophisticated stateful stream processing applications which you can deploy in an environment of your choice. Kafka Streams is not only scalable, but fully elastic allowing for dynamic scale-in and scale-out as the library handles state migration transparently in the background. By running Kafka Streams applications on Kubernetes, you will be able to use Kubernetes powerful control plane to standardize and simplify the application management—from deployment to dynamic scaling.
In this technical deep dive, we’ll explain the internals of dynamic scaling and state migration in Kafka Streams. We’ll then show, with a live demo, how a Kafka Streams application can run in a Docker container on Kubernetes and the dynamic scaling of an application running in Kubernetes.
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
Kafka Streams is the popular stream processing component of Apache Kafka®. One of its best features is stateful operations. Kafka Streams works hard to ensure stateful operations can scale horizontally and survive failures, but doing so takes time. Kafka Streams offers the concept of ""standby-tasks,"" allowing for near-zero downtime failover, but surprisingly this feature still isn't widely used. The could be for various reasons, from lack of awareness to needing additional resources.
This presentation will cover how standby tasks work and how they're enabled. Additionally, I'll cover the work done in KIP-441 that enables faster scaling out for stateful tasks and provides more balanced stateful assignments. I'll also dive into the consumer rebalance protocol improvements that enable KIP-441 to be effective.
Attendees of this presentation will walk away understanding how and when to use standby tasks, leverage the improvements from KIP-441, and have a deeper understanding of how Kafka Streams works with state.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
High level introduction to Kafka Connect and Kafka Streams, two components of the Apache Kafka open source framework. See the concepts, architecture and features.
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022HostedbyConfluent
If you were to ask any developer, ""what's a schema and where is it used?"" Most likely, you'd get an answer involving a relational database. The truth is the domain objects used in applications represent a contract, an implied schema, whether developers choose to acknowledge them or not. But even if you recognize the need for a formal schema, what's the best way to manage them?
This presentation will contain some theory and primarily practical application for schemas with Schema Registry. I'll briefly explain what a schema is and how it's very relevant to any application working with Kafka today. It will go into the practical, introducing Schema Registry, describing how it works and how developers can leverage it to provide schemas across an organization. The discussion will cover working with Schema Registry from the command line, how to leverage it with Kafka clients, and the supported serialization formats. Some established build tools that make life easier for the Kafka developer will also be covered.
Attendees will walk away with knowledge of Schema Registry and a solid understanding of how it works, how to integrate them into Kafka clients. They'll also learn enough about the supported serialization frameworks to start implementing schemas right away in their Kafka development efforts.
Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction with El...HostedbyConfluent
According to Gartner Forecasts, the worldwide end-user spending on public cloud services is forecast to grow by 23% in 2021, to a total of $332B.
Kafka is no different in that matter. Organizations all over the world are using Kafka as their main stream-processing platform for collecting, processing, and analyzing data at scale. As organizations evolve and grow, data rates grow too, as does the consequent Kafka deployment cost.
So what can we do? -- In this talk, we are going to address exactly this problem.
We will understand what we are paying for when running a self-hosted Kafka deployment, where we can cut costs, how to develop an economic mindset, and what we can proactively do to reduce our cloud infrastructure cost.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Automate Your Kafka Cluster with Kubernetes Custom Resources confluent
(Sam Obeid, Shopify) Kafka Summit SF 2018
At Shopify we manage multiple Apache Kafka clusters in multiple locations in Google’s cloud platform. We deploy our Kafka clusters as Kubernetes StatefulSets, and we use other K8s workloads to implement different tasks. Automating critical and repetitive operational tasks is one of our top priorities.
In this talk we’ll discuss how we leveraged Kubernetes Custom Resources and Controllers to automate some of the key cluster operational tasks, to detect clusters configuration changes and react to these changes with required actions. We will go through actual examples we implemented at Shopify, how we solved the problem of cluster discovery and how we automated topics creation across different clusters with zero human intervention and safety controls.
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
Over the last few years, we have been working on removing the dependency on ZooKeeper from Apache Kafka®. Instead of using an external system to store metadata, Kafka can now manage its own metadata. This new mode of operation is called Kafka Raft mode, or ""KRaft"" for short. It has many performance and scalability benefits.
This talk will discuss our efforts to get KRaft mode production-ready. We will talk about the old and new architectures, and how we adapted features to work in both worlds. We will also talk about our experiences with testing and deploying the new software. Finally, we'll talk about what's planned for the future.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
Talk 1 : Evolution of the GoPro's data platform
In this talk, we will share GoPro’s experiences in building Data Analytics Cluster in Cloud. We will discuss: evolution of data platform from fixed-size Hadoop clusters to Cloud-based Spark Cluster with Centralized Hive Metastore +S3: Cost Benefits and DevOp Impact; Configurable, spark-based batch Ingestion/ETL framework;
Migration Streaming framework to Cloud + S3;
Analytics metrics delivery with Slack integration;
BedRock: Data Platform Management, Visualization & Self-Service Portal
Visualizing Machine learning Features via Google Facets + Spark
Speakers: Chester Chen
Chester Chen is the Head of Data Science & Engineering, GoPro. Previously, he was the Director of Engineering at Alpine Data Lab.
David Winters
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka data ingestion pipeline. Previously He worked at Apple & Splice Machines.
Hao Zou
Hao is a Senior big data engineer at Data Science and Engineering team. Previously He worked as Alpine Data Labs and Pivotal
[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...Insight Technology, Inc.
[db tech showcase Tokyo 2018] #dbts2018 #B31
『1,2,3 and Done! 3 easy ways to migrate to the cloud!』
Data Intensity - Director of Innovation Francisco Munoz Alvarez 氏
Access Data from XPages with the Relational ControlsTeamstudio
Did you know that Domino and XPages allows for the easy access of relational data? These exciting capabilities in the Extension Library can greatly enhance the capability of your applications and allow access to information beyond Domino. Howard and Paul will discuss what you need to get started, what controls allow access to relational data, and the new @Functions available to incorporate relational data in your Server Side JavaScript programming.
Data stage Online Training is Offering at Glory IT Technologies. We have Certified Working Professionals on this Modules. They trained so many Global Students, We also Provides Corporate Training & Job/Project Support Services to data stage . We are Only Institute Delivering Best Online Training Services to this Module.
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
Presentation used by Lucas Jellema and Marco Gralike during the AMIS Oracle Database 12c Launch event on Monday the 15th of July 2013 (much thanks to Tom Kyte, Oracle, for being allowed to use some of his material)
M.
Oracle DataGuard Online Training in USA | INDIAXoom Trainings
Xoom Trainings providing Best Oracle DataGuard Online Training with complete tutorial by 10 years experienced professionals worldwide
For More online training Demo Please Reach the below link:
https://www.youtube.com/watch?v=2zXZPh4agwE
For More Information please follow the below link:
http://www.xoomtrainings.com/course/oracle-dataguard
For General Queries Email us at sales@xoomtrainings.com or +1-610-686-8077
AMIS organiseerde op maandagavond 15 juli het seminar ‘Oracle database 12c revealed’. Deze avond bood AMIS Oracle professionals de eerste mogelijkheid om de vernieuwingen in Oracle database 12c in actie te zien! De AMIS specialisten die meer dan een jaar bèta testen hebben uitgevoerd lieten zien wat er nieuw is en hoe we dat de komende jaren gaan inzetten!
Deze presentatie is deze avond gegeven als een plenaire sessie!
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
Migrating an Oracle database to Postgres is never an automated operation. And it rarely (never?) involve just the database. Experience brought us to develop an agile methodology for the migration process, involving schema migration, data import, migration of procedures and queries up to the generation of unit tests for QA.
Pitfalls, technologies and main migration opportunities will be outlined, focusing on the reduction of total costs of ownership and management of a database solution in the middle-long term (without reducing quality and business continuity requirements).
Level Up Your Integration Testing With TestcontainersVMware Tanzu
Traditional approaches to integration testing—using shared, local, or in-memory databases—fall short for today's modern developer.
Developers today are building cloud native distributed microservices and taking advantage of a rich variety of backing services. This explosion of applications and backing services introduces new challenges in creating the necessary environments for integration testing. To be useful and effective, these environments must be easy to create and they must resemble production as closely as possible. New solutions are needed to make this need a reality.
Enter Testcontainers!
Testcontainers is a Java library that supports JUnit tests and makes it incredibly easy to create lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
In this talk, you will learn when and how to use Testcontainers. We will cover the fundamentals and walk through a step-by-step example using a Spring Boot application that we build from scratch. As a bonus, we'll highlight some new features in Spring Boot 3.0 along the way!
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way.
Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks.
Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case.
Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."
"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics.
Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
"Trendyol, Turkey's leading e-commerce company, is committed to positively impacting the lives of millions of customers. Our decision-making processes are entirely driven by data. As a data warehouse team, our primary goal is to provide accurate and up-to-date data, enabling the extraction of valuable business insights.
We utilize the benefits provided by Kafka and Kafka Connect to facilitate the transfer of data from the source to our analytical environment. We recently transitioned our Kafka Connect clusters from on-premise VMs to Kubernetes. This shift was driven by our desire to effectively manage rapid growth(marked by a growing number of producers, consumers, and daily messages), ensuring proper monitoring and consistency. Consistency is crucial, especially in instances where we employ Single Message Transforms to manipulate records like filtering based on their keys or converting a JSON Object into a JSON string.
Monitoring our cluster's health is key and we achieve this through Grafana dashboards and alerts generated through kube-state-metrics. Additionally, Kafka Connect's JMX metrics, coupled with NewRelic, are employed for comprehensive monitoring.
The session will aim to explain our approach to NRT data ingestion, outlining the role of Kafka and Kafka Connect, our transition journey to K8s, and methods employed to monitor the health of our clusters."
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
"Join our lightning talk to delve into the strategies vital for maintaining a resilient Kafka service.
While proactive monitoring is key for issue prevention, failures will still occur. Rapid detection tools will enable you to identify and resolve problems before they impact end-users. This session explores the techniques employed by Kafka cloud providers for this detection, many of which are also applicable if you are managing independent Kafka clusters or applications.
The talk focuses on health-checking, a powerful tool that encompasses an application and its monitoring to validate Kafka environment availability. The session navigates through Kafka health-check methods, sharing best practices, identifying common pitfalls, and highlighting the monitoring of critical performance metrics like throughput and latency for early issue detection.
Attendees will gain valuable insights into the art of health-checking their Kafka environment, equipping them with the tools to identify and address issues before they escalate into critical problems. We invite all Kafka enthusiasts to join us in this talk to foster a deeper understanding of Kafka health-checking and ensure the continued smooth operation of your Kafka environment."
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
"Stream processing systems traditionally gave their users the choice between at least once processing and at most once processing: accepting duplicate data or missing data. But ideally we would provide exactly-once processing, where every event in the input data is represented exactly once in the output.
Kafka provides a transaction API that enables exactly-once when using Kafka as your source and sink. But this API has turned out to not be well suited for use by high level streaming systems, requiring various work arounds to still provide transactional processing.
In this talk, I’ll cover how the transaction API works, and how systems like Arroyo and Flink have used it to build exactly-once support, and how improvements to the transactional API will enable better end-to-end support for consistent stream processing."
"In this talk, we will explore the exciting world of IoT and computer vision by presenting a unique project: Fish Plays Pokemon. Using an ESP Eye camera connected to an ESP32 and other IoT devices, to monitor fish's movements in an aquarium.
This project showcases the power of IoT and computer vision, demonstrating how even a fish can play a popular video game. We will discuss the challenges we faced during development, including real-time processing, IoT device integration, and Kafka message consumption.
By the end of the talk, attendees will have a better understanding of how to combine IoT, computer vision, and the usage of a serverless cloud to create innovative projects. They will also learn how to integrate IoT devices with Kafka to simulate keyboard behavior, opening up endless possibilities for real-time interactions between the physical and digital worlds."
What is tiered storage and what is it good for? After this session you will know how to leverage the tiered storage feature to enable longer retention than the storage attached to brokers allows. You will get acquainted with the different configuration options and know what to expect when you enable the feature, like for example when will the first upload to the remote object storage take place.
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
"Real-time 24/7 monitoring and verification of massive data is challenging – even more so for the world’s second largest manufacturer of memory chips and semiconductors. Tolerance levels are incredibly small, any small defect needs to be identified and dealt with immediately. The goal of semiconductor manufacturing is to improve yield and minimize unnecessary work.
However, even with real-time data collection, the data was not easy to manipulate by users and it took many days to enable stream processing requests – limiting its usefulness and value to the business.
You’ll hear why SK hynix switched to Confluent and how we developed a self-service stream process portal on top of it. Now users have an easy-to-use service to manipulate the data they want.
Results have been impressive, stream processing requests are available the same day – previously taking 5 days! We were also able to drive down costs by 10% as stream processing requests no longer require additional hardware.
What you’ll take away from our talk:
- What were the pain points in the previous environment
- How we transitioned to Confluent without service downtime
- Creating a self-service stream processing portal built on top of Connect and ksqlDB
- Use case of stream process portal"
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
"Discover how default configurations might impact ingestion times, especially when dealing with large files. We'll explore a real-world scenario with a 20,000,000+ line file, assessing metrics and exploring the bottleneck in the default setup. Understand the intricacies of batch size calculations and how to optimize them based on your unique data characteristics.
Walk away with actionable insights as we showcase a practical example, turning a 7-hour ingestion process into a mere 30 minutes for over 30,000,000 records in a Kafka topic. Uncover metrics, configurations, and best practices to elevate the performance of your Kafka Connect CSV source connectors. Don't miss this opportunity to optimize your data pipeline and ensure smooth, efficient data flow."
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
"In order to meet the current and ever-increasing demand for near-zero RPO/RTO systems, a focus on resiliency is critical. While Kafka offers built-in resiliency features, a perfect blend of client and cluster resiliency is necessary in order to achieve a highly resilient Kafka client application.
At Fidelity Investments, Kafka is used for a variety of event streaming needs such as core brokerage trading platforms, log aggregation, communication platforms, and data migrations. In this lightening talk, we will discuss the governance framework that has enabled producers and consumers to achieve their SLAs during unprecedented failure scenarios. We will highlight how we automated resiliency tests through chaos engineering and tightly integrated observability dashboards for Kafka clients to analyze and optimize client configurations. And finally, we will summarize the chaos test suite and the ""test, test and test"" mantra that are helping Fidelity Investments reach its goal of a future with zero down-time."
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
"There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option!
In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds."
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met.
I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines.
Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
"Almost everyone has heard about large language models, and tens of millions of people have tried out OpenAI ChatGPT and Google Bard. However, the intricate architecture and underlying mathematics driving these remarkable systems remain elusive to many.
LLM's are fascinating - so let's grab a drink and find out how these systems are built and dive deep into their inner workings. In the length of time it to enjoy a round of drinks, you'll understand the inner workings of these models. We'll take our first sip of word vectors, enjoy the refreshing taste of the transformer, and drain a glass understanding how these models are trained on phenomenally large quantities of data.
Large language models for your streaming application - explained with a little maths and a lot of pub stories"
"Monitoring is a fundamental operation when running Kafka and Kafka applications in production. There are numerous metrics available when using Kafka, however the sheer number is overwhelming, making it challenging to know where to start and how to properly utilise them.
This session will introduce you to some of the key metrics that should be monitored and best practices in fine tuning your monitoring. We will delve into which metrics are the indicators for cluster’s availability and performance and are the most helpful when debugging client applications."
Kafka Streams relies on state restoration for maintaining standby tasks as failure recovery mechanism as well as for restoring the state after rebalance scenarios. When you are scaling up or down your application instances, it is necessary to know the current state of the restoration process for each active and standby task in order to prevent a long restoration process as much as possible. During this presentation, you will get an understanding of how KIP-869 provides valuable information about the current active task restoration after a rebalance and KIP-988 opens a window to the continuous process of standby restoration. When you encounter a situation in which you need to choose whether or not to scale up or down your application instances, both KIPs will be an invaluable ally for you.
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
"In this talk, we will dive into the world of Kafka producer configs and explore how to understand and optimize them for better performance. We will cover the different types of configs, their impact on performance, and how to tune them to achieve the best results. Whether you're new to Kafka or a seasoned pro, this session will provide valuable insights and practical tips for improving your Kafka producer performance.
- Introduction to Kafka producer internal and workflow
- Understanding the producer configs like linger.ms, batch.size, buffer.memory and their impact on performance
- Learning about producer configs like max.block.ms, delivery.timeout.ms, request.timeout.ms and retries to make producer more resilient.
- Discuss configs like enable.idempotence, max.in.flight.requests.per.connection and transaction related configs to achieve delivery guarantees.
- Q&A session with attendees to address specific questions and concerns."
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
"Data contracts are one of the hottest topics in the data management community. A data contract is a formal agreement between a data producer and its consumers, aimed at reducing data downtime and improving data quality. Schemas are an important part of data contracts, but they are not the only relevant element.
In this talk, we’ll:
1. see why data contracts are so important but also difficult to implement;
2. identify the characteristics of a well-designed data contract:
discuss the anatomy of a data contract, its main elements and, how to formally describe them;
3. show how to manage the lifecycle of a data contract leveraging Confluent Platform's services."
"In the realm of stateful stream processing, Apache Flink has emerged as a powerful and versatile platform. However, the conventional SQL-based approach often limits the full potential of Flink applications.
We will delve into the benefits of adopting a code-first approach, which provides developers with greater control over application logic, facilitates complex transformations, and enables more efficient handling of state and time. We will also discuss how the code-first approach can lead to more maintainable and testable code, ultimately improving the overall quality of your Flink applications.
Whether you're a seasoned Flink developer or just starting your journey, this talk will provide valuable insights into how a code-first approach can revolutionize your stream processing applications."
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
"Change Data Capture (CDC) has become a commodity in data engineering, much in part due to the ever-rising success of Debezium [1]. But is that all there is? In this lightning talk, we’ll outline the current state of the CDC ecosystem, and understand why adopting a Debezium alternative is still a hard sell. If you’ve ever wondered what else is out there, but can’t keep up with the sprawling of new tools in the ecosystem; we’ll wrap it up for you!
[1] https://debezium.io/"
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
"Separation of compute and storage has become the de-facto standard in the data industry for batch processing.
The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world.
In this talk, we'll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks.
Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka's architecture from the ground up, but the benefits are worth it.
This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero."
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Developing Kafka Streams Applications with Upgradability in Mind with Neil Buesing | Kafka Summit London 2022
1. Designing your Ka
fk
a Streams
Applications with Upgradability In
Mind
Ka
fk
a Summit 2022 London
Neil Buesing, Rill Data
@nbuesing nbuesing
2. Background
• Principal Solution Architect, Rill Data, Inc.
• Work with clients streaming data into our platform
• 5+ years experience with Ka
fk
a Streams
• Speak on topics I'm passionate about with Apache Ka
fk
a and Ka
fk
a
Streams
• Working from home with the best pair-programmer
3. Goals
1. Con
fi
dence you can upgrade your application
2. Support for Data Recovery
• e.g., data corrupted due to bug in upgrade
3. Options
• e.g., responsibility
4. Reduce Developer time to achieve upgrade
4. Topics
1. Name processors
2. Name state stores
3. Minimize rebuilding of state
4. Data evolution
5. Partitioning
6. Microservices
7. Backup & Restore
8. Repartitioning
9. Windowed Stores
10. Circuit Breakers
11. Switches
7. Name Your Processors
• Syntax, add naming to existing con
fi
guration, Named added to those w/out
• Produced.as(), Grouped.as(), Joined.as(), Consumed.as(), Name.as()
• Gotchas - builders & static construction behavior
• Produced.with(Serdes.String(), vSerde).as("name")
• Produced.as("name").withKeySerde(Serdes.String()).withValueSerde(vSerde)
• Produced.<String,PurchaseOrder>as("name")
.withKeySerde(Serdes.String())
.withValueSerde(vSerde)
13. Name Your State Stores
• The most important thing you can do to make upgrades easier
• Simple
KTable<String, User> users =
builder.table(options.getUserTopic(),
Consumed.as("ktable-users"),
Materialized.as("user-table"));
15. Topology
• Print out topology on application start
final Topology topology = streamsBuilder(options).build(p);
log.info("Topology:n" + topology.describe());
• Visualize with
• https://zz85.github.io/ka
fk
a-streams-viz/
21. Data Evolution - JSON
public class UnmappedProperties {
private final Map<String, Object> map = new LinkedHashMap
<
>
();
@JsonAnyGetter
public Map<String, Object> getUnknownProperties() {
return map;
}
@JsonAnySetter
public void setUnknownProperty(String key, Object value) {
map.put(key, value);
}
}
22. Data Evolution - JSON
@JsonInclude(JsonInclude.Include.NON_NULL)
public class Product {
private String sku;
@JsonUnwrapped
private UnmappedProperties unmappedProperties = new UnmappedProperties();
public Product(Sku sku) {
this.sku = sku;
}
}
23. Data Evolution - JSON
• Risk/Pitfall
• Data type changes can break this approach
• Validates 3rd party inputs
• Implement a clearUnknownProperties()
24. Data Evolution - Avro
• Evolution…
• part of Avro's library
• leveraged by Con
fl
uent's Schema Registry
25. Data Evolution - Avro
• FULL
• ability to roll-backup
• streams apps are producers and consumers (forward and backward are harder)
• V1 ⟷ V2 and V2 ⟷ V3
• FULL-TRANSITIVE
• Ability to handle aggregations of older versions inde
fi
nitely
• V1 ⟷ V2 and V2 ⟷ V3 and V1 ⟷ V3
26. Data Evolution - Protobuf
• Tags numbers are encoded,
fi
eld names are not
• optional ⟷ repeated
• no encoding di
ff
erences: writing a repeated value and reading it as an
optional value has "last one wins"
• Renaming
fi
elds ⟶ full evolution
• Renumber tags ⟶ no evolution
27. Avoid Schema Registry Serialization for Keys
• A simple addition of a default attribute — breaks partitioning
• Exceptions
• output topics for sink connectors (e.g. JDBC Sink)
28. Data Evolution
(takeaways)
• Full (Forward and Backwards) - easier to roll-back your applications
• Full Transitive - easier to handle old data in your aggregates
• JSON, Avro, and Protobuf all have their own nuances - understand them
32. Partitioning
• Plan for growth (but…)
• Strive for even work-loads
• Partition for storage is as important (if not more so) than throughput
• Selecting a Partitioning for your Streams Applications
• 12 partitions better than 10 partitions
• avoid primes, 5
• 24 (but at what cost?)
1,2,3,4,6,12 1,2,5,10
1,2,3,4,6,8,12,24
1,5
33. Partitioning
• If repartitioning is easy
• 4 partitions
• If repartitioning is hard
• 8 or 12 partitions
• 24 partitions (large state stores)
• consider separation into multiple micro services
34. Validate partitioning on ingestion
• Peek - Log and Exception
builder.createStream("input-topic")
.peek((key, value) -> {… key != value.getKey() …})
• Filter - Log and Ignore
builder.createStream("input-topic")
.
fi
lter((key, value) -> {… key != value.getKey() …})
36. Micro Services
• easier to deploy
• more uniform allocation of work
• minimize downtime during restarts
• easier to understand
• threading
• storage
43. Backup and Restore
• transformValues cannot be created before aggregate/reduce since DSL
requires store to be materialized
fi
rst.
• aggregate and reduce do not have access to headers
• if DSL adopts PAPI updated refactoring, it would then be able to.
• understand how store caching and commit interval works
44. Backup and Restore
• a set of -changelog topics is not an Event Source based system.
45. co-partitioning
• partitioning of source and restore topics must match
• co-partitioning validation isn't catching this.
• behavior very confusing when they are not the same
(speaking from experience 🤦)
49. Repartitioning
• Leverage Built-in Backup and Restore
• On/O
ff
fi
lters so you can discard while brining the application online
• Version your application
• "foo.v1" ➟ "foo.v2"
51. Repartitioning
• Considerations around making restore a separate application
• Downtime
• Cut-over
• Using `application.id` for backup
• Keeping the code up to date
52. Window Stores
Type Boundary Examples
# records for key
@ point in time
Fixed
Size
Tumbling Epoch
[8:00, 8:30)
[8:30, 9:00)
single Yes
Hopping Epoch
[8:00, 8:30)
[8:15, 8:45)
[8:30, 8:45)
[8:45, 9:00)
constant Yes
Sliding Record
[8:02, 8:32]
[8:20, 8:50]
[8:21, 8:51]
variable Yes
Session Record
[8:02, 8:02]
[8:02, 8:10]
[9:10, 12:56]
single
(by tombstoning)
No
53. Window Stores
• Fixed Windows do NOT store window size (or end timestamp) in the
message
• Release new version and co-exist with old version
• Wait to use new version until windows are "ready"
55. Window Stores
• New Version Challenges
• Very long windows make it harder to wait for cut-over
• epoch
• hydration
• replay incoming events
• How ("When") to have clients cut over to new version
• earliest, latest, or speci
fi
c timestamp
• circuit breaker — moves burden to streams development team.
56. application.id & versions
• Versions should be a su
ffi
x on application.id
• ".v1", ".v2"
• Leverage ACLs with pre
fi
x on application.id
58. Circuit Breakers
• Starting and Stopping the Circuit Breaker application controls
fl
ow of
messages
• Unable to stop producers
• Complicated streams application
• in-
fl
ight data needs to be handled by same version
• no duplicate processing between version releases
59. Circuit Breakers
• Added Complexity
• Extra Application
• Extra Topic
• but can have smaller retention time (original is source-of-truth)
• Extra Deployments
60. Circuit Breaker handy for ksqlDB
• Placing a Ka
fk
a Streams circuit-breaker application gives control in front of
ksqlDB where consumer group selection is not possible
• KSQL query starts from latest
• KLIP-28 "create or replace" solves many issues (0.12.0)
• KLIP-22 "add consumer group id" (proposal - no traction)
62. Switches
• Burden on our deployment, not down-stream applications
• no o
ff
set management changes
63. Circuit Breakers & Switches
• Do not adopt these w/out need
• Add-in only if (and when) needed
64. Topics
1. Name processors
2. Name state stores
3. Minimize rebuilding of state
4. Data evolution
5. Partitioning
6. Microservices
7. Backup & Restore
8. Repartitioning
9. Windowed Stores
10. Circuit Breakers
11. Switches
65. Takeaways
• Do Right Away
• Name your State Stores
• Name your Processors
• Meaningful Partition Size
• Su
ffi
x based versioning
• Start Planning
• Backup/Restore & Repartitioning
• External Applications & Teams
• Release Scheduling
• Data Evolution Strategy