Dave Klein, Confluent, Developer Advocate
Apache Kafka is the core of an amazing ecosystem of tools and frameworks that enable us to get more value from our data. In this session we'll have a gentle introduction to Apache Kafka and a survey of some of the more popular components in the Kafka ecosystem.
https://www.meetup.com/KafkaBayArea/events/276592389/
Alex Woolford, Confluent, Senior Solutions Engineer
This slide deck was used during the following talk, non-linearly:
Real-time recommendations with Snowplow, Kafka, and Neo4j
Abstract: The value of the information gleaned from a web session often has a very short half-life. Post-hoc analysis is often too late to be actionable. In this short talk, we’ll show you how to generate customer profiles, and build a real-time recommendation engine with clickstream events sourced from Snowplow.
https://www.meetup.com/Saint-Louis-Kafka-meetup-group/events/275915770/
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
This document provides an overview of using location data to showcase keys, windows, and joins in Kafka Streams DSL and KSQL. It describes several algorithms for finding the nearest airport or aircraft to a given location within a 5-minute window using Kafka Streams. The algorithms involve bucketing location data, calculating distances, aggregating counts, and suppressing updates to optimize processing. The document cautions that windows are not magic and getting the window logic wrong can lead to incorrect results.
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...confluent
Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again! Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform, providing low-latency pub-sub messaging coupled with native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, which is part of Apache Kafka. ksqlDB is the source-available SQL streaming engine for Apache Kafka and makes it possible to build stream processing applications at scale, written using a familiar SQL interface.
In this talk, we’ll explain the architectural reasoning for Apache Kafka and the benefits of real-time integration, and we’ll build a streaming data pipeline using nothing but our bare hands, Kafka Connect, and ksqlDB.
Gasp as we filter events in real-time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
As distributed applications grow more complex, dynamic, and massively scalable, “observability” becomes more critical. Observability is the practice of using metrics, monitoring and distributed tracing to understand how a system works. In this presentation we’ll explore two complementary Open Source technologies: Prometheus for monitoring application metrics; and OpenTracing and Jaeger for distributed tracing. We’ll discover how they improve the observability of a massively scalable Anomaly Detection system - an application which is built around Apache Cassandra and Apache Kafka for the data layers, and dynamically deployed and scaled on Kubernetes, a container orchestration technology. We will give an overview of Prometheus and OpenTracing/Jaeger, explain how the application is instrumented, and describe how Prometheus and OpenTracing are deployed and configured in a production environment running Kubernetes, to dynamically monitor the application at scale. We conclude by exploring the benefits of monitoring and tracing technologies for understanding, debugging and tuning complex dynamic distributed systems built on Kafka, Cassandra and Kubernetes, and introduce a new use case to enable Cassandra Elastic Autoscaling, by combining Prometheus alerts, Instaclustr’s Provisioning API for Dynamic Resizing, and the new Prometheus monitoring API.
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
This document provides an overview and introduction to Apache Kafka and KSQL for building streaming data pipelines. It discusses how Kafka is an event streaming platform that can be used for messaging, streaming data, and stream processing. It then introduces KSQL, which is a streaming SQL engine for Apache Kafka that allows users to perform stream processing by writing SQL-like queries against Kafka topics. The document uses diagrams and examples to illustrate how to build a streaming data pipeline using Kafka Connect to ingest data, Kafka to store and transport streams, and KSQL to perform stream processing, enrichment, and analytics.
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is an open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. The project is managed and open sourced by Confluent.
KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing logic as an alternative to writing an application in a programming language such as Java, Python or Go. Benefits of using KSQL include: No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem.
This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.
Dave Klein, Confluent, Developer Advocate
Apache Kafka is the core of an amazing ecosystem of tools and frameworks that enable us to get more value from our data. In this session we'll have a gentle introduction to Apache Kafka and a survey of some of the more popular components in the Kafka ecosystem.
https://www.meetup.com/KafkaBayArea/events/276592389/
Alex Woolford, Confluent, Senior Solutions Engineer
This slide deck was used during the following talk, non-linearly:
Real-time recommendations with Snowplow, Kafka, and Neo4j
Abstract: The value of the information gleaned from a web session often has a very short half-life. Post-hoc analysis is often too late to be actionable. In this short talk, we’ll show you how to generate customer profiles, and build a real-time recommendation engine with clickstream events sourced from Snowplow.
https://www.meetup.com/Saint-Louis-Kafka-meetup-group/events/275915770/
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
This document provides an overview of using location data to showcase keys, windows, and joins in Kafka Streams DSL and KSQL. It describes several algorithms for finding the nearest airport or aircraft to a given location within a 5-minute window using Kafka Streams. The algorithms involve bucketing location data, calculating distances, aggregating counts, and suppressing updates to optimize processing. The document cautions that windows are not magic and getting the window logic wrong can lead to incorrect results.
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...confluent
Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again! Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform, providing low-latency pub-sub messaging coupled with native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, which is part of Apache Kafka. ksqlDB is the source-available SQL streaming engine for Apache Kafka and makes it possible to build stream processing applications at scale, written using a familiar SQL interface.
In this talk, we’ll explain the architectural reasoning for Apache Kafka and the benefits of real-time integration, and we’ll build a streaming data pipeline using nothing but our bare hands, Kafka Connect, and ksqlDB.
Gasp as we filter events in real-time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
As distributed applications grow more complex, dynamic, and massively scalable, “observability” becomes more critical. Observability is the practice of using metrics, monitoring and distributed tracing to understand how a system works. In this presentation we’ll explore two complementary Open Source technologies: Prometheus for monitoring application metrics; and OpenTracing and Jaeger for distributed tracing. We’ll discover how they improve the observability of a massively scalable Anomaly Detection system - an application which is built around Apache Cassandra and Apache Kafka for the data layers, and dynamically deployed and scaled on Kubernetes, a container orchestration technology. We will give an overview of Prometheus and OpenTracing/Jaeger, explain how the application is instrumented, and describe how Prometheus and OpenTracing are deployed and configured in a production environment running Kubernetes, to dynamically monitor the application at scale. We conclude by exploring the benefits of monitoring and tracing technologies for understanding, debugging and tuning complex dynamic distributed systems built on Kafka, Cassandra and Kubernetes, and introduce a new use case to enable Cassandra Elastic Autoscaling, by combining Prometheus alerts, Instaclustr’s Provisioning API for Dynamic Resizing, and the new Prometheus monitoring API.
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
This document provides an overview and introduction to Apache Kafka and KSQL for building streaming data pipelines. It discusses how Kafka is an event streaming platform that can be used for messaging, streaming data, and stream processing. It then introduces KSQL, which is a streaming SQL engine for Apache Kafka that allows users to perform stream processing by writing SQL-like queries against Kafka topics. The document uses diagrams and examples to illustrate how to build a streaming data pipeline using Kafka Connect to ingest data, Kafka to store and transport streams, and KSQL to perform stream processing, enrichment, and analytics.
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is an open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. The project is managed and open sourced by Confluent.
KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing logic as an alternative to writing an application in a programming language such as Java, Python or Go. Benefits of using KSQL include: No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem.
This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.
Viktor Gamov, Confluent, Developer Advocate
Apache Kafka is an open source distributed streaming platform that allows you to build applications and process events as they occur. Viktor Gamov (developer Advocate at Confluent) walks through how it works and important underlying concepts. As a real-time, scalable, and durable system, Kafka can be used for fault-tolerant storage as well as for other use cases, such as stream processing, centralized data management, metrics, log aggregation, event sourcing, and more.
This talk will explain what a streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems.
https://www.meetup.com/Chennai-Kafka/events/269942117/
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
This presentation compares Amazon Kinesis Data Streams to Managed Streaming for Kafka (MSK) in both architectural perspective and operational perspective. In addition, it shows common architectural patterns: (1) Data Hub: Event-Bus, (2) Log Aggregation, (3) IoT, (4) Event sourcing and CQRS.
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQLconfluent
The document discusses stream processing with Apache Kafka and KSQL. It begins with an overview of Kafka Streams and KSQL, describing them as Java APIs and a SQL-like language for building real-time applications. It then highlights several key features of Kafka Streams, including its ability to elastically scale applications across instances and deploy them anywhere, as well as its support for exactly-once processing, event-time processing, and powerful stream operations. The document concludes by discussing how KSQL lowers the barrier to entry for stream processing and enables different users like developers and analysts to interact with streaming data.
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Dataconfluent
Watch this talk here: https://www.confluent.io/online-talks/building-a-streaming-etl-solution-with-apache-kafka-rail-data-on-demand
As data engineers, we frequently need to build scalable systems working with data from a variety of sources and with various ingest rates, sizes, and formats. This talk takes an in-depth look at how Apache Kafka can be used to provide a common platform on which to build data infrastructure driving both real-time analytics as well as event-driven applications.
Using a public feed of railway data it will show how to ingest data from message queues such as ActiveMQ with Kafka Connect, as well as from static sources such as S3 and REST endpoints. We'll then see how to use stream processing to transform the data into a form useful for streaming to analytics in tools such as Elasticsearch and Neo4j. The same data will be used to drive a real-time notifications service through Telegram.
If you're wondering how to build your next scalable data platform, how to reconcile the impedance mismatch between stream and batch, and how to wrangle streams of data—this talk is for you!
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
Join Tom Green, Solution Engineer at Confluent for this Lunch and Learn talk covering KSQL. Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka®. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and it supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.
By attending one of these sessions, you will learn:
-How to query streams, using SQL, without writing code.
-How KSQL provides automated scalability and out-of-the-box high availability for streaming queries
-How KSQL can be used to join streams of data from different sources
-The differences between Streams and Tables in Apache Kafka
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
Application teams in JPMC have started shifting towards building event driven architectures and real time steaming pipelines and Kafka has been at core in this journey. As application teams have started adopting Kafka rapidly, need for a centrally managed Kafka as a service has emerged. We have started delivering Kafka as a service in early 2018 and running in production for more than an year now operating 80+ clusters (and growing) in all environments together. One of the key requirements is to provide truly segregated, secured multi-tenant environment with RBAC model while satisfying financial regulations and controls at the same time. Operating clusters at large scale requires scalable self-service capabilities and cluster management orchestration. In this talk we will present - Our experiences in delivering and operating secured, multi-tenant and resilient Kafka clusters at scale. - Internals of our service framework/control plane which enables self-service capabilities for application teams, cluster build/patch orchestration and capacity management capabilities for TSE/admin teams. - Our approach in enabling automated Cross Datacenter failover for application teams using service framework and confluent replicator.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
Robin is a Developer Advocate at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle Groundbreaker Ambassador. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at http://cnfl.io/rmoff and http://rmoff.net/ and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time.
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
With the rapid onset of the global Covid-19 Pandemic from the start of this year the USA Centers for Disease Control and Prevention (CDC) had to quickly implement a new Covid-19 specific pipeline to collect testing data from all of the USA’s states and territories, and carry out other critical steps including integration, cleaning, checking, enrichment, analysis, and enforcing data governance and privacy etc. The pipeline then produces multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka. In this presentation we'll build a similar (but simpler) pipeline for ingesting, integrating, indexing, searching/analysing and visualising some publicly available tidal data. We'll briefly introduce each technology and component, and walk through the steps of using Apache Kafka, Kafka Connect, Elasticsearch and Kibana to build the pipeline and visualise the results.
Deep Dive Series #3: Schema Validation + Structured Audit Logsconfluent
Eine weitere neue sicherheitsrelevante Funktion in Confluent Platform 5.4 sind Structured Audit Logs. Jetzt ist natürlich alles in Kafka ein Log, aber Kafka protokolliert nicht, was Kafka mit Kafka macht - nur das, was in einen Topics geschrieben wird.
Im dritten Teil der Deep Dive Sessions besprechen wir neben den Structured Audit Logs außerdem die "Weiterentwicklung" der bereits bekannten Schema Registry: Die Schema Validation agiert auf dem Topic-Level und stellt sicher, dass jede einzelne Message, die zu einem bestimmten Topic erstellt wird in der Schema Registry überprüft wird. Mehr dazu erklären wir in unserem Deep Dive #3.
Real-time processing of large amounts of dataconfluent
This document discusses real-time processing of large amounts of data using a streaming platform. It begins with an agenda for the presentation, then discusses how streaming platforms can be used as a central nervous system in enterprises. Several use cases are presented, including using Apache Kafka and the Confluent Platform for applications like fraud detection, customer analytics, and migrating from batch to stream-based data processing. The rest of the document goes into details on Kafka, Confluent Platform, and how they can be used to build stream processing applications.
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
KSQL is a streaming SQL engine that allows users to analyze streaming data in Apache Kafka using SQL queries. It provides a simple interactive interface using SQL to explore, transform, and perform real-time analytics on data in Kafka. KSQL can be used for tasks like data exploration, transformation, ETL, enrichment, monitoring, anomaly detection, and more. It builds on Kafka Streams to provide scalability, fault tolerance, and exactly-once processing semantics out of the box.
KSQL: Open Source Streaming for Apache Kafkaconfluent
This document provides an overview of KSQL, an open source streaming SQL engine for Apache Kafka. It describes the core concepts of Kafka and KSQL, including how KSQL can be used for streaming ETL, anomaly detection, real-time monitoring, and data transformation. It also discusses how KSQL fits into a streaming platform and can be run in both client-server and standalone modes.
Speaker: Pere Urbón-Bayes, Technical Account Manager, Confluent
The need to integrate a swarm of systems has always been present in the history of IT; however, with the advent of microservices, big data and IoT, this has simply exploded.
Through the exploration of a few use cases, this presentation will introduce stream processing, a powerful and scalable way to transform and connect applications around your business.
We will explain in this talk how Apache Kafka® and the Confluent Platform can be used to connect the diverse collection of applications that the actual business faces. Components such as KSQL where non-developers can process streaming events at scale or those that are Kafka Streams-oriented to build scalable applications to process event data.
A guide through the Azure Messaging services - Update ConferenceEldert Grootenboer
https://www.updateconference.net/en/2019/session/a-guide-through-the-azure-messaging-services
A guide through the Azure Messaging services - Update Conference
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Michael Noll
Slides of my Strata London 2018 talk:
https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/65325
Abstract:
Modern businesses have data at their core, and this data is changing continuously. Stream processing is what allows you harness this torrent of information in real time, and thousands of companies use Apache Kafka as the core platform for streaming data to transform and reshape their industries. However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies require the user to write code in programming languages such as Java or Scala. This hard requirement on coding skills is preventing many companies to unlock the benefits of stream processing to their full effect.
However, imagine that instead of having to write a lot of code in a programming language like Java or Scala for your favorite stream processing technology, all you’d need to get started with stream processing is a simple SQL statement, such as: SELECT * FROM payments-kafka-stream WHERE fraudProbability > 0.8.
Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. With KSQL, there’s no need to write any code in a programming language. KSQL brings together the worlds of streams and databases by allowing you to work with your data in a stream and in a table format. Built on top of Kafka’s Streams API, KSQL supports many powerful operations, including filtering, transformations, aggregations, joins, windowing, sessionization, and much more. It is open source (Apache 2.0 licensed), distributed, scalable, fault tolerant, and real time. You’ll learn how KSQL makes it easy to get started with a wide range of stream processing use cases and how to get up and running as you explore how it all works under the hood.
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
New Features in Confluent Platform 6.0 / Apache Kafka 2.6, including REST Proxy and API, Tiered Storage for AWS S3 and GCP GCS, Cluster Linking (On-Premise, Edge, Hybrid, Multi-Cloud), Self-Balancing Clusters), ksqlDB.
What is Apache Kafka, and What is an Event Streaming Platform?confluent
Speaker: Viktor Gamov, Developer Advocate, Confluent
Event streaming platforms have emerged as a popular new trend, but what exactly is an event streaming platform? Part messaging system, part Hadoop-made-fast, part fast ETL and scalable data integration, with Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
https://www.meetup.com/Boston-Apache-kafka-Meetup/events/258989900/
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
Introducing Rainbird, Twitter's high volume distributed counting service for realtime analytics, built on Cassandra. This presentation looks at the motivation, design, and uses of Rainbird across Twitter.
Viktor Gamov, Confluent, Developer Advocate
Apache Kafka is an open source distributed streaming platform that allows you to build applications and process events as they occur. Viktor Gamov (developer Advocate at Confluent) walks through how it works and important underlying concepts. As a real-time, scalable, and durable system, Kafka can be used for fault-tolerant storage as well as for other use cases, such as stream processing, centralized data management, metrics, log aggregation, event sourcing, and more.
This talk will explain what a streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems.
https://www.meetup.com/Chennai-Kafka/events/269942117/
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
This presentation compares Amazon Kinesis Data Streams to Managed Streaming for Kafka (MSK) in both architectural perspective and operational perspective. In addition, it shows common architectural patterns: (1) Data Hub: Event-Bus, (2) Log Aggregation, (3) IoT, (4) Event sourcing and CQRS.
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQLconfluent
The document discusses stream processing with Apache Kafka and KSQL. It begins with an overview of Kafka Streams and KSQL, describing them as Java APIs and a SQL-like language for building real-time applications. It then highlights several key features of Kafka Streams, including its ability to elastically scale applications across instances and deploy them anywhere, as well as its support for exactly-once processing, event-time processing, and powerful stream operations. The document concludes by discussing how KSQL lowers the barrier to entry for stream processing and enables different users like developers and analysts to interact with streaming data.
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Dataconfluent
Watch this talk here: https://www.confluent.io/online-talks/building-a-streaming-etl-solution-with-apache-kafka-rail-data-on-demand
As data engineers, we frequently need to build scalable systems working with data from a variety of sources and with various ingest rates, sizes, and formats. This talk takes an in-depth look at how Apache Kafka can be used to provide a common platform on which to build data infrastructure driving both real-time analytics as well as event-driven applications.
Using a public feed of railway data it will show how to ingest data from message queues such as ActiveMQ with Kafka Connect, as well as from static sources such as S3 and REST endpoints. We'll then see how to use stream processing to transform the data into a form useful for streaming to analytics in tools such as Elasticsearch and Neo4j. The same data will be used to drive a real-time notifications service through Telegram.
If you're wondering how to build your next scalable data platform, how to reconcile the impedance mismatch between stream and batch, and how to wrangle streams of data—this talk is for you!
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
Join Tom Green, Solution Engineer at Confluent for this Lunch and Learn talk covering KSQL. Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka®. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and it supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.
By attending one of these sessions, you will learn:
-How to query streams, using SQL, without writing code.
-How KSQL provides automated scalability and out-of-the-box high availability for streaming queries
-How KSQL can be used to join streams of data from different sources
-The differences between Streams and Tables in Apache Kafka
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
Application teams in JPMC have started shifting towards building event driven architectures and real time steaming pipelines and Kafka has been at core in this journey. As application teams have started adopting Kafka rapidly, need for a centrally managed Kafka as a service has emerged. We have started delivering Kafka as a service in early 2018 and running in production for more than an year now operating 80+ clusters (and growing) in all environments together. One of the key requirements is to provide truly segregated, secured multi-tenant environment with RBAC model while satisfying financial regulations and controls at the same time. Operating clusters at large scale requires scalable self-service capabilities and cluster management orchestration. In this talk we will present - Our experiences in delivering and operating secured, multi-tenant and resilient Kafka clusters at scale. - Internals of our service framework/control plane which enables self-service capabilities for application teams, cluster build/patch orchestration and capacity management capabilities for TSE/admin teams. - Our approach in enabling automated Cross Datacenter failover for application teams using service framework and confluent replicator.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
Robin is a Developer Advocate at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle Groundbreaker Ambassador. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at http://cnfl.io/rmoff and http://rmoff.net/ and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time.
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
With the rapid onset of the global Covid-19 Pandemic from the start of this year the USA Centers for Disease Control and Prevention (CDC) had to quickly implement a new Covid-19 specific pipeline to collect testing data from all of the USA’s states and territories, and carry out other critical steps including integration, cleaning, checking, enrichment, analysis, and enforcing data governance and privacy etc. The pipeline then produces multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka. In this presentation we'll build a similar (but simpler) pipeline for ingesting, integrating, indexing, searching/analysing and visualising some publicly available tidal data. We'll briefly introduce each technology and component, and walk through the steps of using Apache Kafka, Kafka Connect, Elasticsearch and Kibana to build the pipeline and visualise the results.
Deep Dive Series #3: Schema Validation + Structured Audit Logsconfluent
Eine weitere neue sicherheitsrelevante Funktion in Confluent Platform 5.4 sind Structured Audit Logs. Jetzt ist natürlich alles in Kafka ein Log, aber Kafka protokolliert nicht, was Kafka mit Kafka macht - nur das, was in einen Topics geschrieben wird.
Im dritten Teil der Deep Dive Sessions besprechen wir neben den Structured Audit Logs außerdem die "Weiterentwicklung" der bereits bekannten Schema Registry: Die Schema Validation agiert auf dem Topic-Level und stellt sicher, dass jede einzelne Message, die zu einem bestimmten Topic erstellt wird in der Schema Registry überprüft wird. Mehr dazu erklären wir in unserem Deep Dive #3.
Real-time processing of large amounts of dataconfluent
This document discusses real-time processing of large amounts of data using a streaming platform. It begins with an agenda for the presentation, then discusses how streaming platforms can be used as a central nervous system in enterprises. Several use cases are presented, including using Apache Kafka and the Confluent Platform for applications like fraud detection, customer analytics, and migrating from batch to stream-based data processing. The rest of the document goes into details on Kafka, Confluent Platform, and how they can be used to build stream processing applications.
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
KSQL is a streaming SQL engine that allows users to analyze streaming data in Apache Kafka using SQL queries. It provides a simple interactive interface using SQL to explore, transform, and perform real-time analytics on data in Kafka. KSQL can be used for tasks like data exploration, transformation, ETL, enrichment, monitoring, anomaly detection, and more. It builds on Kafka Streams to provide scalability, fault tolerance, and exactly-once processing semantics out of the box.
KSQL: Open Source Streaming for Apache Kafkaconfluent
This document provides an overview of KSQL, an open source streaming SQL engine for Apache Kafka. It describes the core concepts of Kafka and KSQL, including how KSQL can be used for streaming ETL, anomaly detection, real-time monitoring, and data transformation. It also discusses how KSQL fits into a streaming platform and can be run in both client-server and standalone modes.
Speaker: Pere Urbón-Bayes, Technical Account Manager, Confluent
The need to integrate a swarm of systems has always been present in the history of IT; however, with the advent of microservices, big data and IoT, this has simply exploded.
Through the exploration of a few use cases, this presentation will introduce stream processing, a powerful and scalable way to transform and connect applications around your business.
We will explain in this talk how Apache Kafka® and the Confluent Platform can be used to connect the diverse collection of applications that the actual business faces. Components such as KSQL where non-developers can process streaming events at scale or those that are Kafka Streams-oriented to build scalable applications to process event data.
A guide through the Azure Messaging services - Update ConferenceEldert Grootenboer
https://www.updateconference.net/en/2019/session/a-guide-through-the-azure-messaging-services
A guide through the Azure Messaging services - Update Conference
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Michael Noll
Slides of my Strata London 2018 talk:
https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/65325
Abstract:
Modern businesses have data at their core, and this data is changing continuously. Stream processing is what allows you harness this torrent of information in real time, and thousands of companies use Apache Kafka as the core platform for streaming data to transform and reshape their industries. However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies require the user to write code in programming languages such as Java or Scala. This hard requirement on coding skills is preventing many companies to unlock the benefits of stream processing to their full effect.
However, imagine that instead of having to write a lot of code in a programming language like Java or Scala for your favorite stream processing technology, all you’d need to get started with stream processing is a simple SQL statement, such as: SELECT * FROM payments-kafka-stream WHERE fraudProbability > 0.8.
Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. With KSQL, there’s no need to write any code in a programming language. KSQL brings together the worlds of streams and databases by allowing you to work with your data in a stream and in a table format. Built on top of Kafka’s Streams API, KSQL supports many powerful operations, including filtering, transformations, aggregations, joins, windowing, sessionization, and much more. It is open source (Apache 2.0 licensed), distributed, scalable, fault tolerant, and real time. You’ll learn how KSQL makes it easy to get started with a wide range of stream processing use cases and how to get up and running as you explore how it all works under the hood.
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
New Features in Confluent Platform 6.0 / Apache Kafka 2.6, including REST Proxy and API, Tiered Storage for AWS S3 and GCP GCS, Cluster Linking (On-Premise, Edge, Hybrid, Multi-Cloud), Self-Balancing Clusters), ksqlDB.
What is Apache Kafka, and What is an Event Streaming Platform?confluent
Speaker: Viktor Gamov, Developer Advocate, Confluent
Event streaming platforms have emerged as a popular new trend, but what exactly is an event streaming platform? Part messaging system, part Hadoop-made-fast, part fast ETL and scalable data integration, with Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
https://www.meetup.com/Boston-Apache-kafka-Meetup/events/258989900/
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
Introducing Rainbird, Twitter's high volume distributed counting service for realtime analytics, built on Cassandra. This presentation looks at the motivation, design, and uses of Rainbird across Twitter.
This document summarizes a presentation given by Kevin Weil about Rainbird, Twitter's real-time analytics system. Rainbird uses Cassandra to count events extremely quickly at high volumes, with hierarchical and temporal aggregation. It powers several internal Twitter products like promoted tweets analytics and monitoring. While not yet open sourced, Rainbird and the team's work will be open sourced in the future.
The document discusses moving a Slack commands application from a single Express app hosting all commands to individual serverless functions (Lambda) per command. It provides examples of the code structure for a sample "user stats" command function, how it is executed by AWS Lambda, and how CloudFormation templates are used to define and deploy the Lambda function and its resources to AWS.
Interactively querying Google Analytics reports from R using ganalyticsJohann de Boer
This presentation introduces the Google Analytics reporting APIs and how these can be accessed for data analysis using R. Examples are provided that demonstrate using the ganalytics package for R to answer analysis questions with accompanying visualisations.
Crossing the Streams: Rethinking Stream Processing with KStreams and KSQL confluent
(Viktor Gamov, Confluent) Kafka Summit SF 2018
All things change constantly! And dealing with constantly changing data at low latency is pretty hard. It doesn’t need to be that way. Apache Kafka, the de facto standard open source distributed stream processing system. Many of us know Kafka’s architectural and pub/sub API particulars. But that doesn’t mean we’re equipped to build the kind of real-time streaming data systems that the next generation of business requirements are going to demand. We need to get on board with streams!
Viktor Gamov will introduce Kafka Streams and KSQL—an important recent addition to the Confluent Open Source platform that lets us build sophisticated stream processing systems with little to no code at all! They will talk about how to deploy stream processing applications and look at the actual working code that will bring your thinking about streaming data systems from the ancient history of batch processing into the current era of streaming data!
P.S. No prior knowledge of Kafka Streams, KSQL or Ghostbusters needed!
This document discusses using MapReduce vs Apache Pig for batch processing of large datasets. It provides a code example in Pig Latin for analyzing customer category data to calculate statistics like top products purchased, average number of views and purchases, and average purchase amount for each customer category. The code loads and joins input data, groups and filters the data, calculates statistics, and stores the results.
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...HostedbyConfluent
"Kafka Connect, the framework for building scalable and reliable data pipelines, has gained immense popularity in the data engineering landscape. This session will provide a comprehensive guide to creating Kafka connectors using Kotlin, a language known for its conciseness and expressiveness.
In this session, we will explore a step-by-step approach to crafting Kafka connectors with Kotlin, from inception to deployment using an simple use case. The process includes the following key aspects:
Understanding Kafka Connect: We'll start with an overview of Kafka Connect and its architecture, emphasizing its importance in real-time data integration and streaming.
Connector Design: Delve into the design principles that govern connector creation. Learn how to choose between source and sink connectors and identify the data format that suits your use case.
Building a Source Connector: We'll start with building a Kafka source connector, exploring key considerations, such as data transformations, serialization, deserialization, error handling and delivery guarantees. You will see how Kotlin's concise syntax and type safety can simplify the implementation.
Testing: Learn how to rigorously test your connector to ensure its reliability and robustness, utilizing best practices for testing in Kotlin.
Connector Deployment: go through the process of deploying your connector in a Kafka Connect cluster, and discuss strategies for monitoring and scaling.
Real-World Use Cases: Explore real-world examples of Kafka connectors built with Kotlin.
By the end of this session, you will have a solid foundation for creating and deploying Kafka connectors using Kotlin, equipped with practical knowledge and insights to make your data integration processes more efficient and reliable. Whether you are a seasoned developer or new to Kafka Connect, this guide will help you harness the power of Kafka and Kotlin for seamless data flow in your applications."
Grails offers a number of options for implementing asynchronous and event-driven applications, from the low-level Servlet 3.0 async support to the Promise API abstraction, which supports GPars, RxJava 1 & 2. At the GORM level, there is also the async namespace, and RxGORM. Finally, a brand new events API shipped with Grails 3.3, which moves off Reactor, will help users to implement event-driven applications.
In this session, the speaker will walk the audience through such options.
KSQL is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. KSQL is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
KSQL offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using KSQL for most part. This will be done in a live demo on a fictitious IoT sample.
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Carlos Andrés García
This document discusses running Apache Kafka on Kubernetes using the Confluent Operator. It provides an overview of Kafka's architecture including sharding, replication, and scaling. It then covers how the Operator allows automated provisioning, operations, resiliency and monitoring of Kafka and Confluent Platform on Kubernetes through custom resource definitions and controllers. The Operator helps with tasks like rolling upgrades, scaling, and meeting SLAs. Resources are provided to learn more about running Kafka on Kubernetes.
Most data visualization solutions today still work on the “data at rest” paradigm, where data is persisted first and then analyzed. But data sources today often come as a constant stream of data, from IoT devices to Social Media streams. These data streams publish information with high velocity and messages often have to be processed as quick as possible. For the processing and analytics, so called stream processing solutions are available. But these only provide minimal or no visualization capabilities. So how do we solve the visualization of high velocity data streams? An option is to first persist the data and then use a traditional data visualization solution to present the data. If latency is not an issue, this might be good enough. A NoSQL database might be an ideal solution, but not all traditional visualization tools might easily integrate with the specific data store. Another option is to use a dedicated Streaming Visualization solution. They are specially built for streaming data but on the other hand often do not support batch data. This talk presents different architecture blueprints for visualizing fast data and shows some products for implementing these blueprints.
Talk about using Ganglia and other tools for storing all kinds of web application metrics for both operations and business purposes. Presented at Cambridge Geek Night
TSAR is a framework for building time-series aggregation jobs at scale. It allows users to specify how to extract events from data sources, the dimensions to aggregate on, metrics to compute, and datastores to write outputs to. TSAR then handles all aspects of deploying and operating the aggregation pipeline, including processing events in real-time and batch, coordinating data schemas, and providing operational tools. It is based on Summingbird which provides abstractions for distributed computation across different platforms.
TSAR (TimeSeries AggregatoR) Tech TalkAnirudh Todi
Twitter's 250 million users generate tens of billions of tweet views per day. Aggregating these events in real time - in a robust enough way to incorporate into our products - presents a massive scaling challenge. In this presentation I introduce TSAR (the TimeSeries AggregatoR), a robust, flexible, and scalable service for real-time event aggregation designed to solve this problem and a range of similar ones. I discuss how we built TSAR using Python and Scala from the ground up, almost entirely on open-source technologies (Storm, Summingbird, Kafka, Aurora, and others), and describe some of the challenges we faced in scaling it to process tens of billions of events per day.
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...Tail Target
The document discusses optimizing MapReduce pipelines in Apache Crunch. It describes how Crunch can orchestrate and optimize MapReduce jobs by chaining together map, group, and reduce functions. It provides an example of a pipeline that reads data from a file, counts visitors using parallel mapping, groups the results by key, and writes the final counts to an output file. The document notes that network traffic between nodes is often a bottleneck, and discusses how preprocessing data during mapping can reduce network traffic and improve performance compared to a standard MapReduce job.
Bridge Your Kafka Streams to Azure Webinarconfluent
With a fully managed Apache Kafka(R) as-a-service on Microsoft Azure, businesses can focus on building applications and not managing clusters. Build a persistent bridge from on-premises data systems to the cloud with a hybrid Kafka service or stream across public clouds for multi-cloud data pipelines.
In this session for business and technical data leaders, you can learn about powering business applications with the managed Kafka service that streams data into Azure SQL Data Warehouse, Cosmos DB, Azure Data Lake Storage and Azure Blob Storage.
Kafka as an Event Store (Guido Schmutz, Trivadis) Kafka Summit NYC 2019confluent
Event Sourcing and CQRS are two popular patterns for implementing a Microservices architectures. With Event Sourcing we do not store the state of an object, but instead store all the events impacting its state. Then to retrieve an object state, we have to read the different events related to a certain object and apply them one by one. CQRS (Command Query Responsibility Segregation) on the other hand is a way to dissociate writes (Command) and reads (Query). Event Sourcing and CQRS are frequently grouped and used together to form something bigger. While it is possible to implement CQRS without Event Sourcing, the opposite is not necessarily correct. In order to implement Event Sourcing, an efficient Event Store is needed. But is that also true when combining Event Sourcing and CQRS? And what is an event store in the first place and what features should it implement? This presentation will first discuss what functionalities an event store should offer and then present how Apache Kafka can be used to implement an event store. But is Kafka good enough or do specific event store solutions such as AxonDB or Event Store provide a better solution?
Similar to Crossing the Streams: Event Streaming with Kafka Streams (20)
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Santander Stream Processing with Apache Flinkconfluent
Flink is becoming the de facto standard for stream processing due to its scalability, performance, fault tolerance, and language flexibility. It supports stream processing, batch processing, and analytics through one unified system. Developers choose Flink for its robust feature set and ability to handle stream processing workloads at large scales efficiently.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
This document discusses networking options and best practices for Confluent Cloud. It provides an overview of public endpoints, private link, and peering options. It then discusses best practices for private networking architectures on Azure using hub-and-spoke and private link designs. Finally, it addresses networking considerations and challenges for Kafka Connect managed connectors, as well as planned enhancements for DNS peering and outbound private link support.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
This document discusses moving to an event-driven architecture using Confluent. It begins by outlining some of the limitations of traditional messaging middleware approaches. Confluent provides benefits like stream processing, persistence, scalability and reliability while avoiding issues like lack of structure, slow consumers, and technical debt. The document then discusses how Confluent can help modernize architectures, enable new real-time use cases, and reduce costs through migration. It provides examples of how companies like Advance Auto Parts and Nord/LB have benefitted from implementing Confluent platforms.
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
20. @gamussa | #confluentvug | @confluentinc
The log is a simple idea
Messages are added at the end of the log
Old New
21. @gamussa | #confluentvug | @confluentinc
The log is a simple idea
Messages are added at the end of the log
Old New
22. @gamussa | #confluentvug | @confluentinc
Consumers have a position all of their own
Old New
Robin
is here
Scan
Viktor
is here
Scan
Ricardo
is here
Scan
23. @gamussa | #confluentvug | @confluentinc
Consumers have a position all of their own
Old New
Robin
is here
Scan
Viktor
is here
Scan
Ricardo
is here
Scan
24. @gamussa | #confluentvug | @confluentinc
Consumers have a position all of their own
Old New
Robin
is here
Scan
Viktor
is here
Scan
Ricardo
is here
Scan
28. @gamussa | #confluentvug | @confluentinc
Shard data to get scalability
Producer (1) Producer (2) Producer (3)
Cluster
of
machines
Partitions live on
different machines
Messages are sent
to different
partitions
29. @gamussa | #confluentvug | @confluentinc
// in-memory store, not persistent
Map<String, Integer> groupByCounts = new HashMap<>();
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProperties());
KafkaProducer<String, Integer> producer = new KafkaProducer<>(producerProperties())) {
consumer.subscribe(Arrays.asList("A", "B"));
while (true) { // consumer poll loop
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
Integer count = groupByCounts.get(key);
if (count == null) {
count = 0;
}
count += 1;
groupByCounts.put(key, count);
}
}
30. @gamussa | #confluentvug | @confluentinc
// in-memory store, not persistent
Map<String, Integer> groupByCounts = new HashMap<>();
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProperties());
KafkaProducer<String, Integer> producer = new KafkaProducer<>(producerProperties())) {
consumer.subscribe(Arrays.asList("A", "B"));
while (true) { // consumer poll loop
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
Integer count = groupByCounts.get(key);
if (count == null) {
count = 0;
}
count += 1;
groupByCounts.put(key, count);
}
}
31. @gamussa | #confluentvug | @confluentinc
// in-memory store, not persistent
Map<String, Integer> groupByCounts = new HashMap<>();
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProperties());
KafkaProducer<String, Integer> producer = new KafkaProducer<>(producerProperties())) {
consumer.subscribe(Arrays.asList("A", "B"));
while (true) { // consumer poll loop
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
Integer count = groupByCounts.get(key);
if (count == null) {
count = 0;
}
count += 1;
groupByCounts.put(key, count);
}
}
32. @gamussa | #confluentvug | @confluentinc
// in-memory store, not persistent
Map<String, Integer> groupByCounts = new HashMap<>();
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProperties());
KafkaProducer<String, Integer> producer = new KafkaProducer<>(producerProperties())) {
consumer.subscribe(Arrays.asList("A", "B"));
while (true) { // consumer poll loop
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
Integer count = groupByCounts.get(key);
if (count == null) {
count = 0;
}
count += 1;
groupByCounts.put(key, count);
}
}
33. @gamussa | #confluentvug | @confluentinc
while (true) { // consumer poll loop
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
Integer count = groupByCounts.get(key);
if (count == null) {
count = 0;
}
count += 1;
groupByCounts.put(key, count);
}
}
71. @gAmUssA | #confluentvug | @confluentinc
ALL UPCOMING MEETUPS
NEW EVENT EMAIL ALERTS
THE CONFLUENT MEETUP HUB
CNFL.IO/MEETUP-HUB
VIDEOS OF PAST
MEETUPS
SLIDES FROM THE
TALKS
72. @gamussa | #confluentvug | @confluentinc
Confluent Community
Slack
A vibrant community of over 16,000
members
Come along and discuss Apache Kafka and
Confluent Platform on dedicated channels
including #ksqlDB, #connect, #clients, and
more
http://cnfl.io/slack
73. @gamussa | #confluentvug | @confluentinc
Free eBooks Designing Event-Driven Systems
Ben Stopford
Kafka: The Definitive Guide
Neha Narkhede, Gwen Shapira, Todd Palino
Making Sense of Stream Processing
Martin Kleppmann
I ❤ Logs
Jay Kreps
http://cnfl.io/book-bundle