The document discusses 4 reasons to use a cloud-native Kafka service like Confluent Cloud instead of managing Kafka yourself. It notes that managing Kafka requires significant investment of time and resources for tasks like architecture planning, cluster sizing, software upgrades, and more. A cloud-native service handles all operational overhead automatically so you can focus on your core business. Confluent Cloud specifically offers elastic scaling, infinite data retention, global access across clouds, and integrations to make it a complete data streaming platform.
What is Apache Kafka and What is an Event Streaming Platform?confluent
Speaker: Gabriel Schenker, Lead Curriculum Developer, Confluent
Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration. With Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
Presenters: Rachel Pedreschi, Senior Director, Solutions Engineering, Imply.io + Josh Treichel, Partner Solutions Architect, Confluent
Analytic pipelines running purely on batch processing systems can suffer from hours of data lag, resulting in accuracy issues with analysis and overall decision-making. Join us for a demo to learn how easy it is to integrate your Apache Kafka® streams in Apache Druid (incubating) to provide real-time insights into the data.
In this online talk, you’ll hear about ingesting your Kafka streams into Imply’s scalable analytic engine and gaining real-time insights via a modern user interface.
Register now to learn about:
-The benefits of combining a real-time streaming platform with a comprehensive analytics stack
-Building an analytics pipeline by integrating Confluent Platform and Imply
-How KSQL, streaming SQL for Kafka, can easily transform and filter streams of data in real time
-Querying and visualizing streaming data in Imply
-Practical ways to implement Confluent Platform and Imply to address common use cases such as analyzing network flows, collecting and monitoring IoT data and visualizing clickstream data
Confluent Platform, developed by the creators of Kafka, enables the ingest and processing of massive amounts of real-time event data. Imply, the complete analytics stack built on Druid, can ingest, store, query and visualize streaming data from Confluent Platform, enabling end-to-end real-time analytics. Together, Confluent and Imply can provide low latency data delivery, data transform, and data querying capabilities to power a range of use cases.
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
New Features in Confluent Platform 6.0 / Apache Kafka 2.6, including REST Proxy and API, Tiered Storage for AWS S3 and GCP GCS, Cluster Linking (On-Premise, Edge, Hybrid, Multi-Cloud), Self-Balancing Clusters), ksqlDB.
Death of the dumb pipes: Using Apache Kafka® for Integration projectsHostedbyConfluent
Guru Sattanathan, Confluent, Senior Solutions Engineer
Enterprise Integration technologies (aka Middleware) are the key enablers when it comes to Real-time data flows or Event Driven Architecture. Starting from real-time payments, e-commerce, travel booking systems, etc, everything is powered by a middleware underneath. It did transform a lot of things but with caveats!
Are ESB’s & MQ’s enough for today’s integration needs? Do you know their technical debts?
If you are someone looking at integrating your applications or an Integration Architect this session is for you. It's time to refresh yourself and see how organizations are building integrations today.
In this session, we will go in this order:
-Recap on Enterprise Integration technologies
-What are the key flaws & What needs improvement?
-What is Apache Kafka?
-Rethinking Integration using Apache Kafka
https://www.meetup.com/KafkaMelbourne/events/280590162/
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will dig into how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so. Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to
hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
What is Apache Kafka and What is an Event Streaming Platform?confluent
Speaker: Gabriel Schenker, Lead Curriculum Developer, Confluent
Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration. With Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
Presenters: Rachel Pedreschi, Senior Director, Solutions Engineering, Imply.io + Josh Treichel, Partner Solutions Architect, Confluent
Analytic pipelines running purely on batch processing systems can suffer from hours of data lag, resulting in accuracy issues with analysis and overall decision-making. Join us for a demo to learn how easy it is to integrate your Apache Kafka® streams in Apache Druid (incubating) to provide real-time insights into the data.
In this online talk, you’ll hear about ingesting your Kafka streams into Imply’s scalable analytic engine and gaining real-time insights via a modern user interface.
Register now to learn about:
-The benefits of combining a real-time streaming platform with a comprehensive analytics stack
-Building an analytics pipeline by integrating Confluent Platform and Imply
-How KSQL, streaming SQL for Kafka, can easily transform and filter streams of data in real time
-Querying and visualizing streaming data in Imply
-Practical ways to implement Confluent Platform and Imply to address common use cases such as analyzing network flows, collecting and monitoring IoT data and visualizing clickstream data
Confluent Platform, developed by the creators of Kafka, enables the ingest and processing of massive amounts of real-time event data. Imply, the complete analytics stack built on Druid, can ingest, store, query and visualize streaming data from Confluent Platform, enabling end-to-end real-time analytics. Together, Confluent and Imply can provide low latency data delivery, data transform, and data querying capabilities to power a range of use cases.
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
New Features in Confluent Platform 6.0 / Apache Kafka 2.6, including REST Proxy and API, Tiered Storage for AWS S3 and GCP GCS, Cluster Linking (On-Premise, Edge, Hybrid, Multi-Cloud), Self-Balancing Clusters), ksqlDB.
Death of the dumb pipes: Using Apache Kafka® for Integration projectsHostedbyConfluent
Guru Sattanathan, Confluent, Senior Solutions Engineer
Enterprise Integration technologies (aka Middleware) are the key enablers when it comes to Real-time data flows or Event Driven Architecture. Starting from real-time payments, e-commerce, travel booking systems, etc, everything is powered by a middleware underneath. It did transform a lot of things but with caveats!
Are ESB’s & MQ’s enough for today’s integration needs? Do you know their technical debts?
If you are someone looking at integrating your applications or an Integration Architect this session is for you. It's time to refresh yourself and see how organizations are building integrations today.
In this session, we will go in this order:
-Recap on Enterprise Integration technologies
-What are the key flaws & What needs improvement?
-What is Apache Kafka?
-Rethinking Integration using Apache Kafka
https://www.meetup.com/KafkaMelbourne/events/280590162/
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will dig into how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so. Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to
hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
A guide through the Azure Messaging services - Update ConferenceEldert Grootenboer
https://www.updateconference.net/en/2019/session/a-guide-through-the-azure-messaging-services
A guide through the Azure Messaging services - Update Conference
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...HostedbyConfluent
Legacy migration is a journey. Mainframes cannot be replaced in a single project. A big bang will fail. This has to be planned long-term.
Mainframe offloading and replacement with Apache Kafka and its ecosystem can be used to keep a more modern data store in real-time sync with the mainframe, while at the same time persisting the event data on the bus to enable microservices, and deliver the data to other systems such as data warehouses and search indexes.
This session walks through the different steps some companies are already gone through. Technical options like Change Data Capture (CDC), MQ, and third-party tools for mainframe integration, offloading and replacement are explored.
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
Cloud Native Night March 2020, Mainz: Talk by Perry Krol (@perkrol, Confluent)
=== Please download slides if blurred! ===
Abstract: Proven approaches such as service-oriented and event-driven architectures are joined by newer techniques such as microservices, reactive architectures, DevOps, and stream processing. Many of these patterns are successful by themselves, but they provide a more holistic and compelling approach when applied together. In this session Confluent will provide insights how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems. You will learn why streaming beats request-response based architectures in complex, contemporary use cases, and explain why replayable logs such as Kafka provide a backbone for both service communication and shared datasets.
Based on these principles, we will explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches, apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”.
Neo4j Graph Streaming Services with Apache Kafkajexp
In this presentation we give an high level overview of the Neo4j-Kafka integration and the Confluent partnership.
Providing change-data-capture and ingestion capabilities as Neo4j Extension and the Kafka Connect Neo4j Sink on Confluent Hub allows you to integrate real-time streaming with graph querying and analytics.
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...Red Hat Developers
Apache Kafka is taking the world by storm and is rapidly becoming the de-facto event bus for event-driven and streaming applications that respond to events and data in real time. OpenShift Streams for Apache Kafka is Red Hat's fully hosted and managed Apache Kafka service targeting development teams that want to incorporate streaming data and scalable messaging in their applications, without the burden of setting up and maintaining a Kafka cluster infrastructure.
In this session you will discover how Apache Kafka can be used in an IoT scenario to ingest data from devices and make them available in real-time to other applications.
More specifically you will learn how to:
Simulate devices that send MQTT messages to a MQTT broker
Use Apache Camel and Camel-K to bridge MQTT with Apache Kafka
Use Kafka Streams in a Quarkus application to process the device messages
Query the state of the devices using GraphQ
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
We will demonstrate how easy it is to use Confluent Cloud as the data source of your Beam pipelines. You will learn how to process the information that comes from Confluent Cloud in real time, make transformations on such information and feed it back to your Kafka topics and other parts of your architecture.
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
In this workshop we will set up a streaming framework which will process realtime data of traffic sensors installed within the Belgian road system.
Starting with the intake of the data, you will learn best practices and the recommended approach to split the information into events in a way that won't come back to haunt you.
With some basic stream operations (count, filter, ... ) you will get to know the data and experience how easy it is to get things done with Spring Boot & Spring Cloud Stream.
But since simple data processing is not enough to fulfill all your streaming needs, we will also let you experience the power of windows.
After this workshop, tumbling, sliding and session windows hold no more mysteries and you will be a true streaming wizard.
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...HostedbyConfluent
Developing cloud native microservices introduced us to many new challenges. One of the most difficult is to build reliable microservices integrations and their data exchange patterns. In this session I will share my 10 years of experience with building microservices and application runtime platforms with some of the largest European organisations. I will introduce basic principles of developing Java Spring Boot with Apache Kafka. These patterns can be used for: microservices communication decoupling, implementing microservices state stores, avoiding dependencies on traditional database systems.
This session is targeted for developers who are interested in learning new cloud native development practices and understanding how event streaming microservices improve their current work. Demo application code will be available to participants.
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/best-practices-for-streaming-iot-data-with-MQTT-and-apache-kafka-on-demand
Organizations today are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices over unreliable networks can create some architecture challenges.
In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
In this session we share our experience of building a real-time data pipelines at Tencent PCG - one that handles 20 trillion daily messages with 700 clusters and 100Gb/s bursting traffic from a single app. We discuss our roadmap of enhancing Kafka to break its limits in terms of scalability, robustness and cost of operation.
We first built a proxy layer that aggregates physical clusters in a way agnostic to the clients. While this architecture solves many operational problems, it requires significant development to stay future-proof. With retrospection with our customer and careful study of the ongoing work from the community, we then designed a region federation solution in the broker layer, which allows us to deploy clusters at a much larger scale than previously possible, while at the same time providing better failure recovery and operability. We discuss how we make this development compatible with KIP-500 and KIP-405, and the two KIP (693, 694) that we submitted for discussion.
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Michael Noll
Talk URL: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/77360
Abstract: Would you cross the street with traffic information that’s a minute old? Certainly not. Modern businesses have the same needs nowadays, whether it’s due to competitive pressure or because their customers have much higher expectations of how they want to interact with a product or service. At the heart of this movement are events: in today’s digital age, events are everywhere. Every digital action—across online purchases to ride-sharing requests to bank deposits—creates a set of events around transaction amount, transaction time, user location, account balance, and much more. The technology that allows businesses to read, write, store, and compute and process these events in real-time are event-streaming platforms, and tens of thousands of companies like Netflix, Audi, PayPal, Airbnb, Uber, and Pinterest have picked Apache Kafka as the de facto choice to implement event-driven architectures and reshape their industries.
Michael Noll explores why and how you can use Apache Kafka and its growing ecosystem to build event-driven architectures that are elastic, scalable, robust, and fault tolerant, whether it’s on-premises, in the cloud, on bare metal machines, or in Kubernetes with Docker containers. Specifically, you’ll look at Kafka as the storage and publish and subscribe layer; Kafka’s Connect framework for integrating external data systems such as MySQL, Elastic, or S3 with Kafka; and Kafka’s Streams API and KSQL as the compute layer to implement event-driven applications and microservices in Java and Scala and streaming SQL, respectively, that process the events flowing through Kafka in real time. Michael provides an overview of the most relevant functionality, both current and upcoming, and shares best practices and typical use cases so you can tie it all together for your own needs.
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...HostedbyConfluent
Kafka has fast become the center of streaming analytics applications in the modern digital enterprise. Kafka operates in the context of a broad ecosystem of data lifecycle components which need a consistent platform of security, monitoring, management and governance. This problem becomes paramount when your streaming architectures go hybrid by spanning from on-premises to the cloud. Throw in the reality of a multi-cloud setup that a lot of enterprises are facing and now, you have a complex streaming architecture that is difficult to operationally manage, monitor, secure or govern.
Cloudera remains committed to an open community driven approach and increasing the ease of use and visibility for Kafka based solutions. Attend this session to understand more about how streaming architectures can be extended easily to the hybrid cloud and multi-cloud. Also, learn about our plans for further community contributions.
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...HostedbyConfluent
As cyber threats continuously grow in sophistication and frequency, companies need to quickly acclimate to effectively detect, respond, and protect their environments. At Intel, we’ve addressed this need by implementing a modern, scalable Cyber Intelligence Platform (CIP) based on Splunk and Apache Kafka. We believe that CIP positions us for the best defense against cyber threats well into the future.
Our CIP ingests tens of terabytes of data each day and transforms it into actionable insights through streams processing, context-smart applications, and advanced analytics techniques. Kafka serves as a massive data pipeline within the platform. It provides us the ability to operate on data in-stream, enabling us to reduce Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR). Faster detection and response ultimately leads to better prevention.
In our session, we’ll discuss the details described in the IT@Intel white paper that was published in Nov 2020 with same title.
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...HostedbyConfluent
You cannot operate what you cannot measure. In this talk, I am going to present the built-in metrics framework of Kafka Streams that supports monitoring Kafka Streams applications. You will learn how to setup monitoring of metrics for your Kafka Streams applications and you will hear about the following recent improvements to the metrics framework that aim to extend and simplify monitoring. KIP-444 aims to simplify and extend the built-in metrics framework. The RocksDB metrics introduced in KIP-471 and KIP-607 allow you to look directly into the built-in persistent state stores of your Kafka Streams applications. Finally, KIP-613 specifies metrics that measure end-to-end latencies in your applications. This talk will help you collect intel about the behavior of your Kafka Streams applications, and will allow you to reason about the deployment. In the end, you will be able to better understand your applications and run them in a more robust manner.
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022HostedbyConfluent
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Modern streaming use cases are generating massive amounts of data - much of it needs to be organized and queried over time. The sheer amount and complexity of this problem presents new challenges for data engineers and developers alike.
To solve this problem Apache Kafka and MongoDB Time Series collections are a powerful combination. In this talk, Kenny Gorman and Elena Cuevas will present how Apache Kafka on Confluent Cloud can stream massive amounts of data to Time Series Collections via the MongoDB Connector for Apache Kafka. Elena and Kenny will discuss the required configuration details and critical components of Confluent Cloud and MongoDB Atlas as well as some tips, tricks and best practices.
You will leave armed with the knowledge of how Confluent Cloud, Apache Kafka, MongoDB Atlas, and Time Series collections fit into your event-driven architecture.
A guide through the Azure Messaging services - Update ConferenceEldert Grootenboer
https://www.updateconference.net/en/2019/session/a-guide-through-the-azure-messaging-services
A guide through the Azure Messaging services - Update Conference
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...HostedbyConfluent
Legacy migration is a journey. Mainframes cannot be replaced in a single project. A big bang will fail. This has to be planned long-term.
Mainframe offloading and replacement with Apache Kafka and its ecosystem can be used to keep a more modern data store in real-time sync with the mainframe, while at the same time persisting the event data on the bus to enable microservices, and deliver the data to other systems such as data warehouses and search indexes.
This session walks through the different steps some companies are already gone through. Technical options like Change Data Capture (CDC), MQ, and third-party tools for mainframe integration, offloading and replacement are explored.
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
Cloud Native Night March 2020, Mainz: Talk by Perry Krol (@perkrol, Confluent)
=== Please download slides if blurred! ===
Abstract: Proven approaches such as service-oriented and event-driven architectures are joined by newer techniques such as microservices, reactive architectures, DevOps, and stream processing. Many of these patterns are successful by themselves, but they provide a more holistic and compelling approach when applied together. In this session Confluent will provide insights how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems. You will learn why streaming beats request-response based architectures in complex, contemporary use cases, and explain why replayable logs such as Kafka provide a backbone for both service communication and shared datasets.
Based on these principles, we will explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches, apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”.
Neo4j Graph Streaming Services with Apache Kafkajexp
In this presentation we give an high level overview of the Neo4j-Kafka integration and the Confluent partnership.
Providing change-data-capture and ingestion capabilities as Neo4j Extension and the Kafka Connect Neo4j Sink on Confluent Hub allows you to integrate real-time streaming with graph querying and analytics.
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...Red Hat Developers
Apache Kafka is taking the world by storm and is rapidly becoming the de-facto event bus for event-driven and streaming applications that respond to events and data in real time. OpenShift Streams for Apache Kafka is Red Hat's fully hosted and managed Apache Kafka service targeting development teams that want to incorporate streaming data and scalable messaging in their applications, without the burden of setting up and maintaining a Kafka cluster infrastructure.
In this session you will discover how Apache Kafka can be used in an IoT scenario to ingest data from devices and make them available in real-time to other applications.
More specifically you will learn how to:
Simulate devices that send MQTT messages to a MQTT broker
Use Apache Camel and Camel-K to bridge MQTT with Apache Kafka
Use Kafka Streams in a Quarkus application to process the device messages
Query the state of the devices using GraphQ
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
We will demonstrate how easy it is to use Confluent Cloud as the data source of your Beam pipelines. You will learn how to process the information that comes from Confluent Cloud in real time, make transformations on such information and feed it back to your Kafka topics and other parts of your architecture.
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
In this workshop we will set up a streaming framework which will process realtime data of traffic sensors installed within the Belgian road system.
Starting with the intake of the data, you will learn best practices and the recommended approach to split the information into events in a way that won't come back to haunt you.
With some basic stream operations (count, filter, ... ) you will get to know the data and experience how easy it is to get things done with Spring Boot & Spring Cloud Stream.
But since simple data processing is not enough to fulfill all your streaming needs, we will also let you experience the power of windows.
After this workshop, tumbling, sliding and session windows hold no more mysteries and you will be a true streaming wizard.
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...HostedbyConfluent
Developing cloud native microservices introduced us to many new challenges. One of the most difficult is to build reliable microservices integrations and their data exchange patterns. In this session I will share my 10 years of experience with building microservices and application runtime platforms with some of the largest European organisations. I will introduce basic principles of developing Java Spring Boot with Apache Kafka. These patterns can be used for: microservices communication decoupling, implementing microservices state stores, avoiding dependencies on traditional database systems.
This session is targeted for developers who are interested in learning new cloud native development practices and understanding how event streaming microservices improve their current work. Demo application code will be available to participants.
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/best-practices-for-streaming-iot-data-with-MQTT-and-apache-kafka-on-demand
Organizations today are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices over unreliable networks can create some architecture challenges.
In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
In this session we share our experience of building a real-time data pipelines at Tencent PCG - one that handles 20 trillion daily messages with 700 clusters and 100Gb/s bursting traffic from a single app. We discuss our roadmap of enhancing Kafka to break its limits in terms of scalability, robustness and cost of operation.
We first built a proxy layer that aggregates physical clusters in a way agnostic to the clients. While this architecture solves many operational problems, it requires significant development to stay future-proof. With retrospection with our customer and careful study of the ongoing work from the community, we then designed a region federation solution in the broker layer, which allows us to deploy clusters at a much larger scale than previously possible, while at the same time providing better failure recovery and operability. We discuss how we make this development compatible with KIP-500 and KIP-405, and the two KIP (693, 694) that we submitted for discussion.
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Michael Noll
Talk URL: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/77360
Abstract: Would you cross the street with traffic information that’s a minute old? Certainly not. Modern businesses have the same needs nowadays, whether it’s due to competitive pressure or because their customers have much higher expectations of how they want to interact with a product or service. At the heart of this movement are events: in today’s digital age, events are everywhere. Every digital action—across online purchases to ride-sharing requests to bank deposits—creates a set of events around transaction amount, transaction time, user location, account balance, and much more. The technology that allows businesses to read, write, store, and compute and process these events in real-time are event-streaming platforms, and tens of thousands of companies like Netflix, Audi, PayPal, Airbnb, Uber, and Pinterest have picked Apache Kafka as the de facto choice to implement event-driven architectures and reshape their industries.
Michael Noll explores why and how you can use Apache Kafka and its growing ecosystem to build event-driven architectures that are elastic, scalable, robust, and fault tolerant, whether it’s on-premises, in the cloud, on bare metal machines, or in Kubernetes with Docker containers. Specifically, you’ll look at Kafka as the storage and publish and subscribe layer; Kafka’s Connect framework for integrating external data systems such as MySQL, Elastic, or S3 with Kafka; and Kafka’s Streams API and KSQL as the compute layer to implement event-driven applications and microservices in Java and Scala and streaming SQL, respectively, that process the events flowing through Kafka in real time. Michael provides an overview of the most relevant functionality, both current and upcoming, and shares best practices and typical use cases so you can tie it all together for your own needs.
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...HostedbyConfluent
Kafka has fast become the center of streaming analytics applications in the modern digital enterprise. Kafka operates in the context of a broad ecosystem of data lifecycle components which need a consistent platform of security, monitoring, management and governance. This problem becomes paramount when your streaming architectures go hybrid by spanning from on-premises to the cloud. Throw in the reality of a multi-cloud setup that a lot of enterprises are facing and now, you have a complex streaming architecture that is difficult to operationally manage, monitor, secure or govern.
Cloudera remains committed to an open community driven approach and increasing the ease of use and visibility for Kafka based solutions. Attend this session to understand more about how streaming architectures can be extended easily to the hybrid cloud and multi-cloud. Also, learn about our plans for further community contributions.
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...HostedbyConfluent
As cyber threats continuously grow in sophistication and frequency, companies need to quickly acclimate to effectively detect, respond, and protect their environments. At Intel, we’ve addressed this need by implementing a modern, scalable Cyber Intelligence Platform (CIP) based on Splunk and Apache Kafka. We believe that CIP positions us for the best defense against cyber threats well into the future.
Our CIP ingests tens of terabytes of data each day and transforms it into actionable insights through streams processing, context-smart applications, and advanced analytics techniques. Kafka serves as a massive data pipeline within the platform. It provides us the ability to operate on data in-stream, enabling us to reduce Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR). Faster detection and response ultimately leads to better prevention.
In our session, we’ll discuss the details described in the IT@Intel white paper that was published in Nov 2020 with same title.
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...HostedbyConfluent
You cannot operate what you cannot measure. In this talk, I am going to present the built-in metrics framework of Kafka Streams that supports monitoring Kafka Streams applications. You will learn how to setup monitoring of metrics for your Kafka Streams applications and you will hear about the following recent improvements to the metrics framework that aim to extend and simplify monitoring. KIP-444 aims to simplify and extend the built-in metrics framework. The RocksDB metrics introduced in KIP-471 and KIP-607 allow you to look directly into the built-in persistent state stores of your Kafka Streams applications. Finally, KIP-613 specifies metrics that measure end-to-end latencies in your applications. This talk will help you collect intel about the behavior of your Kafka Streams applications, and will allow you to reason about the deployment. In the end, you will be able to better understand your applications and run them in a more robust manner.
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022HostedbyConfluent
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Modern streaming use cases are generating massive amounts of data - much of it needs to be organized and queried over time. The sheer amount and complexity of this problem presents new challenges for data engineers and developers alike.
To solve this problem Apache Kafka and MongoDB Time Series collections are a powerful combination. In this talk, Kenny Gorman and Elena Cuevas will present how Apache Kafka on Confluent Cloud can stream massive amounts of data to Time Series Collections via the MongoDB Connector for Apache Kafka. Elena and Kenny will discuss the required configuration details and critical components of Confluent Cloud and MongoDB Atlas as well as some tips, tricks and best practices.
You will leave armed with the knowledge of how Confluent Cloud, Apache Kafka, MongoDB Atlas, and Time Series collections fit into your event-driven architecture.
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
Apache Kafka users who want to leverage Google Cloud Platform's (GCPs) data analytics platform and open source hosting capabilities can bridge their existing Kafka infrastructure on-premise or in other clouds to GCP using Confluent's replicator tool and managed Kafka service on GCP. Using actual customer examples and a reference architecture, we'll showcase how existing Kafka users can stream data to GCP and use it in popular tools like Apache Beam on Dataflow, BigQuery, Google Cloud Storage (GCS), Spark on Dataproc, and Tensorflow for data warehousing, data processing, data storage, and advanced analytics using AI and ML.
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersenconfluent
Best Practices for building Hybrid-Cloud Architectures - Hans Jespersen
Afternoon opening presentation during Confluent’s streaming event in Paris, presented by Hans Jespersen, VP WW Systems Engineering at Confluent.
Build & Deploy Scalable Cloud Applications in Record TimeRightScale
RightScale Webinar: August 11, 2009 - Watch this webinar to see a hands-on demonstration of WaveMaker Visual Ajax Studio and Rapid Deployment Framework to illustrate how easy it is to build your app in Wavemaker. We demonstrate the one-button push from Wavemaker to deploying your application on the cloud with the RightScale Cloud Management Platform. From there we show you how easy it is to manage, automate and scale your application running on the cloud.
Should you be getting more from your data? If you’ve answered yes, perhaps you’re exploring how you can power new analytics and apps by streaming data from on-premises, open source, and hybrid cloud environments to your desired cloud endpoints (i.e.: Cosmos DB and/or Synapse) in real time. Getting your data from point A to point B can be expensive, time-consuming and complex. Fortunately, there’s a much easier way. Join Microsoft’s Kal Yella, Luciano Moreira, and Confluent’s Jacob Bogie to learn how you can connect multi-cloud and hybrid data to Azure cloud, reducing the complexity and cost associated with building real-time applications and analytics in the cloud.
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
Converting production databases into live data streams for Apache Kafka can be labor intensive and costly. As Kafka architectures grow, complexity also rises as data teams begin to configure clusters for redundancy, partitions for performance, as well as for consumer groups for correlated analytics processing. In this breakout session, you’ll hear data streaming success stories from Generali and Skechers that leverage Qlik Data Integration and Confluent. You’ll discover how Qlik’s data integration platform lets organizations automatically produce real-time transaction streams into Kafka, Confluent Platform, or Confluent Cloud, deliver faster business insights from data, enable streaming analytics, as well as streaming ingestion for modern analytics. Learn how these customer use Qlik and Confluent to: - Turn databases into live data feeds - Simplify and automate the real-time data streaming process - Accelerate data delivery to enable real-time analytics Learn how Skechers and Generali breathe new life into data in the cloud, stay ahead of changing demands, while lowering over-reliance on resources, production time and costs.
Bridge to Cloud: Using Apache Kafka to Migrate to AWSconfluent
Watch this talk here: https://www.confluent.io/online-talks/bridge-to-cloud-apache-kafka-migrate-aws
Speakers: Priya Shivakumar, Director of Product, Confluent + Konstantine Karantasis, Software Engineer, Confluent + Rohit Pujari, Partner Solutions Architect, AWS
Most companies start their cloud journey with a new use case, or a new application. Sometimes these applications can run independently in the cloud, but often times they need data from the on premises datacenter. Existing applications will slowly migrate, but will need a strategy and the technology to enable a multi-year migration.
In this session, we will share how companies around the world are using Confluent Cloud, a fully managed Apache Kafka service, to migrate to AWS. By implementing a central-pipeline architecture using Apache Kafka to sync on-prem and cloud deployments, companies can accelerate migration times and reduce costs.
In this online talk we will cover:
•How to take the first step in migrating to AWS
•How to reliably sync your on premises applications using a persistent bridge to cloud
•Learn how Confluent Cloud can make this daunting task simple, reliable and performant
•See a demo of the hybrid-cloud and multi-region deployment of Apache Kafka
IBM’s Steve Barbieri and Chad Holliday show how enterprise customers are using blueprints to develop their infrastructure and application layers across different cloud environments - helping them "make the move to cloud" in 2017.
Similar to Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself (20)
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a comprehensive platform designed to address multi-faceted needs by offering multi-function data management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion.
In this research-based session, I’ll discuss what the components are in multiple modern enterprise analytics stacks (i.e., dedicated compute, storage, data integration, streaming, etc.) and focus on total cost of ownership.
A complete machine learning infrastructure cost for the first modern use case at a midsize to large enterprise will be anywhere from $3 million to $22 million. Get this data point as you take the next steps on your journey into the highest spend and return item for most companies in the next several years.
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
Do you ever wonder how data-driven organizations fuel analytics, improve customer experience, and accelerate business productivity? They are successful by governing and mastering data effectively so they can get trusted data to those who need it faster. Efficient data discovery, mastering and democratization is critical for swiftly linking accurate data with business consumers. When business teams can quickly and easily locate, interpret, trust, and apply data assets to support sound business judgment, it takes less time to see value.
Join data mastering and data governance experts from Informatica—plus a real-world organization empowering trusted data for analytics—for a lively panel discussion. You’ll hear more about how a single cloud-native approach can help global businesses in any economy create more value—faster, more reliably, and with more confidence—by making data management and governance easier to implement.
What is data literacy? Which organizations, and which workers in those organizations, need to be data-literate? There are seemingly hundreds of definitions of data literacy, along with almost as many opinions about how to achieve it.
In a broader perspective, companies must consider whether data literacy is an isolated goal or one component of a broader learning strategy to address skill deficits. How does data literacy compare to other types of skills or “literacy” such as business acumen?
This session will position data literacy in the context of other worker skills as a framework for understanding how and where it fits and how to advocate for its importance.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Uncover how your business can save money and find new revenue streams.
Driving profitability is a top priority for companies globally, especially in uncertain economic times. It's imperative that companies reimagine growth strategies and improve process efficiencies to help cut costs and drive revenue – but how?
By leveraging data-driven strategies layered with artificial intelligence, companies can achieve untapped potential and help their businesses save money and drive profitability.
In this webinar, you'll learn:
- How your company can leverage data and AI to reduce spending and costs
- Ways you can monetize data and AI and uncover new growth strategies
- How different companies have implemented these strategies to achieve cost optimization benefits
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
In this webinar, Bob will focus on:
-Selecting the appropriate metadata to govern
-The business and technical value of a data catalog
-Building the catalog into people’s routines
-Positioning the data catalog for success
-Questions the data catalog can answer
Because every organization produces and propagates data as part of their day-to-day operations, data trends are becoming more and more important in the mainstream business world’s consciousness. For many organizations in various industries, though, comprehension of this development begins and ends with buzzwords: “Big Data,” “NoSQL,” “Data Scientist,” and so on. Few realize that all solutions to their business problems, regardless of platform or relevant technology, rely to a critical extent on the data model supporting them. As such, data modeling is not an optional task for an organization’s data effort, but rather a vital activity that facilitates the solutions driving your business. Since quality engineering/architecture work products do not happen accidentally, the more your organization depends on automation, the more important the data models driving the engineering and architecture activities of your organization. This webinar illustrates data modeling as a key activity upon which so much technology and business investment depends.
Specific learning objectives include:
- Understanding what types of challenges require data modeling to be part of the solution
- How automation requires standardization on derivable via data modeling techniques
- Why only a working partnership between data and the business can produce useful outcomes
Analytics play a critical role in supporting strategic business initiatives. Despite the obvious value to analytic professionals of providing the analytics for these initiatives, many executives question the economic return of analytics as well as data lakes, machine learning, master data management, and the like.
Technology professionals need to calculate and present business value in terms business executives can understand. Unfortunately, most IT professionals lack the knowledge required to develop comprehensive cost-benefit analyses and return on investment (ROI) measurements.
This session provides a framework to help technology professionals research, measure, and present the economic value of a proposed or existing analytics initiative, no matter the form that the business benefit arises. The session will provide practical advice about how to calculate ROI and the formulas, and how to collect the necessary information.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
Enterprise data literacy. A worthy objective? Certainly! A realistic goal? That remains to be seen. As companies consider investing in data literacy education, questions arise about its value and purpose. While the destination – having a data-fluent workforce – is attractive, we wonder how (and if) we can get there.
Kicking off this webinar series, we begin with a panel discussion to explore the landscape of literacy, including expert positions and results from focus groups:
- why it matters,
- what it means,
- what gets in the way,
- who needs it (and how much they need),
- what companies believe it will accomplish.
In this engaging discussion about literacy, we will set the stage for future webinars to answer specific questions and feature successful literacy efforts.
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
Change is hard, especially in response to negative stimuli or what is perceived as negative stimuli. So organizations need to reframe how they think about data privacy, security and governance, treating them as value centers to 1) ensure enterprise data can flow where it needs to, 2) prevent – not just react – to internal and external threats, and 3) comply with data privacy and security regulations.
Working together, these roles can accelerate faster access to approved, relevant and higher quality data – and that means more successful use cases, faster speed to insights, and better business outcomes. However, both new information and tools are required to make the shift from defense to offense, reducing data drama while increasing its value.
Join us for this panel discussion with experts in these fields as they discuss:
- Recent research about where data privacy, security and governance stand
- The most valuable enterprise data use cases
- The common obstacles to data value creation
- New approaches to data privacy, security and governance
- Their advice on how to shift from a reactive to resilient mindset/culture/organization
You’ll be educated, entertained and inspired by this panel and their expertise in using the data trifecta to innovate more often, operate more efficiently, and differentiate more strategically.
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
As DATAVERSITY’s RWDG series hurdles into our 12th year, this webinar takes a quick look behind us, evaluates the present, and predicts the future of Data Governance. Based on webinar numbers, hot Data Governance topics have evolved over the years from policies and best practices, roles and tools, data catalogs and frameworks, to supporting data mesh and fabric, artificial intelligence, virtualization, literacy, and metadata governance.
Join Bob Seiner as he reflects on the past and what has and has not worked, while sharing examples of enterprise successes and struggles. In this webinar, Bob will challenge the audience to stay a step ahead by learning from the past and blazing a new trail into the future of Data Governance.
In this webinar, Bob will focus on:
- Data Governance’s past, present, and future
- How trials and tribulations evolve to success
- Leveraging lessons learned to improve productivity
- The great Data Governance tool explosion
- The future of Data Governance
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
Would you share your bank account information on social media? How about shouting your social security number on the New York City subway? We didn’t think so either – that’s why data governance is consistently top of mind.
In this webinar, we’ll discuss the common Cloud data governance best practices – and how to apply them today. Join us to uncover Google Cloud’s investment in data governance and learn practical and doable methods around key management and confidential computing. Hear real customer experiences and leave with insights that you can share with your team. Let’s get solving.
Topics that you will hear addressed in this webinar:
- Understanding the basics of Cloud Incident Response (IR) and anticipated data governance trends
- Best practices for key management and apply data governance to your day-to-day
- The next wave of Confidential Computing and how to get started, including a demo
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the enterprise mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and data architecture. William will kick off the fifth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
Too often I hear the question “Can you help me with our data strategy?” Unfortunately, for most, this is the wrong request because it focuses on the least valuable component: the data strategy itself. A more useful request is: “Can you help me apply data strategically?” Yes, at early maturity phases the process of developing strategic thinking about data is more important than the actual product! Trying to write a good (must less perfect) data strategy on the first attempt is generally not productive –particularly given the widespread acceptance of Mike Tyson’s truism: “Everybody has a plan until they get punched in the face.” This program refocuses efforts on learning how to iteratively improve the way data is strategically applied. This will permit data-based strategy components to keep up with agile, evolving organizational strategies. It also contributes to three primary organizational data goals. Learn how to improve the following:
- Your organization’s data
- The way your people use data
- The way your people use data to achieve your organizational strategy
This will help in ways never imagined. Data are your sole non-depletable, non-degradable, durable strategic assets, and they are pervasively shared across every organizational area. Addressing existing challenges programmatically includes overcoming necessary but insufficient prerequisites and developing a disciplined, repeatable means of improving business objectives. This process (based on the theory of constraints) is where the strategic data work really occurs as organizations identify prioritized areas where better assets, literacy, and support (data strategy components) can help an organization better achieve specific strategic objectives. Then the process becomes lather, rinse, and repeat. Several complementary concepts are also covered, including:
- A cohesive argument for why data strategy is necessary for effective data governance
- An overview of prerequisites for effective strategic use of data strategy, as well as common pitfalls
- A repeatable process for identifying and removing data constraints
- The importance of balancing business operation and innovation
Who Should Own Data Governance – IT or Business?DATAVERSITY
The question is asked all the time: “What part of the organization should own your Data Governance program?” The typical answers are “the business” and “IT (information technology).” Another answer to that question is “Yes.” The program must be owned and reside somewhere in the organization. You may ask yourself if there is a correct answer to the question.
Join this new RWDG webinar with Bob Seiner where Bob will answer the question that is the title of this webinar. Determining ownership of Data Governance is a vital first step. Figuring out the appropriate part of the organization to manage the program is an important second step. This webinar will help you address these questions and more.
In this session Bob will share:
- What is meant by “the business” when it comes to owning Data Governance
- Why some people say that Data Governance in IT is destined to fail
- Examples of IT positioned Data Governance success
- Considerations for answering the question in your organization
- The final answer to the question of who should own Data Governance
It is clear that Data Management best practices exist and so does a useful process for improving existing Data Management practices. The question arises: Since we understand the goal, how does one design a process for Data Management goal achievement? This program describes what must be done at the programmatic level to achieve better data use and a way to implement this as part of your data program. The approach combines DMBoK content and CMMI/DMM processes – permitting organizations with the opportunity to benefit from the best of both. It also permits organizations to understand:
- Their current Data Management practices
- Strengths that should be leveraged
- Remediation opportunities
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
MLOps is a practice for collaboration between Data Science and operations to manage the production machine learning (ML) lifecycles. As an amalgamation of “machine learning” and “operations,” MLOps applies DevOps principles to ML delivery, enabling the delivery of ML-based innovation at scale to result in:
Faster time to market of ML-based solutions
More rapid rate of experimentation, driving innovation
Assurance of quality, trustworthiness, and ethical AI
MLOps is essential for scaling ML. Without it, enterprises risk struggling with costly overhead and stalled progress. Several vendors have emerged with offerings to support MLOps: the major offerings are Microsoft Azure ML and Google Vertex AI. We looked at these offerings from the perspective of enterprise features and time-to-value.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
3. The Power of Kafka
80%
Fortune 100 Companies
Using Apache Kafka®
(majority are Confluent customers)
3
4. I N V E S T M E N T & T I M E
SCALE
The toll and risk of managing Kafka grows over time
● Architecture planning
● Kafka cluster sizing
● Cluster provisioning
● Build & set up source/sink connectors
● Build monitoring and reporting tools
● Maintain software patches and upgrades
● Monitor and react to security and reliability
risks
● Maintain, upgrade, and restart servers
● Monitor Kafka cluster
loading and rebalancing
● Plan and execute cluster
expansions
● Provide utilization visibility
across the organization
● Maintain cluster
performance through infra
upgrades, migrations, etc
● Manage Zookeeper
● And so much more . . .
5. Core vs. Non-Core
“Whatever is the new thing that creates
differentiation, that’s core.” Meanwhile, context
is any activity that doesn’t differentiate the
company in the eyes of the customer. “If we do
[context] brilliantly, we don’t succeed. If we do
it badly, we get in trouble.”
— Geoffrey Moore, author and management expert
6. The Move to Cloud-Native Data Systems
6
Storage
Data
Warehousing
Databases
Amazon S3 Blob
storage
Redshift
Snowflake
Google
BigTable
Self managed
hardware and
software
Fully managed
services
Data
Movement
“By 2022, 75% of all databases will be deployed or migrated to a cloud platform.”
- Gartner
8. Partially Managed
(cloud-hosted)
Manual operations
with tools and/or
support
Ease and speed of use
Low High
Other managed
services for Kafka
Confluent is the only cloud-native Kafka solution
Self-Managed
(DIY)
Development and
maintenance
without support
Fully Managed
(cloud-native)
Automated product
capabilities with no
operational overhead
Confluent Cloud
9. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
The 4 Benefits of a Cloud-Native Kafka Service
Scale up instantly
to meet any
demand and scale
back down to avoid
over-provisioning
infrastructure
Elastic
Build and launch
faster with a
complete data in
motion platform
that goes way
beyond Kafka
Complete
Infinite
Store unlimited data
on Confluent to
enhance your
real-time apps and
use cases with a
broader set of data
Global
Create a consistent
data fabric
throughout your
organization by
linking clusters across
your different
environments
11. Elastic Scaling in Confluent Cloud
Confluent Cloud
Milliseconds Minutes
Basic, Standard [0-100Mbps]
*Even in public clouds provider quotas for VMs, disks, security groups can be encountered causing delays. Confluent has these limits raised already.
Dedicated [Mbps - Gbps]
Select CKU from drop down in
cluster management UI and
click Apply Changes
Other Kafka Services
Days - Weeks
Determine how much capacity is needed
Procure capacity*
Configure new brokers
a. Disks b. OS c. Network d. Kafka (application)
Identify partitions on specific brokers to
rebalance & topics they are part of
For each Topic: migrate partitions
a. Increase ISR +1 b. Wait for new replica to sync
c. Failover master d. Reduce ISR -1 e. Delete old replica
Automated 1 Click
14. Infinite Retention
Scale up storage without
adding compute
resources by removing
storage limits on topics
to enable infinite data
retention in Kafka
Cluster
Broker 1 Broker 2 Broker 2
Topic 1
Topic 2
Topic 3
Topic 2
Topic 3
Topic 2
Topic 3
Topic 1
Topic 1
Available on AWS and Google Cloud
15. Infinite Retention
Cluster
Broker 1 Broker 2 Broker 2
Topic 1
Topic 2
Topic 3
Topic 2
Topic 3
Topic 2
Topic 3
Topic 1
Topic 1
Available on AWS and Google Cloud
● Reimagine event streaming
apps that can use both
real-time and historical events
● Faster and better informed
decision making for your
event-driven processes
● Meet regulatory compliance
requirements for data
retention
17. “Confluent gives us the tools we need
to drive innovation. Before Confluent
Cloud, when we had broker outages
that required rebuilds, it could take up
to three days of developers’ time to
resolve. Now, Confluent takes care of
everything for us, so our developers
can focus on building new features
and applications.”
— Jon Vines, Software Development
Team Lead at AO.com
AO.com maintains focus
on value with Confluent
19. 19
Deploy how and where you want
Confluent AWS
MARKETPLACE
Azure
MARKETPLACE
Google Cloud
MARKETPLACE
Start your multi-cloud
journey
Simplify procurement with easy click-though through annual commitments and draw
down against existing Enterprise Agreements in your cloud marketplace of choice
20 regions 16 regions 22 regions
20. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Global Access
● Create a consistent data
layer across clouds: easily
share data between fully
independent clusters
● Simplify your architecture:
bridge or migrate data
without needing a separate
Connect cluster
Cluster Linking (preview) requires no additional
infrastructure and fully preserves offsets
Migrate
clusters to
Confluent
Cloud
Access real-time data globally and across public and private clouds
with linked Kafka clusters that sync in real-time
22. Build and launch faster
with a Complete data in motion platform
Pre-built
Integrations
Data
Governance
Stream
Processing
Enterprise-Grade
Security
Instantly integrate
clusters to the most
popular data sources
and sinks within the
Kafka ecosystem
Make data backwards
compatible and
future-proof with
Schema Registry &
Validation
Launch real-time and
continuous stream
processing apps with
a lightweight SQL
syntax
Ensure data
confidentiality,
control access
management, and
monitor user activity
23. Instantly Connect Popular Data Sources & Sinks
Choose from a growing list of fully managed connectors in Cloud Confluent
Sources Sinks
Amazon
Kinesis
Azure Event
Hubs
Datagen GCP
Pub/Sub
Microsoft SQL
Server CDC
Microsoft
SQL Server
MongoDB
Atlas
MySQL CDC MySQL Oracle
Database
Postgres
CDC
Postgres Salesforce
CDC
Elasticsearch
Service
GCP
BigQuery
GCP
Dataproc
AWS
Lambda
Azure
Functions
GCP Cloud
Storage
Microsoft
SQL Server
MongoDB
Atlas
MySQL Postgres Snowflake
Visit confluent.io/hub to see all available connectors
Datadog
Metrics
24. MongoDB Atlas Source
& Sink Connectors
Build and launch real-time apps faster
with fully managed source & sink
connectors.
Now Generally Available!
Deploy with just a few clicks and
eliminate the efforts to build, test, &
maintain custom integrations
Work with data anywhere with setup
available across all major clouds
including AWS, Azure, and Google Cloud
Drive analysis with real-time data by
connecting Apache Kafka®
and
MongoDB
25. Manage Schema Registry
Centrally create, edit and view topic schemas,
and compare schema versions
Configure Schema Validation
Enable the feature at the topic level when
creating or editing topics
Simplify Enterprise Schema Management
26. ksqlDB provides everything you need to build
a complete, end-to-end event streaming
application entirely with SQL syntax
DB
APP
APP
DB
PULL
PUSH
CONNECTORS
STREAM
PROCESSING
STATE STORES
ksqlDB
1 2
APP
Simplify
Your Stream
Processing
Architecture
27. Protect sensitive
data in the cloud
● SOC 1, SOC 2, SOC 3
● ISO 27001
● PCI, HIPAA1
ready, GDPR
ready
Enterprise grade security and compliance
Secure data against
unwanted access
● Data-at-rest encryption
● Data-in-motion encryption
● Private networking
connectivity
Control access to
cluster & resources
● SAML / SSO
● RBAC & Audit Logs
● Access Control Lists (ACLs)
1
Available on Dedicated clusters only
29. The Cost-Efficient Way to Deploy Kafka
Confluent Cloud
(typical savings)
1
2
3
⬇60
%
Cloud
Infrastructure
Operational
(FTE)
Support and
other spend
Total
self-managed
Confluent
Cloud
30. Try Confluent Free
De-risk, accelerate, and save with Confluent
Request TCO Assessment
confluent.io/request-kafka-tco-assessment