This document provides an overview of Kafka Connect and how it can be used to stream data between Kafka and other data systems. It discusses key Kafka Connect concepts like connectors, converters, transforms, deployment modes, and troubleshooting. The document contains configuration examples for connectors and transforms.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
Apache Kafka is a distributed messaging system that allows for publishing and subscribing to streams of records, known as topics, in a fault-tolerant and scalable way. It is used for building real-time data pipelines and streaming apps. Producers write data to topics which are committed to disks across partitions and replicated for fault tolerance. Consumers read data from topics in a decoupled manner based on offsets. Kafka can process streaming data in real-time and at large volumes with low latency and high throughput.
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
I will present the recent additions to Kafka to achieve exactly-once semantics (0.11.0) within its Streams API for stream processing use cases. This is achieved by leveraging the underlying idempotent and transactional client features. The main focus will be the specific semantics that Kafka distributed transactions enable in Streams and the underlying mechanics to let Streams scale efficiently.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...confluent
Ever wondered just how many CPU cores of KSQL Server you need to provision to handle your planned stream processing workload ? Or how many GBits of aggregate network bandwidth, spread across some number of processing threads, you'll need to deal with combined peak throughput of multiple queries ? In this talk we'll first explore the basic drivers of KSQL throughput and hardware requirements, building up to more advanced query plan analysis and capacity-planning techniques, and review some real-world testing results along the way. Finally we will recap how and what to monitor to know you got it right!
Apache Kafka® is the technology behind event streaming which is fast becoming the central nervous system of flexible, scalable, modern data architectures. Customers want to connect their databases, data warehouses, applications, microservices and more, to power the event streaming platform. To connect to Apache Kafka, you need a connector!
This online talk dives into the new Verified Integrations Program and the integration requirements, the Connect API and sources and sinks that use Kafka Connect. We cover the verification steps and provide code samples created by popular application and database companies. We will discuss the resources available to support you through the connector development process.
This is Part 2 of 2 in Building Kafka Connectors - The Why and How
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
Apache Kafka is a distributed messaging system that allows for publishing and subscribing to streams of records, known as topics, in a fault-tolerant and scalable way. It is used for building real-time data pipelines and streaming apps. Producers write data to topics which are committed to disks across partitions and replicated for fault tolerance. Consumers read data from topics in a decoupled manner based on offsets. Kafka can process streaming data in real-time and at large volumes with low latency and high throughput.
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
I will present the recent additions to Kafka to achieve exactly-once semantics (0.11.0) within its Streams API for stream processing use cases. This is achieved by leveraging the underlying idempotent and transactional client features. The main focus will be the specific semantics that Kafka distributed transactions enable in Streams and the underlying mechanics to let Streams scale efficiently.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...confluent
Ever wondered just how many CPU cores of KSQL Server you need to provision to handle your planned stream processing workload ? Or how many GBits of aggregate network bandwidth, spread across some number of processing threads, you'll need to deal with combined peak throughput of multiple queries ? In this talk we'll first explore the basic drivers of KSQL throughput and hardware requirements, building up to more advanced query plan analysis and capacity-planning techniques, and review some real-world testing results along the way. Finally we will recap how and what to monitor to know you got it right!
Apache Kafka® is the technology behind event streaming which is fast becoming the central nervous system of flexible, scalable, modern data architectures. Customers want to connect their databases, data warehouses, applications, microservices and more, to power the event streaming platform. To connect to Apache Kafka, you need a connector!
This online talk dives into the new Verified Integrations Program and the integration requirements, the Connect API and sources and sinks that use Kafka Connect. We cover the verification steps and provide code samples created by popular application and database companies. We will discuss the resources available to support you through the connector development process.
This is Part 2 of 2 in Building Kafka Connectors - The Why and How
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
Apache Kafka is a distributed publish-subscribe messaging system that can handle high volumes of data and enable messages to be passed from one endpoint to another. It uses a distributed commit log that allows messages to be persisted on disk for durability. Kafka is fast, scalable, fault-tolerant, and guarantees zero data loss. It is used by companies like LinkedIn, Twitter, and Netflix to handle high volumes of real-time data and streaming workloads.
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
High level introduction to Kafka Connect and Kafka Streams, two components of the Apache Kafka open source framework. See the concepts, architecture and features.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
Kafka Streams is the popular stream processing component of Apache Kafka®. One of its best features is stateful operations. Kafka Streams works hard to ensure stateful operations can scale horizontally and survive failures, but doing so takes time. Kafka Streams offers the concept of ""standby-tasks,"" allowing for near-zero downtime failover, but surprisingly this feature still isn't widely used. The could be for various reasons, from lack of awareness to needing additional resources.
This presentation will cover how standby tasks work and how they're enabled. Additionally, I'll cover the work done in KIP-441 that enables faster scaling out for stateful tasks and provides more balanced stateful assignments. I'll also dive into the consumer rebalance protocol improvements that enable KIP-441 to be effective.
Attendees of this presentation will walk away understanding how and when to use standby tasks, leverage the improvements from KIP-441, and have a deeper understanding of how Kafka Streams works with state.
This document provides an introduction to Apache Kafka. It describes Kafka as a distributed messaging system with features like durability, scalability, publish-subscribe capabilities, and ordering. It discusses key Kafka concepts like producers, consumers, topics, partitions and brokers. It also summarizes use cases for Kafka and how to implement producers and consumers in code. Finally, it briefly outlines related tools like Kafka Connect and Kafka Streams that build upon the Kafka platform.
Kafka Streams: What it is, and how to use it?confluent
Kafka Streams is a client library for building distributed applications that process streaming data stored in Apache Kafka. It provides a high-level streams DSL that allows developers to express streaming applications as set of processing steps. Alternatively, developers can use the lower-level processor API to implement custom business logic. Kafka Streams handles tasks like fault-tolerance, scalability and state management. It represents data as streams for unbounded data or tables for bounded state. Common operations include transformations, aggregations, joins and table operations.
Integrating Apache Kafka Into Your Environmentconfluent
Watch this talk here: https://www.confluent.io/online-talks/integrating-apache-kafka-into-your-environment-on-demand
Integrating Apache Kafka with other systems in a reliable and scalable way is a key part of an event streaming platform. This session will show you how to get streams of data into and out of Kafka with Kafka Connect and REST Proxy, maintain data formats and ensure compatibility with Schema Registry and Avro, and build real-time stream processing applications with Confluent KSQL and Kafka Streams.
This session is part 4 of 4 in our Fundamentals for Apache Kafka series.
Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
Presentation by Colin McCabe, Confluent, Big Data Day LA
What is Apache Kafka and What is an Event Streaming Platform?confluent
Speaker: Gabriel Schenker, Lead Curriculum Developer, Confluent
Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration. With Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Apache Kafka is a distributed streaming platform and distributed publish-subscribe messaging system. It uses a log abstraction to order events and replicate data across clusters. Kafka allows developers to publish and subscribe to streams of records known as topics. Producers publish data to topics and consumers subscribe to topics to process streams of records. The Kafka ecosystem includes tools like KStreams for stream processing and KSQL for querying streams of data.
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
Apache Kafka is a distributed publish-subscribe messaging system that can handle high volumes of data and enable messages to be passed from one endpoint to another. It uses a distributed commit log that allows messages to be persisted on disk for durability. Kafka is fast, scalable, fault-tolerant, and guarantees zero data loss. It is used by companies like LinkedIn, Twitter, and Netflix to handle high volumes of real-time data and streaming workloads.
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
High level introduction to Kafka Connect and Kafka Streams, two components of the Apache Kafka open source framework. See the concepts, architecture and features.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
Kafka Streams is the popular stream processing component of Apache Kafka®. One of its best features is stateful operations. Kafka Streams works hard to ensure stateful operations can scale horizontally and survive failures, but doing so takes time. Kafka Streams offers the concept of ""standby-tasks,"" allowing for near-zero downtime failover, but surprisingly this feature still isn't widely used. The could be for various reasons, from lack of awareness to needing additional resources.
This presentation will cover how standby tasks work and how they're enabled. Additionally, I'll cover the work done in KIP-441 that enables faster scaling out for stateful tasks and provides more balanced stateful assignments. I'll also dive into the consumer rebalance protocol improvements that enable KIP-441 to be effective.
Attendees of this presentation will walk away understanding how and when to use standby tasks, leverage the improvements from KIP-441, and have a deeper understanding of how Kafka Streams works with state.
This document provides an introduction to Apache Kafka. It describes Kafka as a distributed messaging system with features like durability, scalability, publish-subscribe capabilities, and ordering. It discusses key Kafka concepts like producers, consumers, topics, partitions and brokers. It also summarizes use cases for Kafka and how to implement producers and consumers in code. Finally, it briefly outlines related tools like Kafka Connect and Kafka Streams that build upon the Kafka platform.
Kafka Streams: What it is, and how to use it?confluent
Kafka Streams is a client library for building distributed applications that process streaming data stored in Apache Kafka. It provides a high-level streams DSL that allows developers to express streaming applications as set of processing steps. Alternatively, developers can use the lower-level processor API to implement custom business logic. Kafka Streams handles tasks like fault-tolerance, scalability and state management. It represents data as streams for unbounded data or tables for bounded state. Common operations include transformations, aggregations, joins and table operations.
Integrating Apache Kafka Into Your Environmentconfluent
Watch this talk here: https://www.confluent.io/online-talks/integrating-apache-kafka-into-your-environment-on-demand
Integrating Apache Kafka with other systems in a reliable and scalable way is a key part of an event streaming platform. This session will show you how to get streams of data into and out of Kafka with Kafka Connect and REST Proxy, maintain data formats and ensure compatibility with Schema Registry and Avro, and build real-time stream processing applications with Confluent KSQL and Kafka Streams.
This session is part 4 of 4 in our Fundamentals for Apache Kafka series.
Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
Presentation by Colin McCabe, Confluent, Big Data Day LA
What is Apache Kafka and What is an Event Streaming Platform?confluent
Speaker: Gabriel Schenker, Lead Curriculum Developer, Confluent
Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration. With Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Apache Kafka is a distributed streaming platform and distributed publish-subscribe messaging system. It uses a log abstraction to order events and replicate data across clusters. Kafka allows developers to publish and subscribe to streams of records known as topics. Producers publish data to topics and consumers subscribe to topics to process streams of records. The Kafka ecosystem includes tools like KStreams for stream processing and KSQL for querying streams of data.
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Today’s enterprises have their core systems often implemented on top of relational databases, such as the Oracle RDBMS. Implementing a new solution supporting the digital strategy using Kafka and the ecosystem can not always be done completely separate from the traditional legacy solutions. Often streaming data has to be enriched with state data which is held in an RDBMS of a legacy application. It’s important to cache this data in the stream processing solution, so that It can be efficiently joined to the data stream. But how do we make sure that the cache is kept up-to-date, if the source data changes? We can either poll for changes from Kafka using Kafka Connect or let the RDBMS push the data changes to Kafka. But what about writing data back to the legacy application, i.e. an anomaly is detected inside the stream processing solution which should trigger an action inside the legacy application. Using Kafka Connect we can write to a database table or view, which could trigger the action. But this not always the best option. If you have an Oracle RDBMS, there are many other ways to integrate the database with Kafka, such as Advanced Queueing (message broker in the database), CDC through Golden Gate or Debezium, Oracle REST Database Service (ORDS) and more. In this session, we present various blueprints for integrating an Oracle RDBMS with Apache Kafka in both directions and discuss how these blueprints can be implemented using the products mentioned before.
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...confluent
A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Today's enterprises have their core systems often implemented on top of relational databases, such as the Oracle RDBMS. Implementing a new solution supporting the digital strategy using Kafka and the ecosystem can not always be done completely separate from the traditional legacy solutions. Often streaming data has to be enriched with state data which is held in an RDBMS of a legacy application. It's important to cache this data in the stream processing solution, so that It can be efficiently joined to the data stream. But how do we make sure that the cache is kept up-to-date, if the source data changes? We can either poll for changes from Kafka using Kafka Connect or let the RDBMS push the data changes to Kafka. But what about writing data back to the legacy application, i.e. an anomaly is detected inside the stream processing solution which should trigger an action inside the legacy application. Using Kafka Connect we can write to a database table or view, which could trigger the action. But this not always the best option. If you have an Oracle RDBMS, there are many other ways to integrate the database with Kafka, such as Advanced Queueing (message broker in the database), CDC through Golden Gate or Debezium, Oracle REST Database Service (ORDS) and more. In this session, we present various blueprints for integrating an Oracle RDBMS with Apache Kafka in both directions and discuss how these blueprints can be implemented using the products mentioned before.
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
As a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and excels at processing streams of real-time events. Kafka provides reliable, millisecond delivery for connecting downstream systems with real-time data.
In this talk, we will show how easy it is to leverage Kafka and the Elasticsearch connector to keep your indices populated with the latest data from the rest of your enterprise, as it changes.
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
Apache Kafka is a popular distributed streaming data platform. A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Data sources flowing into Kafka are often native data streams such as social media streams, telemetry data, financial transactions and many others. But these data stream only contain part of the information. A lot of data necessary in stream processing is stored in traditional systems backed by relational databases. To implement new and modern, real-time solutions, an up-to-date view of that information is needed. So how do we make sure that information can flow between the RDBMS and Kafka, so that changes are available in Kafka as soon as possible in near-real-time? This session will present different approaches for integrating relational databases with Kafka, such as Kafka Connect, Oracle GoldenGate and bridging Kafka with Oracle Advanced Queuing (AQ).
The document discusses different approaches for describing and deploying applications in the cloud, including declarative and procedural options. It provides examples of AWS CloudFormation templates, OpenStack Heat templates, Apache Whirr specifications, and Brooklyn application blueprints. It also outlines ongoing work to standardize application modeling through the OASIS TOSCA specification.
KSQL is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. KSQL is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
KSQL offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using KSQL for most part. This will be done in a live demo on a fictitious IoT sample.
This presentation discusses how Kafka Streams can expose the data from stateful operations, which can enable robust and more streamlined microservices.
Kick your database_to_the_curb_reston_08_27_19confluent
This document discusses using Kafka Streams interactive queries to enable powerful microservices by making stream processing results queryable in real-time. It provides an overview of Kafka Streams, describes how to embed an interactive query server to expose stateful stream processing results via HTTP endpoints, and demonstrates how to securely query processing state from client applications.
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
This document discusses streaming architectures and libraries for processing streaming data. It provides an overview of Kafka Streams and Akka Streams, highlighting that Kafka Streams is ideal for ETL, aggregations, joins and "effectively once" requirements, while Akka Streams is suitable for low latency, mid-volume workloads based on graphs of processing nodes. It also includes a Kafka Streams example in Scala for scoring streaming data records using a streaming machine learning model.
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreDropsolid
Given at DrupalJam 2015 - Netherlands.
This presentation explains some of the fundamental issues you have to overcome when designing software for distributed systems that can fail. Also called "Cloud" in other terminologies. The presentation uses AWS components to explain these fundamentals and uses Drupal as the example application. The example is by no means perfect, but gives you a good idea how to design your system from scratch.
Technologies used:
Cloudformation
EC2 Instances
RDS MySQL Database
Elastic Load Balancer
ElastiCache (Memcache)
Example can be found here:
https://gist.github.com/nickveenhof/601c5dc1b76ff26896bf
Take note that the example does not include components such as VPC for simplicity, but it is highly recommended to add this.
The document discusses asynchronous programming with Spring 4.X and relational database management systems in microservices architectures. It covers asynchronous vs synchronous programming, the C10K problem of handling 10,000 clients simultaneously and its solutions like load balancing, NoSQL databases, and event-driven programming. It provides examples of using Spring's @Async annotation, DeferredResult, and CompletableFuture for asynchronous programming. It also discusses challenges with databases being blocking I/O and solutions like avoiding blocking on database connections, using asynchronous data access with Spring, and transaction management across asynchronous calls.
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...confluent
Integrating Apache Kafka with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren’t working. This talk will discuss the key design concepts within Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We’ll do a live demo of building pipelines with Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we’ll go hands-on in methodically diagnosing and resolving common issues encountered with Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Kafka Connect in containers.
This document discusses using Kafka for service messaging in a service-oriented architecture. Some key points discussed include:
- Kafka allows for one message per event, instant message passing between services, ability to replay messages, and messages having a schema.
- Kafka Streams provides a framework for building streaming applications and processing data in Kafka topics. It allows for querying state stored in Kafka.
- Real-world examples show Kafka Streams can handle high throughput workloads of 50,000 messages per second across multiple instances.
- Kafka Connect is used for simple data integration without transformations by moving data to and from Kafka.
AMS adapters in Rails 5 are important for building a wonderful RESTful API because they provide hypermedia controls and links in the API response. The JSON API adapter in particular generates responses that follow the JSON API specification by including links, relationships, pagination metadata, and embedded resources. Adapters allow Rails to render different representation formats like JSON API, customize the API structure, and improve the developer experience of the API through conventions like hypermedia controls and links. However, there is still work to be done on adapters, like improving deserialization, documentation, and better supporting JSON API conventions fully.
Deployment and Management on AWS: A Deep Dive on Options and ToolsDanilo Poccia
This document provides an overview and comparison of different options for deploying and managing applications on AWS: AWS Elastic Beanstalk, AWS OpsWorks, AWS CloudFormation, and raw Amazon EC2. It discusses the tradeoffs between convenience, control, and complexity for each option. It also includes code samples and descriptions of features for each service.
This document discusses serverless architectures and building a web chat application using AWS serverless services like Lambda, API Gateway, DynamoDB, IoT, and Cognito. It explains what serverless is, shows how to use AWS services without provisioning servers, and demonstrates how to build a serverless web chat app with real-time capabilities using IoT topics and rules. Code for the chat app is available on GitHub.
Similar to Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_20190826v01 (20)
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Santander Stream Processing with Apache Flinkconfluent
Flink is becoming the de facto standard for stream processing due to its scalability, performance, fault tolerance, and language flexibility. It supports stream processing, batch processing, and analytics through one unified system. Developers choose Flink for its robust feature set and ability to handle stream processing workloads at large scales efficiently.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
This document discusses networking options and best practices for Confluent Cloud. It provides an overview of public endpoints, private link, and peering options. It then discusses best practices for private networking architectures on Azure using hub-and-spoke and private link designs. Finally, it addresses networking considerations and challenges for Kafka Connect managed connectors, as well as planned enhancements for DNS peering and outbound private link support.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
This document discusses moving to an event-driven architecture using Confluent. It begins by outlining some of the limitations of traditional messaging middleware approaches. Confluent provides benefits like stream processing, persistence, scalability and reliability while avoiding issues like lack of structure, slow consumers, and technical debt. The document then discusses how Confluent can help modernize architectures, enable new real-time use cases, and reduce costs through migration. It provides examples of how companies like Advance Auto Parts and Nord/LB have benefitted from implementing Confluent platforms.
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
29. 29
What About Internal Converters
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
value.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter.bork.bork.bork=org.apache.kafka.connect.json.JsonConverter
key.internal.value.please.just.work.converter=org.apache.kafka.connect.json.JsonConverter
30. 30
What About Internal Converters
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
value.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter.bork.bork.bork=org.apache.kafka.connect.json.JsonConverter
key.internal.value.please.just.work.converter=org.apache.kafka.connect.json.JsonConverter
Internal converters are for internal use by Kafka Connect only, and have been deprecated as of Apache Kafka 2.0.
You should not change these, and you will get warnings from Apache Kafka 2.0 onwards, if you do try to configure them.
37. 37
Confluent Hub
Online library of pre-packaged and
ready-to-install extensions or add-ons
for Confluent Platform and Apache
Kafka®:
● Connectors
● Transforms
● Converters
Easily install the components that suit
your needs into your local
environment with the Confluent Hub
client command line tool . https://hub.confuent.io
61. 61
Error Handling & Dead Letter Queues (DLQ)
https://cnfl.io/connect-dlq
Handled
● Convert (read/write from
Kafka®, [de]-serialisation)
● Transform
Not Handled
● Start (connections to a Data
Store)
● Poll / Put (read/write from/to
Data Store)*
* Can be retried by Kafka® Connect
63. 63
You Only Live Once...
Source Topic Messages Sink Messages
https://cnfl.io/connect-dlq
errors.tolerance=all
64. 64
Dead Letter Queue
Source Topic Messages Sink Messages
https://cnfl.io/connect-dlq
errors.tolerance=all
errors.deadletterqueue.topic.name=my_dlq
Dead Letter Queue
65. 65
Re-processing the Dead Letter Queue
Source Topic Messages Sink Messages
https://cnfl.io/connect-dlq
Dead
Letter
Queue
Sink Messages
AVRO Sink
JSON Sink
67. 67
No fields found using key and value schemas for table:
foo-bar
JsonDeserializer with schemas.enable requires
"schema" and "payload" fields and may not contain
additional fields
68. 68
Schema where are You...
No fields found using key and value schemas for table: foo-bar
JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
70. 70
Schema where are You...
{
"id": 7,
"order_id": 7,
"customer_id": 849,
"order_total_usd": 98062.21,
"make": "Pontiac",
"model": "Aztek",
"delivery_city": "Leeds",
"delivery_company": "Price-Zieme",
"delivery_address": "754 Becker Way",
"CREATE_TS": "2019-05-09T13:42:28Z",
"UPDATE_TS": "2019-05-09T13:42:28Z"
}
No Schema!
Messages
(JSON)
No fields found using key and value schemas for table: foo-bar
71. 71
Schema where are You...
You need a schema!
Use Avro, or use JSON with schemas.enable=true
Either way, you need to re-configure the producer of the data
Or, use KSQL to impose a schema on the JSON/CSV data
and re-serialise it to Avro 👍🏻
No fields found using key and value schemas for table: foo-bar
72. 72
Schema where are You...
Schema
(Embedded per
JSON message)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
[...]
],
"optional": true,
"name": "asgard.demo.ORDERS.Value"
},
"payload": {
"id": 611,
"order_id": 111,
"customer_id": 851,
"order_total_usd": 182190.93,
"make": "Kia",
"model": "Sorento",
"delivery_city": "Swindon",
"delivery_company": "Lehner,
Kuvalis and Schaefer",
"delivery_address": "31 Cardinal
Junction",
"CREATE_TS":
"2019-05-09T13:54:59Z",
"UPDATE_TS": "2019-05-09T13:54:59Z"
}
Messages
(JSON)
JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
74. 7474
11. November 2019
Steigenberger Frankfurter Hof
13. November 2019
NOVOTEL Zürich City West
Ben Stopford
Office of the CTO
Confluent
Axel Löhn
Senior Project Manager
Deutsche Bahn
Kai Waehner,
Technologist
Confluent
Ralph Debusmann
IoT Solution Architect
Bosch Power Tools
cnfl.io/cse19frankfurt cnfl.io/cse19zurich