ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Presentation @ Oracle Code Berlin.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can we make sure that all these events are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amounts of messages between a source and a target. This session will start with an introduction of Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Presentation @ Oracle Code Berlin.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can we make sure that all these events are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amounts of messages between a source and a target. This session will start with an introduction of Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
Apache Kafka Scalable Message Processing and more! Guido Schmutz
media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Whether you are developing a greenfield data project or migrating a legacy system,
there are many critical design decisions to be made. Often, it is advantageous to not only
consider immediate requirements, but also the future requirements and technologies you may
want to support. Your project may start out supporting batch analytics with the vision of adding
realtime support. Or your data pipeline may feed data to one technology today, but tomorrow
an entirely new system needs to be integrated. Apache Kafka can help decouple these
decisions and provide a flexible core to your data architecture. This talk will show how building
Kafka into your pipeline can provide the flexibility to experiment, evolve and grow. It will also
cover a brief overview of Kafka, its architecture, and terminology.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
Real-life Machine Learning (ML) workloads typically require more than training and predicting: data often needs to be pre-processed and post-processed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm, but a sequence of algorithms that will collaborate in delivering predictions from raw data. In this session, we’ll first show you how to use Apache Spark MLlib to build ML pipelines, and we’ll discuss scaling options when datasets grow huge. We’ll then show how to how implement inference pipelines on Amazon SageMaker, using Apache Spark, Scikit-learn, as well as ML algorithms implemented by Amazon.
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
In the world of sensors and social media streams, the integration and handling of high-volume event streams is more important than ever. Events have to be handled both efficiently and reliably and often many consumers or systems are interested in all or part of the events. How do we make sure that all these event are accepted and forwarded in an efficient and reliable way? Apache Kafka, a distributed, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target can be of great help in such scenario.
This session introduces Apache Kafka and its place in a modern architecture, shows its integration with Oracle Stack and presents the Oracle Event Hub cloud service, the managed Kafka service.
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Data Con LA
Big Data systems keeps getting bigger. Types and counts of components involved in the system to get critical answers for businesses just keep increasing. It is like increasing number of houses or businesses in any metropolitan city. With more places people can be, more transportation is required. Requirement of more transportation is either met by more cars or a better public transportation system. More individual vehicles add to the traffic chaos. Apache Kafka is like public transportation system of the city of Big Data or even better a distributed system. This talk will go over basic overview of Apache Kafka, various components involved in it and the What and the How of the problem it solves.
Introduction to Apache Kafka and Confluent... and why they matterconfluent
Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017.
Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data. How can we efficiently collect and transmit these events? How can we make sure that we can always report over historical events? How can these new events be integrated into traditional infrastructure and application landscape?
Starting with a product and technology neutral reference architecture, we will then present different solutions using Open Source frameworks and the Oracle Stack both for on premises as well as the cloud.
Reliable Data Intestion in BigData / IoTGuido Schmutz
Many of the Big Data and IoT use cases are based on combing data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
Apache Kafka Scalable Message Processing and more! Guido Schmutz
media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Whether you are developing a greenfield data project or migrating a legacy system,
there are many critical design decisions to be made. Often, it is advantageous to not only
consider immediate requirements, but also the future requirements and technologies you may
want to support. Your project may start out supporting batch analytics with the vision of adding
realtime support. Or your data pipeline may feed data to one technology today, but tomorrow
an entirely new system needs to be integrated. Apache Kafka can help decouple these
decisions and provide a flexible core to your data architecture. This talk will show how building
Kafka into your pipeline can provide the flexibility to experiment, evolve and grow. It will also
cover a brief overview of Kafka, its architecture, and terminology.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
Real-life Machine Learning (ML) workloads typically require more than training and predicting: data often needs to be pre-processed and post-processed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm, but a sequence of algorithms that will collaborate in delivering predictions from raw data. In this session, we’ll first show you how to use Apache Spark MLlib to build ML pipelines, and we’ll discuss scaling options when datasets grow huge. We’ll then show how to how implement inference pipelines on Amazon SageMaker, using Apache Spark, Scikit-learn, as well as ML algorithms implemented by Amazon.
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
In the world of sensors and social media streams, the integration and handling of high-volume event streams is more important than ever. Events have to be handled both efficiently and reliably and often many consumers or systems are interested in all or part of the events. How do we make sure that all these event are accepted and forwarded in an efficient and reliable way? Apache Kafka, a distributed, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target can be of great help in such scenario.
This session introduces Apache Kafka and its place in a modern architecture, shows its integration with Oracle Stack and presents the Oracle Event Hub cloud service, the managed Kafka service.
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Data Con LA
Big Data systems keeps getting bigger. Types and counts of components involved in the system to get critical answers for businesses just keep increasing. It is like increasing number of houses or businesses in any metropolitan city. With more places people can be, more transportation is required. Requirement of more transportation is either met by more cars or a better public transportation system. More individual vehicles add to the traffic chaos. Apache Kafka is like public transportation system of the city of Big Data or even better a distributed system. This talk will go over basic overview of Apache Kafka, various components involved in it and the What and the How of the problem it solves.
Introduction to Apache Kafka and Confluent... and why they matterconfluent
Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017.
Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data. How can we efficiently collect and transmit these events? How can we make sure that we can always report over historical events? How can these new events be integrated into traditional infrastructure and application landscape?
Starting with a product and technology neutral reference architecture, we will then present different solutions using Open Source frameworks and the Oracle Stack both for on premises as well as the cloud.
Reliable Data Intestion in BigData / IoTGuido Schmutz
Many of the Big Data and IoT use cases are based on combing data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events.
So far this mostly a development experience, with frameworks such as Oracle Event Processing, Apache Storm or Spark Streaming. With Oracle Stream Analytics, analytics on event streams can be put in the hands of the business analyst. It simplifies the implementation of event processing solutions so that every business analyst is able to graphically and decleratively define event stream processing pipelines, without having to write a single line of code or continous query language (CQL). Event Processing is no longer “complex”! This session presents Oracle Stream Analytics directly on some selected demo use cases.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination will bring the most value.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data. How can we efficiently collect and transmit these events? How can we make sure that we can always report over historical events? How can these new events be integrated into traditional infrastructure and application landscape?
Starting with a product and technology neutral reference architecture, we will then present different solutions using Open Source frameworks and the Oracle Stack both for on premises as well as the cloud.
Data Pipelines Made Simple with Apache Kafkaconfluent
Presentation by Ewen Cheslack-Postava, Engineer, Apache Kafka Committer, Confluent
In streaming workloads, often times data produced at the source is not useful down the pipeline or it requires some transformation to get it into usable shape. Similarly, where sensitive data is concerned, filtering of topics is helpful to ensure that the wrong data doesn't get to the wrong place.
The newest release of Apache Kafka now offers the ability to do transformations on individual messages, making is possible to implement finer grained transformations customized to your unique needs. In this session we’ll talk about the new single message transform capabilities, how to use them to implement things like data masking and advanced partitioning, and when you’ll need to use more complex tools like the Kafka Streams API instead.
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
For many industries the need to group together related events based on a period of activity or inactivity is key. Advertising businesses, content producers are just a few examples of where session windows can be used to better understand user behavior.
While such sessionization has been possible in Apache Kafka up to this point, implementing it has been rather complex and required leveraging low-level APIs. In the most recent release of Kafka, however, new capabilities have been added making session windows much easier to implement.
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
Building Event-Driven Services with Apache Kafkaconfluent
Should you use REST to sew services together? Is it better to use a richer, brokered protocol? This practical talk will dig into how we piece services together in event driven systems, how we we use a distributed log to create a central, persistent narrative and what benefits we reap from doing so.
The presentation is about the kafka 9 features and the bug fixes.
The presentation also talks about the new consumer. Explains the state diagram of consumer and co-ordinator
This tutorial was presented in KDD 2016 conference in San Francisco, CA. You can find the main presentation at http://www.slideshare.net/NeeraAgarwal2/streaming-analytics
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
Apache Flink is an open source project that offers both batch and stream processing on top of a common runtime and exposing a common API. This talk focuses on the stream processing capabilities of Flink.
RBea: Scalable Real-Time Analytics at KingGyula Fóra
This talk introduces RBEA (Rule-Based Event Aggregator), the scalable real-time analytics platform developed by King’s Streaming Platform team. We have built RBEA to make real-time analytics easily accessible to game teams across King without having to worry about operational details. RBEA is built on top of Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. We will talk about how we have built a high-level DSL on the abstractions provided by Flink and how we tackled different technical challenges that have come up while developing the system.
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Как устроить анализ данных 40 млн. человек за 5 лет так, чтобы это выглядело почти в реальном времени.
In this presentation Guido Schmutz talks about Apache Kafka, Kafka Core, Kafka Connect, Kafka Streams, Kafka and "Big Data"/"Fast Data Ecosystems, Confluent Data Platform and Kafka in Architecture.
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...HostedbyConfluent
Event-driven application architectures are becoming increasingly common as a large number of users demand more interactive, real-time, and intelligent responses. Yet it can be challenging to decide how to capture and perform real-time data analysis and deliver differentiating experiences. Join experts from Confluent and AWS to learn how to build Apache Kafka®-based streaming applications backed by machine learning models. Adopting the recommendations will help you establish repeatable patterns for high performing event-based apps.
Streaming Data Ingest and Processing with Apache KafkaAttunity
Apache™ Kafka is a fast, scalable, durable, and fault-tolerant
publish-subscribe messaging system. It offers higher throughput, reliability and replication. To manage growing data volumes, many companies are leveraging Kafka for streaming data ingest and processing.
Join experts from Confluent, the creators of Apache™ Kafka, and the experts at Attunity, a leader in data integration software, for a live webinar where you will learn how to:
-Realize the value of streaming data ingest with Kafka
-Turn databases into live feeds for streaming ingest and processing
-Accelerate data delivery to enable real-time analytics
-Reduce skill and training requirements for data ingest
The recorded webinar on slide 32 includes a demo using automation software (Attunity Replicate) to stream live changes from a database into Kafka and also includes a Q&A with our experts.
For more information, please go to www.attunity.com/kafka.
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams.
Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala.
Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application.
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
Microservices, events, containers, and orchestrators are dominating our vernacular today. As operations teams adapt to support these technologies in production, cloud-native platforms like Pivotal Cloud Foundry and Kubernetes have quickly risen to serve as force multipliers of automation, productivity and value.
Apache Kafka® is providing developers a critically important component as they build and modernize applications to cloud-native architecture.
This talk will explore:
• Why cloud-native platforms and why run Apache Kafka on Kubernetes?
• What kind of workloads are best suited for this combination?
• Tips to determine the path forward for legacy monoliths in your application portfolio
• Demo: Running Apache Kafka as a Streaming Platform on Kubernetes
Data & analytics challenges in a microservice architectureNiels Naglé
DataSaturday 2019 session:
Domain driven design, microservices, event-driven, polyglot data storage. All popular developments within software architecture to realize modular and ultra-scalable solutions. But what is the impact on the Data & Analytics side? So how to contain a global vision on the data and processes when every service contains their own logic, data and enrichments? which data is leading? How to avoid conflicts? So what do these architectures mean for Data & Analytics.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
The need for gleaning answers from unbounded data streams is moving from nicety to a necessity. Netflix is a data driven company, and has a need to process over 1 trillion events a day amounting to 3 PB of data to derive business insights.
To ease extracting insight, we are building a self-serve, scalable, fault-tolerant, multi-tenant "Stream Processing as a Service" platform so the user can focus on data analysis. I'll share our experience using Flink to help build the platform.
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
OSSNA Building Modern Data Streaming AppsTimothy Spann
OSSNA
Building Modern Data Streaming Apps
https://ossna2023.sched.com/event/1Jt05/virtual-building-modern-data-streaming-apps-with-open-source-timothy-spann-streamnative
Timothy Spann
Cloudera
Principal Developer Advocate
Data in Motion
In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize all the best features. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We make continuous queries against our topics with Flink SQL. We will stream data into various open-source data stores, including Apache Iceberg, Apache Pinot, and others. We use the best streaming tools for the current applications with the open source stack - FLiPN. https://www.flipn.app/ Updates: This will be in-person with live coding based on feedback from the crowd. This will also include new data stores, new sources, and data relevant to and from the Vancouver area. This will also include updates to the platforms and inclusion of Apache Iceberg, Apache Pinot and some other new tech.
https://github.com/tspannhw/SpeakerProfile Tim Spann is a Principal Developer Advocate for Cloudera. He works with Apache Kafka, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
Timothy J Spann
Cloudera
Principal Developer Advocate
Hightstown, NJ
Websitehttps://datainmotion.dev/
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...confluent
How Priceline uses Kafka Streams technology to effectively save TBs on daily licenses of our monitoring systems. Kafka Streams powers a big part of our analytics and monitoring pipelines and delivers operational metrics transformations in real time. All logs and operational metrics from all of the APIs of Priceline’s products flow into Kafka and is ingested into our Monitoring System Splunk for Alerting and Monitoring. We have now implemented data transformations, aggregations and summarizations using Kafka Streams technologies to effectively eliminate PCI/PII violations on the log data; do aggregations on metrics to avoid ingesting sub-second metrics and ingest metrics only at the granularity that we need to. We will cover the need for custom Serdes, custom partitioners, and why we don’t use the confluent registry. You will also learn how Priceline uses a self service model to configure its streams, topics and consumers using Data Collection Console, which is our UI for managing the Kafka streaming pipelines.
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
Analytical platforms for PoCs and evaluation can be built in the cloud in an hour - with ready-made setup scripts. But if you put the services together freely, it gets more difficult. The open-source platform-in-a-box "Platys" (https://github.com/TrivadisPF/platys) shows that it is easier for test and PoC environments. In addition to possible uses and examples, we explain services and "just briefly" set up a data lake with a database, event broker, stream processing, blob store, SQL access and data science notebook.
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
Today's modern data architectures and the their implementations contain an Event Broker. What are the benefits of placing an Event Broker in a Modern Data (Analytics) Architecture? What exactly is an Event Broker and what capabilities should it provide? Why is Apache Kafka the most popular realisation of an Event Broker?
These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Broker.
Then the session will highlight the different architecture styles which can be supported using an Event Broker (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Broker the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
The concept of "Data Lake" is in everyone's mind today. The idea of storing all the data that accumulates in a company in a central location and making it available sounds very interesting at first. But Data Lake can quickly turn from a clear, beautiful mountain lake into a huge pond, especially if it is inexpertly entrusted with all the source data formats that are common in today's enterprises, such as XML, JSON, CSV or unstructured text data. Who, after some time, still has an overview of which data, which format and how they have developed over different versions? Anyone who wants to help themselves from the Data Lake must ask themselves the same questions over and over again: what information is provided, what data types do they have and how has the content changed over time?
Data serialization frameworks such as Apache Avro and Google Protocol Buffer (Protobuf), which enable platform-independent data modeling and data storage, can help. This talk will discuss the possibilities of Avro and Protobuf and show how they can be used in the context of a data lake and what advantages can be achieved. The support on Avro and Protobuf by Big Data and Fast Data platforms is also a topic.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
Kafka as your Data Lake - is it Feasible?Guido Schmutz
For a long time we discuss how much data we can keep in Kafka. Can we store data forever or do we remove data after a while and maybe having the history in a data lake on Object Storage or HDFS? With the advent of Tiered Storage in Confluent Enterprise Platform, storing data much longer in Kafka is much very feasible. So can we replace a traditional data lake with just Kafka? Maybe at least for the raw data? But what about accessing the data, for example using SQL?
KSQL allows for processing data in a streaming fashion using an SQL like dialect. But what about reading all data of a topic? You can reset the offset and still use KSQL. But there is another family of products, so-called query engines for Big Data. They originate from the idea of reading Big Data sources such as HDFS, object storage or HBase, using the SQL language. Presto, Apache Drill and Dremio are the most popular solutions in that space. Lately these query engines also added support for Kafka topics as a source of data. With that you can read a topic as a table and join it with information available in other data sources. The idea of course is not real-time streaming analytics but batch analytics directly on the Kafka topic, without having to store it in a big data storage.
This talk answers, how well these tools support Kafka as a data source. What serialization formats do they support? Is there some form of predicate push-down supported or do we have to always read the complete topic? How performant is a query against a topic, compared to a query against the same data sitting in HDFS or an object store? And finally, will this allow us to replace our data lake or at least part of it by Apache Kafka?
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
Today's modern data architectures and the their implementations contain an Event Hub. What are the benefits of placing an Event Hub in a Modern Data (Analytics) Architecture? What exactly is an Event Hub and what capabilities should it provide? Why is Apache Kafka the most popular realization of an Event Hub?
These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Hub.
Then the session will highlight the different architecture styles which can be supported using an Event Hub (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Hub the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
Apache Kafka is a popular distributed streaming data platform and more and more is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. A lot of data necessary in stream processing is stored in traditional systems backed by relational databases. This session will present different approaches for integrating relational databases with Kafka, such as Kafka Connect, Oracle GoldenGate, ORDS APIs and bridging Kafka with Oracle AQ.
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
Today's modern data architectures and the their implementations contain an Event Hub. What are the benefits of placing an Event Hub in a Modern Data (Analytics) Architecture? What exactly is an Event Hub and what capabilities should it provide? Why is Apache Kafka the most popular realization of an Event Hub? These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Hub. Then the session will highlight the different architecture styles which can be supported using an Event Hub (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Hub the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will start with quick recap of how we created systems over the past 20 years and how different architectures evolved from it. The talk will show how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so.
Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
An important underlying concept behind location-based applications is called geofencing. Geofencing is a process that allows acting on users and/or devices who enter/exit a specific geographical area, known as a geo-fence. A geo-fence can be dynamically generated—as in a radius around a point location, or a geo-fence can be a predefined set of boundaries (such as secured areas, buildings, boarders of counties, states or countries).
Geofencing lays the foundation for realizing use cases around fleet monitoring, asset tracking, phone tracking across cell sites, connected manufacturing, ride-sharing solutions and many others.
GPS tracking tells constantly and in real time where a device is located and forms the stream of events which needs to be analyzed against the much more static set of geo-fences. Many of the use cases mentioned above require low-latency actions taken place, if either a device enters or leaves a geo-fence or when it is approaching such a geo-fence. That’s where streaming data ingestion and streaming analytics and therefore the Kafka ecosystem comes into play.
This session will present how location analytics applications can be implemented using Kafka and KSQL & Kafka Streams. It highlights the exiting features available out-of-the-box and then shows how easy it is to extend it by custom defined functions (UDFs). The design of such solution so that it can scale with both an increasing amount of position events as well as geo-fences will be discussed as well.
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
Apache Kafka is a popular distributed streaming data platform. A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Data sources flowing into Kafka are often native data streams such as social media streams, telemetry data, financial transactions and many others. But these data stream only contain part of the information. A lot of data necessary in stream processing is stored in traditional systems backed by relational databases. To implement new and modern, real-time solutions, an up-to-date view of that information is needed. So how do we make sure that information can flow between the RDBMS and Kafka, so that changes are available in Kafka as soon as possible in near-real-time? This session will present different approaches for integrating relational databases with Kafka, such as Kafka Connect, Oracle GoldenGate and bridging Kafka with Oracle Advanced Queuing (AQ).
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
Apache Kafka is a popular distributed streaming data platform. A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Data sources flowing into Kafka are often native data streams such as social media streams, telemetry data, financial transactions and many others. But these data stream only contain part of the information. A lot of data necessary in stream processing is stored in traditional systems backed by relational databases. To implement new and modern, real-time solutions, an up-to-date view of that information is needed. So how do we make sure that information can flow between the RDBMS and Kafka, so that changes are available in Kafka as soon as possible in near-real-time? This session will present different approaches for integrating relational databases with Kafka, such as Kafka Connect, Oracle GoldenGate and bridging Kafka with Oracle Advanced Queuing (AQ).
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
An important underlying concept behind location-based applications is called geofencing. Geofencing is a process that allows acting on users and/or devices who enter/exit a specific geographical area, known as a geo-fence. A geo-fence can be dynamically generated—as in a radius around a point location, or a geo-fence can be a predefined set of boundaries (such as secured areas, buildings, boarders of counties, states or countries).
Geofencing lays the foundation for realizing use cases around fleet monitoring, asset tracking, phone tracking across cell sites, connected manufacturing, ride-sharing solutions and many others.
GPS tracking tells constantly and in real time where a device is located and forms the stream of events which needs to be analyzed against the much more static set of geo-fences. Many of the use cases mentioned above require low-latency actions taken place, if either a device enters or leaves a geo-fence or when it is approaching such a geo-fence. That’s where streaming data ingestion and streaming analytics and therefore the Kafka ecosystem comes into play.
This session will present how location analytics applications can be implemented using Kafka and KSQL & Kafka Streams. It highlights the exiting features available out-of-the-box and then shows how easy it is to extend it by custom defined functions (UDFs). The design of such solution so that it can scale with both an increasing amount of position events as well as geo-fences will be discussed as well.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One option is to first persist the data into a data store and then use a traditional data visualisation solution to present the data. If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solutions and then we show how the different blueprints can be implemented by mapping products onto the blueprints.
Kafka as an event store - is it good enough?Guido Schmutz
Event Sourcing and CQRS are two popular patterns for implementing a Microservices architectures. With Event Sourcing we do not store the state of an object, but instead store all the events impacting its state. Then to retrieve an object state, we have to read the different events related to a certain object and apply them one by one. CQRS (Command Query Responsibility Segregation) on the other hand is a way to dissociate writes (Command) and reads (Query). Event Sourcing and CQRS are frequently grouped and used together to form something bigger. While it is possible to implement CQRS without Event Sourcing, the opposite is not necessarily correct. In order to implement Event Sourcing, an efficient Event Store is needed. But is that also true when combining Event Sourcing and CQRS? And what is an event store in the first place and what features should it implement?
This presentation will first discuss what functionalities an event store should offer and then present how Apache Kafka can be used to implement an event store. But is Kafka good enough or do specific event store solutions such as AxonDB or Event Store provide a better solution?
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Today’s enterprises have their core systems often implemented on top of relational databases, such as the Oracle RDBMS. Implementing a new solution supporting the digital strategy using Kafka and the ecosystem can not always be done completely separate from the traditional legacy solutions. Often streaming data has to be enriched with state data which is held in an RDBMS of a legacy application. It’s important to cache this data in the stream processing solution, so that It can be efficiently joined to the data stream. But how do we make sure that the cache is kept up-to-date, if the source data changes? We can either poll for changes from Kafka using Kafka Connect or let the RDBMS push the data changes to Kafka. But what about writing data back to the legacy application, i.e. an anomaly is detected inside the stream processing solution which should trigger an action inside the legacy application. Using Kafka Connect we can write to a database table or view, which could trigger the action. But this not always the best option. If you have an Oracle RDBMS, there are many other ways to integrate the database with Kafka, such as Advanced Queueing (message broker in the database), CDC through Golden Gate or Debezium, Oracle REST Database Service (ORDS) and more. In this session, we present various blueprints for integrating an Oracle RDBMS with Apache Kafka in both directions and discuss how these blueprints can be implemented using the products mentioned before.
Fundamentals Big Data and AI ArchitectureGuido Schmutz
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
An important underlying concept behind location-based applications is called geofencing. Geofencing is a process that allows acting on users and/or devices who enter/exit a specific geographical area, known as a geo-fence. A geo-fence can be dynamically generated—as in a radius around a point location, or a geo-fence can be a predefined set of boundaries (such as secured areas, buildings, boarders of counties, states or countries). Geofencing lays the foundation for realising use cases around fleet monitoring, asset tracking, phone tracking across cell sites, connected manufacturing, ride-sharing solutions and many others. Many of the use cases mentioned above require low-latency actions taken place, if either a device enters or leaves a geo-fence or when it is approaching such a geo-fence. That’s where streaming data ingestion and streaming analytics and therefore the Kafka ecosystem comes into play. This session will present how location analytics applications can be implemented using Kafka and KSQL & Kafka Streams. It highlights the exiting features available out-of-the-box and then shows how easy it is to extend it by custom defined functions (UDFs).
Most data visualization solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualization capabilities. One option is to first persist the data into a data store and then use a traditional data visualization solution to present the data. If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualization tools might already integrate with the specific data store. An other option is to use a Streaming Visualization solution. This talk presents different architecture blueprints for integrating data visualization into a fast data solutions.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
2. Guido Schmutz
Working for Trivadis for more than 19 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 25 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz
3. Unser Unternehmen.
Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution
Engineering und der Erbringung von IT-Services mit Fokussierung auf - und
-Technologien in der Schweiz, Deutschland, Österreich und Dänemark. Trivadis erbringt
ihre Leistungen aus den strategischen Geschäftsfeldern:
Trivadis Services übernimmt den korrespondierenden Betrieb Ihrer IT Systeme.
B E T R I E B
7. A little story of a “real-life” customer situation
Traditional system interact with its clients
and does its work
Implemented using legacy technologies (i.e.
PL/SQL)
New requirement:
• Offer notification service to notify
customer when goods are shipped
• Subscription and inform over different
channels
• Existing technology doesn’t fit
delivery
Logistic
System
Oracle
Mobile Apps
Sensor ship
sort
8
Rich (Web)
Client Apps
DB
schedule
Logic
(PL/SQL)
delivery
8. A little story of a “real-life” customer situation
Events are “owned” by traditional
application (as well as the channels
they are transported over)
Implement notification as a new Java-based
application/system
But we need the events ! => so let’s
integrate
delivery
Logistic
System
Oracle
Mobile Apps
Sensor ship
sort
9
Rich (Web)
Client Apps
DB
schedule
Notification
Logic
(PL/SQL)
Logic
(Java)
delivery
SMS
Email
…
9. A little story of a “real-life” customer situation
integrate in order to get the information!
Oracle Service Bus was already there
Rule Engine implemented in Java and invoked from
OSB message flow
Notification system informed via queue
Higher Latency introduced (good enough in this
case)
delivery
Logistic
System
Oracle
Oracle
Service Bus
Mobile Apps
Sensor AQship
sort
10
Rich (Web)
Client Apps
DB
schedule
Filter
Notification
Logic
(PL/SQL)
JMS
Rule Engine
(Java)
Logic
(Java)
delivery
shipdelivery
delivery true SMS
Email
…
10. A little story of a “real-life” customer situation
Treat events as first-class citizens
Events belong to the “enterprise” and
not an individual system => Catalog of
Events similar to Catalog of
Services/APIs !!
Event (stream) processing can be
introduced and by that latency reduced!
delivery
Logistic
System
Oracle
Oracle
Service Bus
Mobile Apps
Sensor AQship
sort
11
Rich (Web)
Client Apps
DB
schedule
Filter
Notification
Logic
(PL/SQL)
JMS
Rule Engine
(Java)
Logic
(Java)
delivery
shipdelivery
delivery true SMS
Email
…
11. Treat Events as Events and share them!
delivery
Logistic
System
Oracle
Oracle
Service Bus
Mobile Apps
Sensor
ship
sort
12
Rich (Web)
Client Apps
DB
schedule
Filter
Notification
Logic
(PL/SQL)
JMS
Rule Engine
(Java)
Logic
(Java)
delivery
ship
delivery true SMS
Email
…
Event
Bus/Hub
Stream/Event
Processing
12. Treat Events as Events,share and make use of them!
delivery
Logistic
System
Oracle
Mobile Apps
Sensor
ship
sort
13
Rich (Web)
Client Apps
DB
schedule
Filter
Notification
Logic
(PL/SQL)
Rule Engine
(Java)
Logic
(Java)
delivery
SMS
Email
…
Event
Bus/Hub
Stream/Event
Processing
notifiableDelivery
notifiableDelivery
delivery
15. Apache Kafka - Overview
Distributed publish-subscribe messaging system
Designed for processing of real time activity stream data (logs, metrics
collections, social media streams, …)
Initially developed at LinkedIn, now part of Apache
Does not use JMS API and standards
Kafka maintains feeds of messages in topics
16. Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
• “A unified platform for handling all the real-time data feeds a large company might
have.”
Must haves
• High throughput to support high volume event feeds.
• Support real-time processing of these feeds to create new, derived feeds.
• Support large data backlogs to handle periodic ingestion from offline systems.
• Support low-latency delivery to handle more traditional messaging use cases.
• Guarantee fault-tolerance in the presence of machine failures.
17. Kafka High Level Architecture
The who is who
• Producers write data to brokers.
• Consumers read data from
brokers.
• All this is distributed.
The data
• Data is stored in topics.
• Topics are split into partitions,
which are replicated.
Kafka Cluster
Consumer Consumer Consumer
Producer Producer Producer
Broker 1 Broker 2 Broker 3
Zookeeper
Ensemble
22. Durability Guarantees
Producer can configure acknowledgements
Value Impact Durability
0 • Producer doesn’t wait for leader weak
1 (default) • Producer waits for leader
• Leader sends ack when message written to log
• No wait for followers
medium
all • Producer waits for leader
• Leader sends ack when all In-Sync Replica have
acknowledged
strong
23. Apache Kafka - Partition offsets
Offset: messages in the partitions are each assigned a unique (per partition) and
sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
Consumer Group A Consumer Group B
24. Data Retention – 3 options
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based (entries with same key are removed)
kafka-topics.sh --zookeeper localhost:2181
--create --topic customers
--replication-factor 1 --partitions 1
--config cleanup.policy=compact
25. Apache Kafka – Some numbers
Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics
Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster
• 445’622 messages/second
• 31 MB / second
• 3.0405 ms average latency between producer / consumer
1.3 Trillion messages per
day
330 Terabytes in/day
1.2 Petabytes out/day
Peak load for a single cluster
2 million messages/sec
4.7 Gigabits/sec inbound
15 Gigabits/sec outbound
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
https://engineering.linkedin.com/kafka/running-kafka-scale
31. Kafka Streams
• Designed as a simple and lightweight library in Apache Kafka
• no external dependencies on systems other than Apache Kafka
• Leverages Kafka as its internal messaging layer
• agnostic to resource management and configuration tools
• Supports fault-tolerant local state
• Event-at-a-time processing (not microbatch) with millisecond latency
• Windowing with out-of-order data using a Google DataFlow-like model
34. Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular
products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache NiFi
• Streamsets
• Apache Flume
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Spring Integration Kafka Support
• …Storm built-in Kafka Spout to consume events from Kafka
41. Summary
• Kafka can scale to millions of messages per second, and more
• Easy to start with for a PoC
• A bit more to invest to setup production environment
• Monitoring is key
• Vibrant community and ecosystem
• Fast pace technology
• Confluent provides Kafka Distribution