Apache Samza is a distributed stream processing framework, that's used Kafka for messaging, and YARN to provide fault tolerance, processor isolation, security, and resource management.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Presto is an open source distributed SQL query engine that allows querying large datasets ranging from gigabytes to petabytes faster and more interactively. It employs a custom query execution engine with pipelined operators designed for SQL semantics, avoiding unnecessary I/O and latency overhead. The Presto coordinator parses, analyzes, and plans queries, assigning work to nodes closest to data and monitoring progress, while clients pull results from output stages. Presto developers claim it is 10x better than Hive/MapReduce for most queries in terms of efficiency and latency.
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...Kai Wähner
The Internet of Things (IoT) is getting more and more traction as valuable use cases come to light. Whether you are in Healthcare, Telecommunications, Manufacturing, Banking or Retail to name a few industries, there is one key challenge and that's the integration of backend IoT data logs and applications, business services and cloud services to process the data in real time and at scale.
In this talk, we will be sharing how Kafka has become the leading technology used throughout the business to provide Real Time Event Streaming. Explore real life use cases of Kafka Connect, Kafka Streams and KSQL independent of the data deployment be it on a private or public Cloud, On Premise or at the Edge.
Audi - Connected car infrastructure
Robert Bosch Power Tools - Track and Trace of devices and people at construction areas
Deutsche Bahn - Customer 360 for train timetable updates
E.ON - IoT Streaming Platform to integrate and build smart home, smart building and smart grid infrastructures
The document describes four primary models for building and running apps on Azure: 1) Virtual Machines, 2) Cloud Services, 3) Web Sites, and 4) Mobile Services. It provides brief descriptions of each model and associated services like storage, databases, authentication, and monitoring. The document is an overview of the architecture and services available on the Azure platform.
The document discusses backup and disaster recovery strategies for Hadoop. It focuses on protecting data sets stored in HDFS. HDFS uses data replication and checksums to protect against disk and node failures. Snapshots can protect against data corruption and accidental deletes. The document recommends copying data from the primary to secondary site for disaster recovery rather than teeing, and discusses considerations for large data movement like bandwidth needs and security. It also notes the importance of backing up metadata like Hive configurations along with core data.
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
(Gwen Shapira + Matthias J. Sax, Confluent) Kafka Summit SF 2018
Kafka Streams, Apache Kafka’s stream processing library, allows developers to build sophisticated stateful stream processing applications which you can deploy in an environment of your choice. Kafka Streams is not only scalable, but fully elastic allowing for dynamic scale-in and scale-out as the library handles state migration transparently in the background. By running Kafka Streams applications on Kubernetes, you will be able to use Kubernetes powerful control plane to standardize and simplify the application management—from deployment to dynamic scaling.
In this technical deep dive, we’ll explain the internals of dynamic scaling and state migration in Kafka Streams. We’ll then show, with a live demo, how a Kafka Streams application can run in a Docker container on Kubernetes and the dynamic scaling of an application running in Kubernetes.
As businesses grow, so does the complexity of their software. New features, new models, and new background processes all continue to be added. . .and developers struggle to make sense of it all. Yet the end user demands a swift and functional experience when interacting with your application. It is paramount to be open to alternative patterns that help tame complex, high-demand services. Two such patterns are command-query responsibility segregation (CQRS) and event sourcing (ES).
Command-query responsibility segregation is an architectural pattern for user-facing applications that extends from the now standard Model-View-Controller (MVC) pattern and is an alternative to the CRUD pattern. At its core, CQRS is about changing how we think of and work with our data by introducing two types of models: all user actions become commands, and a read-only query model powers our views. Commands and queries are logistically separated, providing additional decoupling of our application. CQRS also calls for changes in how we store and structure our data.
Enter event sourcing. Instead of persisting the current state of our domain objects or entities, we record historical events about our data. The key advantage is that we can examine our application data at any point in time, rather than just the current state. This pattern changes how we persist and process our data but is surprisingly efficient.
While each of the two patterns can be used exclusively, they complement each other beautifully and facilitate the construction of decoupled, scalable applications or individual services. Stephen Pember explores the fundamentals of each pattern and offers several examples and demonstration code to show how one might actually go about implementing CQRS and ES. Steve discusses task-based UIs and domain-driven design as he outlines some of the advantages—and challenges—that ThirdChannel has seen when developing systems using CQRS and ES over the past year.
Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. In this webinar, developers will learn:
*How Spark Streaming works - a quick review.
*Features in Spark Streaming that help prevent potential data loss.
*Complementary tools in a streaming pipeline - Kafka and Akka.
*Design and tuning tips for Reactive Spark Streaming applications.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Presto is an open source distributed SQL query engine that allows querying large datasets ranging from gigabytes to petabytes faster and more interactively. It employs a custom query execution engine with pipelined operators designed for SQL semantics, avoiding unnecessary I/O and latency overhead. The Presto coordinator parses, analyzes, and plans queries, assigning work to nodes closest to data and monitoring progress, while clients pull results from output stages. Presto developers claim it is 10x better than Hive/MapReduce for most queries in terms of efficiency and latency.
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...Kai Wähner
The Internet of Things (IoT) is getting more and more traction as valuable use cases come to light. Whether you are in Healthcare, Telecommunications, Manufacturing, Banking or Retail to name a few industries, there is one key challenge and that's the integration of backend IoT data logs and applications, business services and cloud services to process the data in real time and at scale.
In this talk, we will be sharing how Kafka has become the leading technology used throughout the business to provide Real Time Event Streaming. Explore real life use cases of Kafka Connect, Kafka Streams and KSQL independent of the data deployment be it on a private or public Cloud, On Premise or at the Edge.
Audi - Connected car infrastructure
Robert Bosch Power Tools - Track and Trace of devices and people at construction areas
Deutsche Bahn - Customer 360 for train timetable updates
E.ON - IoT Streaming Platform to integrate and build smart home, smart building and smart grid infrastructures
The document describes four primary models for building and running apps on Azure: 1) Virtual Machines, 2) Cloud Services, 3) Web Sites, and 4) Mobile Services. It provides brief descriptions of each model and associated services like storage, databases, authentication, and monitoring. The document is an overview of the architecture and services available on the Azure platform.
The document discusses backup and disaster recovery strategies for Hadoop. It focuses on protecting data sets stored in HDFS. HDFS uses data replication and checksums to protect against disk and node failures. Snapshots can protect against data corruption and accidental deletes. The document recommends copying data from the primary to secondary site for disaster recovery rather than teeing, and discusses considerations for large data movement like bandwidth needs and security. It also notes the importance of backing up metadata like Hive configurations along with core data.
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
(Gwen Shapira + Matthias J. Sax, Confluent) Kafka Summit SF 2018
Kafka Streams, Apache Kafka’s stream processing library, allows developers to build sophisticated stateful stream processing applications which you can deploy in an environment of your choice. Kafka Streams is not only scalable, but fully elastic allowing for dynamic scale-in and scale-out as the library handles state migration transparently in the background. By running Kafka Streams applications on Kubernetes, you will be able to use Kubernetes powerful control plane to standardize and simplify the application management—from deployment to dynamic scaling.
In this technical deep dive, we’ll explain the internals of dynamic scaling and state migration in Kafka Streams. We’ll then show, with a live demo, how a Kafka Streams application can run in a Docker container on Kubernetes and the dynamic scaling of an application running in Kubernetes.
As businesses grow, so does the complexity of their software. New features, new models, and new background processes all continue to be added. . .and developers struggle to make sense of it all. Yet the end user demands a swift and functional experience when interacting with your application. It is paramount to be open to alternative patterns that help tame complex, high-demand services. Two such patterns are command-query responsibility segregation (CQRS) and event sourcing (ES).
Command-query responsibility segregation is an architectural pattern for user-facing applications that extends from the now standard Model-View-Controller (MVC) pattern and is an alternative to the CRUD pattern. At its core, CQRS is about changing how we think of and work with our data by introducing two types of models: all user actions become commands, and a read-only query model powers our views. Commands and queries are logistically separated, providing additional decoupling of our application. CQRS also calls for changes in how we store and structure our data.
Enter event sourcing. Instead of persisting the current state of our domain objects or entities, we record historical events about our data. The key advantage is that we can examine our application data at any point in time, rather than just the current state. This pattern changes how we persist and process our data but is surprisingly efficient.
While each of the two patterns can be used exclusively, they complement each other beautifully and facilitate the construction of decoupled, scalable applications or individual services. Stephen Pember explores the fundamentals of each pattern and offers several examples and demonstration code to show how one might actually go about implementing CQRS and ES. Steve discusses task-based UIs and domain-driven design as he outlines some of the advantages—and challenges—that ThirdChannel has seen when developing systems using CQRS and ES over the past year.
Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. In this webinar, developers will learn:
*How Spark Streaming works - a quick review.
*Features in Spark Streaming that help prevent potential data loss.
*Complementary tools in a streaming pipeline - Kafka and Akka.
*Design and tuning tips for Reactive Spark Streaming applications.
Fallacies of distributed computing with Kubernetes on AWSRaffaele Di Fazio
This is the short story of a bug in one of our Go services that cut off some of the traffic targeting one of our production Kubernetes cluster running on AWS. But more than that, this is about how we did conceptually similar mistakes before and why thinking about failures and the famous "fallacies of distributed computing" is key to develop infrastructural components.
With this talk, we will give you a walk through some of those problems, illustrate some interesting details of Kubernetes, AWS and hopefully help you to not make the same mistakes again.
YOW2018 - Events and Commands: Developing Asynchronous MicroservicesChris Richardson
The microservice architecture functionally decomposes an application into a set of services. Each service has its own private database that’s only accessible indirectly through the services API. Consequently, implementing queries and transactions that span multiple services is challenging.
In this presentation, you will learn how to solve these distributed data management challenges using asynchronous messaging. I describe how to implement transactions using sagas, which are sequences of local transactions, coordinated using messages. You will learn how to implement queries using Command Query Responsibility Segregation (CQRS), which uses events to maintain replicas. I describe how to use event sourcing, which is an event-centric approach to business logic and persistence, in a microservice architecture.
This document discusses advances in file carving techniques for data recovery from disks and unallocated space. It describes various carving methods like header/footer carving, statistical carving, and fragment recovery carving. It also outlines limitations of current file carving tools and proposes ideas for future tools that combine methods and support more file types and fragmented files.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
IBM announces a family of Cloud Paks that provide developers, data managers, and administrators an open environment to quickly build, modernize, and deploy applications and middleware across multiple clouds. The Cloud Paks include containerized IBM software and open source components that can be easily deployed to Kubernetes and provide capabilities for lifecycle management, security, and integration with services. Cloud Paks simplify enterprise deployment and management of software in containers and provide a consistent way for organizations to move more workloads to cloud environments faster.
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data.
In this session an architecture with a central log structured storage is presented where anybody can store and subscribe for events. This can be implemented using frameworks such as Kafka, Storm, Samza and Spark Streaming.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
Agenda:
1.Data Flow Challenges in an Enterprise
2.Introduction to Apache NiFi
3.Core Features
4.Architecture
5.Demo –Simple Lambda Architecture
6.Use Cases
7.Q & A
This document discusses high performance computing (HPC) on Microsoft Azure. It begins with an overview of the HPC opportunity in the cloud, highlighting how the cloud provides elasticity and scale to accommodate variable computing demands. It then outlines Azure's value proposition for HPC, including its productive, trusted and hybrid capabilities. The document reviews the various HPC resources available on Azure like VMs, GPUs, and Cray supercomputers. It also discusses solutions for HPC like Azure Batch, Azure Machine Learning Compute, Azure CycleCloud and Avere vFXT. Example industry use cases are provided for automotive, financial services, manufacturing, media/entertainment and oil/gas. The summary reiterates that Azure is uniquely positioned
The presentation looks at the growing demand for data that many organizations are experiencing. Then it will look at the many data sources you can connect to using Ignition, including PLC data; databases; device data; and data from web services.
Here is a link to the webinar - https://inductiveautomation.com/resources/webinar/webinar-get-more-data-your-scada
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Ray Serve: A new scalable machine learning model serving library on RaySimon Mo
When a machine learning model needs to served for interactive use cases, the models are either wrapped inside a Flask server or deployed using external services like Sagemaker. Both methods come with flaws. In this talk, you will learn about how ray serve uses ray to address the limitations of current approaches and enable scalable model serving.
This document provides an overview of application and desktop delivery technologies, including secure access, web application acceleration, connection brokers, application streaming/virtualization, OS provisioning, virtual desktop infrastructure (VDI), remote desktop services, client-side virtualization, client management services, and application virtualization. It discusses providers for each technology, benefits of application virtualization from Citrix and Microsoft, and compares Microsoft and VMware solutions.
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1tToshiaki Maki
This document summarizes a presentation about serverless computing using Spring Cloud Function, Knative, and riff. It discusses what serverless computing is, an overview of Spring Cloud Function for developing serverless applications, and how Knative and riff can be used as platforms to deploy serverless workloads on Kubernetes. Code examples are provided to demonstrate invoking functions via HTTP and messaging with Spring Cloud Function and deploying functions to Knative and riff.
Flink provides unified batch and stream processing. It natively supports streaming dataflows, long batch pipelines, machine learning algorithms, and graph analysis through its layered architecture and treatment of all computations as data streams. Flink's optimizer selects efficient execution plans such as shipping strategies and join algorithms. It also caches loop-invariant data to speed up iterative algorithms and graph processing.
The document discusses the Twelve Factors methodology for building cloud-native applications. The Twelve Factors emphasize agility, portability, and scalability through principles like codebase management, externalized configuration, statelessness, and treating logs as event streams. Adopting these best practices helps applications maximize benefits of modern cloud platforms like rapid deployment, continuous delivery, and horizontal scaling. The document provides an overview of each factor and examples to illustrate how they enable cloud-native architectures like microservices and containerization.
The document proposes a HCI solution using VxRail for Suntory to address their current environment challenges. It includes an analysis of their current physical and virtual environment, an approach using a VxRail solution, and a project delivery plan. The proposed VxRail solution would consist of an E560 1U1N node with Intel CPUs, memory, SSDs to support all of Suntory's existing and new VMs on a single converged infrastructure platform. Dimension Data would handle the installation, configuration, migration, and provide 3 years of maintenance and support.
Introduction to Kafka Streams PresentationKnoldus Inc.
Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.
Apache Storm is an open-source distributed real-time processing system. It allows for processing large amounts of streaming data reliably. Storm consists of spouts that intake data streams and bolts that perform processing. Spouts and bolts are connected in topologies to represent processing workflows. Storm distributes the workload of topologies across computer clusters for fault tolerance and high throughput. It uses ZooKeeper for coordination between Storm components like the master Nimbus node and worker Supervisor nodes.
Fallacies of distributed computing with Kubernetes on AWSRaffaele Di Fazio
This is the short story of a bug in one of our Go services that cut off some of the traffic targeting one of our production Kubernetes cluster running on AWS. But more than that, this is about how we did conceptually similar mistakes before and why thinking about failures and the famous "fallacies of distributed computing" is key to develop infrastructural components.
With this talk, we will give you a walk through some of those problems, illustrate some interesting details of Kubernetes, AWS and hopefully help you to not make the same mistakes again.
YOW2018 - Events and Commands: Developing Asynchronous MicroservicesChris Richardson
The microservice architecture functionally decomposes an application into a set of services. Each service has its own private database that’s only accessible indirectly through the services API. Consequently, implementing queries and transactions that span multiple services is challenging.
In this presentation, you will learn how to solve these distributed data management challenges using asynchronous messaging. I describe how to implement transactions using sagas, which are sequences of local transactions, coordinated using messages. You will learn how to implement queries using Command Query Responsibility Segregation (CQRS), which uses events to maintain replicas. I describe how to use event sourcing, which is an event-centric approach to business logic and persistence, in a microservice architecture.
This document discusses advances in file carving techniques for data recovery from disks and unallocated space. It describes various carving methods like header/footer carving, statistical carving, and fragment recovery carving. It also outlines limitations of current file carving tools and proposes ideas for future tools that combine methods and support more file types and fragmented files.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
IBM announces a family of Cloud Paks that provide developers, data managers, and administrators an open environment to quickly build, modernize, and deploy applications and middleware across multiple clouds. The Cloud Paks include containerized IBM software and open source components that can be easily deployed to Kubernetes and provide capabilities for lifecycle management, security, and integration with services. Cloud Paks simplify enterprise deployment and management of software in containers and provide a consistent way for organizations to move more workloads to cloud environments faster.
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data.
In this session an architecture with a central log structured storage is presented where anybody can store and subscribe for events. This can be implemented using frameworks such as Kafka, Storm, Samza and Spark Streaming.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
Agenda:
1.Data Flow Challenges in an Enterprise
2.Introduction to Apache NiFi
3.Core Features
4.Architecture
5.Demo –Simple Lambda Architecture
6.Use Cases
7.Q & A
This document discusses high performance computing (HPC) on Microsoft Azure. It begins with an overview of the HPC opportunity in the cloud, highlighting how the cloud provides elasticity and scale to accommodate variable computing demands. It then outlines Azure's value proposition for HPC, including its productive, trusted and hybrid capabilities. The document reviews the various HPC resources available on Azure like VMs, GPUs, and Cray supercomputers. It also discusses solutions for HPC like Azure Batch, Azure Machine Learning Compute, Azure CycleCloud and Avere vFXT. Example industry use cases are provided for automotive, financial services, manufacturing, media/entertainment and oil/gas. The summary reiterates that Azure is uniquely positioned
The presentation looks at the growing demand for data that many organizations are experiencing. Then it will look at the many data sources you can connect to using Ignition, including PLC data; databases; device data; and data from web services.
Here is a link to the webinar - https://inductiveautomation.com/resources/webinar/webinar-get-more-data-your-scada
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Ray Serve: A new scalable machine learning model serving library on RaySimon Mo
When a machine learning model needs to served for interactive use cases, the models are either wrapped inside a Flask server or deployed using external services like Sagemaker. Both methods come with flaws. In this talk, you will learn about how ray serve uses ray to address the limitations of current approaches and enable scalable model serving.
This document provides an overview of application and desktop delivery technologies, including secure access, web application acceleration, connection brokers, application streaming/virtualization, OS provisioning, virtual desktop infrastructure (VDI), remote desktop services, client-side virtualization, client management services, and application virtualization. It discusses providers for each technology, benefits of application virtualization from Citrix and Microsoft, and compares Microsoft and VMware solutions.
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1tToshiaki Maki
This document summarizes a presentation about serverless computing using Spring Cloud Function, Knative, and riff. It discusses what serverless computing is, an overview of Spring Cloud Function for developing serverless applications, and how Knative and riff can be used as platforms to deploy serverless workloads on Kubernetes. Code examples are provided to demonstrate invoking functions via HTTP and messaging with Spring Cloud Function and deploying functions to Knative and riff.
Flink provides unified batch and stream processing. It natively supports streaming dataflows, long batch pipelines, machine learning algorithms, and graph analysis through its layered architecture and treatment of all computations as data streams. Flink's optimizer selects efficient execution plans such as shipping strategies and join algorithms. It also caches loop-invariant data to speed up iterative algorithms and graph processing.
The document discusses the Twelve Factors methodology for building cloud-native applications. The Twelve Factors emphasize agility, portability, and scalability through principles like codebase management, externalized configuration, statelessness, and treating logs as event streams. Adopting these best practices helps applications maximize benefits of modern cloud platforms like rapid deployment, continuous delivery, and horizontal scaling. The document provides an overview of each factor and examples to illustrate how they enable cloud-native architectures like microservices and containerization.
The document proposes a HCI solution using VxRail for Suntory to address their current environment challenges. It includes an analysis of their current physical and virtual environment, an approach using a VxRail solution, and a project delivery plan. The proposed VxRail solution would consist of an E560 1U1N node with Intel CPUs, memory, SSDs to support all of Suntory's existing and new VMs on a single converged infrastructure platform. Dimension Data would handle the installation, configuration, migration, and provide 3 years of maintenance and support.
Introduction to Kafka Streams PresentationKnoldus Inc.
Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.
Apache Storm is an open-source distributed real-time processing system. It allows for processing large amounts of streaming data reliably. Storm consists of spouts that intake data streams and bolts that perform processing. Spouts and bolts are connected in topologies to represent processing workflows. Storm distributes the workload of topologies across computer clusters for fault tolerance and high throughput. It uses ZooKeeper for coordination between Storm components like the master Nimbus node and worker Supervisor nodes.
This document discusses messaging queues and compares Kafka and Amazon SQS. It begins by explaining what a messaging queue is and provides examples of software that can be used, including Kafka, SQS, SNS, and RabbitMQ. It then discusses why messaging queues are useful by allowing for asynchronous and failed processing. The document proceeds to provide details on Kafka, including that it is a distributed streaming platform used by companies like LinkedIn, Twitter, and Netflix. It defines Kafka terminology and discusses how producers and consumers work. Finally, it compares features of SQS and Kafka like order of messages, delivery guarantees, retention, security, costs, and throughput.
The document compares the performance of Apache Kafka and RabbitMQ for streaming data. It finds that without fault tolerance, both brokers have similar latency, but with fault tolerance enabled, Kafka has slightly higher latency than RabbitMQ. Latency increases with message size and is improved after an initial warmup period. Overall, RabbitMQ demonstrated the lowest latency for both configurations. The document also describes how each system is deployed and configured for the performance tests.
Kafka is a distributed publish-subscribe messaging system that allows both streaming and storage of data feeds. It is designed to be fast, scalable, durable, and fault-tolerant. Kafka maintains feeds of messages called topics that can be published to by producers and subscribed to by consumers. A Kafka cluster typically runs on multiple servers called brokers that store topics which may be partitioned and replicated for fault tolerance. Producers publish messages to topics which are distributed to consumers through consumer groups that balance load.
The document provides an overview of key concepts for working with Apache Kafka including:
1. Kafka only provides at-most-once or at-least-once delivery out of the box and discusses how messages can be lost or duplicated. Exactly-once delivery was introduced in later versions.
2. Using more partitions can increase unavailability if a broker fails uncleanly, since leadership elections must occur for each partition, and can increase latency by serializing replication across partitions.
3. The schema registry helps enforce schemas when using Avro to serialize data to Kafka, avoiding issues from schema changes.
The document provides an overview of Apache Samza, including its key differentiators and future plans. It discusses Samza's performance advantages from using local state instead of remote databases. Samza allows stateful stream processing and incremental checkpointing for applications with terabytes of state. It supports a variety of input sources, processing as a service on YARN or embedded as a library. Upcoming features include a high-level API, support for event time windows, pipelines, and exactly-once processing while auto-scaling local state.
Apache Kafka is a fast, scalable, and distributed messaging system. It is designed for high throughput systems and can replace traditional message brokers due to its better throughput, built-in partitioning for scalability, replication for fault tolerance, and ability to handle large message processing applications. Kafka uses topics to organize streams of messages, partitions to distribute data, and replicas to provide redundancy and prevent data loss. It supports reliable messaging patterns including point-to-point and publish-subscribe.
Apache Kafka is a distributed publish-subscribe messaging system that can handle high volumes of data and enable messages to be passed from one endpoint to another. It uses a distributed commit log that allows messages to be persisted on disk for durability. Kafka is fast, scalable, fault-tolerant, and guarantees zero data loss. It is used by companies like LinkedIn, Twitter, and Netflix to handle high volumes of real-time data and streaming workloads.
Apache Kafka is a fast, scalable, durable and distributed messaging system. It is designed for high throughput systems and can replace traditional message brokers. Kafka has better throughput, partitioning, replication and fault tolerance compared to other messaging systems, making it suitable for large-scale applications. Kafka persists all data to disk for reliability and uses distributed commit logs for durability.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
Apache Kafka is a distributed publish-subscribe messaging system that was originally created by LinkedIn and contributed to the Apache Software Foundation. It is written in Scala and provides a multi-language API to publish and consume streams of records. Kafka is useful for both log aggregation and real-time messaging due to its high performance, scalability, and ability to serve as both a distributed messaging system and log storage system with a single unified architecture. To use Kafka, one runs Zookeeper for coordination, Kafka brokers to form a cluster, and then publishes and consumes messages with a producer API and consumer API.
This document summarizes key concepts around controlling message flow in Mule such as filter types, flow processing strategies, and common routers. It describes synchronous and asynchronous processing strategies and how Mule determines the appropriate strategy. It provides examples of common routers like scatter-gather, choice, and aggregators. It also summarizes filter types that can determine if a message proceeds in a flow including the idempotent filter.
Apache Kafka is a distributed publish-subscribe messaging system that allows for high volumes of data to be passed from endpoints to endpoints. It uses a broker-based architecture with topics that messages are published to and persisted on disk for reliability. Producers publish messages to topics that are partitioned across brokers in a Kafka cluster, while consumers subscribe to topics and pull messages from brokers. The ZooKeeper service coordinates the Kafka brokers and notifies producers and consumers of changes.
This document compares the streaming frameworks Spark Streaming and Storm. Spark Streaming processes data in micro-batches of around 500ms, uses batch processing semantics, and guarantees exactly-once processing. Storm processes data record-by-record with sub-second latency but only guarantees at least once processing. Benchmarking showed that as the number of data producers increased, Storm's throughput and spout latency also increased, while Spark Streaming's throughput increased with larger batch sizes and higher data loads. The document proposes further testing different use cases and adding monitoring dashboards.
Many businesses are faced with some new messaging challenges for modern applications, such as horizontal scalability of the messaging tier, heterogeneous messaging systems and access methods, and extreme transaction processing. This presentation/demo will cover how businesses can overcome these messaging challenges with the use of Spring and RabbitMQ technologies. Tom will build a case for AMQP, explain how SpringSource is providing AMQP support via Spring AMQP and Spring Integration, explain how RabbitMQ is a modern messaging solution that offers a reliable, highly available, scalable and portable messaging system with predictable and consistent throughput and latency, and demonstrate how Spring Integration and RabbitMQ can be progressively introduced into a standard Spring web application deployed on Cloud Foundry.
How to use kakfa for storing intermediate data and use it as a pub/sub model with each of the Producer/Consumer/Topic configs deeply and the Internals working of it.
Hai Lu presented on the Samza Portable Runner for Apache Beam. The key points are:
1) The Samza Portable Runner allows stream processing to be done in multiple languages like Python by translating Beam pipelines into the Samza execution engine.
2) It provides a high-level Python SDK for building streaming applications on top of Beam's portability framework. Pipelines are translated from Python into the language-independent Beam representation.
3) Performance is improved through batching/bundling messages between the Python and Java processes to reduce round trips. Initial tests showed throughput increasing with larger bundle sizes.
4) Example use cases demonstrated near real-time image OCR, model training,
Event Driven Architecture and Apache Kafka were discussed. Key points:
- Event driven systems allow for asynchronous and decoupled communication between services using message queues.
- Apache Kafka is a distributed streaming platform that allows for publishing and subscribing to streams of records across a cluster of servers. It provides reliability through replication and allows for horizontal scaling.
- Kafka provides advantages over traditional queues like decoupling, scalability, and fault tolerance. It also allows for publishing of data and consumption of data independently, unlike traditional APIs.
O documento discute o Istio, um service mesh open source que fornece funcionalidades como balanceamento de carga, monitoramento, segurança e resiliência para microsserviços. Ele explica como o Istio é composto por proxies, Pilot, Mixer e Citadel e como suas configurações são definidas via arquivos YAML. Também mostra benchmarks demonstrando o baixo impacto de performance do Istio.
Reactive streams, because parallelism mattersHumberto Streb
This document discusses Reactive Streams, which is an asynchronous programming model for processing streams of data in a non-blocking manner. It introduces key concepts like back pressure to control resource usage, and implementations like Project Reactor that provide reactive APIs using Mono and Flux to represent asynchronous data streams. Reactive Streams provide benefits like composability, efficiency through non-blocking operations, and support for high demand through elastic scaling.
Docker, jenkins e gradle para tomar o controle de sua entregaHumberto Streb
O documento discute a implementação de Continuous Delivery em um projeto de software com mais de 1 milhão de linhas de código usando ferramentas como Docker, Jenkins e Gradle. Problemas como builds manuais, dependências compartilhadas e falta de automação foram resolvidos, melhorando a qualidade e permitindo entregas contínuas com menor risco. A mudança também focou em aspectos culturais para promover aprendizado e confiança entre times de negócios e desenvolvimento.
This document discusses WebSocket and its advantages over traditional request-response models for real-time applications. WebSocket allows for bidirectional communication between client and server through a persistent connection, and has broad browser support. It also touches on features like sending images, streams, namespaces/rooms, broadcasting, and volatility supported through libraries like socket.io-stream.
This document discusses how to implement functional programming principles like pure functions, immutability, and higher-order functions when using non-functional languages. Pure functions always return the same result for the same inputs and avoid side effects, making them easier to test. It is recommended that objects be immutable and constructed without later modification. The strategy design pattern allows code to be decoupled from algorithms via functions, improving concurrency, simplifying code, and enabling better testing.
Sinatra é um framework leve para desenvolvimento de aplicações web em Ruby que permite criar aplicações com poucas linhas de código. Ele não exige o uso de padrões MVC e permite criar a estrutura de diretórios desejada. Sinatra também facilita a integração com gems e templates como ERB e permite testes através de Rack::Test.
Descomplicando o controle de versão com gitHumberto Streb
Tutorial de utilização do git, explicando passo a passo como fazer commits, pushes, merges, diffs, criar branchs, tags, adicionar um repositório remoto entre outras features do git.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
2. Samza overview
an open-source distributed stream processing created by Linkedin
- sub-second latency
- handle large amount of state
- fault tolerance
- no messages are ever lost
- partitioned and distributed at every level
- processor isolation
- pluggable
4. Kafka
- The stream may be sharded into one or more partitions.
- Each partition is independent from the others, and is replicated
across multiple machines.
- Each partition consists of a sequence of messages in a fixed order.
- Each message has an offset, which indicates its position in that
sequence.
- A Samza job can start consuming the sequence of messages from
any starting offset.
7. Streams
A stream is composed of immutable messages of a similar type or
category
- more than one stream consumed in the same job, are chosen by
RoundRobin by default, but can be overridden
- by configuration streams can be prioritised
8. Job
Job is code that performs a logical
transformation on a set of input
streams to append output messages to
set of output streams.
9. Partitions
Each stream is broken into one or
more partitions. Each partition in
the stream is a totally ordered
sequence of messages.
10. Task
A job is scaled by breaking it into
multiple tasks. The task is the unit of
parallelism of the job, just as the
partition is to the stream. Each task
consumes data from one partition for
each of the job’s input streams.
11. Containers
Containers are the unit of physical
parallelism, and a container is
essentially a Unix process (or Linux
cgroup). Each container runs one or
more tasks.
12. SamzaContainer starts up steps
1 - Get last checkpointed offset for each input stream partition
2 - Create a “reader” thread for every input stream partition
3 - Start metrics reporters to report metrics
4 - Start a checkpoint timer to save your task’s input stream offsets
every so often
13. SamzaContainer starts up steps
5 - Start a window timer to trigger your task’s window method, if it is
defined
6 - Instantiate and initialize your StreamTask once for each input
stream partition
7 - Start an event loop that takes messages from the input stream reader
threads, and gives them to your StreamTasks
8 - Notify lifecycle listeners during each one of these steps
15. State Management
- fast approach using a local database
- fault tolerance sending a local store’s
writes to a replicated changelog and
checkpointing
- out of the box support RocksDB
(key-value)
16. Event Loop
- synchronous tasks will run on the single thread by default, but you
can configure
- asynchronous tasks will always be invoked in a single thread, while
callbacks can be triggered from a different thread.
Samza will make sure that checkpointing is automatically performed
only after the async calls have completed.
17. Metrics
Samza has its own library to expose metrics, with counters, gauges and
timer.
Metrics can be exposed by JMX, Kafka topic and so on
18. Security
Samza provides no security.
All security is implemented in the stream system, or in the environment
that Samza containers run.