The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention, to content and network recommendations, to ranking and bidding, our world not only consumes low-latency data streams, it adapts to changing conditions modeled by that data.
While software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the ecosystem of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few.
Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
The document discusses translating concepts from relational databases to key-value stores. It covers normalizing data to avoid issues like data inconsistencies and loss. While key-value stores don't support relations, transactions, or SQL, the relationships can be composed in the application layer for smaller datasets. Picking the right data for key-value stores involves accessing data primarily by key lookups.
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Many architectures include both real-time and batch processing components. This often results in two separate pipelines performing similar tasks, which can be challenging to maintain and operate. We'll show how a single, well designed ingest pipeline can be used for both real-time and batch processing, making the desired architecture feasible for scalable production use cases.
Whether you are developing a greenfield data project or migrating a legacy system,
there are many critical design decisions to be made. Often, it is advantageous to not only
consider immediate requirements, but also the future requirements and technologies you may
want to support. Your project may start out supporting batch analytics with the vision of adding
realtime support. Or your data pipeline may feed data to one technology today, but tomorrow
an entirely new system needs to be integrated. Apache Kafka can help decouple these
decisions and provide a flexible core to your data architecture. This talk will show how building
Kafka into your pipeline can provide the flexibility to experiment, evolve and grow. It will also
cover a brief overview of Kafka, its architecture, and terminology.
Processing data from social media streams and sensors in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
Why is My Stream Processing Job Slow? with Xavier LeauteDatabricks
The goal of most streams processing jobs is to process data and deliver insights to the business – fast. Unfortunately, sometimes our streams processing jobs fall short of this goal. Or perhaps the job used to run fine, but one day it just isn’t fast enough? In this talk, we’ll dive into the challenges of analyzing performance of real-time stream processing applications. We’ll share troubleshooting suggestions and some of our favorite tools. So next time someone asks “why is this taking so long?”, you’ll know what to do.
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
The document discusses translating concepts from relational databases to key-value stores. It covers normalizing data to avoid issues like data inconsistencies and loss. While key-value stores don't support relations, transactions, or SQL, the relationships can be composed in the application layer for smaller datasets. Picking the right data for key-value stores involves accessing data primarily by key lookups.
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Many architectures include both real-time and batch processing components. This often results in two separate pipelines performing similar tasks, which can be challenging to maintain and operate. We'll show how a single, well designed ingest pipeline can be used for both real-time and batch processing, making the desired architecture feasible for scalable production use cases.
Whether you are developing a greenfield data project or migrating a legacy system,
there are many critical design decisions to be made. Often, it is advantageous to not only
consider immediate requirements, but also the future requirements and technologies you may
want to support. Your project may start out supporting batch analytics with the vision of adding
realtime support. Or your data pipeline may feed data to one technology today, but tomorrow
an entirely new system needs to be integrated. Apache Kafka can help decouple these
decisions and provide a flexible core to your data architecture. This talk will show how building
Kafka into your pipeline can provide the flexibility to experiment, evolve and grow. It will also
cover a brief overview of Kafka, its architecture, and terminology.
Processing data from social media streams and sensors in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
Why is My Stream Processing Job Slow? with Xavier LeauteDatabricks
The goal of most streams processing jobs is to process data and deliver insights to the business – fast. Unfortunately, sometimes our streams processing jobs fall short of this goal. Or perhaps the job used to run fine, but one day it just isn’t fast enough? In this talk, we’ll dive into the challenges of analyzing performance of real-time stream processing applications. We’ll share troubleshooting suggestions and some of our favorite tools. So next time someone asks “why is this taking so long?”, you’ll know what to do.
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
Hari Shreedharan/Cloudera @Playtika. With its easy to use interfaces and native integration with some of the most popular ingest tools, such as Kafka, Flume, Kinesis etc, Spark Streaming has become go-to tool for stream processing. Code sharing with Spark also makes it attractive. In this talk, we will discuss the latest features in Spark Streaming and how it integrates with Kafka natively with no data loss, and even do exactly once processing!
The document discusses using Apache Kafka for event detection pipelines. It describes how Kafka can be used to decouple data pipelines and ingest events from various source systems in real-time. It then provides an example use case of using Kafka, Hadoop, and machine learning for fraud detection in consumer banking, describing the online and offline workflows. Finally, it covers some of the challenges of building such a system and considerations for deploying Kafka.
This document discusses Apache Kafka and how it can be used by Oracle DBAs. It begins by explaining how Kafka builds upon the concept of a database redo log by providing a distributed commit log service. It then discusses how Kafka is a publish-subscribe messaging system and can be used to log transactions from any database, application logs, metrics and other system events. Finally, it discusses how schemas are important for Kafka since it only stores messages as bytes, and how Avro can be used to define and evolve schemas for Kafka messages.
This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.
https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E
Apache Samza is a framework for reliable stream processing using Apache Kafka and Hadoop YARN. It provides low-latency stream processing by allowing users to write stream processing jobs that consume messages from Kafka topics and process them using simple process functions. Samza jobs are distributed and run across clusters using YARN to provide reliability and scalability. The process functions in Samza allow users to easily integrate stream processing with state storage and message output to other Kafka topics.
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
The Netflix Way to deal with Big Data ProblemsMonal Daxini
The document discusses Netflix's approach to handling big data problems. It summarizes Netflix's data pipeline system called Keystone that was built in a year to replace a legacy system. Keystone ingests over 1 trillion events per day and processes them using technologies like Kafka, Samza and Spark Streaming. The document emphasizes Netflix's culture of freedom and responsibility and how it helped the small team replace the legacy system without disruption while achieving massive scale.
The need for gleaning answers from unbounded data streams is moving from nicety to a necessity. Netflix is a data driven company, and has a need to process over 1 trillion events a day amounting to 3 PB of data to derive business insights.
To ease extracting insight, we are building a self-serve, scalable, fault-tolerant, multi-tenant "Stream Processing as a Service" platform so the user can focus on data analysis. I'll share our experience using Flink to help build the platform.
Operationalizing Machine Learning: Serving ML ModelsLightbend
Join O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, as he discusses one of the hottest topics in software engineering today: serving machine learning models.
Typically with machine learning, different groups are responsible for model training and model serving. Data scientists often introduce their own machine-learning tools, causing software engineers to create complementary model-serving frameworks to keep pace. It’s not a very efficient system. In this webinar, Boris demonstrates a more standardized approach to model serving and model scoring:
* How to develop an architecture for serving models in real time as part of input stream processing
* How this approach enables data science teams to update models without restarting existing applications
* Different ways to build this model-scoring solution, using several popular stream processing engines and frameworks
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019confluent
Abstract Summary: While Apache Kafka is designed to be fault-tolerant, there will be times when your Kafka environment just isn't working as expected. Whether it's a newly configured application not processing messages, or an outage in a high-load, mission-critical production environment, it's crucial to get up and running as quickly and safely as possible. IBM has hosted production Kafka environments for several years and has in-depth knowledge of how to diagnose and resolve problems rapidly and accurately to ensure minimal impact to end users. This session will discuss our experiences of how to most effectively collect and understand Kafka diagnostics. We'll talk through using these diagnostics to work out what's gone wrong, and how to recover from a system outage. Using this new-found knowledge, you will be equipped to handle any problem your cluster throws at you.
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
Talk from Kafka Summit San Francisco 2019 (https://kafka-summit.org/sessions/event-driven-model-serving-stream-processing-vs-rpc-kafka-tensorflow/). Video recording will be available for free on the Summit website.
Event-based stream processing is a modern paradigm to continuously process incoming data feeds, e.g. for IoT sensor analytics, payment and fraud detection, or logistics. Machine Learning / Deep Learning models can be leveraged in different ways to do predictions and improve the business processes. Either analytic models are deployed natively in the application or they are hosted in a remote model server. In the latter you combine stream processing with RPC / Request-Response paradigm instead of direct doing direct inference within the application. This talk discusses the pros and cons of both approaches and shows examples of stream processing vs. RPC model serving using Kubernetes, Apache Kafka, Kafka Streams, gRPC and TensorFlow Serving. The trade-offs of using a public cloud service like AWS or GCP for model deployment are also discussed and compared to local hosting for offline predictions directly “at the edge”.
Key takeaways
• Machine Learning / Deep Learning models can be used in different ways to do predictions. Scalability and loose coupling are important success factors
• Stream processing vs. RPC / Request-Response for model serving has many trade-offs – learn about alternatives and best practices for your different scenarios
• Understand the alternatives and trade-offs of model deployment in modern infrastructures like Kubernetes or Cloud Services like AWS or GCP
• See live demos with Java, gRPC, Apache Kafka, KSQL and TensorFlow Serving to understand the trade-offs
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
This document discusses challenges with writing streaming data from Kafka to Parquet files stored in HDFS. It evaluates several approaches: 1) windowing, which works but uses too much memory; 2) bucketing data into time-based files, which works for failures but requires modifying Flink; 3) closing files at checkpoints, which was later supported in Flink; and 4) hourly batch jobs, which avoids streaming complexity but limits real-time use. The conclusion is that streaming solutions are not trivial and it may be better to use a database or different tool instead of files for this use case. Supporting both real-time and batch processing is challenging.
This document provides an overview of Flume and Spark Streaming. It describes how Flume is used to reliably ingest streaming data into Hadoop using an agent-based architecture. Events are collected by sources, stored reliably in channels, and sent to sinks. The Flume connector allows ingested data to be processed in real-time using Spark Streaming's micro-batch architecture, where streams of data are processed through RDD transformations. This combined Flume + Spark Streaming approach provides a scalable and fault-tolerant way to reliably ingest and process streaming data.
The document discusses Akka 2.4 and commercial features available through the Reactive Platform. Key points include: Akka 2.4 requires Java 8 but provides backwards compatibility; Cluster Tools, Persistence, and Distributed PubSub are now stable features; Persistence allows cross-Scala version snapshot compatibility; a Split Brain Resolver is available in beta for cluster failure scenarios; and extended Java 6 support is provided through the Reactive Platform.
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...StreamNative
The Netdata Agent is free, open source single-node monitoring software. Netdata Cloud is a free, closed source, software-as-a-service that brings together metadata from endpoints running the Netdata Agent, giving a complete view of the health and performance of an infrastructure. All the metrics remain on the Netdata Agent, making Netdata Cloud the focal point of a distributed, infinitely scalable, low cost solution.
The heart of Netdata Cloud is Pulsar. Almost every message coming from and going to the open source agents passes through Pulsar. Pulsar's infinite number of topics has given us the flexibility we needed and in some cases, every single Netdata Agent has its own unique Pulsar topic. A single message from an agent or from a service that processes a front end request can trigger several other Pulsar messages, as we also use Pulsar for communication between microservices (using a CQRS pattern with shared subscriptions for scalability).
The reliable persistence of messages has allowed us to replay old events to rebuild old and build new materialized views and debug specific production issues. It's also what will enable us to implement an event sourcing pattern, for a new set of features we want to introduce shortly.
We have had a few issues with a specific client and our shared subscriptions that we're working on resolving, but overall Pulsar has proven to be one of the most reliable parts of our infrastructure and we decided to proceed with a managed services agreement.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination will bring the most value.
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
Netflix Keystone Pipeline processing 600 billion events a day, and detailed treatise on the modification of and use of Samza for real time routing of events including docker.
The document discusses designing robust data architectures for decision making. It advocates for building architectures that can easily add new data sources, improve and expand analytics, standardize metadata and storage for easy data access, discover and recover from mistakes. The key aspects discussed are using Kafka as a data bus to decouple pipelines, retaining all data for recovery and experimentation, treating the filesystem as a database by storing intermediate data, leveraging Spark and Spark Streaming for batch and stream processing, and maintaining schemas for integration and evolution of the system.
Building High Fidelity Data Streams (QCon London 2023)Sid Anand
The document discusses building reliable data streams. It begins by describing PayPal's need for a change data capture system to offload database queries. The author then built their own solution at PayPal to meet requirements like high availability and scalability.
Next, the document discusses building a simple initial streaming system with a source, destination, and messaging system between them. It emphasizes making non-functional requirements like reliability first-class citizens.
The document then explores how to make the system reliable by ensuring at-least-once delivery across each link. It proposes using transactions and auto-scaling groups. Finally, it discusses how to measure reliability using lag and loss metrics to track message delays across the system.
This document discusses various data link layer protocols. It begins by describing the services provided by the data link layer, including framing, error control, and flow control. It then discusses different types of framing such as fixed-size and variable-size. The document also covers different protocols for handling flow control and error control, including stop-and-wait, go-back-N ARQ, and selective repeat ARQ. It analyzes the performance of these protocols on both noiseless and noisy channels.
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
Hari Shreedharan/Cloudera @Playtika. With its easy to use interfaces and native integration with some of the most popular ingest tools, such as Kafka, Flume, Kinesis etc, Spark Streaming has become go-to tool for stream processing. Code sharing with Spark also makes it attractive. In this talk, we will discuss the latest features in Spark Streaming and how it integrates with Kafka natively with no data loss, and even do exactly once processing!
The document discusses using Apache Kafka for event detection pipelines. It describes how Kafka can be used to decouple data pipelines and ingest events from various source systems in real-time. It then provides an example use case of using Kafka, Hadoop, and machine learning for fraud detection in consumer banking, describing the online and offline workflows. Finally, it covers some of the challenges of building such a system and considerations for deploying Kafka.
This document discusses Apache Kafka and how it can be used by Oracle DBAs. It begins by explaining how Kafka builds upon the concept of a database redo log by providing a distributed commit log service. It then discusses how Kafka is a publish-subscribe messaging system and can be used to log transactions from any database, application logs, metrics and other system events. Finally, it discusses how schemas are important for Kafka since it only stores messages as bytes, and how Avro can be used to define and evolve schemas for Kafka messages.
This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.
https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E
Apache Samza is a framework for reliable stream processing using Apache Kafka and Hadoop YARN. It provides low-latency stream processing by allowing users to write stream processing jobs that consume messages from Kafka topics and process them using simple process functions. Samza jobs are distributed and run across clusters using YARN to provide reliability and scalability. The process functions in Samza allow users to easily integrate stream processing with state storage and message output to other Kafka topics.
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
The Netflix Way to deal with Big Data ProblemsMonal Daxini
The document discusses Netflix's approach to handling big data problems. It summarizes Netflix's data pipeline system called Keystone that was built in a year to replace a legacy system. Keystone ingests over 1 trillion events per day and processes them using technologies like Kafka, Samza and Spark Streaming. The document emphasizes Netflix's culture of freedom and responsibility and how it helped the small team replace the legacy system without disruption while achieving massive scale.
The need for gleaning answers from unbounded data streams is moving from nicety to a necessity. Netflix is a data driven company, and has a need to process over 1 trillion events a day amounting to 3 PB of data to derive business insights.
To ease extracting insight, we are building a self-serve, scalable, fault-tolerant, multi-tenant "Stream Processing as a Service" platform so the user can focus on data analysis. I'll share our experience using Flink to help build the platform.
Operationalizing Machine Learning: Serving ML ModelsLightbend
Join O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, as he discusses one of the hottest topics in software engineering today: serving machine learning models.
Typically with machine learning, different groups are responsible for model training and model serving. Data scientists often introduce their own machine-learning tools, causing software engineers to create complementary model-serving frameworks to keep pace. It’s not a very efficient system. In this webinar, Boris demonstrates a more standardized approach to model serving and model scoring:
* How to develop an architecture for serving models in real time as part of input stream processing
* How this approach enables data science teams to update models without restarting existing applications
* Different ways to build this model-scoring solution, using several popular stream processing engines and frameworks
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019confluent
Abstract Summary: While Apache Kafka is designed to be fault-tolerant, there will be times when your Kafka environment just isn't working as expected. Whether it's a newly configured application not processing messages, or an outage in a high-load, mission-critical production environment, it's crucial to get up and running as quickly and safely as possible. IBM has hosted production Kafka environments for several years and has in-depth knowledge of how to diagnose and resolve problems rapidly and accurately to ensure minimal impact to end users. This session will discuss our experiences of how to most effectively collect and understand Kafka diagnostics. We'll talk through using these diagnostics to work out what's gone wrong, and how to recover from a system outage. Using this new-found knowledge, you will be equipped to handle any problem your cluster throws at you.
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
Talk from Kafka Summit San Francisco 2019 (https://kafka-summit.org/sessions/event-driven-model-serving-stream-processing-vs-rpc-kafka-tensorflow/). Video recording will be available for free on the Summit website.
Event-based stream processing is a modern paradigm to continuously process incoming data feeds, e.g. for IoT sensor analytics, payment and fraud detection, or logistics. Machine Learning / Deep Learning models can be leveraged in different ways to do predictions and improve the business processes. Either analytic models are deployed natively in the application or they are hosted in a remote model server. In the latter you combine stream processing with RPC / Request-Response paradigm instead of direct doing direct inference within the application. This talk discusses the pros and cons of both approaches and shows examples of stream processing vs. RPC model serving using Kubernetes, Apache Kafka, Kafka Streams, gRPC and TensorFlow Serving. The trade-offs of using a public cloud service like AWS or GCP for model deployment are also discussed and compared to local hosting for offline predictions directly “at the edge”.
Key takeaways
• Machine Learning / Deep Learning models can be used in different ways to do predictions. Scalability and loose coupling are important success factors
• Stream processing vs. RPC / Request-Response for model serving has many trade-offs – learn about alternatives and best practices for your different scenarios
• Understand the alternatives and trade-offs of model deployment in modern infrastructures like Kubernetes or Cloud Services like AWS or GCP
• See live demos with Java, gRPC, Apache Kafka, KSQL and TensorFlow Serving to understand the trade-offs
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
This document discusses challenges with writing streaming data from Kafka to Parquet files stored in HDFS. It evaluates several approaches: 1) windowing, which works but uses too much memory; 2) bucketing data into time-based files, which works for failures but requires modifying Flink; 3) closing files at checkpoints, which was later supported in Flink; and 4) hourly batch jobs, which avoids streaming complexity but limits real-time use. The conclusion is that streaming solutions are not trivial and it may be better to use a database or different tool instead of files for this use case. Supporting both real-time and batch processing is challenging.
This document provides an overview of Flume and Spark Streaming. It describes how Flume is used to reliably ingest streaming data into Hadoop using an agent-based architecture. Events are collected by sources, stored reliably in channels, and sent to sinks. The Flume connector allows ingested data to be processed in real-time using Spark Streaming's micro-batch architecture, where streams of data are processed through RDD transformations. This combined Flume + Spark Streaming approach provides a scalable and fault-tolerant way to reliably ingest and process streaming data.
The document discusses Akka 2.4 and commercial features available through the Reactive Platform. Key points include: Akka 2.4 requires Java 8 but provides backwards compatibility; Cluster Tools, Persistence, and Distributed PubSub are now stable features; Persistence allows cross-Scala version snapshot compatibility; a Split Brain Resolver is available in beta for cluster failure scenarios; and extended Java 6 support is provided through the Reactive Platform.
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...StreamNative
The Netdata Agent is free, open source single-node monitoring software. Netdata Cloud is a free, closed source, software-as-a-service that brings together metadata from endpoints running the Netdata Agent, giving a complete view of the health and performance of an infrastructure. All the metrics remain on the Netdata Agent, making Netdata Cloud the focal point of a distributed, infinitely scalable, low cost solution.
The heart of Netdata Cloud is Pulsar. Almost every message coming from and going to the open source agents passes through Pulsar. Pulsar's infinite number of topics has given us the flexibility we needed and in some cases, every single Netdata Agent has its own unique Pulsar topic. A single message from an agent or from a service that processes a front end request can trigger several other Pulsar messages, as we also use Pulsar for communication between microservices (using a CQRS pattern with shared subscriptions for scalability).
The reliable persistence of messages has allowed us to replay old events to rebuild old and build new materialized views and debug specific production issues. It's also what will enable us to implement an event sourcing pattern, for a new set of features we want to introduce shortly.
We have had a few issues with a specific client and our shared subscriptions that we're working on resolving, but overall Pulsar has proven to be one of the most reliable parts of our infrastructure and we decided to proceed with a managed services agreement.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination will bring the most value.
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
Netflix Keystone Pipeline processing 600 billion events a day, and detailed treatise on the modification of and use of Samza for real time routing of events including docker.
The document discusses designing robust data architectures for decision making. It advocates for building architectures that can easily add new data sources, improve and expand analytics, standardize metadata and storage for easy data access, discover and recover from mistakes. The key aspects discussed are using Kafka as a data bus to decouple pipelines, retaining all data for recovery and experimentation, treating the filesystem as a database by storing intermediate data, leveraging Spark and Spark Streaming for batch and stream processing, and maintaining schemas for integration and evolution of the system.
Building High Fidelity Data Streams (QCon London 2023)Sid Anand
The document discusses building reliable data streams. It begins by describing PayPal's need for a change data capture system to offload database queries. The author then built their own solution at PayPal to meet requirements like high availability and scalability.
Next, the document discusses building a simple initial streaming system with a source, destination, and messaging system between them. It emphasizes making non-functional requirements like reliability first-class citizens.
The document then explores how to make the system reliable by ensuring at-least-once delivery across each link. It proposes using transactions and auto-scaling groups. Finally, it discusses how to measure reliability using lag and loss metrics to track message delays across the system.
This document discusses various data link layer protocols. It begins by describing the services provided by the data link layer, including framing, error control, and flow control. It then discusses different types of framing such as fixed-size and variable-size. The document also covers different protocols for handling flow control and error control, including stop-and-wait, go-back-N ARQ, and selective repeat ARQ. It analyzes the performance of these protocols on both noiseless and noisy channels.
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianDataStax Academy
This session will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.
Synapse 2018 Guarding against failure in a hundred step pipelineCalvin French-Owen
Control store (ctlstore) is a new infrastructure component that solves the "n+1 problem" of independent database failures bringing down a distributed system. It replicates control data from a system of record to local SQLite databases on each service node. Ctlstore loads data transactionally from the system of record using a loader, executive, and control data log. It then replicates changes to local databases using a reflector. Snapshots to object storage allow new instances to start up with the latest data. This approach provides high availability and fast querying of shared control data across many services.
Fahd Siddiqui describes the concept of full consistency lag in eventually consistent databases and how that concept can be leveraged in your own applications.
We at Global web tutors provide expert help for Congestion control assignment or Congestion control homework. Our Congestion control online tutors are expert in providing homework help to students at all levels. Please post your assignment at support@globalwebtutors.com to get the instant Congestion controlhomework help. Congestion controlonline tutors are available 24/7 to provide assignment help as well as Congestion control homework help.
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
For one of our projects, we needed to improve the current content delivery system for terminals. In this talk, I will share our experience in building an efficient data replication system for thousands of terminals. We will touch on architecture decisions and tradeoffs, technologies that we used, and a bit of load testing.
Spoiler: We didn't use Kafka.
This document discusses cloud native applications and service meshes. It notes the challenges of scaling monolithic applications to the cloud, including issues around redundancy, scheduling, service discovery, and resiliency. It then introduces containers as a way to address these challenges by pushing complexity between services. However, this introduces new challenges around observability, dependencies, failures, traffic flow, and security. The document proposes service meshes as a solution to these challenges, providing features like load balancing, failure handling, auto-scaling, and security across distributed services without requiring code changes. It provides examples using Consul Connect and discusses how service meshes can provide an "immune system" for cloud native applications.
This document summarizes key concepts about congestion control in TCP including:
- TCP uses additive increase multiplicative decrease (AIMD) to dynamically adjust the congestion window size and maintain efficiency and fairness.
- TCP has slow start and congestion avoidance states that govern how the congestion window is adjusted in response to acknowledgements.
- TCP responds to packet loss through fast retransmit, fast recovery, and halving the congestion window size to reduce congestion according to protocols like Tahoe, Reno, and New Reno.
I am Norman H. I am a Computer Networking Assignment Expert at computernetworkassignmenthelp.com. I hold a Master's in Computer Science from, McMaster University, Canada. I have been helping students with their assignments for the past 15 years. I solve assignments related to Computer Networking.
Visit computernetworkassignmenthelp.com or email support@computernetworkassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Computer Networking Assignment.
This document summarizes key topics related to data link control and protocols. It discusses framing methods like fixed-size and variable-size framing. It also covers flow control, error control, and protocols for both noiseless and noisy channels. Specific protocols described include the Simplest Protocol, Stop-and-Wait Protocol, Stop-and-Wait ARQ, Go-Back-N ARQ, and Selective Repeat ARQ. The document provides details on their design, algorithms, and flow diagrams to illustrate how each protocol handles framing, flow control, and error control.
TCP provides reliable data transfer through several key features:
- It numbers data bytes and uses acknowledgments to ensure all bytes are received correctly. If bytes are lost, they are retransmitted.
- Congestion control algorithms like slow start and congestion avoidance allow TCP to gradually increase data transfer rates while avoiding overwhelming the network.
- Fast retransmit detects lost packets sooner by retransmitting on three duplicate ACKs, while fast recovery resumes data transfer using ACKs still in the pipe.
The document discusses applications and simulations of error correction coding (ECC) for multicast file transfer. It provides an overview of different ECC and feedback-based multicast protocols and evaluates their performance based on simulations. Reed-Solomon coding on blocks provided faster decoding times than on entire files, while tornado coding had the fastest decoding but required slightly more packets for reconstruction. Simulations of protocols like MFTP and MFTP/EC using network simulators showed that using ECC like Reed-Muller codes significantly improved performance over regular MFTP.
This document discusses Reactive Programming and Reactive Streams. It introduces Reactor, a reactive programming framework, and how it addresses issues like latency in microservices architectures. Reactive Streams provide an interoperable way to work with asynchronous data streams in a non-blocking manner. Streams represent sequences of data that can be processed reactively through operators like map and filter.
Deep Dive: AWS X-Ray London Summit 2017Randall Hunt
Instrument production applications (both in AWS and on prem) with x-ray to collect live telemetry and latency metrics on your applications. You can also use it to debug live!
How to move a mission critical system to 4 AWS regions in one year?Wojciech Gawroński
A year ago our team was challenged to enhance the scope and scale of an existing platform, that is providing significant revenue for our client. As the designers and maintainers of that solution, we decided to leverage AWS cloud during that transition. In the presentation, I would like to discuss how we have tackled that migration - with the assumption that we had to move in a limited resource, hybrid cloud environment - working in close cooperation with teams responsible for other parts of the system. As I stated previously - it was a challenge - and I would like to talk what problems we have solved during that process. Also, what services we have leveraged to smooth the transition. And last, but not least - I would like to present how we have maintained the delivery pipeline, automation and massive pile of CloudFormation templates and why AWS Lambda is an excellent glue for any operational work you have to do in the cloud. Our hard work paid off. In October 2017 we have deployed our system into 4th AWS region. Bare with me during the talk, and you will learn how we achieved that
Lessons learned migrating 100+ services to KubernetesJose Galarza
TransferWise migrated over 150 services from bare metal servers to Kubernetes in AWS over the course of a year. They developed tools and processes to automate cluster creation with Terraform, service deployment through custom manifest generation from YAML definitions, and a network mesh with Envoy to enable service-to-service communication. Some lessons learned included not exposing Kubernetes complexity, choosing the right CNI plugin, and installing node local DNS to reduce errors. Future plans include using EKS for simplified upgrades, enabling developer environments, and implementing network policies and advanced deployment pipelines.
I am Bernard. I am a Computer Networking Assignment Expert at computernetworkassignmenthelp.com. I hold a Master's in Computer Science from, University of Leeds, UK. I have been helping students with their assignments for the past 12 years. I solve assignments related to Computer Networking.
Visit computernetworkassignmenthelp.com or email support@computernetworkassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Computer Networking Assignment.
The document provides an overview of asynchronous processing and how it relates to scalability and performance. It discusses key topics like sync vs async, scheduling, latency measurement, concurrent vs lock-free vs wait-free data structures, I/O models like IO, AIO, NIO, zero-copy, and sorting algorithms. It emphasizes picking the right tools for the job and properly benchmarking and measuring performance.
The data link layer transforms the physical layer into a link responsible for node-to-node communication. It provides framing, addressing, error control, and flow control. Specific responsibilities include grouping bits into frames, adding addressing and error detection through checksums, and preventing fast senders from overwhelming slow receivers through flow control. Data link protocols must provide well-defined interfaces, handle transmission errors, and regulate data flow. They offer services like unacknowledged connectionless, acknowledged connectionless, and acknowledged connection-oriented to transfer data reliably between nodes.
Similar to Building & Operating High-Fidelity Data Streams - QCon Plus 2021 (20)
PayPal has seen tremendous growth in recent years, processing over 7.8 billion payments transactions annually for over 227 million active customer accounts across 200+ markets and currencies. To support this scale, PayPal's data infrastructure includes over 2,000 database instances, 116 billion database calls per day, and over 74 petabytes of total storage. PayPal continues enhancing its data infrastructure to meet growing analytics and machine learning needs through technologies like Kafka, Hadoop, graph databases and real-time OLAP engines.
Building Better Data Pipelines using Apache AirflowSid Anand
Apache Airflow is a platform for authoring, scheduling, and monitoring workflows or directed acyclic graphs (DAGs). It allows users to programmatically author DAGs in Python without needing to bundle many XML files. The UI provides a tree view to see DAG runs over time and Gantt charts to see performance trends. Airflow is useful for ETL pipelines, machine learning workflows, and general job scheduling. It handles task dependencies and failures, monitors performance, and enforces service level agreements. Behind the scenes, the scheduler distributes tasks from the metadata database to Celery workers via RabbitMQ.
Cloud Native Predictive Data Pipelines (micro talk)Sid Anand
This document summarizes Sid Anand's microtalk on cloud native predictive data pipelines at a PayPal risk infrastructure all-hands meeting in January 2018. It discusses Agari's approach to detecting spear phishing emails in near real-time by intercepting emails and sending metadata to AWS cloud services for trust modeling and scoring, then returning signals to quarantine, label, or pass through emails. The architecture uses microservices, decoupled services, immutable services, polyglot persistence leveraging various AWS services, and focuses on building trust models for scoring emails in near real-time and batch processing.
Cloud Native Data Pipelines (GoTo Chicago 2017)Sid Anand
Cloud Native Data Pipelines
The document discusses building data pipelines in a cloud native way using open source technologies and cloud native techniques. It describes a message scoring use case at Agari where data is ingested from multiple enterprises into S3 and then processed through a Spark job on EMR hourly. The results are written to S3 and trigger downstream processing. Design goals for resilient data pipelines include operability, correctness, timeliness and cost. Techniques discussed to achieve these goals include using Apache Airflow for workflow management, auto scaling groups, and leveraging serverless technologies where possible.
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
This document discusses cloud native data pipelines. It begins by introducing the speaker and their company, Agari, which applies trust models to email metadata to score messages. The document then discusses design goals for resilient data pipelines, including operability, correctness, timeliness and cost. It presents two use cases at Agari: batch message scoring and near real-time message scoring. For each use case, the pipeline architecture is shown including components like S3, SNS, SQS, ASGs, EMR and databases. The document discusses leveraging AWS services and tools like Airflow, Packer and Terraform to tackle issues like cost, timeliness, operability and correctness. It also introduces innovations like Apache Avro for
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
Slides from "Cloud Native Data Pipelines" talk given @ QCon Tokyo 2016. The slides are in both English and Japanese. Thanks to Kiro Harada (https://jp.linkedin.com/in/haradakiro) for the translation.
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Sid Anand
This document discusses cloud native data pipelines. It begins by describing the speaker and their work experience. Then, it outlines some key qualities of resilient data pipelines like operability, correctness, timeliness and cost. Two use cases at the speaker's company for applying trust models to messages are presented - one using batch processing and the other using near real-time processing. The document discusses how tools like Apache Airflow, auto-scaling groups, Amazon Kinesis and Avro can help achieve those qualities for data pipelines in the cloud.
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
Apache Airflow is a platform for authoring, scheduling, and monitoring workflows or directed acyclic graphs (DAGs) of tasks. It includes a DAG scheduler, web UI, and CLI. Airflow allows users to author DAGs in Python without needing to bundle many XML files. The UI provides tree and Gantt chart views to monitor DAG runs over time. Airflow was accepted into the Apache Incubator in 2016 and has over 300 users from 40+ companies. Agari uses Airflow to orchestrate message scoring pipelines across AWS services like S3, Spark, SQS, and databases to enforce SLAs on correctness and timeliness. Areas for further improvement include security, APIs, execution scaling, and on
Agari uses Apache Airflow to automate and orchestrate their data pipelines. They have two main classes of orchestration - operational automation and building new products. One use case described is message scoring, where Airflow manages a batch pipeline to score messages from multiple enterprises on S3, run Spark jobs to score the messages, write outputs to S3/DB, and ingest the results. Airflow allows them to monitor SLAs for correctness and timeliness and integrate with monitoring tools to alert on SLA misses. They operate Airflow in AWS across multiple environments for security, fault tolerance and production deployments.
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
Sid Anand presented on building resilient predictive data pipelines. The key challenges discussed were the "blast radius" problem where bugs can affect downstream jobs and data, and ensuring timeliness as pipelines and algorithms change over time. Design goals for resilient pipelines include operability, correctness, timeliness and cost. AWS services like SQS, SNS, EMR and auto scaling groups were presented as ways to address these goals by enabling quick recoverability, pay as you go costs, and auto scaling to improve timeliness. Apache Airflow was also discussed as a way to provide workflow automation, scheduling, performance insights and integration with monitoring tools to improve operability and correctness.
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
This document discusses building resilient predictive data pipelines. It begins by distinguishing between ETL and predictive data pipelines, noting that predictive pipelines require high availability with downtimes of less than an hour. The document then outlines design goals for resilient data pipelines, including being scalable, available, instrumented/monitored/alert-enabled, and quickly recoverable. It proposes using AWS services like SQS, SNS, S3, and Auto Scaling Groups to build such pipelines. The document also recommends using Apache Airflow for workflow automation and scheduling to reliably manage pipelines as directed acyclic graphs. It presents an architecture using these techniques and assesses how well it meets the outlined design goals.
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
The document provides details about Sid Anand's career and then discusses LinkedIn's software development process and architecture when he was there. It notes that when Sid started at LinkedIn in 2011, compiling the code took a long time due to the large codebase and many dependencies. It then describes how LinkedIn scaled to support hundreds of millions of members and thousands of employees by splitting the monolithic codebase into individual Git repos, using intermediate JARs to reduce dependencies, and connecting development machines to test environments instead of deploying everything locally. It also discusses LinkedIn's use of Kafka, search federation, and not making web service calls between data centers to scale across multiple data centers.
Building a Modern Website for Scale (QCon NY 2013)Sid Anand
LinkedIn uses several technologies to scale its services and infrastructure to support over 200 million members. It uses a dynamic discovery and client-side load balancing approach for its web services to improve fault tolerance. The presentation tier is composed of various front-end frameworks while business logic is encapsulated in services. LinkedIn's databases Espresso and Oracle are scaled using techniques like data replication, read replicas and change data capture via Databus. Databus provides a consistent, real-time stream of database changes to power services like search, recommendations and standardization. Messaging is handled using Apache Kafka which provides pub-sub streaming capabilities.
Maven is a build system that provides:
1) A standard directory structure for projects;
2) A standard build lifecycle of phases like compile, test, package; and
3) The ability to override defaults through plugins.
The document then discusses several key aspects of Maven including:
1) The standard Maven directory structure for source code, resources, and test code;
2) The default Maven lifecycle phases like compile, test, package; and
3) The pom.xml file which is Maven's build specification and configuration file.
Git is a version control system that stores snapshots of files rather than tracking changes between file versions. It allows for offline work and nearly all operations are performed locally. Files can exist in three states - committed, modified, or staged. Commits create snapshots of the staged files. Branches act as pointers to commits, with the default branch being master.
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
Learn about Espresso, Databus, and Voldemort. LinkedIn Data Infrastructure Slides (Version 2). This talk was given in NYC on June 20, 2012
You can download the slides as PPT in order to see the transitions here :
http://bit.ly/LfH6Ru
LinkedIn Data Infrastructure (QCon London 2012)Sid Anand
LinkedIn uses DataBus to replicate data changes from Oracle in real-time. DataBus consists of relay and bootstrap services that capture changes from Oracle and distribute them to various services like search indexes, graphs, and read replicas to keep them updated in real-time. This allows users to immediately see profile updates or new connections in search results and feeds.
This is a talk about Netflix's path to Cassandra. The first few slides may look similar to previous presentations, but they are just to set the context. Most the content is brand new!
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
17. Why Are Streams Hard?
In streaming architectures, any gaps in non-functional requirements can be unforgiving
You end up spending a lot of your time
fi
ghting
fi
res & keeping systems up
If you don’t build your systems with the -ilities as
fi
rst class citizens, you pay an
operational tax
18. Why Are Streams Hard?
In streaming architectures, any gaps in non-functional requirements can be unforgiving
You end up spending a lot of your time
fi
ghting
fi
res & keeping systems up
If you don’t build your systems with the -ilities as
fi
rst class citizens, you pay an
operational tax
… and this translates to unhappy customers and burnt-out team members!
19. Why Are Streams Hard?
In streaming architectures, any gaps in non-functional requirements can be unforgiving
You end up spending a lot of your time
fi
ghting
fi
res & keeping systems up
If you don’t build your systems with the -ilities as
fi
rst class citizens, you pay an
operational tax
… and this translates to unhappy customers and burnt-out team members!
In this talk, we will focus on building high-
fi
delity streams from the ground up!
21. Start Simple
Goal : Build a system that can deliver messages from source S to destination D
S D
22. Start Simple
Goal : Build a system that can deliver messages from source S to destination D
S D
But
fi
rst, let’s decouple S and D by putting messaging infrastructure between them
E
S D
Events topic
23. Start Simple
Make a few more implementation decisions about this system
E
S D
24. Start Simple
Make a few more implementation decisions about this system
E
S D
Run our system on a cloud platform (e.g. AWS)
25. Start Simple
Make a few more implementation decisions about this system
Operate at low scale
E
S D
Run our system on a cloud platform (e.g. AWS)
26. Start Simple
Make a few more implementation decisions about this system
Operate at low scale
Kafka with a single partition
E
S D
Run our system on a cloud platform (e.g. AWS)
27. Start Simple
Make a few more implementation decisions about this system
Operate at low scale
Kafka with a single partition
Kafka across 3 brokers split across AZs with RF=3 (min in-sync replicas =2)
E
S D
Run our system on a cloud platform (e.g. AWS)
28. Start Simple
Make a few more implementation decisions about this system
Operate at low scale
Kafka with a single partition
Kafka across 3 brokers split across AZs with RF=3 (min in-sync replicas =2)
Run S & D on single, separate EC2 Instances
E
S D
Run our system on a cloud platform (e.g. AWS)
29. Start Simple
T
o make things a bit more interesting, let’s provide our stream as a service
We de
fi
ne our system boundary using a blue box as shown below!
È
S D
32. Reliability
Goal : Build a system that can deliver messages reliably from S to D
È
S D
Concrete Goal : 0 message loss
33. Reliability
Goal : Build a system that can deliver messages reliably from S to D
È
S D
Concrete Goal : 0 message loss
Once S has ACKd a message to a remote sender, D must deliver that message to
a remote receiver
37. Reliability
`
A B C
m1 m1
T
reat the messaging system like a chain — it’s only as strong as its weakest link
In order to make this system reliable
38. Reliability
`
A B C
m1 m1
T
reat the messaging system like a chain — it’s only as strong as its weakest link
Insight : If each process/link is transactional in nature, the chain will be
transactional!
In order to make this system reliable
39. Reliability
`
A B C
m1 m1
T
reat the messaging system like a chain — it’s only as strong as its weakest link
In order to make this system reliable
Insight : If each process/link is transactional in nature, the chain will be
transactional!
T
ransactionality = At least once delivery
40. Reliability
`
A B C
m1 m1
T
reat the messaging system like a chain — it’s only as strong as its weakest link
How do we make each link transactional?
In order to make this system reliable
Insight : If each process/link is transactional in nature, the chain will be
transactional!
T
ransactionality = At least once delivery
45. Reliability
But, how do we handle edge nodes A & C?
B̀
m1 m1
`
A
m1 m1
` C m1
m1
What does A need to do?
• Receive a Request (e.g. REST)
• Do some processing
• Reliably send data to Kafka
• kProducer.send(topic, message)
• kProducer.
fl
ush()
• Producer Con
fi
g
• acks = all
• Send HTTP Response to caller
46. Reliability
But, how do we handle edge nodes A & C?
B̀
m1 m1
`
A
m1 m1
` C m1
m1
What does C need to do?
• Read data (a batch) from Kafka
• Do some processing
• Reliably send data out
• ACK / NACK Kafka
• Consumer Con
fi
g
• enable.auto.commit = false
• ACK moves the read checkpoint
forward
• NACK forces a reread of the same data
47. Reliability
But, how do we handle edge nodes A & C?
B̀
m1 m1
`
A
m1 m1
` C m1
m1
B is a combination of A and C
48. Reliability
But, how do we handle edge nodes A & C?
B̀
m1 m1
`
A
m1 m1
` C m1
m1
B is a combination of A and C
B needs to act like a reliable Kafka Producer
49. Reliability
But, how do we handle edge nodes A & C?
B̀
m1 m1
`
A
m1 m1
` C m1
m1
B is a combination of A and C
B needs to act like a reliable Kafka Producer
B needs to act like a reliable Kafka Consumer
50. `
A B C
m1 m1
Reliability
How reliable is our system now?
52. Reliability
How reliable is our system now?
What happens if a process crashes?
If A crashes, we will have a complete outage at ingestion!
`
A B C
m1 m1
53. Reliability
How reliable is our system now?
If C crashes, we will stop delivering messages to external consumers!
What happens if a process crashes?
If A crashes, we will have a complete outage at ingestion!
`
A B C
m1 m1
54. Reliability
`
A B C
m1 m1
Solution : Place each service in an autoscaling group of size T
`
A B C
m1 m1
T-1 concurrent
failures
55. Reliability
`
A B C
m1 m1
Solution : Place each service in an autoscaling group of size T
`
A B C
m1 m1
T-1 concurrent
failures
For now, we appear to have a pretty reliable data stream
58. Lag : What is it?
Lag is simply a measure of message delay in a system
59. Lag : What is it?
Lag is simply a measure of message delay in a system
The longer a message takes to transit a system, the greater its lag
60. Lag : What is it?
Lag is simply a measure of message delay in a system
The longer a message takes to transit a system, the greater its lag
The greater the lag, the greater the impact to the business
61. Lag : What is it?
Lag is simply a measure of message delay in a system
The longer a message takes to transit a system, the greater its lag
The greater the lag, the greater the impact to the business
Hence, our goal is to minimize lag in order to deliver insights as quickly as possible
63. Lag : How do we compute it?
eventTime : the creation time of an event message
Lag can be calculated for any message m1 at any node N in the system as
lag(m1, N) = current_time(m1, N) - eventTime(m1)
`
A B C
m1 m1
T0
eventTime:
64. Lag : How do we compute it?
`
A B C
T0 = 12p
eventTime:
m1
65. Lag : How do we compute it?
`
A B C
T1 = 12:01p
T0 = 12p
eventTime:
m1
66. Lag : How do we compute it?
`
A B C
T1 = 12:01p T3 = 12:04p
T0 = 12p
eventTime:
m1
67. Lag : How do we compute it?
`
A B C
T1 = 12:01p T3 = 12:04p T5 = 12:10p
T0 = 12p
eventTime:
m1 m1
68. Lag : How do we compute it?
`
A B C
T1 = 12:01p T3 = 12:04p T5 = 12:10p
T0 = 12p
eventTime:
m1 m1
lag(m1, A) = T1-T0 lag(m1, B) = T3-T0 lag(m1, C) = T5-T0
lag(m1, A) = 1m lag(m1, B) = 4m lag(m1, C) = 10m
69. `
A B C
Arrival Lag (Lag-in): time message arrives - eventTime
T1 T3 T5
T0
eventTime:
m1 m1
Lag : How do we compute it?
Lag-in @
A = T1 - T0 (e.g 1 ms)
B = T3 - T0 (e.g 3 ms)
C = T5 - T0 (e.g 8 ms)
70. `
A B C
Arrival Lag (Lag-in): time message arrives - eventTime
T1 T3 T5
T0
eventTime:
m1 m1
Lag : How do we compute it?
Lag-in @
A = T1 - T0 (e.g 1 ms)
B = T3 - T0 (e.g 3 ms)
C = T5 - T0 (e.g 8 ms)
A
B C
Observation: Lag is Cumulative
71. Lag : How do we compute it?
Lag-in @
A = T2 - T0 (e.g 2 ms)
B = T4 - T0 (e.g 4 ms)
C = T6 - T0 (e.g 10 ms)
`
A B C
Arrival Lag (Lag-in): time message arrives - eventTime
T1 T3 T5
Departure Lag (Lag-out): time message leaves - eventTime
T2 T4 T6
T0
eventTime:
m1
72. Lag : How do we compute it?
Lag-in @
A = T2 - T0 (e.g 2 ms)
B = T4 - T0 (e.g 4 ms)
C = T6 - T0 (e.g 10 ms)
`
A B C
Arrival Lag (Lag-in): time message arrives - eventTime
T1 T3 T5
Departure Lag (Lag-out): time message leaves - eventTime
T2 T4
T0
eventTime:
m1
The most important metric for Lag in any streaming system
is E2E Lag: the total time a message spent in the system
T6
73. Lag : How do we compute it?
Lag-in @
A = T2 - T0 (e.g 2 ms)
B = T4 - T0 (e.g 4 ms)
C = T6 - T0 (e.g 10 ms)
`
A B C
Arrival Lag (Lag-in): time message arrives - eventTime
T1 T3 T5
Departure Lag (Lag-out): time message leaves - eventTime
T2 T4
T0
eventTime:
m1
E2E Lag
The most important metric for Lag in any streaming system
is E2E Lag: the total time a message spent in the system
T6
74. Lag : How do we compute it?
While it is interesting to know the lag for a particular message m1, it is of little use
since we typically deal with millions of messages
75. Lag : How do we compute it?
While it is interesting to know the lag for a particular message m1, it is of little use
since we typically deal with millions of messages
Instead, we prefer statistics (e.g. P95) to capture population behavior
76. Lag : How do we compute it?
Some useful Lag statistics are:
E2E Lag (p95) : 95th percentile time of messages spent in the system
Lag_[in|out](N, p95): P95 Lag_in or Lag_out at any Node N
77. Lag : How do we compute it?
Some useful Lag statistics are:
E2E Lag (p95) : 95th percentile time of messages spent in the system
Lag_[in|out](N, p95): P95 Lag_in or Lag_out at any Node N
Process_Duration(N, p95) : Time Spent at any node in the chain!
Lag_out(N, p95) - Lag_in(N, p95)
80. Loss : What is it?
Loss is simply a measure of messages lost while transiting the system
81. Loss : What is it?
Loss is simply a measure of messages lost while transiting the system
Messages can be lost for various reasons, most of which we can mitigate!
82. Loss : What is it?
Loss is simply a measure of messages lost while transiting the system
Messages can be lost for various reasons, most of which we can mitigate!
The greater the loss, the lower the data quality
83. Loss : What is it?
Loss is simply a measure of messages lost while transiting the system
Messages can be lost for various reasons, most of which we can mitigate!
The greater the loss, the lower the data quality
Hence, our goal is to minimize loss in order to deliver high quality insights
85. Loss : How do we compute it?
Concepts : Loss
Loss can be computed as the set difference of messages between any 2
points in the system
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
86. Loss : How do we compute it?
Message Id E2E Loss E2E Loss %
m1 1 1 1 1
m2 1 1 1 1
m3 1 0 0 0
… … … … …
m10 1 1 0 0
Count 10 9 7 5
Per Node Loss(N) 0 1 2 2 5 50%
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
87. Loss : How do we compute it?
In a streaming data system, messages never stop
fl
owing. So, how do we know when
to count?
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m11
m1
m1
m1
m1
m1
m21
m1
m1
m1
m1
m1
m31
88. Loss : How do we compute it?
In a streaming data system, messages never stop
fl
owing. So, how do we know when
to count?
Solution
Allocate messages to 1-minute wide time buckets using message eventTime
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m11
m1
m1
m1
m1
m1
m21
m1
m1
m1
m1
m1
m31
89. Loss : How do we compute it?
Message Id E2E Loss E2E Loss %
m1 1 1 1 1
m2 1 1 1 1
m3 1 0 0 0
… … … … …
m10 1 1 0 0
Count 10 9 7 5
Per Node Loss(N) 0 1 2 2 5 50%
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
@12:34p
90. Loss : How do we compute it?
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
@12:34p @12:35p @12:36p @12:37p @12:38p @12:39p @12:40p
91. Loss : How do we compute it?
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
@12:34p @12:35p @12:36p @12:37p @12:38p @12:39p @12:40p
Now
92. Loss : How do we compute it?
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
@12:34p @12:35p @12:36p @12:37p @12:38p @12:39p @12:40p
Now
Update late arrival data
93. Loss : How do we compute it?
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
@12:34p @12:35p @12:36p @12:37p @12:38p @12:39p @12:40p
Now
Update late arrival data
Compute
Loss
94. Loss : How do we compute it?
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
@12:34p @12:35p @12:36p @12:37p @12:38p @12:39p @12:40p
Now
Update late arrival data
Compute
Loss
Age Out
95. Loss : How do we compute it?
In a streaming data system, messages never stop
fl
owing. So, how do we know when
to count?
Solution
Allocate messages to 1-minute wide time buckets using message eventTime
Wait a few minutes for messages to transit, then compute loss (e.g. 12:35p loss table)
Raise alarms if loss occurs over a con
fi
gured threshold (e.g. > 1%)
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m1
m11
m1
m1
m1
m1
m1
m21
m1
m1
m1
m1
m1
m31
96. We now have a way to measure the reliability (via Loss metrics) and latency (via Lag
metrics) of our system.
Loss : How do we compute it?
But wait…
98. Performance
Goal : Build a system that can deliver messages reliably from S to D with low latency
S
S
S
S
S
S
S
D
S
S
S
…
SS
S
…
T
o understand streaming system performance, let’s understand the components of E2E Lag
101. Performance
Ingest Time Expel Time
S
S
S
S
S
S
S
D
S
S
S
…
S
S
S
…
Expel Time : Time to process and egest a message at D.
102. Performance
E2E Lag
Ingest Time Expel Time
T
ransit Time
S
S
S
S
S
S
S
D
S
S
S
…
S
S
S
…
E2E Lag : T
otal time messages spend in the system from message ingest to expel!
104. Performance
Challenge 1 : Ingest Penalty
In the name of reliability, S needs to call kProducer.
fl
ush() on every inbound API
request
S also needs to wait for 3 ACKS from Kafka before sending its API response
E2E Lag
Ingest Time Expel Time
T
ransit Time
S
S
S
S
S
S
S
D
S
S
S
…
SS
S
…
105. Performance
Challenge 1 : Ingest Penalty
Approach : Amortization
Support Batch APIs (i.e. multiple messages per web request) to amortize the
ingest penalty
E2E Lag
Ingest Time Expel Time
T
ransit Time
S
S
S
S
S
S
S
D
S
S
S
…
SS
S
…
106. Performance
E2E Lag
Ingest Time Expel Time
T
ransit Time
Challenge 2 : Expel Penalty
Observations
Kafka is very fast — many orders of magnitude faster than HTTP RTT
s
The majority of the expel time is the HTTP RTT
S
S
S
S
S
S
S
D
S
S
S
…
SS
S
…
107. Performance
E2E Lag
Ingest Time Expel Time
T
ransit Time
Challenge 2 : Expel Penalty
Approach : Amortization
In each D node, add batch + parallelism
S
S
S
S
S
S
S
D
S
S
S
…
SS
S
…
108. Performance
Challenge 3 : Retry Penalty (@ D)
Concepts
In order to run a zero-loss pipeline, we need to retry messages @ D that will
succeed given enough attempts
109. Performance
Challenge 3 : Retry Penalty (@ D)
Concepts
In order to run a zero-loss pipeline, we need to retry messages @ D that will
succeed given enough attempts
We call these Recoverable Failures
110. Performance
Challenge 3 : Retry Penalty (@ D)
Concepts
In order to run a zero-loss pipeline, we need to retry messages @ D that will
succeed given enough attempts
We call these Recoverable Failures
In contrast, we should never retry a message that has 0 chance of success!
We call these Non-Recoverable Failures
111. Performance
Challenge 3 : Retry Penalty (@ D)
Concepts
In order to run a zero-loss pipeline, we need to retry messages @ D that will
succeed given enough attempts
We call these Recoverable Failures
In contrast, we should never retry a message that has 0 chance of success!
We call these Non-Recoverable Failures
E.g. Any 4xx HTTP response code, except for 429 (T
oo Many Requests)
112. Performance
Challenge 3 : Retry Penalty
Approach
We pay a latency penalty on retry, so we need to smart about
What we retry — Don’t retry any non-recoverable failures
How we retry
113. Performance
Challenge 3 : Retry Penalty
Approach
We pay a latency penalty on retry, so we need to smart about
What we retry — Don’t retry any non-recoverable failures
How we retry — One Idea : Tiered Retries
114. Performance - Tiered Retries
Local Retries
T
ry to send message a
con
fi
gurable number of times @ D
Global Retries
115. Performance - Tiered Retries
Local Retries
T
ry to send message a
con
fi
gurable number of times @ D
If we exhaust local retries, D
transfers the message to a Global
Retrier
Global Retries
116. Performance - Tiered Retries
Local Retries
T
ry to send message a
con
fi
gurable number of times @ D
If we exhaust local retries, D
transfers the message to a Global
Retrier
Global Retries
The Global Retrier than retries
the message over a longer span of
time
121. Scalability
First, Let’s dispel a myth!
Each system is traf
fi
c-rated
The traf
fi
c rating comes from running load tests
There is no such thing as a system that can handle in
fi
nite scale
122. Scalability
First, Let’s dispel a myth!
Each system is traf
fi
c-rated
The traf
fi
c rating comes from running load tests
There is no such thing as a system that can handle in
fi
nite scale
We only achieve higher scale by iteratively running load tests & removing bottlenecks
123. Scalability - Autoscaling
Autoscaling Goals (for data streams):
Goal 1: Automatically scale out to maintain low latency (e.g. E2E Lag)
Goal 2: Automatically scale in to minimize cost
124. Scalability - Autoscaling
Autoscaling Goals (for data streams):
Goal 1: Automatically scale out to maintain low latency (e.g. E2E Lag)
Goal 2: Automatically scale in to minimize cost
125. Scalability - Autoscaling
Autoscaling Goals (for data streams):
Goal 1: Automatically scale out to maintain low latency (e.g. E2E Lag)
Goal 2: Automatically scale in to minimize cost
Autoscaling Considerations
What can autoscale? What can’t autoscale?
126. Scalability - Autoscaling
Autoscaling Goals (for data streams):
Goal 1: Automatically scale out to maintain low latency (e.g. E2E Lag)
Goal 2: Automatically scale in to minimize cost
Autoscaling Considerations
What can autoscale? What can’t autoscale?
127. Scalability - Autoscaling EC2
The most important part of autoscaling is picking the right metric to trigger
autoscaling actions
128. Scalability - Autoscaling EC2
Pick a metric that
Preserves low latency
Goes up as traf
fi
c increases
Goes down as the microservice scales out
129. Scalability - Autoscaling EC2
Pick a metric that
Preserves low latency
Goes up as traf
fi
c increases
Goes down as the microservice scales out
E.g.
Average CPU
130. Scalability - Autoscaling EC2
Pick a metric that
Preserves low latency
Goes up as traf
fi
c increases
Goes down as the microservice scales out
E.g.
Average CPU
What to be wary of
Any locks/code synchronization & IO Waits
Otherwise … As traf
fi
c increases, CPU will plateau, auto-
scale-out will stop, and latency (i.e. E2E Lag) will increase
131. Conclusion
• We now have a system with the Non-functional Requirements (NFRs)
that we desire!
• While we’ve covered many key elements, a few areas will be covered
in future talks (e.g. Isolation, Containerization, Caching).
• These will be covered in upcoming blogs! Follow for updates on
(@r39132)