As we edge towards larger, more complex and decoupled systems, combined with the continual growth of the global information graph, our frontiers of unsolved challenges grow equally as fast. Central challenges for distributed systems include persistence strategies across DCs, zones or regions, network partitions, data optimization, system stability in all phases.
How does leveraging CRDTs and Event Sourcing address several core distributed systems challenges? What are useful strategies and patterns involved in the design, deployment, and running of stateful and stateless applications for the cloud, for example with Kubernetes. Combined with code samples, we will see how Akka Cluster, Multi-DC Persistence, Split Brain, Sharding and Distributed Data can help solve these problems.
Talk at Reactive Summit 2016 on on-demand streaming, job discovery & chaining and auto-scaling aspects of the Mantis Reactive Stream Processing platform at Netflix.
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lightbend
In this guest webinar with Chris McDermott, Lead Data Engineer at HPE, learn how HPE InfoSight–powered by Lightbend Platform–has emerged as the go-to solution for providing real-time metrics and predictive analytics across various network, server, storage, and data center technologies.
20160609 nike techtalks reactive applications tools of the tradeshinolajla
An update to my talk about concurrency abstractions, including event loops (node.js and Vert.x), CSP (Go, Clojure), Futures, CPS/Dataflow (RxJava) and Actors (Erlang, Akka)
Optimizing Alert Monitoring with Oracle Enterprise ManagerDatavail
Watch this webinar to find out how OEM Grid configuration using Datavail’s Alert Optimizer™ and custom templates helps eliminate unwanted alerts, while enriching actionable alerts, and improving the performance of the entire database system.
These five areas help organize the tuning approach and define the major concerns beyond the architecture, setup, and data model. It also addresses how performance tuning becomes less of a mystery if it can be measured, documented, affected, and improved.
Talk at Reactive Summit 2016 on on-demand streaming, job discovery & chaining and auto-scaling aspects of the Mantis Reactive Stream Processing platform at Netflix.
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lightbend
In this guest webinar with Chris McDermott, Lead Data Engineer at HPE, learn how HPE InfoSight–powered by Lightbend Platform–has emerged as the go-to solution for providing real-time metrics and predictive analytics across various network, server, storage, and data center technologies.
20160609 nike techtalks reactive applications tools of the tradeshinolajla
An update to my talk about concurrency abstractions, including event loops (node.js and Vert.x), CSP (Go, Clojure), Futures, CPS/Dataflow (RxJava) and Actors (Erlang, Akka)
Optimizing Alert Monitoring with Oracle Enterprise ManagerDatavail
Watch this webinar to find out how OEM Grid configuration using Datavail’s Alert Optimizer™ and custom templates helps eliminate unwanted alerts, while enriching actionable alerts, and improving the performance of the entire database system.
These five areas help organize the tuning approach and define the major concerns beyond the architecture, setup, and data model. It also addresses how performance tuning becomes less of a mystery if it can be measured, documented, affected, and improved.
This talk will address new architectures emerging for large scale streaming analytics. Some based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) and other newer streaming analytics platforms and frameworks using Apache Flink or GearPump. Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (e.g. ETL).
I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...Helena Edelson
Building Self Healing, Intelligent Platforms, systems that learn, multi-datacenter, removing human intervention with ML. Reactive Summit 2016 @helenaedelson
It's harder than ever to predict the load your application will need to handle in advance, so how do you design your architecture so you can afford to implement as you go and be ready for whatever comes your way. It's easy to focus on optimizing each part of your application but your application architecture determines the options you have to make big leaps in scalability. In this talk we'll cover practical patterns you can build today to meet the needs of rapid development while still creating systems that can scale up and out. Specific code examples will focus on .NET but the principles apply across many technologies. Real world systems will be discussed based on our experience helping customers around the world optimize their enterprise applications.
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
Building large scale, job processing systems with Scala Akka Actor frameworkVignesh Sukumar
The Akka Actor framework is designed to be a fast message processing system. In this talk, we will explain how, at Box, we have used this framework to develop a large scale job processing system that works on billions of data files and achieves a high degree of throughput and fault tolerance. Over the course of the talk, we will explore the usage of Akka framework’s Supervisor functionality to provide a more controllable fault-tolerance strategy, and how we can use Futures to manage asynchronous jobs.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1M9hGVj.
Helena Edelson addresses new architectures emerging for large scale streaming analytics - based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) and other streaming analytics platforms and frameworks using Apache Flink or GearPump. Edelson discusses the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. Filmed at qconsf.com.
Helena Edelson is a committer to the Spark Cassandra Connector and a contributor to Akka, adding new features in Akka Cluster such as the initial version of the cluster metrics API and AdaptiveLoadBalancingRouter.
This webinar by Orkhan Gasimov (Senior Solution Architect, Consultant, GlobalLogic) was delivered at Java Community Webinar #3 on October 16, 2020.
During webinar we had simplified overview of classical and modern architecture patterns and concepts that are used for development of distributed applications during the last decade.
More details and presentation: https://www.globallogic.com/ua/about/events/java-community-webinar-3/
The adoption of container native and cloud native development practices presents new operational challenges. Today’s microservice environments are polyglot, distributed, container-based, highly-scalable, and ephemeral. To understand your system, you need to be able to follow the life of a request across numerous components distributed in multiple environments. Without the proper tools it can feel impossible to determine a root cause of an issue. This requires a new approach to operations. We will review a series of open source observability tools for logging, monitoring, and tracing to help developers achieve operational excellence for running container-based workloads.
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx
From our Feb 25, 2016 webcast on operating Elasticsearch at scale, the metrics to monitor, and how to create low-noise meaningful alerts on Elasticsearch performance.
Product Information - Fuse Management Central 1.0.0antonio.carvalho
Fuse Management Central is an administration platform for OpenText Content Suite/Extended ECM, enabling a centralized management of system while monitoring its components.
Due to its architecture, it separate system administration from business administration, introducing a new layer of security on OpenText Content Suite administration.
Performing Oracle Health Checks Using APEXDatavail
With the heavy workload that most, if not all, DBAs face, it’s no wonder there is little time left to perform routine health checks. This presentation deck reviews the real value of health checks, based on the thousands of them performed for clients and how APEX can be used to standardize health checks.
Massively scalable ETL in real world applications: the hard wayJ On The Beach
Big Data examples always give the correct answers. However, in the real world, Big Data might be corrupt, contradictory or consist of so many small files it becomes extremely hard to keep track - let alone scale. A solid architecture will help to overcome many of the difficulties.
Floris will talk about a real-world implementation of a massively scalable ETL architecture. Two years ago, at the time of the implementation, Airflow just became part of Apache and still left many features to be desired for. However, requirements from the start were thousands of ETL tasks per day on average, but on occasion, this could become hundreds of thousands. The script-based method that was in place was already not capable to meet the requirements on a day to day basis and needed to be replaced as soon as possible. So this custom framework was rolled out in just 8 weeks of development time.
Traditional Big Data is done on Data you have. You load the data into a repository and perform map reduce or other style calculations on the data. However, certain industries need to perform complex operations on data you might not have. Data you can acquire, Data that can be shared with you, and Data that you can model are all types of data you may not have but may need to integrate instantly into a complex data analysis. Problem is: you may not even know you need this data until deep into the execution stack at runtime. This talk discusses a new functional language paradigm for dealing naturally with data you don’t have and about how to make all data first-class citizens, regardless of whether you have it or you don’t, and we will give a demo of a project written in Scala to deal exactly with this issue.
More Related Content
Similar to Toward Predictability and Stability At The Edge Of Chaos
This talk will address new architectures emerging for large scale streaming analytics. Some based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) and other newer streaming analytics platforms and frameworks using Apache Flink or GearPump. Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (e.g. ETL).
I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...Helena Edelson
Building Self Healing, Intelligent Platforms, systems that learn, multi-datacenter, removing human intervention with ML. Reactive Summit 2016 @helenaedelson
It's harder than ever to predict the load your application will need to handle in advance, so how do you design your architecture so you can afford to implement as you go and be ready for whatever comes your way. It's easy to focus on optimizing each part of your application but your application architecture determines the options you have to make big leaps in scalability. In this talk we'll cover practical patterns you can build today to meet the needs of rapid development while still creating systems that can scale up and out. Specific code examples will focus on .NET but the principles apply across many technologies. Real world systems will be discussed based on our experience helping customers around the world optimize their enterprise applications.
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
Building large scale, job processing systems with Scala Akka Actor frameworkVignesh Sukumar
The Akka Actor framework is designed to be a fast message processing system. In this talk, we will explain how, at Box, we have used this framework to develop a large scale job processing system that works on billions of data files and achieves a high degree of throughput and fault tolerance. Over the course of the talk, we will explore the usage of Akka framework’s Supervisor functionality to provide a more controllable fault-tolerance strategy, and how we can use Futures to manage asynchronous jobs.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1M9hGVj.
Helena Edelson addresses new architectures emerging for large scale streaming analytics - based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) and other streaming analytics platforms and frameworks using Apache Flink or GearPump. Edelson discusses the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. Filmed at qconsf.com.
Helena Edelson is a committer to the Spark Cassandra Connector and a contributor to Akka, adding new features in Akka Cluster such as the initial version of the cluster metrics API and AdaptiveLoadBalancingRouter.
This webinar by Orkhan Gasimov (Senior Solution Architect, Consultant, GlobalLogic) was delivered at Java Community Webinar #3 on October 16, 2020.
During webinar we had simplified overview of classical and modern architecture patterns and concepts that are used for development of distributed applications during the last decade.
More details and presentation: https://www.globallogic.com/ua/about/events/java-community-webinar-3/
The adoption of container native and cloud native development practices presents new operational challenges. Today’s microservice environments are polyglot, distributed, container-based, highly-scalable, and ephemeral. To understand your system, you need to be able to follow the life of a request across numerous components distributed in multiple environments. Without the proper tools it can feel impossible to determine a root cause of an issue. This requires a new approach to operations. We will review a series of open source observability tools for logging, monitoring, and tracing to help developers achieve operational excellence for running container-based workloads.
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx
From our Feb 25, 2016 webcast on operating Elasticsearch at scale, the metrics to monitor, and how to create low-noise meaningful alerts on Elasticsearch performance.
Product Information - Fuse Management Central 1.0.0antonio.carvalho
Fuse Management Central is an administration platform for OpenText Content Suite/Extended ECM, enabling a centralized management of system while monitoring its components.
Due to its architecture, it separate system administration from business administration, introducing a new layer of security on OpenText Content Suite administration.
Performing Oracle Health Checks Using APEXDatavail
With the heavy workload that most, if not all, DBAs face, it’s no wonder there is little time left to perform routine health checks. This presentation deck reviews the real value of health checks, based on the thousands of them performed for clients and how APEX can be used to standardize health checks.
Massively scalable ETL in real world applications: the hard wayJ On The Beach
Big Data examples always give the correct answers. However, in the real world, Big Data might be corrupt, contradictory or consist of so many small files it becomes extremely hard to keep track - let alone scale. A solid architecture will help to overcome many of the difficulties.
Floris will talk about a real-world implementation of a massively scalable ETL architecture. Two years ago, at the time of the implementation, Airflow just became part of Apache and still left many features to be desired for. However, requirements from the start were thousands of ETL tasks per day on average, but on occasion, this could become hundreds of thousands. The script-based method that was in place was already not capable to meet the requirements on a day to day basis and needed to be replaced as soon as possible. So this custom framework was rolled out in just 8 weeks of development time.
Traditional Big Data is done on Data you have. You load the data into a repository and perform map reduce or other style calculations on the data. However, certain industries need to perform complex operations on data you might not have. Data you can acquire, Data that can be shared with you, and Data that you can model are all types of data you may not have but may need to integrate instantly into a complex data analysis. Problem is: you may not even know you need this data until deep into the execution stack at runtime. This talk discusses a new functional language paradigm for dealing naturally with data you don’t have and about how to make all data first-class citizens, regardless of whether you have it or you don’t, and we will give a demo of a project written in Scala to deal exactly with this issue.
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...J On The Beach
Industry 4.0, aka the "Fourth Industrial Revolution," refers to the computerization of manufacturing. One important aspect of Industry 4.0 is the ability to monitor the health and reliability of a physical manufacturing plant using low-cost IoT sensors. For example, machine learning models can be trained to predict the physical degradation of a manufacturing system as a function of acoustic measurements obtained from strategically placed microphones; however, the same acoustic measurements can be used to reverse engineer proprietary information about the manufacturing process and/or precisely what is being manufactured at the time of recording. Thus, improved reliability and fault tolerance is achieved at the cost of what appears to be an unprecedented new class of security vulnerabilities related to the acoustic side channel.
As a case study, we report a novel acoustic side channel attack against a commercial DNA synthesizer, a commonly used instrument in fields such as synthetic biology. Using a smart phone-quality microphone placed on or in the near vicinity of a DNA synthesizer, we were able to determine with 88.07% accuracy the sequence of DNA being produced; using a database of biologically relevant known-sequences, we increased the accuracy of our model to 100%. An academic or industrial research project may use the synthetic DNA to engineer an organism with desired traits or functions; however, while the organism is still under development, prior to publication, patent, and/or copyright, the research remains vulnerable to academic intellectual property theft and/or industrial espionage. On the other hand, this attack could also be used for benevolent purposes, for example, to determine whether a suspected criminal or terrorist is engineering a harmful pathogen. Thus, it is essential to recognize both the benefits and risks inherent to the cyber-physical systems that will inevitably control Industry 4.0 manufacturing processes and to take steps to mitigate them whenever possible.
Where is the edge in IoT and how much can you do there? Data collection? Analytics? I’ll show you how to build and deploy an embedded IoT edge platform that can do data collection, analytics, dashboarding and much more. All using Open Source.
As IoT deployments move forward, the need to collect, analyze, and respond to data further out on the edge becomes a critical factor in the success – or failure – of any IoT project. Network bandwidth costs may be dropping, and storage is cheaper than ever, but at IoT scale, these costs can still quickly overrun a project’s budget and ultimately doom it to failure.
The more you centralize your data collection and storage, the higher these costs become. Edge data collection and analysis can dramatically lower these costs, plus decrease the time to react to critical sensor data. With most data platforms, it simply isn’t practical, or even possible, to push collection AND analytics to the edge. In this talk I’ll show how I’ve done exactly this with a combination of open source hardware – Pine64 – and open source software – InfluxDB – to build a practical, efficient and scalable data collection and analysis gateway device for IoT deployments. The edge is where the data is, so the edge is where the data collection and analytics needs to be.
Drinking from the firehose, with virtual streams and virtual actorsJ On The Beach
Event Stream Processing is a popular paradigm for building robust and performant systems in many different domains, from IoT to fraud detection to high-frequency trading. Because of the wide range of scenarios and requirements, it is difficult to conceptualize a unified programming model that would be equally applicable to all of them. Another tough challenge is how to build streaming systems with cardinalities of topics ranging from hundreds to billions while delivering good performance and scalability.
In this session, Sergey Bykov will talk about the journey of building Orleans Streams that originated in gaming and monitoring scenarios, and quickly expanded beyond them. He will cover the programming model of virtual streams that emerged as a natural extension of the virtual actor model of Orleans, the architecture of the underlying runtime system, the compromises and hard choices made in the process. Sergey will share the lessons learned from the experience of running the system in production, and future ideas and opportunities that remain to be explored.
Over the last twenty years, there has been a paradigm shift in software development: from meticulously planned release cycles to an experimental way of working in which lead times are becoming shorter and shorter.
How can Java ever keep up with this trend when we have Docker containers that are several hundred megabytes in size, with warm-up times of ten minutes or longer? In this talk, I'll demonstrate how we can use Quarkus so that we can create super small, super fast Java containers! This will give us better possibilities for scaling up and down - which can be a game-changer, especially in a serverless environment. It will also provide the shortest possible lead times, as well as a much better use of cloud performance with the added bonus of lower costs.
When Cloud Native meets the Financial SectorJ On The Beach
We live in our own bubble of microservices and endlessly horizontal scaling infrastructure, but there is still critical infrastructure that runs the world of financial systems depending on Windows boxes, FTP servers, and single-threaded protocols. This talk is about how to glue these two worlds together, what works for us and what doesn't.
The advancement of technology in the last decade or so has allowed astronomy to see exponential growth in data volumes. ESA's space telescope Euclid will gather high-resolution images of a third of the sky, ~850GB of data downloaded daily for 6 years, by 2032 ground-based telescope LSST will have generated 500PB of data and the radio telescope SKA will be producing more data per second than the entire internet worldwide. This talk will address the questions of what current techniques exist to address big data volumes, how the astronomical community will prepare for this big data wave, and what other challenges lie ahead?
The world is moving from a model where data sits at rest, waiting for people to make requests of it, to where data is constantly moving and streams of data flow to and from devices with or without human interaction. Decisions need to be made based on these streams of data in real-time, models need to be updated, and intelligence needs to be gathered. In this context, our old-fashioned approach of CRUD REST APIs serving CRUD database calls just doesn't cut it. It's time we moved to a stream-centric view of the world.
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
Our increasingly connected world leveraging the Internet of Things (IoT) creates great value, in connected healthcare, smart cities, and more. The increasing use of IoT also creates great risk. We will discuss the challenges and risks we need to address as developers in TIPPSS - Trust, Identity, Privacy, Protection, Safety, and Security - for devices, systems and solutions we deliver and use. Florence leads IEEE workstreams on clinical IoT and data interoperability with blockchain addressing TIPPSS issues. She is an author of IEEE articles on "Enabling Trust and Security - TIPPSS for IoT" and "Wearables and Medical Interoperability - the Evolving Frontier", "TIPPSS for Smart Cities" in the 2017 book "Creating, Analysing and Sustaining Smarter Cities: A Systems Perspective" , and Editor in Chief for an upcoming book on "Women Securing the Future with TIPPSS for IoT."
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
Want to run your AI algorithms directly in the browser on the client-side? Now you can with WebAssembly and Blazor. Join us as we write code directly in WebAssembly. Then, we’ll look at Blazor and how you can use it, along with WebAssembly to run your tooling client side in the browser.
Want to run your AI algorithms directly in the browser on the client-side without the need for transpilers or browser plug-ins? Well, now you can with WebAssembly and Blazor. WebAssembly (WASM) is the W3C specification that will be used to provide the next generation of development tools for the web and beyond. Blazor is Microsoft’s experiment that allows ASP.Net developers to create web pages that do much of the scripting work in C# using WASM. Come join us as we learn to write code directly in WebAssembly’s human-readable format. Then, we’ll look at the current state of Blazor and how you can use it, along with WebAssembly to run your tooling client side in the browser.
RAFT protocol is a well-known protocol for consensus in Distributed Systems. Want to learn how consensus is achieved in a system with a large amount of data such as Axon Server’s Event Store? Join this talk to hear about all specifics regarding data replication in highly available Event Store!
Axon is a free and open source Java framework for writing Java applications following DDD, event sourcing, and CQRS principles. While especially useful in a microservices context, Axon provides great value in building structured monoliths that can be broken down into microservices when needed.
Axon Server is a messaging platform specifically built to support distributed Axon applications. One of its key benefits is storing events published by Axon applications. In not so rare cases, the number of these events is over millions, even billions. Availability of Axon Server plays a significant role in the product portfolio. To keep event replication reliable we chose RAFT protocol for consensus implementation of our clustering features.
In short, consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final. Typical consensus algorithms make progress when any majority of their servers is available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).
Join this talk to learn why we chose RAFT; what were our findings during the design, the implementation, and testing phase; and what does it mean to replicate an event store holding billions of events!
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
Thinking of moving to Microservices? Watch out! That quest is full of traps, social traps. If you are not able to handle it, you may be blocked by meetings, frustration, endless challenges that will make you miss the monolith. In this talk, I share my experience and mistakes, so you can avoid them.
Creating or migrating to a Microservices architecture might easily become a big mess, not only due to technical challenges but mostly because of human factors: it’s a major change in the software culture of a company. In this talk, I’ll share my past experience as the technical lead of an ambitious Microservices-based product, I’ll go through the parts we struggled with, and give you some advice on how to deal with what I call the Six Pitfalls:
The Common Patterns Phobia
The Book Club Cult
The Never-Decoupled Story
The Buzz Words Syndrome
The Agile Trap
The Conway’s Law Hackers
Instead of randomly injecting faults ( i.e. Chaos Monkey), what if we could order our experiments to perform min number of experiments for maximum yield? We present a solution(& results) to the problem of experiment selection using Lineage Driven Fault Injection to reduce the search space of faults.
Lineage Driven Fault Injection (LDFI) is a state of the art technique in chaos engineering experiment selection. LDFI since its inception has used an SAT solver under the hood which presents solutions to the decision problem (which faults to inject) in no particular order. As SRE’s we would like to perform experiments that reveal the bugs that the customers are most likely to hit first. In this talk, we present new improvements to LDFI that orders the experiment suggestions.
In the first the half of the talk we will show LDFI is a technique that can be widely used within an enterprise. We present the motivation for ordering the chaos experiments along with some prioritization we utilized while conducting the experiments. We also highlight how ordering is a general purpose technique that we can use to encode the peculiarities of a heterogeneous microservices architecture. LDFI can work in an enterprise by harnessing the observability infrastructure to model the redundancy of the system.
Next, we present experiments conducted within our organization using ordered LDFI and some preliminary results. We show examples of services where we discovered bugs, and how carefully controlling the order of experiments allowed LDFI to avoid running unnecessary experiments. We also present an example of an application where we declared the service shippable under crash stop model. We also present a comparison with Chaos Monkey and show how LDFI found the known bugs in a given application using orders of magnitude fewer experiments than a random fault injection tool like Chaos Monkey.
Finally, we discuss how we plan to take LDFI forward. We discuss open problems and possible solutions for scalarizing probabilities of failure, latency injection, integration with service mesh technologies like envoy for fine-grained fault injection, fault injection for stateful systems.
Key takeaways: 1) Understand how LDFI can be integrated in the enterprise by harnessing the observability infrastructure. 2) Limitations of LDFI w.r.t unordered solutions and why ordering matters for chaos engineering experiments. 3) Preliminary results of prioritized LDFI and a future direction for the community.
Complexity in systems should be defeated if it is possible to do. But the default nature of our computer systems are complex and servers are doomed to fail. In this talk, we will go through new approaches in modern architectures to design and evaluate new computer systems.
Interaction Protocols: It's all about good mannersJ On The Beach
Distributed systems collaborate to achieve collective goals via a system of rules. Rules that affords good hygiene, fault tolerance, effective communication and trusted feedback. These rules form protocols which enable the system to achieve its goals.
Distributed and concurrent systems can be considered a social group that collaborates to achieve collective goals. In order to collaborate a system of rules must be applied, that affords good hygiene, fault tolerance, and effective communication to coordinate, share knowledge, and provide feedback in a polite trusted manner. These rules form a number of protocols which enable the group to act as a system which is greater than the sum of the individual components.
In this talk, we will explore the history of protocols and their application when building distributed systems.
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
Do you want to check the efficiency of the new, state of the art, GraalVM JIT Compiler in comparison to the old but mostly used JIT C2? Let’s have a side by side comparison from a performance standpoint on the same source code.
The talk reveals how traditional Just In Time Compiler (e.g. JIT C2) from HotSpot/OpenJDK internally manages runtime optimizations for hot methods in comparison to the new, state of the art, GraalVM JIT Compiler on the same source code, emphasizing all of the internals and strategies used by each Compiler to achieve better performance in most common situations (or code patterns). For each optimization, there is Java source code and corresponding generated assembly code in order to prove what really happens under the hood.
Each test is covered by a dedicated benchmark (JMH), timings and conclusions. Main topics of the agenda: - Scalar replacement - Null Checks - Virtual calls - Lock coarsening - Lock elision - Virtual calls - Scalar replacement - Lambdas - Vectorization (few cases)
The tools used during my research study are JITWatch, Java Measurement Harness, and perf. All test scenarios will be launched against the latest official Java release (e.g. version 11).
Leadership is easy when you're a manager, or an expert in a field, or a conference speaker! In a Kanban organisation, though, we "encourage acts of leadership at every level". In this talk, we look at what it means to be a leader in the uncertain, changing and high-learning environment of software development. We learn about the importance of safety in encouraging others to lead and follow, and how to get that safety using both technical and human practices; the necessity of a clear, compelling vision and provision of information on how we're achieving it; and the need to be able to ask awkward and difficult questions... especially the ones without easy answers.
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
During this presentation, we will answer how much you’ll need to invest in a superhero costume to be as popular as Superman. We will generate a unique logo which will stand against the ever popular Batman and create new superhero teams. We shall achieve it using linear regression and neural networks.
Machine learning is one of the hottest buzzwords in technology today as well as one of the most innovative fields in computer science – yet people use libraries as black boxes without basic knowledge of the field. In this session, we will strip them to bare math, so next time you use a machine learning library, you’ll have a deeper understanding of what lies underneath.
During this session, we will first provide a short history of machine learning and an overview of two basic teaching techniques: supervised and unsupervised learning.
We will start by defining what machine learning is and equip you with an intuition of how it works. We will then explain the gradient descent algorithm with the use of simple linear regression to give you an even deeper understanding of this learning method. Then we will project it to supervised neural networks training.
Within unsupervised learning, you will become familiar with Hebb’s learning and learning with concurrency (winner takes all and winner takes most algorithms). We will use Octave for examples in this session; however, you can use your favourite technology to implement presented ideas.
Our aim is to show the mathematical basics of neural networks for those who want to start using machine learning in their day-to-day work or use it already but find it difficult to understand the underlying processes. After viewing our presentation, you should find it easier to select parameters for your networks and feel more confident in your selection of network type, as well as be encouraged to dive into more complex and powerful deep learning methods.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
2. @helenaedelson
Helena Edelson
● Principal Engineer @ Lightbend
● Member of the Akka team
● Former: Apple, Crowdstrike, VMware,
SpringSource, Tuplejump
● github.com/helena
● twitter.com/helenaedelson
● speakerdeck.com/helenaedelson
Data, Analytics & ML Platform Infrastructure and Cloud Engineer
Former biologist
4. @helenaedelson
When systems reach a critical level of dynamism we have to change our way of
modeling and designing them
• Stateful in a stateless world
• Automation of everything - Ops, *aaS platforms
• Persistence strategies across DCs, zones and regions
• Data and query optimization
• System availability and stability in all states of deployment and rolling restarts
• Leveraging AI / ML to
Rethinking Strategies
5. @helenaedelson
Computational model embracing non-determinism
- Actor Model of Computation, Carl Hewitt
• Mathematical theory treating "Actors" as primitives of concurrent computation
• Framework for a theoretical understanding of concurrency
• Asynchronous communication
• Stateful isolated processes
• Non-observable state within
• Decoupling in space and time
The Network and Autonomous Processes
6. @helenaedelson
Principles that Akka stands on can be traced back to the ’70s and ’80s
• Carl Hewitt invented the Actor Model, early 70s
• Jim Gray and Pat Helland on the Tandem System, 80s
• Joe Armstrong, Robert Virding and Mike Williams on Erlang, 1986
Look Back Before Looking Forward
7. @helenaedelson
• From the ’40s and still being heavily developed today across many fields of
research and application in industry.
• 1940s: Cellular automata (CA), originally discovered by Stanislaw Ulam and John
von Neumann, Los Alamos National Laboratory
• 1970s: Conway's Game of Life
• Asynchronous Cellular Automaton
Complex Adaptive Systems, Systems Theory,
early AI
8. @helenaedelson
Can solve problems difficult or impossible for an individual agent or a monolithic
system to solve
• The foundations for artificial neural networks and NLP
• Composed of multiple autonomous agents, interacting to achieve common goals
• Decentralized, no control point of decisions making
• More fault tolerant, no single point of failure
• Reach higher degrees of dependability
Multi-Agent Systems (MAS)
9. @helenaedelson@helenaedelson
Complex Adaptive Systems (CAS)
Self-Organization
Theory
Emergence
Synchronization
Amplification
Distributed
Networks
cellular
automata
Feedback
Loops
Systems
Evolution
Swarming
localAsynchronous
Unpredictable
Non-Linear
Adaptive
Versatile
13. @helenaedelson
• Stateful - in-memory yet durable and resilient state
• Long-lived - lifecycle is not bound to a specific session, context available until
explicitly destroyed
• Virtual - location transparent and not bound to a physical location
• Addressable - referenced through a stable address
Akka Actors Also Happen To Be
20. @helenaedelson
• Complex Event Processing (CEP) - developed 1989-1995 to analyze event-driven simulations of
distributed systems, abstracting causal event histories, patterns, filtering and aggregation in large,
distributed, time-sensitive systems
• Stream Processing - mid-1990s research in real-time event data analysis, internet companies
processing large number of events
• Event Sourcing (ES) - from domain-driven design and enterprise development, processing very
complex data models with often smaller datasets than internet companies
• Command Query Responsibility Segregation (CQRS) - isn't about events, but often combined with ES
• Also - CDC
Structuring data as a stream of events
21. @helenaedelson
• How data from system behavior is structured
• Capture all changes as a sequence of events in time
• Store events as an immutable event log / append-only storage
• Preserves the happened-before causality of events
• Replay event log to reconstruct state within a given time window or all
Event Sourcing
22. @helenaedelson
Requirements - forensics
• Auditable - what is the current state and how it arrived there
• Causality - observe and analyze a system's causal structure
Applications For ES In Distributed
Asynchronous Systems
For example
• Cybersecurity and Vulnerability Detection
• Banking - what is the account balance and how did it arrive at that
• Click stream
• Accounting & Ledgers
• Shopping Cart
• Anything with a sequence of events that lead to X which must be preserved
23. @helenaedelson
A pattern decoupling the write path (commands) from the read path (queries)
• Different access patterns and differing ratios of reads to writes is typical
• Different schemas / data structures
• Typically different teams around orgs owning the write and using/owning the read
• No reason to share structure and bad practice (no monolith, loose coupling, etc.)
• Command - Writers / Publishers publish without having awareness who needs to
receive it or how to reach them (location, protocol...)
• Query - Readers / Subscribers should be able to subscribe and asynchronously receive
from topics of interest
Command Query Responsibility
Segregation (CQRS)
24. @helenaedelson
My old diagram from 3 years ago: Kafka Summit:
Real Time Bidding (RTB)
The write path and model is naturally separate and differs from the read:
25. @helenaedelson
• Ingest large amounts of data, from multiple
sources, sometimes bursty so it can't overload
the system
• Write the raw data to a store so that
• when algorithms change I can run the data
stream over for new meaning
• when nodes or applications fail I can replay
data from a checkpoint to recover
• Route the event streams to my ML/Analytics
streams
It Doesn't Matter What We Call It
or Whether It's Microservices Or A
Streaming Data Pipeline
• Process and aggregate inbound data and store
aggregates for querying historical against the
stream
• Not loose data
• Be secure, probably encrypt/decrypt everything
• Not pay massive cloud and data storage fees
• Be sure my team can handle infrastructure
TOC
28. @helenaedelson
Akka Persistence Stateful Actors
• Enables stateful actors to persist their state for recovery and replay from failure
and error
• Events persisted to storage, nothing is mutated (no read-modify-write)
• Allows higher transaction rates and efficient replication
• Only events received by the actor are persisted
• Snapshotting for checkpoint replay
• At least once message delivery semantics
Event Stream As Replication Fabric
29. @helenaedelson
Connect different event logs with Event-sourced processors for event processing
pipelines or graphs
• Cassandra, Redis, DynamoDB, Couchbase, MongoDB, Hazelcast, JDBC and
more
• Built-in: in-memory heap based journal, local file-system based snapshot-store
and LevelDB based journal
Storage Plugins
30. @helenaedelson
• Your algorithms have changed, you need to replay historic data against the new
logic
• Rolling upgrade, restart, cluster migration
• Error, e.g. after a JVM crash
• Failure, e.g. cluster nodes or a DC went down, a network outage or partition
• Cloud compute layer planned maintenance restarts
• Application throws exception, if a persistent Actor is configured to restart by a
supervisor
Replay Reasons
31. @helenaedelson
Akka out of the box gives us tooling for each of these steps:
• Failure awareness and lifecycle
• Save state of failed node before failure
• Load state that was in flight at time of failure (define time slice)
• Replay from a checkpoint in a snapshot or run the full history
• Resume operations
Failure And Recovery
33. @helenaedelson
● Decentralized peer-to-peer
● Cluster Formation and membership service
● Communication and Consensus
● Leader and Roles
● Cluster Lifecycle and Events
● Failure Detector
● Self-Healing
● CoordinatedShutdown
Akka Cluster: Quick Premise
34. @helenaedelson
Cluster User API
• What roles am I in, what is my address
• Join, Leave, Down
• Programatic membership control
• Register listeners to cluster events
• Startup when configurable cluster size
reached
• Highly tunable behavior
40. @helenaedelson
• ClusterDomainEvent: base type
• MemberUp: member status changed to Up
• UnreachableMember: member considered unreachable by failure detector
• MemberRemoved: member completely removed from the cluster
• MemberEvent: member status change Up, Removed
• Leader events
• Reachability events
Cluster Events
41. @helenaedelson
• CurrentClusterState: current snapshot state of the cluster, sent to new
subscribers, unless InitialStateAsEvents specified
• InitialStateAsEvents to receive messages which replay events to restore the
current snapshot of the cluster state
Cluster State
44. @helenaedelson
(leader)
• Masterless
• No Leader Election
• Role of the leader: only one
who can change status
• joining to up
• exiting to removed
Leader decisions are local to
DC
Cluster Leader
47. @helenaedelson
Cluster Membership State
A CRDT which can be deterministically merged
Joining
Up
Leaving
Exiting
removedDown
User Action
Join
Leader
Action
User Action
Leave Leader
Action
Leader
Action
User Action
Down
54. @helenaedelson
Cluster Singleton
Single point of cluster-wide decisions or coordination
ClusterSingletonManager
ClusterSingletonManager
(oldest)
SingletonActor
ClusterSingletonManager
58. @helenaedelson
Cluster Singleton: On Failure
(oldest)
Failover
Message
ClusterSingletonManager
SingletonActorDowned or Network Partition
ClusterSingletonProxy
ClusterSingletonManager
59. @helenaedelson
Strong Consistency Always Available
Guarantees one instance of a particular
actor type per cluster
Cluster Singleton
doc.akka.io/docs/akka/current/scala/cluster-singleton
61. @helenaedelson
An approach to eventual distributed consistency
• Replicate data across the network
• Concurrent updates from different nodes without coordination
• Mathematical properties guarantee eventual consistency
• Updates execute immediately, unaffected by network faults
• Consistency without consensus
• Highly scalable and fault tolerant
Conflict-Free Replicated Data Types (CRDT)
A comprehensive study of Convergent and Commutative Replicated Data Types
62. @helenaedelson
A replicated counter, which converges because the increment / decrement operations
commute
• Service Discovery
• Shopping Cart
• Priority on low latency and full availability
• Computation in delay-tolerant networks
• Data aggregation
• Partition-tolerant cloud computing
• Collaborative text editing
Application Of CRDTs
A few implementations:
• Riak Data Types
• SoundCloud Roshi
• Akka Distributed Data
63. @helenaedelson
1976: The maintenance of duplicate databases, Paul Johnson, Robert Thomas
1984: Efficient solutions to the replicated log and dictionary problems, Gene Wuu, Arthur Bernstein
1988: Scale and performance in a distributed file system, J. Howard, M. Kazar, S. Menees, D. Nichols, M.
Satyanarayanan, R. Sidebotham, M. West
1988: Commutativity-based concurrency control for abstract data types, W. Weihl
1989: Concurrency control in groupware systems, C. Ellis, S. Gibbs
1994: Resolving file conflicts in the Ficus file system, P. Reiher, J. Heidemann, D. Ratner, G. Skinner, and G. Popek
1994: Detecting causal relationships in distributed computations: In search of the holy grail, R. Schwarz, F. Mattern
1997: Specification of convergent abstract data types for autonomous mobile computing, C. Baquero, F. Moura
1999: Using structural characteristics for autonomous operation, Carlos Baquero, Francisco Moura
2009: A commutative replicated data type for cooperative editing, N. Preguiça, J. Marquès, M. Shapiro, M. Leţia
2011: A comprehensive study of Convergent and Commutative Replicated Data Types, M. Shapiro, N. Preguiça, C.
Baquero, M. Zawirski
Not New
64. @helenaedelson
• Low latency and high availability
• Data availability despite network partitions
• Nodes concurrently update as multi-master
• Async state replication across the cluster
• Granular control of consistency level for reads and writes
• Key-value store like API
Akka Distributed Data
doc.akka.io/docs/akka/current/scala/distributed-data
Replicated in-memory data store using CvRDT to share data between cluster nodes
65. @helenaedelson
Concurrent updates from different nodes resolve via the monotonic merge function,.
Counters GCounter grow-only, PNCounter (2 GCounters) increment decrement
Registers Flag toggle boolean, LWWRegister - Last Write Wins register
Sets GSet grow-only merge by union, ORSet observer-remove version vector
Maps ORMap, ORMultiMap, LWWMap, PNCounterMap
Graphs DAG
Composable For More Advanced Types
A comprehensive study of Convergent and Commutative Replicated Data Types
66. @helenaedelson
Delta State CRDTs (δ-CRDTs)
• A way to reduce the need for sending the full state for updates
• Sending only what changed
• Merging done on the receiving side
• Eventually consistent by default, and supports opt-in causal
consistency
Delta State Replicated Data Types
GCounter
GSet
PNCounter
PNCounterMap
LWWMap
ORMap
ORMultiMap
ORSet
LWWRegister
75. @helenaedelson
• By default the data is only kept in memory and replicated to other nodes
• If all nodes are stopped the data is lost
• You can configure it to store on the local disk on each node (LMDB)
• Or implement your own to another store via the trait
• It will be loaded the next time the replicator is started
Configurable Durable Storage
76. @helenaedelson
Strong Consistency Always Available
doc.akka.io/docs/akka/current/distributed-data
Distributed Data
Eventually consistent - always accepts writes
77. @helenaedelson
• Needing high consistency over availability and low latency
• Big Data - not currently intended for billions of entries
• When a new node is added to the cluster all entries are propagated to it,
hence top level entries should not exceed 100000
• Data is held in memory
• If not using a delta-CRDT, when a data entry is changed the full state of that
entry may be replicated to other nodes.
Not Designed For
78. @helenaedelson
Cluster Sharding
Scale, Resilience & Consistency
• Automatically distribute entities of the same type over several nodes
• Balance resources (memory, disk space, network traffic) across
multiple nodes for scalability
• Location transparency: Interact by logical ID
• Increased fault tolerance - relocation on failure
Life beyond Distributed Transactions
Node 1
SR1
S1 S2 S3
79. @helenaedelson
Each Entity Is A Consistency Boundary
Sender on Node 1
Local ShardRegion
Shards: groups of entities
Node 1
SR 1
S1 S2 S3
Your Code, Supervised By Shards
Message(gid)
80. @helenaedelson
• Creates entity actors on demand
• Supervises group of entities - defined by the shard ID extraction
N-Shards Per Cluster Node
Entity B-1
SR2
SC
SR1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
SR3
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3
81. @helenaedelson
• Creates and supervises its shards
• Knows how to route messages by routing key
ShardRegion Per Cluster Node
Envelope(“c-1”)
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3
Node 1
Node 2 Node 3
82. @helenaedelson
• Stores Shard to Region mappings with Akka Persistence
• Monitors all cluster node status
• If the SC goes down it starts up on another node and
replays the state
Shard Coordination
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
(Cluster Singleton)
ShardRegion 1
ShardRegion 2
ShardRegion 3
83. @helenaedelson
Start Cluster Sharding On Node
Sending data
Your Entity ID
Extraction function
Your Shard ID
Extraction function
Your custom shard
allocation strategy
Your Envelope type
Or use built-in
HashExtractor
84. @helenaedelson
Cluster Sharding: Failover
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
ShardCoordinator
Downed
Location Transparency
Failover
Entity C-1
Shard C
ShardRegion 1
ShardRegion 2Envelope(“c-1”)
85. @helenaedelson
Strong Consistency Always Available
Each entity is a boundary of consistency
Guarantees one instance per entity type at a time per cluster
doc.akka.io/docs/akka/current/scala/cluster-sharding
Cluster Sharding
86. @helenaedelson
"Serverless is a new generation of platform-as-a-service offerings where
the infrastructure provider takes responsibility for receiving client
requests and responding to them, capacity planning, task scheduling,
and operational monitoring. Developers need to worry only about the
logic for processing client requests."
- Adzic et al
Serverless computing: economic and architectural impact
Serverless
87. @helenaedelson
• Automated infrastructure running in a container pool
• A classic data-shipping architecture - we move data to the code, not the other
way round
• Pay be execution time
• Autoscales with load
• Event driven
• Stateless
• Ephemeral (5-15 minutes)
FaaS
89. @helenaedelson
• Load and event spikes needing massive parallelism
• Scaling from 0 to 10000s requests and down to zero
• Simplifies delivery of scale and availability
• As integration layer between various (ephemeral and durable) data sources
• Processing stateless intensive workloads
• As data backbone moving data from A to B and transforming it
• Can work well for event-driven use cases
What Is FaaS Good At Currently?
91. @helenaedelson
• Functions handle only one event source
• Functions are stateless, ephemeral, and short-lived
• Computational context easily lost
• Limited options for managing and coordinating distributed state
• Limited options for the right consistency guarantees
• Limited options for durable state, that is scalable and available
• Expensive to load and store state from storage repeatedly
Limitations With Serverless
Distributed state is not well supported for complex distributed data workflows
92. @helenaedelson
• No direct communication which means applications must pub-sub all data over a
storage medium
• Too high latency for general purpose distributed computing problems
For a discussion on this, and other limitations with FaaS read the paper,
“Serverless Computing: One Step Forward, Two Steps Back”
by Joe Hellerstein, et al.
FaaS Does Not Have Addressability
97. @helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
Knative Events
User Function
(JavaScript, Go, Java,…)
KNative Serving of Stateful Functions
User Function
(JavaScript, Go, Java,…)
User Function
(JavaScript, Go, Java,…)
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)
gRPC
98. @helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
User Function
(JavaScript, Go, Java,…)
Powered by Akka Cluster Sidecars
User Function
(JavaScript, Go, Java,…)
User Function
(JavaScript, Go, Java,…)
Akka Sidecar
Akka Sidecar
Akka Sidecar
Akka Cluster
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)