Responsive: The system responds in a timely manner if at all possible. Responsiveness is the cornerstone of usability and utility, but more than that, responsiveness means that problems may be detected quickly and dealt with effectively. Responsive systems focus on providing rapid and consistent response times, establishing reliable upper bounds so they deliver a consistent quality of service. This consistent behaviour in turn simplifies error handling, builds end user confidence, and encourages further interaction. Resilient: The system stays responsive in the face of failure. This applies not only to highly-available, mission critical systems — any system that is not resilient will be unresponsive after a failure. Resilience is achieved byreplication, containment, isolation and delegation. Failures are contained within each component, isolating components from each other and thereby ensuring that parts of the system can fail and recover without compromising the system as a whole. Recovery of each component is delegated to another (external) component and high-availability is ensured by replication where necessary. The client of a component is not burdened with handling its failures. Elastic: The system stays responsive under varying workload. Reactive Systems can react to changes in the input rate by increasing or decreasing the resources allocated to service these inputs. This implies designs that have no contention points or central bottlenecks, resulting in the ability to shard or replicate components and distribute inputs among them. Reactive Systems support predictive, as well as Reactive, scaling algorithms by providing relevant live performance measures. They achieve elasticity in a cost-effective way on commodity hardware and software platforms. Message Driven: Reactive Systems rely on asynchronous message-passingto establish a boundary between components that ensures loose coupling, isolation and location transparency. This boundary also provides the means to delegate failures as messages. Employing explicit message-passing enables load management, elasticity, and flow control by shaping and monitoring the message queues in the system and applying back-pressurewhen necessary. Location transparent messaging as a means of communication makes it possible for the management of failure to work with the same constructs and semantics across a cluster or within a single host. Non-blocking communication allows recipients to only consumeresources while active, leading to less system overhead. Large systems are composed of smaller ones and therefore depend on the Reactive properties of their constituents. This means that Reactive Systems apply design principles so these properties apply at all levels of scale, making them composable. The largest systems in the world rely upon architectures based on these properties and serve the needs of billions of people daily. It is time to apply these design principles consciously from the start instead of rediscovering them each time.
Designing distributed systems
What this talk is about
The distributed Paradigm
Cloud computing considerations
What is a distributed computing
“Distributed computing is a field of computer science that studies distributed
systems. A distributed system is a software system in which components located
on networked computers communicate and coordinate their actions by passing
What is cloud computing?
Lets the platform do the hard stuff by leveraging the application services.
Uses non-blocking asynchronous communication in a loosely coupled architecture.
Scales horizontally in an elastic mechanism.
Does not waste resources
Handles scaling events, node failures, transient failures without downtime or
Uses geographic distribution to minimize network latency.
Upgrades without downtime.
Scales automatically using proactive and reactive actions.
Monitors and manages application logs as nodes come and go.
What are microservices?
In computing, microservices is a software architecture style in which
complex applications are composed of small,
independent processes communicating with each other using language-
agnostic APIs. These services are small building blocks, highlydecoupled and
focussed on doing a small task, facilitating a modular approach to system-
Brewers CAP thorem
Eric Brewer introduced the idea that there is a fundamental trade-off between
consistency, availability, and partition tolerance.
In a network subject to communication failures, it is impossible for any web
service to implement an atomic read/write shared memory that guarantees a
response to every request.
Gives the theoretical speedup in latency of the execution of a task at
fixed workload that can be expected of a system whose resources are improved.
Horizontal scaling compute Pattern
Horizontal scaling is reversible.
Supports scaling out and scaling in
They keep user session information
They have single point of failure
Store session information externally from the nodes.
Queue-Centric Workflow Pattern
Used in web applications to decouple communication between web-tier and service
tier by focusing on the flow of commands.
A service tier that is unreliable or slow can affect the web tier negatively.
All communication is asynchronous as message over a queue
The sender and receiver are loosely coupled. Neither one knows about the
implementation of the other.
There is some edge cases where the risk of invisibility windows occurs when processing
takes longer than allowed.
Idempotency concerns. Database transactions, compensating transaction.
Poison messages placed in dead letter queue.
QCW is not full CQRS as it does not articulate the read model.
Simultaneous requests for the same data may result in different values.
Leads to better performance and lower cost.
Uses Brewer’s CAP theorem (Consistency Availability and Partition tolerance). 3
Guarantees and application an pick only 2.
Consistency. Everyone get the same answer.
Availability. Clients have ongoing access (even if there is a partial system failure)
Partition tolerance. Means correct operation even if some nodes are cut of from the
DNS updates and NoSQL are examples of eventually consistent services.
Busy Signal Pattern
Applies to services or resources accessed over a network where a signal response
These may include management, data services and more, and periodic transient
failure should be expected. E.g. Busy signal on telephones.
A good application should be able to handle retries and properly handle failures.
On HTTP. Response 503 Service Unavailable.
Clearly identify Busy Signal and Errors and retry on Busy state after an interval. Log
them for further analysis of patterns.
Node Failure Pattern
Concerns availability and graceful handling of unexpected application/hardware failures,
reboots or node shutdown.
Application state should be in reliable storage, not on local disk or individual node
Network latency problem
Move application data closer to users
Ensure nodes within your application are closer together (Colocation)
WA uses Affinity Groups
Consider Data Compression, Background processing, Predictive Fetching, CDN
Traits of a reactive system
scalable (react to load, minimize contention) scale up/out
resilient (has to be bolted by design, things will fail, isolation of components and
preventing cascading failures)
event driven (asynchronous)
The reactive manifesto
The Actor Model
The actor model in computer science is a mathematical model of concurrent
computation that treats "actors" as the universal primitives of concurrent
In response to a message that it receives, an actor can: make local decisions,
create more actors, send more messages, and determine how to respond to the
next message received. Actors may modify private state, but can only affect each
other through messages (avoiding the need for any locks)
In different scenarios, an Actor may be an
an object instance or component
a callback or listener
a singleton or service
a router; load-balancer or pool
a Java EE Session Bean or Message-Driven Bean
an out-of-process service
a Finite State Machine (FSM)
- Simple add high level abstractions for concurrency and parallelism
- Asynchronous, non-blocking and highly event driven programming model
Supervisor hierarchies with let-it-crash semantics
Supervisor hierarchies can san multiple VMs
Excellent for writing highly fault tolerant systems that self heal and never stop
Everything akk is designed to work in distributed environment: All interactions of actors use
pure message passing and everything is asynchronous
Multi threads vs Multi Actors
1 MB per thread (4MB in 64 bit) vs 2.7 million actors per gigabyte
More on Actors
Let it crash philosophy
The failure strategies
One for one supervision
All for one. One node crashes the siblings will be restarted