Toward Predictability and Stability

@helenaedelson
Toward Predictability
and Stability
At the edge of chaos

@helenaedelson
Helena Edelson
● Principal Engineer @ Lightbend
● Member of the Akka team
● Former: Apple, Crowdstrike, VMware,
SpringSource, Tuplejump
● github.com/helena
● twitter.com/helenaedelson
● speakerdeck.com/helenaedelson
Data, Analytics & ML Platform Infrastructure and Cloud Engineer
Former biologist

@helenaedelson
Word Salad
Behind the buzzwords
©

@helenaedelson
When systems reach a critical level of dynamism we have to change our way of
modeling and designing them
• Stateful in a stateless world
• Automation of everything - Ops, *aaS platforms
• Persistence strategies across DCs, zones and regions
• Data and query optimization
• System availability and stability in all states of deployment and rolling restarts
• Leveraging AI / ML to
Rethinking Strategies

@helenaedelson
Computational model embracing non-determinism
- Actor Model of Computation, Carl Hewitt
• Mathematical theory treating "Actors" as primitives of concurrent computation
• Framework for a theoretical understanding of concurrency
• Asynchronous communication
• Stateful isolated processes
• Non-observable state within
• Decoupling in space and time
The Network and Autonomous Processes

@helenaedelson
Principles that Akka stands on can be traced back to the ’70s and ’80s
• Carl Hewitt invented the Actor Model, early 70s
• Jim Gray and Pat Helland on the Tandem System, 80s
• Joe Armstrong, Robert Virding and Mike Williams on Erlang, 1986
Look Back Before Looking Forward

@helenaedelson
• From the ’40s and still being heavily developed today across many fields of
research and application in industry.
• 1940s: Cellular automata (CA), originally discovered by Stanislaw Ulam and John
von Neumann, Los Alamos National Laboratory
• 1970s: Conway's Game of Life
• Asynchronous Cellular Automaton
Complex Adaptive Systems, Systems Theory,
early AI

@helenaedelson
Can solve problems difficult or impossible for an individual agent or a monolithic
system to solve
• The foundations for artificial neural networks and NLP
• Composed of multiple autonomous agents, interacting to achieve common goals
• Decentralized, no control point of decisions making
• More fault tolerant, no single point of failure
• Reach higher degrees of dependability
Multi-Agent Systems (MAS)

@helenaedelson@helenaedelson
Complex Adaptive Systems (CAS)
Self-Organization
Theory
Emergence
Synchronization
Ampliﬁcation
Distributed
Networks
cellular
automata
Feedback
Loops
Systems
Evolution
Swarming
localAsynchronous
Unpredictable
Non-Linear
Adaptive
Versatile

@helenaedelson
Akka
ActorSystem
Message
Message
Actor
Actor
Actor

@helenaedelson
Actor Task Delegation & Supervision
ActorSystem Hierarchy

@helenaedelson
Akka Cluster: Distributed & Multi-DC
JVM
JVM
ActorSystem
ActorSystem
Message
Message
Actor
Actor
Actor
Message

@helenaedelson
• Stateful - in-memory yet durable and resilient state
• Long-lived - lifecycle is not bound to a specific session, context available until
explicitly destroyed
• Virtual - location transparent and not bound to a physical location
• Addressable - referenced through a stable address
Akka Actors Also Happen To Be

@helenaedelson
Consistency vs Availability
Strong Consistency Always Available
Operational Complexity
Total Cost of Ownership (TCO)

@helenaedelson
Consistency vs Availability
Node 1 Node 2
Partition Tolerance
Conflicting goals to weigh against each other

@helenaedelson
Finding Balance
CAP, Operational Complexity and the Network

@helenaedelson
Everything We Do
Is About Data

@helenaedelson
Everything We Do
Delivering Meaning
Is Data

@helenaedelson
Stream Processing
Event Sourcing
CQRS
A few patterns and approaches to event processing

@helenaedelson
• Complex Event Processing (CEP) - developed 1989-1995 to analyze event-driven simulations of
distributed systems, abstracting causal event histories, patterns, filtering and aggregation in large,
distributed, time-sensitive systems
• Stream Processing - mid-1990s research in real-time event data analysis, internet companies
processing large number of events
• Event Sourcing (ES) - from domain-driven design and enterprise development, processing very
complex data models with often smaller datasets than internet companies
• Command Query Responsibility Segregation (CQRS) - isn't about events, but often combined with ES
• Also - CDC
Structuring data as a stream of events

@helenaedelson
• How data from system behavior is structured
• Capture all changes as a sequence of events in time
• Store events as an immutable event log / append-only storage
• Preserves the happened-before causality of events
• Replay event log to reconstruct state within a given time window or all
Event Sourcing

@helenaedelson
Requirements - forensics
• Auditable - what is the current state and how it arrived there
• Causality - observe and analyze a system's causal structure
Applications For ES In Distributed
Asynchronous Systems
For example
• Cybersecurity and Vulnerability Detection
• Banking - what is the account balance and how did it arrive at that
• Click stream
• Accounting & Ledgers
• Shopping Cart
• Anything with a sequence of events that lead to X which must be preserved

@helenaedelson
A pattern decoupling the write path (commands) from the read path (queries)
• Different access patterns and differing ratios of reads to writes is typical
• Different schemas / data structures
• Typically different teams around orgs owning the write and using/owning the read
• No reason to share structure and bad practice (no monolith, loose coupling, etc.)
• Command - Writers / Publishers publish without having awareness who needs to
receive it or how to reach them (location, protocol...)
• Query - Readers / Subscribers should be able to subscribe and asynchronously receive
from topics of interest
Command Query Responsibility
Segregation (CQRS)

@helenaedelson
My old diagram from 3 years ago: Kafka Summit:
Real Time Bidding (RTB)
The write path and model is naturally separate and differs from the read:

@helenaedelson
• Ingest large amounts of data, from multiple
sources, sometimes bursty so it can't overload
the system
• Write the raw data to a store so that
• when algorithms change I can run the data
stream over for new meaning
• when nodes or applications fail I can replay
data from a checkpoint to recover
• Route the event streams to my ML/Analytics
streams
It Doesn't Matter What We Call It
or Whether It's Microservices Or A
Streaming Data Pipeline
• Process and aggregate inbound data and store
aggregates for querying historical against the
stream
• Not loose data
• Be secure, probably encrypt/decrypt everything
• Not pay massive cloud and data storage fees
• Be sure my team can handle infrastructure
TOC

@helenaedelson
Buzzwords Are For
Analysts

@helenaedelson
Boundaries between
Microservices and Stream
Processing are gone

@helenaedelson
Akka Persistence Stateful Actors
• Enables stateful actors to persist their state for recovery and replay from failure
and error
• Events persisted to storage, nothing is mutated (no read-modify-write)
• Allows higher transaction rates and efficient replication
• Only events received by the actor are persisted
• Snapshotting for checkpoint replay
• At least once message delivery semantics
Event Stream As Replication Fabric

@helenaedelson
Connect different event logs with Event-sourced processors for event processing
pipelines or graphs
• Cassandra, Redis, DynamoDB, Couchbase, MongoDB, Hazelcast, JDBC and
more
• Built-in: in-memory heap based journal, local file-system based snapshot-store
and LevelDB based journal
Storage Plugins

@helenaedelson
• Your algorithms have changed, you need to replay historic data against the new
logic
• Rolling upgrade, restart, cluster migration
• Error, e.g. after a JVM crash
• Failure, e.g. cluster nodes or a DC went down, a network outage or partition
• Cloud compute layer planned maintenance restarts
• Application throws exception, if a persistent Actor is configured to restart by a
supervisor
Replay Reasons

@helenaedelson
Akka out of the box gives us tooling for each of these steps:
• Failure awareness and lifecycle
• Save state of failed node before failure
• Load state that was in flight at time of failure (define time slice)
• Replay from a checkpoint in a snapshot or run the full history
• Resume operations
Failure And Recovery

@helenaedelson
Stateful Clusters
• Cluster Singleton
• Distributed Data
• Cluster Sharding
• Split Brain Resolver
• Distributed Lock & Kubernetes
• Multi-DC
• Cluster Bootstrapping & Service Discovery
• Cluster Management APIs

@helenaedelson
● Decentralized peer-to-peer
● Cluster Formation and membership service
● Communication and Consensus
● Leader and Roles
● Cluster Lifecycle and Events
● Failure Detector
● Self-Healing
● CoordinatedShutdown
Akka Cluster: Quick Premise

@helenaedelson
Cluster User API
• What roles am I in, what is my address
• Join, Leave, Down
• Programatic membership control
• Register listeners to cluster events
• Startup when configurable cluster size
reached
• Highly tunable behavior

@helenaedelson
Cluster Communication
S
S
S
S
S
(leader)

@helenaedelson
Heartbeats & Failure Detection
A is unreachable!
S
S
S
S
S
🤢
A
(leader)

@helenaedelson
Failure Detector

@helenaedelson
S
S
S
S
A
A is unreachable
😵
(leader)
Failure Detector

@helenaedelson
A is reachable again
S
S
S
S
S
🤕
A
(leader)
Failure Detector

@helenaedelson
• ClusterDomainEvent: base type
• MemberUp: member status changed to Up
• UnreachableMember: member considered unreachable by failure detector
• MemberRemoved: member completely removed from the cluster
• MemberEvent: member status change Up, Removed
• Leader events
• Reachability events
Cluster Events

@helenaedelson
• CurrentClusterState: current snapshot state of the cluster, sent to new
subscribers, unless InitialStateAsEvents specified
• InitialStateAsEvents to receive messages which replay events to restore the
current snapshot of the cluster state
Cluster State

@helenaedelson
Gossip Protocol

@helenaedelson
Gossip Convergence
The cluster state is a CRDT which can be deterministically merged

@helenaedelson
(leader)
• Masterless
• No Leader Election
• Role of the leader: only one
who can change status
• joining to up
• exiting to removed
Leader decisions are local to
DC
Cluster Leader

@helenaedelson
[api]
[api]
[worker, backend]
[worker]
[worker]
Cluster Roles

@helenaedelson
Cluster Membership State
A CRDT which can be deterministically merged
Joining
Up
Leaving
Exiting
removedDown
User Action
Join
Leader
Action
User Action
Leave Leader
Action
Leader
Action
User Action
Down

@helenaedelson
Cluster Member Node Lifecycle
Node Lifecycle: failure
Node Lifecycle: clean startup and graceful , coordinated shutdown

@helenaedelson
Network Partitions
Split Brain
A, E & D
Unreachable
A
E
B
S
S
S
B & C
Unreachable
B
C
D

@helenaedelson
Network Partition: Split Brain
Cluster State Cluster State

@helenaedelson
developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver
Split Brain Resolver (SBR) Strategies

@helenaedelson
developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver
SBR Strategy: Keep Majority
Keep Majority:
keep = 3

@helenaedelson
Cluster Singleton
Single point of cluster-wide decisions or coordination
ClusterSingletonManager
(oldest)
SingletonActor

@helenaedelson
Cluster Singleton
ClusterSingletonProxy
Message
(oldest)
SingletonActor

@helenaedelson
Cluster Singleton

@helenaedelson
Cluster Singleton: On Failure
(oldest)
Failover
Message
SingletonActorDowned or Network Partition
ClusterSingletonProxy

@helenaedelson
Guarantees one instance of a particular
actor type per cluster
Cluster Singleton
doc.akka.io/docs/akka/current/scala/cluster-singleton

@helenaedelson
Distributed Data, CRDTs
& Eventual Consistency
Partition and delay tolerant data availability with multi-master replication

@helenaedelson
An approach to eventual distributed consistency
• Replicate data across the network
• Concurrent updates from different nodes without coordination
• Mathematical properties guarantee eventual consistency
• Updates execute immediately, unaffected by network faults
• Consistency without consensus
• Highly scalable and fault tolerant
Conflict-Free Replicated Data Types (CRDT)
A comprehensive study of Convergent and Commutative Replicated Data Types

@helenaedelson
A replicated counter, which converges because the increment / decrement operations
commute
• Service Discovery
• Shopping Cart
• Priority on low latency and full availability
• Computation in delay-tolerant networks
• Data aggregation
• Partition-tolerant cloud computing
• Collaborative text editing
Application Of CRDTs
A few implementations:
• Riak Data Types
• SoundCloud Roshi
• Akka Distributed Data

@helenaedelson
1976: The maintenance of duplicate databases, Paul Johnson, Robert Thomas
1984: Efficient solutions to the replicated log and dictionary problems, Gene Wuu, Arthur Bernstein
1988: Scale and performance in a distributed file system, J. Howard, M. Kazar, S. Menees, D. Nichols, M.
Satyanarayanan, R. Sidebotham, M. West
1988: Commutativity-based concurrency control for abstract data types, W. Weihl
1989: Concurrency control in groupware systems, C. Ellis, S. Gibbs
1994: Resolving file conflicts in the Ficus file system, P. Reiher, J. Heidemann, D. Ratner, G. Skinner, and G. Popek
1994: Detecting causal relationships in distributed computations: In search of the holy grail, R. Schwarz, F. Mattern
1997: Specification of convergent abstract data types for autonomous mobile computing, C. Baquero, F. Moura
1999: Using structural characteristics for autonomous operation, Carlos Baquero, Francisco Moura
2009: A commutative replicated data type for cooperative editing, N. Preguiça, J. Marquès, M. Shapiro, M. Leţia
2011: A comprehensive study of Convergent and Commutative Replicated Data Types, M. Shapiro, N. Preguiça, C.
Baquero, M. Zawirski
Not New

@helenaedelson
• Low latency and high availability
• Data availability despite network partitions
• Nodes concurrently update as multi-master
• Async state replication across the cluster
• Granular control of consistency level for reads and writes
• Key-value store like API
Akka Distributed Data
doc.akka.io/docs/akka/current/scala/distributed-data
Replicated in-memory data store using CvRDT to share data between cluster nodes

@helenaedelson
Concurrent updates from different nodes resolve via the monotonic merge function,.
Counters GCounter grow-only, PNCounter (2 GCounters) increment decrement
Registers Flag toggle boolean, LWWRegister - Last Write Wins register
Sets GSet grow-only merge by union, ORSet observer-remove version vector
Maps ORMap, ORMultiMap, LWWMap, PNCounterMap
Graphs DAG
Composable For More Advanced Types
A comprehensive study of Convergent and Commutative Replicated Data Types

@helenaedelson
Delta State CRDTs (δ-CRDTs)
• A way to reduce the need for sending the full state for updates
• Sending only what changed
• Merging done on the receiving side
• Eventually consistent by default, and supports opt-in causal
consistency
Delta State Replicated Data Types
GCounter
GSet
PNCounter
PNCounterMap
LWWMap
ORMap
ORMultiMap
ORSet
LWWRegister

@helenaedelson
Replicator
Replicator
Replicator
Replicator
Replicator
Replicator
in memory key-value store

@helenaedelson
Replicator
Replicator
Replicator
Replicator
Replicator
Update(key, ddata)
Get(key)
Subscribe(key, actor)
Update(key, delta)
Replicator Protocol
Delete(key)

@helenaedelson
Simple Replicated Counter
Monotonic sequence: increment / decrement

@helenaedelson
Granular Consistency Levels
• strong consistency
• highest latency
• lowest availability
Majority is N/2 + 1
(nodes_written + nodes_read) > N

@helenaedelson
Granular Consistency Levels
• eventual consistency
• low latency
• high availability
(nodes_written + nodes_read) > N

@helenaedelson
Capacity Tracker
} put in common trait

@helenaedelson
CDC Capacity Listener
} put in common trait

@helenaedelson
• By default the data is only kept in memory and replicated to other nodes
• If all nodes are stopped the data is lost
• You can configure it to store on the local disk on each node (LMDB)
• Or implement your own to another store via the trait
• It will be loaded the next time the replicator is started
Configurable Durable Storage

@helenaedelson
doc.akka.io/docs/akka/current/distributed-data
Distributed Data
Eventually consistent - always accepts writes

@helenaedelson
• Needing high consistency over availability and low latency
• Big Data - not currently intended for billions of entries
• When a new node is added to the cluster all entries are propagated to it,
hence top level entries should not exceed 100000
• Data is held in memory
• If not using a delta-CRDT, when a data entry is changed the full state of that
entry may be replicated to other nodes.
Not Designed For

@helenaedelson
Cluster Sharding
Scale, Resilience & Consistency
• Automatically distribute entities of the same type over several nodes
• Balance resources (memory, disk space, network traffic) across
multiple nodes for scalability
• Location transparency: Interact by logical ID
• Increased fault tolerance - relocation on failure
Life beyond Distributed Transactions
Node 1
SR1
S1 S2 S3

@helenaedelson
Each Entity Is A Consistency Boundary
Sender on Node 1
Local ShardRegion
Shards: groups of entities
Node 1
SR 1
S1 S2 S3
Your Code, Supervised By Shards
Message(gid)

@helenaedelson
• Creates entity actors on demand
• Supervises group of entities - defined by the shard ID extraction
N-Shards Per Cluster Node
Entity B-1
SR2
SC
SR1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
SR3
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3

@helenaedelson
• Creates and supervises its shards
• Knows how to route messages by routing key
ShardRegion Per Cluster Node
Envelope(“c-1”)
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3
Node 1
Node 2 Node 3

@helenaedelson
• Stores Shard to Region mappings with Akka Persistence
• Monitors all cluster node status
• If the SC goes down it starts up on another node and
replays the state
Shard Coordination
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
(Cluster Singleton)
ShardRegion 1
ShardRegion 2
ShardRegion 3

@helenaedelson
Start Cluster Sharding On Node
Sending data
Your Entity ID
Extraction function
Your Shard ID
Extraction function
Your custom shard
allocation strategy
Your Envelope type
Or use built-in
HashExtractor

@helenaedelson
Cluster Sharding: Failover
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
ShardCoordinator
Downed
Location Transparency
Failover
Entity C-1
Shard C
ShardRegion 1
ShardRegion 2Envelope(“c-1”)

@helenaedelson
Each entity is a boundary of consistency
Guarantees one instance per entity type at a time per cluster
doc.akka.io/docs/akka/current/scala/cluster-sharding
Cluster Sharding

@helenaedelson
"Serverless is a new generation of platform-as-a-service offerings where
the infrastructure provider takes responsibility for receiving client
requests and responding to them, capacity planning, task scheduling,
and operational monitoring. Developers need to worry only about the
logic for processing client requests."
- Adzic et al
Serverless computing: economic and architectural impact
Serverless

@helenaedelson
• Automated infrastructure running in a container pool
• A classic data-shipping architecture - we move data to the code, not the other
way round
• Pay be execution time
• Autoscales with load
• Event driven
• Stateless
• Ephemeral (5-15 minutes)
FaaS

@helenaedelson
Message In
A FaaS Serverless Deployment
User Function
Deployment
Message Out

@helenaedelson
• Load and event spikes needing massive parallelism
• Scaling from 0 to 10000s requests and down to zero
• Simplifies delivery of scale and availability
• As integration layer between various (ephemeral and durable) data sources
• Processing stateless intensive workloads
• As data backbone moving data from A to B and transforming it
• Can work well for event-driven use cases
What Is FaaS Good At Currently?

@helenaedelson
Message In User Function
Deployment
Database
Message Out
Not Serverless
In An Ideal World
FaaS With CRUD

@helenaedelson
• Functions handle only one event source
• Functions are stateless, ephemeral, and short-lived
• Computational context easily lost
• Limited options for managing and coordinating distributed state
• Limited options for the right consistency guarantees
• Limited options for durable state, that is scalable and available
• Expensive to load and store state from storage repeatedly
Limitations With Serverless
Distributed state is not well supported for complex distributed data workflows

@helenaedelson
• No direct communication which means applications must pub-sub all data over a
storage medium
• Too high latency for general purpose distributed computing problems
For a discussion on this, and other limitations with FaaS read the paper,
“Serverless Computing: One Step Forward, Two Steps Back”
by Joe Hellerstein, et al.
FaaS Does Not Have Addressability

@helenaedelson
Stateful Serverless
Knative, Akka Cluster, gRPC, CRDT

@helenaedelson
Stateful Serverless
Message In
User Function
Deployment
Message Out
State In State Out
We Need Better Models
For Distributed State

@helenaedelson
Serverless Event Sourcing
Command In
User Function
Deployment
Reply Out
Event Log In Events OUt

@helenaedelson
Message In
User Function
Deployment
Message Out
States/Deltas IN States/deltas OUT
Serverless CRDTs

@helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
Knative Events
User Function
(JavaScript, Go, Java,…)
KNative Serving of Stateful Functions
User Function
User Function
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)
gRPC

@helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
User Function
Powered by Akka Cluster Sidecars
User Function
User Function
Akka Sidecar
Akka Sidecar
Akka Sidecar
Akka Cluster
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)

@helenaedelson
Get Involved
github.com/lightbend/stateful-serverless
bit.ly/stateful-serverless-intro

@helenaedelson
Find Out More
• akka.io/docs
• developer.lightbend.com - sample
distributed workers project
• github.com/akka/akka-samples - many
sample projects
• discuss.akka.io - forums
• academy.lightbend.com
• developer.lightbend.com/docs/akka-
commercial-addons
• lightbend.com/videos-and-webinars
• lightbend.com/learn

@helenaedelson
Thank you
speakerdeck.com/helenaedelson
@helenaedelson
github.com/helena
Slides

Toward Predictability and Stability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Toward Predictability and Stability

Similar to Toward Predictability and Stability (20)

More from Helena Edelson

More from Helena Edelson (11)

Recently uploaded

Recently uploaded (20)

Toward Predictability and Stability