Desert Code Camp 2016.1 - Stateful Distributed Systems

Statefulness in
Distributed
Systems

Amazon is hiring!
To learn more about our Dev Centers: http://bit.ly/phxdevcenters  
To learn more about current opportunities, email: phx@amazon.com
Our mission is to provide the
most innovative, scalable,
and reliable systems in the
world.

Outline
What’s that smell?
What’s the meaning of this?

Outline
Pics or it didn’t happen.

Outline
Swallowing the red pill.

Outline
Swallowing the red pill.
Challenge accepted!

DESIGN SMELL
Stateful Services

ANTI-PATTERN
DESIGN SMELL
Stateful Services

WEB
SERVER
TIER
CLIENT
TIER
DATABASE
TIER

Stateful Services
You’re already doing it!

Required to do something useful
Stateful Services

Stateful Services
(most of the time)

Stateful Services
(most of the time)
stateful processing systems share
similar concerns

Stateful Services
(most of the time)
stateful processing systems share
similar concerns
(This is where the fun is)

Stateful Processing
Data locality

data intensive systems
Stateful Processing
Data locality

Stateful Processing
Strong Consistency
Data locality

Stateful Processing
Strong Consistency
Data locality
High performance

Stateless Services
does not care what has happened

Stateless Services
does not care what has happened
does not care what has changed

Needs to care what has happened
Stateful Processing

Needs To care what has changed
Stateful Processing

Needs To care what has changed
Divorce Process lifecycle
from data lifecycle
Stateful Processing

CLIENT
TIER
WEB
SERVER
TIER
DATABASE
TIER

Stateful
Processing
service
CLIENT
TIER

Stateful
Processing
service
CLIENT
TIER
Storage
TIER

https://github.com/graphite-project/whisper
https://github.com/graphite-project/carbon

https://www.elastic.co/products/elasticsearch
http://lucene.apache.org/

Gossip protocols
Stateful Processing

Gossip Protocols
Usage
dissemination - event / background

Gossip Protocols
Usage
anti-entropy - data repair

Gossip Protocols
Usage
aggregation - compute across the network

Gossip Protocols
Usage
SWIM

Gossip Protocols
Usage
SWIM
scalable weakly-consistent infection-style
process group membership protocol
- membership list (non-faulty processes)

SWIM Protocol
failure detection

SWIM Protocol
failure detection
1. ping random process from member list

SWIM Protocol
failure detection
A. Receives Ack, Great! end

SWIM Protocol
failure detection
B. does not receive ack, goto 2

SWIM Protocol
failure detection
2. send ping request to k processes from member list

SWIM Protocol
failure detection
3. k processes try to ping “failed” process

SWIM Protocol
failure detection
A. K processes respond back to original process about
status

SWIM Protocol
failure detection
status
dissemination

SWIM Protocol
failure detection
status
dissemination
multicast failure (vol. leave) updates

SWIM Protocol
failure detection
status
dissemination
multicast failure (vol. leave) updates
multicast new members

consensus
Gossip protocols
Stateful Processing

Paxos
Proposers, Acceptors, Learners

Paxos
Proposers:

Paxos
Proposers: Ask Acceptors to Approve Proposals

Paxos
Proposers: Ask Acceptors to Approve Proposals
Ask Acceptors to accept a proposal & Version

Paxos
Proposers:
Acceptors:
Ask Acceptors to Approve Proposals

Paxos
Proposers:
Acceptors:
Do not have to approve or accept

Paxos
Proposers:
Acceptors:
Only Approve/Accept what has been proposed

Paxos
Proposers:
Acceptors:
LEARNERS:

Paxos
Proposers:
Acceptors:
LEARNERS: Proposed Value is chosen when a majority of  
Acceptors have accepted the value

Paxos
Phase 1:

Paxos
Phase 1:
1) Proposers propose a value

Paxos
Phase 1:
2) Acceptors approve value

Paxos
Phase 1:
1) not approve or accept smaller values

Paxos
Phase 1:
2) send back highest (accepted) version

Paxos
Phase 1:
Phase 2:

Paxos
Phase 1:
Phase 2:
1) proposer Receives approval from majority

Paxos
Phase 1:
Phase 2:
2) sends accept! to acceptors with highest
version

Paxos
Phase 1:
Phase 2:
version
Phase 3:

Paxos
Phase 1:
Phase 2:
version
Phase 3: 1) acceptors notify all learners

Gossip protocols
consensus
Stateful Processing

consistent hashing
Gossip protocols
consensus
Stateful Processing

Consistent Hashing
Distributed Hash Tables

Consistent Hashing
Convenient, Fast O(1)

Consistent Hashing
Great for lookups given a key

Consistent Hashing
Issue

Consistent Hashing
Issue
Horizontal scaling

Consistent Hashing
Issue
Horizontal scaling
Fault Tolerance

Consistent Hashing
Issue
Horizontal scaling
Fault Tolerance
hash(object) (mod n)
n = number of slots

Consistent Hashing
Issue
Horizontal scaling
Fault Tolerance
hash(object) (mod n)
Remapping keys to more slots
n = number of slots

Consistent Hashing
Convenient, Fast O(1) 
Great for Key / Value lookups
Solution
K = keys, n = number of slots

Consistent Hashing
Solution
consistent hashing

Consistent Hashing
Solution
consistent hashing
During resize, k / n keys are remapped

Challenges
Stateful Processing

Challenges
result composition
Stateful Processing

Challenges
result composition
work distribution
Stateful Processing

Challenges
result composition
work distribution
code deployments
Stateful Processing

Challenges
result composition
work distribution
code deployments
unbounded data structures
Stateful Processing

Challenges
result composition
work distribution
code deployments
memory management
Stateful Processing

Challenges
result composition
work distribution
code deployments
memory management
persistence strategies
Stateful Processing

Challenges
result composition
work distribution
code deployments
memory management
persistence strategies
concurrency
Stateful Processing

Key Takeaways
Stateful and Stateless systems co-exist

Key Takeaways
process lifecycle vs data lifecyCle

Key Takeaways
Stateful services have their place

Key Takeaways
Interesting opportunities

Key Takeaways
Interesting opportunities
But not all is rosy

Dynamo: Amazon’s Highly Available Key-value Store
http://www.allthingsdistributed.com/ﬁles/amazon-dynamo-sosp2007.pdf
Papers We Love
The Chubby lock service for loosely-coupled distributed systems
research.google.com/archive/chubby-osdi06.pdf
Paxos made simple
http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
Time, Clocks, and the Ordering of Events in a Distributed System
http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf
The Google File System
http://research.google.com/archive/gfs-sosp2003.pdf

Resources
Caitie McCaffrey’s talks - https://goo.gl/8MSdRz
Apache Foundation - http://apache.org/
graphite - https://graphiteapp.org/
elastic - https://www.elastic.co/

Thanks!
Q&A
Amazon is hiring!
To learn more about our Dev Centers: http://bit.ly/phxdevcenters  
To learn more about current opportunities, email: phx@amazon.com

Desert Code Camp 2016.1 - Stateful Distributed Systems

Recommended

Recommended

More Related Content

Similar to Desert Code Camp 2016.1 - Stateful Distributed Systems

Similar to Desert Code Camp 2016.1 - Stateful Distributed Systems (20)

Recently uploaded

Recently uploaded (20)

Desert Code Camp 2016.1 - Stateful Distributed Systems