Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Distributed Systems: Patterns and Practices 
John Brinnand 
Enterprise Architect: StubHub
Agenda 
● Introduction 
● Why Distributed Systems – what problem do they solve? 
● Types of Distributed Systems 
● Common ...
What is a distributed system? 
● A distributed system is a software system in which components located on 
networked compu...
Vertical Scaling: Problems 
● What problems do distributed systems solve; why not build bigger and bigger 
machines to add...
Solution: Horizontal Scalability – Adaptive Systems 
● Big systems are made of many smaller systems working together. 
– I...
Solution: Horizontal Scalability – Adaptive Systems 
Node 1 Node2 
Node 4 Node 5 
Node 3 
● Embrace Failure 
– Self Healin...
Solution: Zookeeper 
Contains a list of the 
ZK servers in the 
cluster. 
● Clients connect to a single server. 
● All Cli...
Patterns 
● Leader and Followers 
– Continuous communication between servers 
● (awareness of the presence or absence of a...
Pattern: Stateless Applications 
Discovery Service, Load Balancing 
ZOOKEEPER 
ZOOKEEPER 
Service 1 
Leader 
Service 2 
Fo...
Snapshot data - Problem 
Server 1 
Follower 
Server 1 
Follower 
Server 2 
Leader 
Server 2 
Leader 
Server 3 
Follower 
S...
Pattern: Snapshot data – Quorum Management 
Server 1 
Follower 
Server 1 
Follower 
Quorum 
Manager 
Server 2 
Leader 
Ser...
Pattern: Data Lookup and Replication: 
HDFS 
NNaammeeNNooddee 
DDaatataNNooddee 1 1 DDaatataNNooddee 2 2 DDaatataNNooddee ...
Consistent Hashing – Replicated Data 
Cassandra 
A Consistent 
B 
C 
There are two write modes: 
Hash based 
off namespace...
Consistent Hashing – Replicated Data 
Cassandra 
A 
C 
B 
If the node that was hosting B's data 
goes down. The node next ...
Conclusion 
● Vertical scaling is expensive and error prone. 
● Horizontal scaling is elastic, responsive, fault tolerant ...
Questions
Upcoming SlideShare
Loading in …5
×

SVCC-2014

824 views

Published on

An overview of the principles, patterns and practices found in distributed systems.

Published in: Software
  • Be the first to comment

SVCC-2014

  1. 1. Distributed Systems: Patterns and Practices John Brinnand Enterprise Architect: StubHub
  2. 2. Agenda ● Introduction ● Why Distributed Systems – what problem do they solve? ● Types of Distributed Systems ● Common strategies and patterns in distributed systems ● Conclusion ● Questions
  3. 3. What is a distributed system? ● A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. Wikipedia – A Distributed system is an Ecosystem – or a set of systems working together to provide a service, functionality or behavior for clients. ● The behavior is uniform – it appears to come from a single source, but in fact it comes from a set of systems interacting to produce that behavior. ● The components (systems) know of their peers and work together, passing messages between each other in order to: – service users requests; – detect and respond to failures; – adapt to changing conditions
  4. 4. Vertical Scaling: Problems ● What problems do distributed systems solve; why not build bigger and bigger machines to address increasing demand? ● Single points of failure – the bigger they are, the harder they fall – When the big system goes down, everything it contains goes down. ● NOC builds disaster recovery, failover strategies, constant monitoring. ● Ops becomes failure sensitive, vigilant and risk averse. ● Elastic demand - How to size system resources for elastic demand? – At peak times (Thanksgiving, Christmas, Valentines day, etc) demand increases. Hordes of consumers descend upon eCommerce sites simultaneously, causing system meltdown. – Off-season – usage is bursty. Sometimes steady, sometimes slow and sometimes relatively idle. ● Business Impacts – Increased expenditure – Failure results in loss of current and future business ● Loss of customer confidence, ● Negative Brand impact – Competitive edge: newer software features take time to be installed. Development is fast, Ops is slow.
  5. 5. Solution: Horizontal Scalability – Adaptive Systems ● Big systems are made of many smaller systems working together. – Individual system – has a single capability. To service a request it delegates to a peer or peers for providing the capabilities it does not have. Responses from its peers are processed and presented to the user. Node 1 Node2 Node 4 Node 5 Node 3 ● Horizontal Scalability by itself is not an Adaptive System. – So what is an adaptive system? Message based Network dependent Failure Isolation Optimized Deployment Elastic: on demand Service addition And removal Parallel Development High Failure Rates
  6. 6. Solution: Horizontal Scalability – Adaptive Systems Node 1 Node2 Node 4 Node 5 Node 3 ● Embrace Failure – Self Healing: Make the system “self-aware”. If one component fails (which it will) “spin up” another instance. Node2 Admin ● Respond to demand – Increase and decrease capacity to meet changes in demand. Node 1 Node 1 Admin Node 4 Node 5 Node 3 Node 4 Node 2 ● However – the system is still not fault tolerant – The Admin Service is a single point of failure.
  7. 7. Solution: Zookeeper Contains a list of the ZK servers in the cluster. ● Clients connect to a single server. ● All Client requests are served from the in-memory Broadcast messages Server 1 Follower Server 1 Follower Server 2 Leader Server 2 Leader Server 3 Follower Server 3 Follower DDaattaa SSttoorree Configuration: Host: IP and Port Client Data Configuration: Host: IP and Port Client Data ….............. ….............. All Writes go to the Leader CClileienntt CClileienntt CClileienntt CClileienntt CClileienntt ZZKK C Clileienntt Client Anatomy of a Client data store on a server. ● Servers send their data to the leader. ● Leader stores the data in a data store. ● A Server responds to client only after the leader has stored the data. Broadcast messages Server 1 Leader Server 1 Leader Server 2 Leader Server 2 Leader Server 3 Follower Server 3 Follower DDaattaa SSttoorree Configuration: Host: IP and Port Client Data Configuration: Host: IP and Port Client Data ….............. ….............. All Writes go to the Leader CClileienntt CClileienntt CClileienntt CClileienntt CClileienntt ZZKK C Clileienntt Client Anatomy of a Client Contains a list of the ZK servers in the cluster. ● If a Leader fails, a new leader is elected. ● Clients reconnect to the next available server from their list of available zookeeper servers. ● The data for each client is loaded into each server that services that client.
  8. 8. Patterns ● Leader and Followers – Continuous communication between servers ● (awareness of the presence or absence of a peer) ● Leader election – dynamically elect a leader on startup and on failure conditions. – Leader manages common data store (which is the source of truth). ● Common Data Store – single source of data (or state) which is distributed to all servers in the cluster or ensemble. ● Expectation of Failure – Programming model, storage model, messaging model – all have failure recognition and failure recovery methodologies built-in. Server 1 Follower Server 1 Follower Broadcast messages Server 2 Leader Server 2 Leader DDaattaa SSttoorree Server 3 Follower Server 3 Follower / /app1 /app1/p_1 /app1/p_2 /app1/p_3 / /app1 Server 1 Follower Server 1 Follower Broadcast messages Server 2 Leader Server 2 Leader DDaattaa SSttoorree Server 3 Follower Server 3 Follower / /app1 /app1/p_1 /app1/p_2 /app1/p_3 / /app1 Initial Cluster / Ensemble Leader Failure: Restructured ensemble
  9. 9. Pattern: Stateless Applications Discovery Service, Load Balancing ZOOKEEPER ZOOKEEPER Service 1 Leader Service 2 Follower Service 3 Follower Service 1 Leader Service 2 Follower Service 3 Follower Service 1 Follower Service 2 Leader Service 3 Follower Client 1 List of all services: Blue: 1, 2, 3 Light Orange: 1,2,3 Green: 1,2,3 Internal load balancer: Round robin request to each Service. ZOOKEEPER ZOOKEEPER Service 1 Service 2 Follower Service 3 Leader Service 1 Leader Service 2 Follower Service 3 Follower Service 1 Service 2 Service 3 Client 1 List of all services: Blue: 2, 3 Light Orange: 1,3 Green: 1,2,3 Internal load balancer: Round robin request to each Service. Cluster configuration - after Cluster configuration: initial deployment. a failure condition. ● ZK Async notification: all services that are part of a “group” receive asynchronous notifications when any member of that group goes down. ● ZK Leader Election: when a leader of a group goes down, zookeeper will elect a new leader. ● Discovery Service built on Zookeeper notifies the client of the new cluster configuration. ● Shared Data: All members of a group will receive data (configuration, events) published by any other member of the group.
  10. 10. Snapshot data - Problem Server 1 Follower Server 1 Follower Server 2 Leader Server 2 Leader Server 3 Follower Server 3 Follower / /app1 /app1/p_1 /app1/p_2 / /app1 / /app1/p_1 /app1/p_2 /app1/p_3 /app1/p_3 / /app1 /app1/p_1 /app1/p_2 /app1/p_3 /app1 Client 1 /app1/p_1 /app1/p_2 /app1/p_3 Periodic updates / Snapshots CAP Theorem:  Consistency – all nodes see the same data at the same time  Availability – a guarantee that every request receives a response about whether it was successful or failed  Partition Tolerance - the system continues to operate despite arbitrary message loss or failure of part of the system. Consistent: to synchronize the data, the system will have to be unavailable for a period of time even though it is fully operational. Availability: if the system is always available and is operating in spite of message loss and component failure, then the data will be inconsistent at any given point in time. Partition Tolerance: if the system continues to function when parts of it fail, then it can be available but the data within it cannot be consistent. So if Availability and Partition Tolerance are favored, how can a client get accurate or viable data?
  11. 11. Pattern: Snapshot data – Quorum Management Server 1 Follower Server 1 Follower Quorum Manager Server 2 Leader Server 2 Leader Server 3 Follower Server 3 Follower / /app1 /app1/p_1 /app1/p_2 / /app1 / /app1/p_1 /app1/p_2 /app1/p_3 /app1/p_3 / /app1 /app1/p_1 /app1/p_2 /app1/p_3 Client 1 /app1 /app1/p_1 /app1/p_2 /app1/p_3 Periodic updates / Snapshots Quorum Manager  A quorum manager issues a request to a number of systems, takes the results, compares the timestamps (or vector clock) and returns the most up to date data back to the client.  A Quorum manager can exist in the cluster – in each component – or external to the system as a service. According to Wikipedia, Quorum is the minimum number of members of a deliberative body necessary to conduct the business of that group. Ordinarily, this is a majority of the people expected to be there, although many bodies may have a lower or higher quorum.
  12. 12. Pattern: Data Lookup and Replication: HDFS NNaammeeNNooddee DDaatataNNooddee 1 1 DDaatataNNooddee 2 2 DDaatataNNooddee 3 3 Read or Write File / Data 1 2 http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html 1 3 DDaatataNNooddee 4 4 DDaatataNNooddee 5 5 1 3 2 2 4 3 4 4 5 5 5 6 6 6 Client 1 /user/my-company/file-part-0, r:3, 1,3, /user/my-company/fie-part-1, r:3, 2,4 /user/my-company/file_part-2, r3, 5,6 Example: WebHdfs which first contacts the NameNode to find out the data nodes to write to. Or to find out which data nodes to read from.
  13. 13. Consistent Hashing – Replicated Data Cassandra A Consistent B C There are two write modes: Hash based off namespace and key C A B Find the node on the ring with a range of keys into which the current key falls. Write the data to that node.  Quorum write: blocks until quorum is reached  Async write: sends request to any node. That node will push the data to appropriate nodes but return to client immediately If the node is down, then write to another node with a hint saying where it should be written to. Harvester [goes through] every 15 min goes through and find hints and moves the data to the appropriate node
  14. 14. Consistent Hashing – Replicated Data Cassandra A C B If the node that was hosting B's data goes down. The node next to it on the ring will take its data, from Bs replicated data and it will become the host for Bs data. A B C B If a node is added to a partition, it will share some of the data that exists in that partition. The data it is responsible for is based on its hashed position in the ring. This results in a division of the keys among the two nodes. Interestingly , it promotes load balancing as well – since now the load is shared between two data nodes. A B C Initial state of the cluster Note that all data (A, B, C) is replicated. If it were not then a nodes failure will result in data loss.
  15. 15. Conclusion ● Vertical scaling is expensive and error prone. ● Horizontal scaling is elastic, responsive, fault tolerant and self-healing. ● Distributed Systems affect all aspects of software development. – Programming models – Testing – Deployment – Maintenance ● There are best practices and patterns for designing your distributed system. ● Many existing systems (Cassandra, Hadoop, Solr, Riak, Netflix platform) have are implementations of these patterns. Look under the hood. Use the patterns to “roll your own”.
  16. 16. Questions

×