Distributed systems and scalability rules


Published on

JavaDay 2013 presentation about main Distributed Systems principles and concepts and about basic Scalability Rules.

Published in: Technology

Distributed systems and scalability rules

  1. 1. Distributed Systems and Scalability Rules Oleg Tsal-Tsalko Email: oleg.tsalko@gmail.com Skype: oleg.tsalko Twitter: @tsaltsol
  2. 2. What is distributed system? A distributed system is a collection of independent computers that coordinate their activity and share resources and appears to its users as a single coherent system.
  3. 3. Why do we need distributed systems? • Nature of application required distributed network/system • Availability/Reliability (no single point of failure) • Performance (bunch of commodity servers give more performance that one supercomputer) • Cost efficient (bunch of commodity servers cost less than one supercomputer)
  4. 4. Examples • • • • • • • • Telecom networks (telephone/computer networks) WWW, peer-to-peer networks Multiplayer online games Distributed databases Network file systems Aircraft control systems Scientific computing (cluster/grid computing) Distributed rendering
  5. 5. Distributed systems characteristics Lack of a global clock Multiple autonomous components Components are not shared by all users Resources may not be accessible Software runs in concurrent processes on different processors Multiple Points of control (distributed management) Multiple Points of failure (fault tolerance) The structure of the system (network topology, network latency, number of computers) is not known in advance • Each computer has only a limited, incomplete view of the system. • • • • • • • •
  6. 6. Advantages over centralized systems Scalability Redundancy •The system can easily be expanded by adding more machines as needed. •Several machines can provide the same services, so if one is unavailable, work does not stop. Economics •A collection of microprocessors offer a better price/performance than mainframes. Low price/performance ratio: cost effective way to increase computing power. Reliability •If one machine crashes, the system as a whole can still survive. Speed Incremental growth •A distributed system may have more total computing power than a mainframe •Computing power can be added in small increments
  7. 7. Advantages over independent PCs Data sharing • Allow many users to access common data Resource sharing • Allow shared access to common resources Communication • Enhance human-to-human communication Flexibility • Spread the workload over the available machines
  8. 8. Parallel computing Distributed computing •In parallel computing, all processors may have access to a shared memory to exchange information between processors. •In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.
  9. 9. Algorithms Parallel algorithms in shared-memory model • All computers have access to a shared memory. The algorithm designer chooses the program executed by each computer. Parallel algorithms in message-passing model • The algorithm designer chooses the structure of the network, as well as the program executed by each computer. Distributed algorithms in message-passing model • The algorithm designer only chooses the computer program. All computers run the same program. The system must work correctly regardless of the structure of the network.
  10. 10. It appeared that Distributed Systems have some fundamental problems!
  11. 11. Byzantine fault-tolerance problem The objective of Byzantine fault tolerance is to be able to defend against Byzantine failures, in which components of a system fail in arbitrary ways Known algorithms can ensure correct operation only if <1/3 of the processes are faulty.
  12. 12. Byzantine Generals and Two Generals problems It is proved that there is no solution for these problems other than probabilistic…
  13. 13. Consensus problem • Agreeing on the identity of leader • State-machine replication • Atomic broadcasts There are number of protocols to solve consensus problem in distributed systems such as widely used `Paxos consensus protocol` http://en.wikipedia.org/wiki/Paxos_algorithm
  14. 14. Types of distributed systems Cluster computing systems Grid computing systems
  15. 15. Grid computing Grid computing is the collection of computer resources from multiple locations to reach a common goal. What distinguishes grid computing from conventional high performance computing systems such as cluster computing is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed.
  16. 16. Cluster computing Computer clustering relies on a centralized management approach which makes the nodes available as orchestrated shared servers. It is distinct from other approaches such as peer to peer or grid computing which also use many nodes, but with a far more distributed nature.
  17. 17. Distributed systems design and architecture principles The art of simplicity Scaling out (X/Y/Z-axis) Aggressive use of caching Using messaging whenever possible Redundancy to achieve HA Replication Sharding Scaling your database level Data locality Consistency Fault tolerance CAP theorem
  18. 18. Architecture principles ® Copyright 2012 Gigaspaces Ltd. All Rights Reserved 18
  19. 19. HA nodes configuration Active/active (Load balanced) • Traffic intended for the failed node is either passed onto an existing node or load balanced across the remaining nodes. Active/passive • Provides a fully redundant instance of each node, which is only brought online when its associated primary node fails: Hot standby Warm standby Cold standby • Software components are installed and available on both primary and secondary nodes. • The software component is installed and available on the secondary node. The secondary node is up and running. • The secondary node acts as backup of another identical primary system. It will be installed and configured only when the primary node breaks down for the first time.
  20. 20. Redundancy as is • • • • • • Redundant Web/App Servers Redundant databases Disk mirroring Redundant network Redundant storage network Redundant electrical power
  21. 21. Redundancy in HA cluster • Easy start/stop procedures • Using NAS/SAN shared storage • App should be able to store it’s state in shared storage • App should be able to restart from stored shared state on another node • App shouldn’t corrupt data if it crashes or restarted
  22. 22. Replication Replication in computing involves sharing information so as to ensure consistency between redundant resources. • Primary-backup (master-slave) schema – only primary node processing requests. • Multi-primary (multi-master) schema – all nodes are processing requests simultaneously and distribute state between each other. Backup differs from replication in that it saves a copy of data unchanged for a long period of time. Replicas, on the other hand, undergo frequent updates and quickly lose any historical state.
  23. 23. Replication models • Transactional replication. Synchronous replication to number of nodes. • State machine replication. Using state machine based on Paxis algorithm. • Virtual synchrony (Performance over fault-tolerance). Sending asynchronous events to other nodes. • Synchronous replication (Consistency over Performance) - guarantees "zero data loss" by the means of atomic write operation. • Asynchronous replication (Performance over Consistency) (Eventual consistency) - write is considered complete as soon as local storage acknowledges it. Remote storage is updated, but probably with a small lag.
  24. 24. Sharding (Partitioning) Sharding is the process of storing data records across multiple machines to meet demands of data growth. Why sharding? • High query rates can exhaust the CPU capacity of the server. • Larger data sets exceed the storage capacity of a single machine. • Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
  25. 25. Sharding (Partitioning) • Sharding reduces the number of operations each shard handles. • Sharding reduces the amount of data that each server needs to store.
  26. 26. Data Partitioning Principles Partitioned Data Feeder Virtual Machine Virtual Machine Virtual Machine Back to key scenarios Partitioned Data with Backup Per Partition Feeder Replication Replication Backup 1 Primary 1 Primary 2 Backup 2 Virtual Machine Virtual Machine Virtual Machine Virtual Machine ® Copyright 2012 Gigaspaces Ltd. All Rights Reserved 26
  27. 27. Split-brain problem When connectivity between nodes in cluster gone and cluster divided in several parts Solutions: • Optimistic approach (Availability over Consistency) o Leave as is and rely on later resynch (Hazelcast) • Pessimistic approach (Consistency over Availability) o Leave only one partition live before connectivity fixed (MongoDB)
  28. 28. Consistency Strong Weak Eventual • After update completes any subsequent access will return the updated value. • The system does not guarantee that subsequent accesses will return the updated value. • The storage system guarantees that if no new updates are made to object eventually all accesses will return the last updated value.
  29. 29. Eventually consistent Strong => W + R > N Weak/Eventual => W + R <= N Optimized read => R=1, W=N Optimized write => W=1, R=N N – number of nodes W – number of replicas to aknowledge update R – number of replicas contacted for read
  30. 30. Fault tolerance (Architecture concepts) Fault tolerant system: Approaches: • No single point of failure • Fault isolation • Roll-back/Roll-forward procedures • Replication • Redundancy • Diversity – several alternative implementations of some functionality
  31. 31. Fault tolerance (Design principles) Design using fault isolated “swimlanes” Never trust single point of failure Avoid putting systems in series Ensure you have “switch on/switch off” for your new functionality
  32. 32. Data locality Put data closer to clients scaling by Z-axis. Locate processing units near data to be processed.
  33. 33. BASE • Basic Availability • Soft-state • Eventual consistency Alternative model to well known ACID which is used in Distributed Systems to relax strong consistency constraints in favor to achieve higher Availability together with Partition Tolerance as per CAP theorem.
  34. 34. CAP theorem
  35. 35. CAP prove
  36. 36. Eric Brewer’s quote “Because partitions are rare, CAP should allow perfect C and A most of the time, but when partitions are present or perceived, a strategy that detects partitions and explicitly accounts for them is in order. This strategy should have three steps: detect partitions, enter an explicit partition mode that can limit some operations, and initiate a recovery process to restore consistency and compensate for mistakes made during a partition.”
  37. 37. Eric Brewer’s recipe
  38. 38. Design principles ® Copyright 2012 Gigaspaces Ltd. All Rights Reserved 38
  39. 39. Scaling out (Z/Y/Z axis) [X-Axis]: Horizontal duplication (design to clone things) [Y-Axis]: Split by Function, Service or Resource (design to split diff things) [Z-Axis]: Lookups split (design to split similar things)
  40. 40. The art of simplicity KISS (Keep it simple). Don’t overengineer a solution. Simplify solution 3 times over (scope, design, implementation) Reduce DNS lookups. Reduce objects where possible (Google main page) Use homogenous networks where possible Avoid too many traffic redirects Don’t check your work (avoid defensive programing) Relax temporal constraints where possible
  41. 41. Aggressive use of caching Use expires headers Cache AJAX calls Leverage Page Caches (Proxy Web Servers) Utilize Application caches Use Object Caches (ORM level) Put caches in their own tier
  42. 42. Caching on different levels
  43. 43. Using messaging whenever possible • Communicate asynchronously as much as possible • Ensure your message bus can scale • Avoid overcrowding your message bus
  44. 44. Scaling your database layer Denormalize data where possible cause relationships are costly. Use the right type of lock. Avoid using multiphase commits and distributed transactions. Avoid using “select for update” statements. Don’t select everything.
  45. 45. Thank you and questions! Oleg Tsal-Tsalko Email: oleg.tsalko@gmail.com Skype: oleg.tsalko Twitter: @tsaltsol ® Copyright 2012 Gigaspaces Ltd. All Rights Reserved 45