Designing large scale distributed systems
Upcoming SlideShare
Loading in...5
×
 

Designing large scale distributed systems

on

  • 1,619 views

 

Statistics

Views

Total Views
1,619
Views on SlideShare
1,535
Embed Views
84

Actions

Likes
2
Downloads
39
Comments
0

1 Embed 84

http://www.scoop.it 84

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Designing large scale distributed systems Designing large scale distributed systems Presentation Transcript

  • Designing Large­Scale  Distributed Systems Ashwani Priyedarshi
  • “the network is the computer.” John Gage, Sun Microsystems
  • “A distributed system is one in which the failure  of a computer you didn’t even know existed can  render your own computer unusable.” Leslie Lamport
  • “Of three properties of distributed data systems­  consistency, availability, partition­tolerance –  choose two.” Eric Brewer, CAP Theorem, PODC 2000
  • Agenda● Consistency Models● Transactions● Why to distribute?● Decentralized Architecture● Design Techniques & Tradeoffs● Few Real World Examples● Conclusions
  • Consistency Model• Restricts possible values that a read operation on  an item can return – Some are very restrictive, others are less – The less restrictive ones are easier to implement• The most natural semantic for storage system is ­  "read should return the last written value” – In case of concurrent accesses and multiple replicas, its  not easy to identify what "last write" means
  • Strict Consistency● Assumes the existence of absolute global time● It is impossible to implement on a large distributed  system● No two operations (in different clients) allowed at the  same time● Example: Sequence (a) satisfies strict consistency, but  sequence (b) does not
  • Sequential Consistency● The result of any execution is the same as if  ● the read and write operations by all processes on the data  store were executed in some sequential order ● the operations of each individual process appear in this  sequence in the order specified by its program● All processes see the same interleaving of operations● Many interleavings are valid● Different runs of a program might act differently● Example: Sequence (a) satisfies sequential consistency,  but sequence (b) does not
  • Consistency vs Availability• In large shared­data distributed systems, network  partitions are a given• Consistency or Availability• Both options require the client developer to be aware  of what the system is offering
  • Eventual Consistency• An eventual consistent storage system guarantees that  if no new updates are made to the object, eventually  all accesses will return the last updated value• If no failures occur, the maximum size of the  inconsistency window can be determined based on factors  such as: – load on the system – communication delays – number of replicas• The most popular system that implements eventual  consistency is DNS
  • Quorum­based Technique • To enforce consistent operation in a distributed  system.• Consider the following parameters: – N = Total number of replicas – W = Replicas to wait for acknowledgement during writes – R = Replicas to access during reads• If W+R > N – the read set and the write set always overlap and one can  guarantee strong consistency• If W+R <= N – the read and write set might not overlap and consistency  cannot be guaranteed
  • Agenda● Consistency Models● Transactions● Why to distribute?● Decentralized Architecture● Design Techniques & Tradeoffs● Few Real World Examples● Conclusions
  • Transactions● Extended form of consistency across multiple operations● Example: Transfer money from A to B ● Subtract from A ● Add to B● What if something happens in between? ● Another transaction on A or B ● Machine Crashes ● ...
  • Why Transactions?● Correctness● Consistency● Enforce Invariants● ACID
  • Agenda● Consistency Models● Transactions● Why to distribute?● Decentralized Architecture● Design Techniques & Tradeoffs● Few Real World Examples● Conclusions
  • Why to distribute?● Catastrophic Failures● Expected Failures● Routine Maintenance● Geolocality ● CDN, edge caching
  • Why NOT to distribute?● Within a Datacenter ● High bandwidth: 1­100Gbps interconnects ● Low latency: < 1ms within a rack, < 5ms across ● Little to no cost● Between Datacenters ● Low bandwidth: 10Mbps­1Gbps ● High latency: expect 100s of ms ● High Cost for fiber
  • Agenda● Consistency Models● Transactions● Why to distribute?● Decentralized Architecture● Design Techniques & Tradeoffs● Few Real World Examples● Conclusions
  • Decentralized Architecture● Operating from multiple data­centers simultaneously● Hard problem● Maintaining consistency? Harder● Transactions? Hardest
  • Option 1: Dont● Most common ● Make sure data­center never goes down● Bad at catastrophic failure ● Large scale data loss● Not great for serving ● No geolocation
  • Option 2: Primary with hot failover(s)● Better, but not ideal ● Mediocre at catastrophic failure ● Window of lost data ● Failover data may be inconsistent● Geolocated for reads, not for writes
  • Option 3: Truly Distributed● Simultaneous writes in different DCs, maintaining  consistency● Two­way: Hard● N­way: Harder● Handles catastrophic failure, geolocality● But high latency
  • Agenda● Consistency Models● Transactions● Why to distribute?● Decentralized Architecture● Design Techniques & Tradeoffs● Few Real World Examples● Conclusions
  • Tradeoffs Backups M/S MM 2PC PaxosConsistencyTransactionsLatencyThroughputData LossFailover
  • Backups● Make a copy● Weak consistency● Usually no transactions
  • Tradeoffs – Backups Backups M/S MM 2PC PaxosConsistency WeakTransactions NoLatency LowThroughput HighData Loss HighFailover Down
  • Master/slave replication● Usually asynchronous ● Good for throughput, latency● Weak/eventual consistency● Support transactions
  • Tradeoffs – Master/Slave Backups M/S MM 2PC PaxosConsistency Weak EventualTransactions No FullLatency Low LowThroughput High HighData Loss High SomeFailover Down Read Only
  • Multi­master replication● Asynchronous, eventual consistency● Concurrent writes● Need serialization protocol ● e.g. monotonically increasing timestamps ● Either with master election or distributed consensus protocol● No strong consistency● No global transactions
  • Tradeoffs ­ Multi­master Backups M/S MM 2PC PaxosConsistency Weak Eventual EventualTransactions No Full LocalLatency Low Low LowThroughput High High HighData Loss High Some SomeFailover Down Read Only Read/write
  • Two Phase Commit● Semi­distributed consensus protocol ● deterministic coordinator● 1: Request 2: Commit/Abort● Heavyweight, synchronous, high latency● 3PC: Asynchronous (One extra round trip)● Poor Throughput
  • Tradeoffs ­ 2PC Backups M/S MM 2PC PaxosConsistency Weak Eventual Eventual StrongTransactions No Full Local FullLatency Low Low Low HighThroughput High High High LowData Loss High Some Some NoneFailover Down Read Only Read/write Read/write
  • Paxos● Decentralized, distributed consensus protocol● Protocol similar to 2PC/3PC ● Lighter, but still high latency● Three class of agents: proposers, acceptors, learners● 1. a) prepare b) promise 2. a) accept b) accepted ● Survives minority failure
  • Tradeoffs Backups M/S MM 2PC PaxosConsistency Weak Eventual Eventual Strong StrongTransactions No Full Local Full FullLatency Low Low Low High HighThroughput High High High Low MediumData Loss High Some Some None NoneFailover Down Read Only Read/write Read/write Read/write
  • Agenda● Consistency Models● Transactions● Why to distribute?● Decentralized Architecture● Design Techniques & Tradeoffs● Few Real World Examples● Conclusions
  • Examples● Megastore ● Googles Scalable, Highly Available Datastore ● Strong Consistency, Paxos ● Optimized for reads● Dynamo ● Amazon’s Highly Available Key­value Store ● Eventual Consistency, Consistent Hashing, Vector Clocks ● Optimized for writes● PNUTS ● Yahoos Massively Parallel & Distributed Database System ● Timeline Consistency  ● Optimized for reads
  • Conclusions● No silver bullet ● There are no simple solutions● Design systems based on application needs
  • The End
  • Backup Slides
  • Vector Clocks• Used to capture causality between different  versions of the same object.• A vector clock is a list of (node, counter) pairs.• Every version of every object is associated with  one vector clock.• If the counters on the first object’s clock are  less­than­or­equal to all of the nodes in the  second clock, then the first is an ancestor of the  second and can be forgotten.
  • Vector Clock Example
  • Partitioning Algorithm• Consistent hashing: – The output range of a hash  function is treated as a  fixed circular space or  “ring”.• Virtual Nodes – Each node can be responsible  for more than one virtual  node. – When a new node is added, it  is assigned multiple  positions. – Various Advantages