Replicated data
Upcoming SlideShare
Loading in...5
×
 

Replicated data

on

  • 306 views

 

Statistics

Views

Total Views
306
Views on SlideShare
306
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Replicated data Replicated data Presentation Transcript

  • Transactions withReplicated DataDistributed SoftwareSystems
  • One copy serializabilityz Replicated transactional servicey Each replica manager provides concurrency controland recovery of its own data items in the same wayas it would for non-replicated dataz Effects of transactions performed by variousclients on replicated data items are the same asif they had been performed one at a time on asingle data itemz Additional complications: failures, networkpartitionsy Failures should be serialized wrt transactions, i.e. anyfailure observed by a transaction must appear tohave happened before a transaction started
  • Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F200Figure 14.18 Replicated transactional service.BAClient + front endBB BA AGetBalance(A)Client + front endReplica managersReplica managersDeposit(B,3);UT
  • Replication Schemesz Read one –Write AllyCannot handle network partitionsz Schemes that can handle networkpartitionsyAvailable copies with validationyQuorum consensusyVirtual Partition
  • Read one/Write Allz One copy serializabilityy Each write operation sets a write lock at each replicamanagery Each read sets a read lock at one replica managerz Two phase commity Two-level nested transactionxCoordinator -> WorkersxIf either coordinator or worker is a replica manager, it has tocommunicate with replica managersz Primary copy replicationy ALL client requests are directed to a single primaryserverxDifferent from scheme discussed earlier
  • Available copies replicationz Can handle some replica managers areunavailable because they have failed orcommunication failurez Reads can be performed by any available replicamanager but writes must be performed by allavailable replica managersz Normal case is like read one/write ally As long as the set of available replica managers doesnot change during a transaction
  • Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F201Figure 14.19 Available copies.AXClient + front endPBClient + front endReplica managersDeposit(A,3);UTDeposit(B,3);GetBalance(B)GetBalance(A)Replica managersYMBNAB
  • Available copies replicationz Failure casey One copy serializabilty requires that failures andrecovery be serialized wrt transactionsy This is not achieved when different transactionsmake conflicting failure observationsy Example shows local concurrency control not enoughy Additional concurrency control procedure (called localvalidation) has to be performed to ensure correctnessz Available copies with local validation assumes nonetwork partition - i.e. functioning replicamanagers can communicate with one another
  • Local validation - examplez Assume X fails just after T has performedGetBalance and N fails just after U hasperformed GetBalancez Assume X and N fail before T & U haveperformed their Deposit operationsy T’s Deposit will be performed at M & P while U’sDeposit will be performed at Yy Concurrency control on A at X does not prevent Ufrom updating A at Y; similarly concurrency controlon B at N does not prevent Y from updating B at M &Py Local concurrency control not enough!
  • Local validation cont’dz T has read from an item at X, so X’sfailure must be after T.z T observes the failure of N, so N’s failuremust be before TyN fails -> T reads A at X; T writes B at M & P-> T commits -> X failsySimilarly, we can argue:X fails -> U reads B at N; U writes A at Y ->U commits -> N fails
  • Local validation cont’dz Local validation ensures such incompatiblesequences cannot both occurz Before a transaction commits it checks forfailures (and recoveries) of replica managers ofdata items it has accessedz In example, if T validates before U, T wouldcheck that N is still unavailable and X,M, P areavailable. If so, it can commitz U’s validation would fail because N has alreadyfailed.
  • Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F202Figure 14.20 Network partition.Client + front endBWithdraw(B, 4)Client + front endReplica managersDeposit(B,3);UTNetworkpartitionBB B
  • Handling Network Partitionsz Network partitions separate replica managersinto two or more subgroups, in such a way thatthe members of a subgroup can communicatewith one another but members of differentsubgroups cannot communicatez Optimistic approachesy Available copies with validationz Pessimistic approachesy Quorum consensus
  • Available Copies WithValidationz Available copies algorithm applied within eachpartitiony Maintains availability for Read operationsz When partition is repaired, possibly conflictingtransactions in separate partitions are validatedy The effects of a committed transaction that is nowaborted on validation will have to be undonexOnly feasible for applications where such compensatingactions can be taken
  • Available copies withvalidation cont’dz Validationy Version vectors (Write-Write conflicts)y Precedence graphs (each partition maintains a log ofdata items affected by the Read and Write operationsof transactionsy Log used to construct precedence graph whosenodes are transactions and whose edges representconflicts between Read and Write operationsxNo cycles in graph corresponding to each partitiony If there are cycles in graph, validation fails
  • Quorum consensusz A quorum is a subgroup of replica managerswhose size gives it the right to carry outoperationsz Majority voting one instance of a quorumconsensus schemey R + W > total number of votes in groupy W > half the total votesy Ensures that each read quorum intersects a writequorum, and two write quora will intersectz Each replica has a version number that is usedto detect if the replica is up to date.
  • Virtual Partitions schemez Combines available copies and quorumconsensusz Virtual partition = set of replica managers thathave a read and write quorumz If a virtual partition can be formed, availablecopies is usedy Improves performance of Readsz If a failure occurs, and virtual partition changesduring a transaction, it is abortedz Have to ensure virtual partitions do not overlap
  • Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F203Replica managersNetwork partitionVX Y ZTTransactionFigure 14.21 Two network partitions.
  • Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F204Figure 14.22 Virtual partition.X V Y ZReplica managersVirtual partition Network partition
  • Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F205Figure 14.23 Two overlapping virtual partitions.Virtual partition V1 Virtual partition V2Y X V Z
  • Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F206Figure 14.24 Creating a virtual partition.Phase 1:• The initiator sends a Join request to each potential member. The argument of Joinis a proposed logical timestamp for the new virtual partition;• When a replica manager receives a Join request it compares the proposed logicaltimestamp with that of its current virtual partition;– If the proposed logical timestamp is greater it agrees to join and replies Yes;– If it is less, it refuses to join and replies No;Phase 2:• If the initiator has received sufficient Yes replies to have Read and Write quora, itmay complete the creation of the new virtual partition by sending a Confirmationmessage to the sites that agreed to join. The creation timestamp and list of actualmembers are sent as arguments;• Replica managers receiving the Confirmation message join the new virtualpartition and record its creation timestamp and list of actual members.