One copy serializabilityz Replicated transactional servicey Each replica manager provides concurrency controland recovery of its own data items in the same wayas it would for non-replicated dataz Effects of transactions performed by variousclients on replicated data items are the same asif they had been performed one at a time on asingle data itemz Additional complications: failures, networkpartitionsy Failures should be serialized wrt transactions, i.e. anyfailure observed by a transaction must appear tohave happened before a transaction started
Replication Schemesz Read one –Write AllyCannot handle network partitionsz Schemes that can handle networkpartitionsyAvailable copies with validationyQuorum consensusyVirtual Partition
Read one/Write Allz One copy serializabilityy Each write operation sets a write lock at each replicamanagery Each read sets a read lock at one replica managerz Two phase commity Two-level nested transactionxCoordinator -> WorkersxIf either coordinator or worker is a replica manager, it has tocommunicate with replica managersz Primary copy replicationy ALL client requests are directed to a single primaryserverxDifferent from scheme discussed earlier
Available copies replicationz Can handle some replica managers areunavailable because they have failed orcommunication failurez Reads can be performed by any available replicamanager but writes must be performed by allavailable replica managersz Normal case is like read one/write ally As long as the set of available replica managers doesnot change during a transaction
Available copies replicationz Failure casey One copy serializabilty requires that failures andrecovery be serialized wrt transactionsy This is not achieved when different transactionsmake conflicting failure observationsy Example shows local concurrency control not enoughy Additional concurrency control procedure (called localvalidation) has to be performed to ensure correctnessz Available copies with local validation assumes nonetwork partition - i.e. functioning replicamanagers can communicate with one another
Local validation - examplez Assume X fails just after T has performedGetBalance and N fails just after U hasperformed GetBalancez Assume X and N fail before T & U haveperformed their Deposit operationsy T’s Deposit will be performed at M & P while U’sDeposit will be performed at Yy Concurrency control on A at X does not prevent Ufrom updating A at Y; similarly concurrency controlon B at N does not prevent Y from updating B at M &Py Local concurrency control not enough!
Local validation cont’dz T has read from an item at X, so X’sfailure must be after T.z T observes the failure of N, so N’s failuremust be before TyN fails -> T reads A at X; T writes B at M & P-> T commits -> X failsySimilarly, we can argue:X fails -> U reads B at N; U writes A at Y ->U commits -> N fails
Local validation cont’dz Local validation ensures such incompatiblesequences cannot both occurz Before a transaction commits it checks forfailures (and recoveries) of replica managers ofdata items it has accessedz In example, if T validates before U, T wouldcheck that N is still unavailable and X,M, P areavailable. If so, it can commitz U’s validation would fail because N has alreadyfailed.
Handling Network Partitionsz Network partitions separate replica managersinto two or more subgroups, in such a way thatthe members of a subgroup can communicatewith one another but members of differentsubgroups cannot communicatez Optimistic approachesy Available copies with validationz Pessimistic approachesy Quorum consensus
Available Copies WithValidationz Available copies algorithm applied within eachpartitiony Maintains availability for Read operationsz When partition is repaired, possibly conflictingtransactions in separate partitions are validatedy The effects of a committed transaction that is nowaborted on validation will have to be undonexOnly feasible for applications where such compensatingactions can be taken
Available copies withvalidation cont’dz Validationy Version vectors (Write-Write conflicts)y Precedence graphs (each partition maintains a log ofdata items affected by the Read and Write operationsof transactionsy Log used to construct precedence graph whosenodes are transactions and whose edges representconflicts between Read and Write operationsxNo cycles in graph corresponding to each partitiony If there are cycles in graph, validation fails
Quorum consensusz A quorum is a subgroup of replica managerswhose size gives it the right to carry outoperationsz Majority voting one instance of a quorumconsensus schemey R + W > total number of votes in groupy W > half the total votesy Ensures that each read quorum intersects a writequorum, and two write quora will intersectz Each replica has a version number that is usedto detect if the replica is up to date.
Virtual Partitions schemez Combines available copies and quorumconsensusz Virtual partition = set of replica managers thathave a read and write quorumz If a virtual partition can be formed, availablecopies is usedy Improves performance of Readsz If a failure occurs, and virtual partition changesduring a transaction, it is abortedz Have to ensure virtual partitions do not overlap