Your SlideShare is downloading. ×
2011 Db Distributed
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

2011 Db Distributed


Published on

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide

  • Transcript

    • 1. Distributed Transactions Alan Medlar
    • 2. Motivation • Distributed Database • collection of sites, each with own database • each site processes local transactions • local transactions can only access local database • Distributed transactions require co- ordination among sites
    • 3. Advantages • Distributed databases can improve availability (especially if we are using database replication) • Parallel processing of sub-transactions at individual sites instead of all locally improves performance
    • 4. Disadvantages • Cost: hardware, software dev, network (leased lines?) • Operational Overhead: network traffic, co- ordination overhead • Technical: harder to debug, security, greater complexity • ACID properties harder to achieve
    • 5. Main Issues • Transparency: database provides abstraction layer above data access, distributed databases should be accessed in the same way • Distributed Transactions: local transactions are only processed at one site, global transactions need to preserve ACID across multiple sites and provide distributed query processing (eg: distributed join) • Atomicity: all sites in a global transactions must commit or none do • Consistency: all schedules must be conflict serializable (last lecture!)
    • 6. Failures • Site failures: exactly the same as for local databases (hardware failure, out of memory etc) • Networking failures • Failure of a network link: no hope of communicating with other database site • Loss of messages: network link might be fine, but congested, packet loss, TCP timeouts • Network partition: more relevant to replication, set of replicas might be divided in two, updating only replicas in their partition
    • 7. Fragmentation • Divide a relation into sections which can be allocated to different sites to optimise (reduce processing time, network traffic overhead) transaction processing • Horizontal and vertical fragmentation
    • 8. Branch Account no Customer Balance Euston 1234 Alice 200 Euston 2345 Bob 100 Euston 3456 Eve 5 Harrow 4567 Richard 550 Harrow 5678 Jane 75 Harrow 6789 Graham 175
    • 9. Branch Account no Customer Balance Euston 1234 Alice 200 Euston 2345 Bob 100 Euston 3456 Eve 5 Horizontal Fragmentation (in this case taking advantage of usage locality) Branch Account no Customer Balance Harrow 4567 Richard 550 Harrow 5678 Jane 75 Harrow 6789 Graham 175
    • 10. Branch Account no Customer Balance Euston 1234 Alice 200 Euston 2345 Bob 100 Euston 3456 Eve 5 Harrow 4567 Richard 550 Harrow 5678 Jane 75 Harrow 6789 Graham 175
    • 11. Branch Customer Id Id Account no Balance Euston Alice 0 0 1234 200 Euston Bob 1 1 2345 100 Euston Eve 2 2 3456 5 Harrow Richard 3 3 4567 550 Harrow Jane 4 4 5678 75 Harrow Graham 5 5 6789 175 Vertical Fragmentation Additional Id-tuple allows for a join to recreate the original relation
    • 12. Problem • Now our data is split into fragments and each fragment is at a separate site • How do we access these sites using transactions, whilst maintaining the ACID properties?
    • 13. 2-Phase Commit • Distributed algorithm that permits all nodes in a distributed system to agree to commit a transaction, the protocol results in all sites committing or aborting • Completes despite network or node failures • Necessary to provide atomicity
    • 14. 2-Phase Commit • Voting Phase: each site is polled as to whether a transactions should commit (ie: whether their sub-transaction can commit) • Decision Phase: if any site says “abort” or does not reply, then all sites must be told to abort • Logging is performed for failure recovery (as usual)
    • 15. client
    • 16. client TC
    • 17. client TC A B
    • 18. client start TC A B
    • 19. client start TC prepare A B
    • 20. client start TC prepare prepare A B
    • 21. client start TC prepare prepare ready A B
    • 22. client start TC ready prepare prepare ready A B
    • 23. client start TC commit commit ready prepare prepare ready A B
    • 24. client OK start TC commit commit ready prepare prepare ready A B
    • 25. Voting Phase • TC (transaction co-ordinator) writes <prepare Ti> to log • TC sends prepare message to all sites (A,B) • Site’s local DBMS decides whether to commit its part of the transaction or abort. If commit write <ready Ti> else <no Ti> to log • Ready or abort message sent back to TC
    • 26. Decision Phase • After receiving all results from prepare messages (or after a timeout) TC can decision whether the entire transaction should commit • If any site replies “abort” or timed out, TC aborts the entire transaction by logging <abort Ti> and then sending the “abort” message to all sites • If all sites replies with “ready”, TC commits by logging <commit Ti> and sending commit message to all sites • Upon receipt of a commit message, each site logs <commit Ti> and only then alters the database in memory
    • 27. Failure Example 1 • One of the database sites (A,B) fails • On recovery the log is examined: • if log contains <commit Ti>, redo the changes of the transaction • if the log contains <abort Ti>, undo the changes • if the log contains <ready Ti>, but not a commit, contact TC for the outcome of transaction Ti, if TC is down, then other sites • if log does not contain ready, commit or abort then the failure must have occurred before the receipt of “prepare Ti”, so TC would have aborted the transaction
    • 28. Failure Example 2 • One of the transaction coordinator (TC) fails (sites A or B waiting for commit/abort message) • Each database site log is examined: • if any site log contains <commit Ti> Ti must be committed at all sites • if any site log contains <abort Ti> or <no Ti> Ti must be aborted at all sites • if any site log does not contain <ready Ti>, TC must have failed before decision to commit • if none of the above apply then all active sites must have <ready Ti> (but no additional commits or aborts), TC must be consulted (when it comes back online)
    • 29. Network Faults • Failure of the network • From the perspective of entities on one side of the network failure, entities on the other side have failed (apply previous strategies)
    • 30. Locking (non-replicated system) • Each local site has a lock manager • administers lock requests for data items stored at site • when a transactions requires a data item to be locked, it requests a lock from the lock manager • lock manager blocks until lock can be held • Problem: deadlocks in a distributed system, clearly more complicated to resolve...
    • 31. Locking (single co-ordinator) • Have a single lock manager for the whole distributed database • manages locks at all sites • locks for reading of any replica • locks for writing of all replicas • Simpler deadlock handling • Single point of failure • Bottleneck?
    • 32. Locking (replicated system) • Majority protocol where each local site has a lock manager • Transactions wants a lock on a data item that is replicated at n sites • must get a lock for that data item at more than n/2 sites • transaction cannot operate until it has locks on more than half of the replica sites (only one transaction can do this at a time) • if replicas are written to all replicas must be updated...
    • 33. Updating Replicas • Replication makes reading more reliable (probability p that a replica is unavailable, the probability that all n replicas are unavailable is pn) • Replication makes writing less reliable (the probability of all n replicas being available to be updated with a write has a probability (1-p)n) • Writing must succeed even if not all replicas are available...
    • 34. Updating Replicas (2) • Majority update protocol! • Update more than half of the replicas (the rest have “failed”, can be updated later), but this time add a timestamp or version number • To read a data item, read more than half of the replicas and use the one with the most recent timestamp • Write more reliable, reading more complex!
    • 35. ~ Fin ~ (Graphics lectures begin on Monday 9th March)