Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distributed Database Management System

31,382 views

Published on

It covers all concepts in Distributed database management system.

Published in: Engineering
  • Login to see the comments

Distributed Database Management System

  1. 1. Distributed Databases and Client-Server Architectures
  2. 2. Outline 1 Distributed Database Concepts 2 Data Fragmentation, Replication and Allocation 3 Types of Distributed Database Systems 4 Query Processing 5 Concurrency Control and Recovery 6 3-Tier Client-Server Architecture
  3. 3. Distributed Database Concepts It is a system to process Unit of execution (a transaction) in a distributed manner. That is, a transaction can be executed by multiple networked computers in a unified manner. It can be defined as A distributed database (DDB) is a collection of multiple logically related database distributed over a computer network, and a distributed database management system as a software system that manages a distributed database while making the distribution transparent to the user.
  4. 4. Shared nothing architecture
  5. 5. Centralized database
  6. 6. Distributed database
  7. 7. Distributed Database System Advantages 1. Management of distributed data with different levels of transparency: This refers to the physical placement of data (files, relations, etc.) which is not known to the user (distribution transparency). Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3
  8. 8. Distributed Database System Advantages The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizontally and stored with possible replication as shown below. Communications neteork Atlanta San Francisco EMPLOYEES - All PROJECTS - All WORKS_ON - All Chicago (headquarters) New York EMPLOYEES - New York PROJECTS - All WORKS_ON - New York Employees EMPLOYEES - San Francisco and LA PROJECTS - San Francisco WORKS_ON - San Francisco Employees Los Angeles EMPLOYEES - LA PROJECTS - LA and San Francisco WORKS_ON - LA Employees EMPLOYEES - Atlanta PROJECTS - Atlanta WORKS_ON - Atlanta Employees
  9. 9. Distributed Database System Advantages • Distribution and Network transparency: Users do not have to worry about operational details of the network. There is Location transparency, which refers to freedom of issuing command from any location without affecting its working. Then there is Naming transparency, which allows access to any names object (files, relations, etc.) from any location. • Replication transparency: It allows to store copies of a data at multiple sites as shown in the above diagram. This is done to minimize access time to the required data. • Fragmentation transparency: Allows to fragment a relation horizontally (create a subset of tuples of a relation) or vertically (create a subset of columns of a relation).
  10. 10. Distributed Database System Advantages 2. Increased reliability and availability: Reliability refers to system live time, that is, system is running efficiently most of the time. Availability is the probability that the system is continuously available (usable or accessible) during a time interval. A distributed database system has multiple nodes (computers) and if one fails then others are available to do the job. 3. Improved performance: A distributed DBMS fragments the database to keep data closer to where it is needed most. This reduces data management (access and modification) time significantly. 4. Easier expansion (scalability): Allows new nodes (computers) to be added anytime without chaining the entire configuration.
  11. 11. Types of Distributed Database Systems Homogeneous All sites of the database system have identical setup, i.e., same database system software. The underlying operating system may be different. For example, all sites run Oracle or DB2, or Sybase or some other database system. The underlying operating systems can be a mixture of Linux, Window, Unix, etc. The clients thus have to use identical client software. Communications neteork Site 5 Site 1 Site 2Site 3 Oracle Oracle Oracle Oracle Site 4 Oracle LinuxLinux Window Window Unix
  12. 12. Types of Distributed Database Systems Heterogeneous Federated: Each site may run different database system but the data access is managed through a single conceptual schema. This implies that the degree of local autonomy is minimum. Each site must adhere to a centralized access policy. There may be a global schema. Multidatabase: There is no one conceptual global schema. For data access a schema is constructed dynamically as needed by the application software. Communications network Site 5 Site 1 Site 2Site 3 Network DBMS Relational Site 4 Object Oriented LinuxLinux Unix Hierarchical Object Oriented RelationalUnix Window
  13. 13. Types of Distributed Database Systems • Differences in data models: Relational, Objected oriented, hierarchical, network, etc. • Differences in constraints: Each site may have their own data accessing and processing constraints. • Differences in query language: Some site may use SQL, some may use SQL-89, some may use SQL-92, and so on. Federated Database Management Systems Issues
  14. 14. Data Fragmentation, Replication and Allocation Data Fragmentation Which site should be used to store which portion of database. Split a relation into logically related and correct parts. Simplest logical unit is relation. A relation can be fragmented in two ways:
  15. 15. Horizontal fragmentation • Divides a relation “horizontally” • It is a horizontal subset of a relation which contain those of tuples which satisfy selection conditions. • Consider the Employee relation with selection condition (DNO = 5). All tuples satisfy this condition will create a subset which will be a horizontal fragment of Employee relation. • A selection condition may be composed of several conditions connected by AND or OR. • Derived horizontal fragmentation: It is the partitioning of a primary relation to other secondary relations which are related with Foreign keys.
  16. 16. Vertical fragmentation • Divides a relation vertically by columns. • It is a subset of a relation which is created by a subset of columns. Thus a vertical fragment of a relation will contain values of selected columns. There is no selection condition used in vertical fragmentation. • Consider the Employee relation. Two vertical fragments 1)Name, Bdate, Sex, and Address. 2) ENO,salary,supereno,dno • Because there is no condition for creating a vertical fragment, each fragment must include the primary key attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected.
  17. 17. Data Fragmentation, Replication and Allocation Representation Horizontal fragmentation Each horizontal fragment on a relation can be specified by a σCi (R) operation in the relational algebra. Complete horizontal fragmentation A set of horizontal fragments whose conditions C1, C2, …, Cn include all the tuples in R- that is, every tuple in R satisfies (C1 OR C2 OR … OR Cn). Disjoint complete horizontal fragmentation: No tuple in R satisfies (Ci AND Cj) where i ≠ j. To reconstruct R from horizontal fragments a UNION is applied.
  18. 18. Data Fragmentation, Replication and Allocation Representation Vertical fragmentation A vertical fragment on a relation can be specified by a ΠLi(R) operation in the relational algebra. Complete vertical fragmentation A set of vertical fragments whose projection lists L1, L2, …, Ln include all the attributes in R but share only the primary key of R. In this case the projection lists satisfy the following two conditions: L1 ∪ L2 ∪ ... ∪ Ln = ATTRS (R) Li ∩ Lj = PK(R) for any i j, where ATTRS (R) is the set of attributes of R and PK(R) is the primary key of R. To reconstruct R from complete vertical fragments a OUTER JOIN is applied.
  19. 19. Data Fragmentation, Replication and Allocation Mixed (Hybrid) fragmentation A combination of Vertical fragmentation and Horizontal fragmentation. This is achieved by SELECT-PROJECT operations which is represented by ΠLi(σCi (R)). If C = True (Select all tuples) and L ≠ ATTRS(R), we get a vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get a mixed fragment. If C = True and L = ATTRS(R), then R can be considered a fragment.
  20. 20.  L1={ssn,name,bdate,address,sex}  L2={ssn,salary,super_ssn,dno} Salary > 5000 and dno=4 L1={name,address}  L2={ssn,name,salary} Data Fragmentation, Replication and Allocation
  21. 21. Data Fragmentation, Replication and Allocation Fragmentation schema A definition of a set of fragments (horizontal or vertical or horizontal and vertical) that includes all attributes and tuples in the database that satisfies the condition that the whole database can be reconstructed from the fragments by applying some sequence of OUTER UNION (or OUTER JOIN) and UNION operations. Allocation schema It describes the distribution of fragments to sites of distributed databases. It can be fully or partially replicated or can be partitioned.
  22. 22. Data Fragmentation, Replication and Allocation Data Replication Database is replicated to all sites. In full replication the entire database is replicated and in partial replication some selected part is replicated to some of the sites. Data replication is achieved through a replication schema. Full replication improves availability and hence performance for retrieval queries, but slows down update operation. No replication- nonredundant allocation
  23. 23. Data Distribution (Data Allocation) Each fragment-or copy of fragment- must be assigned to a particular site. This is relevant only in the case of partial replication or partition. The selected portion of the database is distributed to the database sites. The choice of sites and degree of replication depend on performance and availability goals Data Fragmentation, Replication and Allocation
  24. 24. Architectural models of distributed databases  The systems sharing multiple DBMS are characterized with respect to Autonomy, distribution, heterogeneity  Autonomy:- this refers to the distribution of control and not of data. The dimensions of autonomy can be – Design Autonomy – Communication Autonomy:- – Execution autonomy  Distribution:- this refers to the distribution of data – Client server – Peer to peer  Heterogeneity:- this ranges from hardware heterogeneity, networking protocols, data models, query languages and transaction management protocols
  25. 25. Distributed database architecture Client server Peer-to-peer
  26. 26. Types of client-server architecture There are 2 types of client server architecture Multi-client and single server Multi-client and multi-server: – Client manages its own connections to appropriate server. This architecture simplifies the server code, but loads client machines with additional responsibilities. This is heavy client – Client knows only of the home servers which then communicates with other servers as required. Loads the server more and is called as light client
  27. 27. Peer-to-peer database architecture
  28. 28. Global conceptual schema:- Gives the enterprise view of data. Union of all local conceptual schemas Fragmentation and Allocation schema:-takes care of fragmentation and allocation of data Local conceptual schema:-logical organization of data at each site External schemas:- user access and user applications Peer-to-peer database architecture
  29. 29. Client-server architecture
  30. 30. Components of DDBMS
  31. 31. Components of DDBMS Contd… User interface handler: interpreting user commands as they come in, and formatting the result data as it is sent to the user. Semantic data controller: uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed. Global query optimizer and decomposer determine an execution strategy to minimize a cost function and translate the global queries into local ones using global and local conceptual schemas as well as the global directory. The global execution monitor coordinates the distributed execution of the user request. It communicates with other execution monitors
  32. 32. The data processor consists of the Local query processor: this acts as the access path selector which is responsible for choosing the best access path to access any data item Local recovery manager :this is responsible for making sure that the local database remains consistent even when failures occur Runtime support processor: interface to the os and contains the database buffer manager, which is responsible for maintaining the main memory buffers and managing the data accesses. Components of DDBMS Contd…
  33. 33. Query Processing and Optimization in Distributed Databases • Distributed Query Processing • Data Transfer Costs of Distributed Query Processing • Distributed Query Processing Using Semijoin • Query and Update Decomposition
  34. 34. Distributed Query Processing 1. Query Mapping 2. Localization 3. Global Query Optimization 4. Local Query Optimization
  35. 35. Query Processing in Distributed Databases
  36. 36. Query Processing in Distributed Databases Result Stretagies: 1. Transfer Employee and Department to site 3. Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. The result of this query will have 10,000 tuples, assuming that every employee is related to a department. Suppose each result tuple is 40 bytes long. The query is submitted at site 3 and the result is sent to this site. Problem: Employee and Department relations are not present at site 3.
  37. 37. Query Processing in Distributed Databases Stretagies: 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3. Total bytes transferred = 400,000 + 3500 = 403,500 bytes. Optimization criteria: minimizing data transfer. Preferred approach: strategy 3. Consider the query Q’: For each department, retrieve the department name and the name of the department manager Relational Algebra expression: ΠFname,Lname,Dname (Employee Mgrssn = SSN Department)
  38. 38. Query Processing in Distributed Databases The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are: Strategies: 1. Transfer Employee and Department to the result site and perorm the join at site 3. Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes. Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes. 3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3. Total transfer size = 4000 + 3500 = 7500 bytes.
  39. 39. Query Processing in Distributed Databases Preferred strategy: Chose strategy 3. Now suppose the result site is 2. Possible strategies: Possible strategies : 1. Transfer Employee relation to site 2, execute the query and present the result to the user at site 2. Total transfer size = 1,000,000 bytes for both queries Q and Q’. 2. Transfer Department relation to site 1, execute join at site 1 and send the result back to site 2. Total transfer size for Q = 400,000 + 3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.
  40. 40. Query Processing in Distributed Databases Semijoin: Objective is to reduce the number of tuples in a relation before transferring it to another site. Example execution of Q or Q’: 1. Project the join attributes of Department at site 2, and transfer them to site 1. For Q, 4 * 100 = 400 bytes are transferred and for Q’, 9 * 100 = 900 bytes are transferred. 2. Join the transferred file with the Employee relation at site 1, and transfer the required attributes from the resulting file to site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred and for Q’, 39 * 100 = 3900 bytes are transferred. 3. Execute the query by joining the transferred file with Department and present the result to the user at site 2.
  41. 41. Availability. Increased parallelism Increased overhead on update Replication
  42. 42. Replication Synchronous All copies of a modified relation must be updated before the modifying transaction commits. Before an update transaction commits, it synchronizes all copies of modified data. Asynchronous Copies of a modified relation are updated only periodically and a transaction that reads different copies of the same relation may see different values. compromises distributed data independence can be more efficiently implemented
  43. 43. Synchronous Replication There are two basic techniques for ensuring that transactions see the same value regardless of which copy of an object they access. 1) Voting - a transaction must write a majority of copies in order to modify an object and read at least enough copies to make sure that one of the copies is current. Eg. If there are 10 copies and 7 copies are written by update transactions, then at least 4 copies must be read.  Each copy has a version number, and the copy with the highest version number is current. This technique is not attractive in most situations because reading an object requires reading multiple copies; in most applications, objects are read much more frequently than they are updated, and efficient performance on reads is very important.
  44. 44. Synchronous Replication 2) Read-any write-all To read an object, a transaction can read any one copy, but to write an object, it must write all copies. Reads are fast, especially if we have a local copy, but writes are slower, relative to the first technique. This technique is attractive when reads are much more frequent than writes, and it is usually adopted for implementing synchronous replication.
  45. 45. Synchronous Replication Synchronous replication comes at a significant cost. 1. Before an update transaction can commit, it must obtain exclusive locks on all copies—assuming that the read-any write-all technique is used—of modified data. 2. The transaction may have to send lock requests to remote sites, and wait for the locks to be granted, and during this potentially long period, it continues to hold all its other locks. 3. If sites or communication links fail, the transaction cannot commit until all sites at which it has modified data recover and are reachable. 4. Finally, even if locks are obtained readily and there are no failures, committing a transaction requires several additional messages to be sent as part of a commit protocol
  46. 46. Primary Site versus Peer-to-Peer Replication In primary site asynchronous replication, one copy of a relation is designated as the primary or master copy. Replicas of the entire relation or of fragments of the relation can be created at other sites; these are secondary copies, and, unlike the primary copy, they cannot be updated. A common mechanism for setting up primary and secondary copies is that users first register or publish the relation at the primary site and subsequently subscribe to a fragment of a registered relation from another (secondary) site.
  47. 47.  In peer-to-peer asynchronous replication, more than one copy can be designated as being updatable, that is, a master copy.  In addition to propagating changes, a conflict resolution strategy must be used to deal with conflicting changes made at different sites.  For example, Joe’s age may be changed to 35 at one site and to 38 at another. Which value is ‘correct’?  Many more subtle kinds of conflicts can arise and it leads to ad hoc conflict resolution.  In some situations which it does not lead to conflicts peer-to- peer replication is best utilized. Primary Site versus Peer-to-Peer Replication
  48. 48. Naming Transparency Data items—such as relations, fragments, and replicas— must have unique names. This property is easy to ensure in a centralized database. In a distributed database, however, we must take care to ensure that two sites do not use the same name for distinct data items.
  49. 49. Naming Transparency 1) Name server. The name server helps to ensure that the same name does not get used for different data items.  We can also use the name server to locate a data item, given the name of the item. Disadvantages. 1. The name server may become a performance bottleneck resulting in poor performance. 2. If the name server crashes, it may not be possible for any site in the distributed system to continue to run..
  50. 50. 2) Site Prefix Each site prefix its own site identifier to any name that it generates. This approach ensures that no two sites generate the same name. No central control is required. Fails to achieve location transparency, since site identifiers are attached to names. The account relation might be referred to as site17.account, or account@site17, rather than as simply account. Many database systems use the internet address of a site to identify it. Naming Transparency
  51. 51. Solution: The database system can create a set of alternative names or aliases for data items. A user may thus refer to data items by simple names that are translated by the system to complete names. The mapping of aliases to the real names can be stored at each site. With aliases, the user can be unaware of the physical location of a data item. Furthermore, the user will be unaffected if the database administrator decides to move a data item from one site to another. Users should not have to refer to a specific replica of a data item. Instead, the system should determine which replica to reference on a read request, and should update all replicas on a write request. This can be ensured by maintaining a catalog table, which the system uses to determine all replicas for the data item.
  52. 52. Concurrency Control and Recovery Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below.  Dealing with multiple copies of data items: The concurrency control must maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery.  Failure of individual sites: Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use.
  53. 53. Concurrency Control and Recovery  Communication link failure: This failure may create network partition which would affect database availability even though all database sites may be running.  Distributed commit: A transaction may be fragmented and they may be executed by a number of sites. This require a two or three-phase commit approach for transaction commit.  Distributed deadlock: Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner.
  54. 54. Concurrency Control and Recovery Distributed Concurrency control based on a distributed copy of a data item Primary site technique: A single site is designated as a primary site which serves as a coordinator for transaction management. Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3 Primary site
  55. 55. Concurrency Control and Recovery 1. Transaction management: Concurrency control and commit are managed by this site. In two phase locking, this site manages locking and releasing data items. If all transactions follow two-phase policy at all sites, then serializability is guaranteed. 2. Advantages: An extension to the centralized two phase locking; so implementation and management is simple. Data items are locked only at one site but they can be accessed at any site. Deadlock can be detected easily by applying a directed graph directly. 3. Disadvantages: All transaction management activities go to primary site which is likely to overload the site. If the primary site fails, the entire system is inaccessible. 4. To aid recovery a backup site is designated which behaves as a shadow of primary site.
  56. 56. Primary site with Backup Site: In case of primary site failure, backup site can act as primary site. All locking information is maintained at both primary site and backup site 1. Advantages: Simplifies process of recovery 2. Disadvantages: Slows down process of acquiring locks. Original problem not solved! Concurrency Control and Recovery
  57. 57. Concurrency Control and Recovery Primary Copy Technique: In this approach, instead of a site, a data item partition is designated as primary copy. To lock a data item just the primary copy of the data item is locked. 1. Advantages: Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests. 2. Disadvantages: Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.
  58. 58. Concurrency Control and Recovery Recovery from a coordinator failure In both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator. Primary site approach with no backup site: Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing. Primary site approach with backup site: Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing. Primary and backup sites fail or no backup site: Use election process to select a new coordinator site.
  59. 59. Concurrency Control and Recovery Concurrency control based on voting (Majority Protocol There is no primary copy of coordinator.  Send lock request to sites that have data item.  If majority of sites grant lock then the requesting transaction gets the data item.  Locking information (grant or denied) is sent to all these sites.  To avoid unacceptably long wait, a time-out period is defined.  If the requesting transaction does not get any vote information then the transaction is aborted.
  60. 60. Biased Protocol The difference from the majority protocol is that requests for shared locks are given more favorable treatment than requests for exclusive locks. Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site that contains a replica of Q. Exclusive locks. When a transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites that contain a replica of Q. Advantage- less overhead on read operations The additional overhead on writes is a disadvantage complexity in handling deadlock. Concurrency Control and Recovery
  61. 61. Quorum Consensus Protocol It is a generalization of the majority protocol. It assigns each site a nonnegative weight. It assigns read and write operations on an item x two integers, called read quorum Qr and write quorum Qw, S is the total weight of all sites at which x resides: Qr + Qw > S and 2 Qw > S∗ To execute a read operation, enough replicas must be read that their total weight is ≥ Qr. To execute a write operation, enough replicas must be written so that their total weight is ≥ Qw It can permit the cost of either reads or writes to be selectively reduced by appropriately defining the read and write quorums. Concurrency Control and Recovery
  62. 62. Timestamping Each transaction is given a unique timestamp that the system uses in deciding the serialization order. Two primary methods for generating unique timestamps, 1. Centralized scheme- a single site distributes the timestamps. The site can use a logical counter or its own local clock for this purpose. 2. Distributed scheme- each site generates a unique local timestamp by using either a logical counter or the local clock. unique global timestamp = unique local timestamp + site identifier, which also must be unique Concurrency Control and Recovery
  63. 63. System Structure Each site has its own local transaction manager, whose function is to ensure the ACID properties of those transactions that execute at that site. The various transaction managers cooperate to execute global transactions. Each site contains two subsystems: 1) The transaction manager manages the execution of those transactions (or subtransactions) that access data stored in a local site. Each such transaction may be either a local transaction or part of a global transaction 2) The transaction coordinator coordinates the execution of the various transactions (both local and global) initiated at that site.
  64. 64. Each transaction manager is responsible for Maintaining a log for recovery purposes Participating in an appropriate concurrency-control scheme to coordinate the concurrent execution of the transactions executing at that site A transaction coordinator, is responsible for coordinating the execution of all the transactions initiated at that site. For each such transaction, the coordinator is responsible for • Starting the execution of the transaction • Breaking the transaction into a number of subtransactions and distributing these subtransactions to the appropriate sites for execution • Coordinating the termination of the transaction, which may result in the transaction being committed at all sites or aborted at all sites.
  65. 65. Commit Protocols If we are to ensure atomicity, all the sites in which a transaction T executed must agree on the final outcome of the execution. T must either commit at all sites, or it must abort at all sites. To ensure this property, the transaction coordinator of T must execute a commit protocol. Two-Phase Commit Consider a transaction T initiated at site Si, where the transaction coordinator is Ci. When T completes its execution—that is, when all the sites at which T has executed inform Ci that T has completed— Ci starts the 2PC protocol.
  66. 66. Phase 1. Ci adds the record <prepare T> to the log, and forces the log onto stable storage. It then sends a prepare T message to all sites at which T executed. On receiving such a message, the transaction manager at that site determines whether it is willing to commit its portion of T. If the answer is no, it adds a record <no T> to the log, and then responds by sending an abort T message to Ci. If the answer is yes, it adds a record <ready T> to the log, and forces the log onto stable storage. The transaction manager then replies with a ready T message to Ci. 2 Phase Commit (2PC)
  67. 67. 2 Phase Commit (2PC) Phase 2. When Ci receives responses to the prepare T message from all the sites, or when a prespecified interval of time has elapsed since the prepare T message was sent out, Ci can determine whether the transaction T can be committed or aborted. Transaction T can be committed if Ci received a ready T message from all the participating sites. Otherwise, transaction T must be aborted. Depending on the verdict, either a record <commit T> or a record <abort T> is added to the log and the log is forced onto stable storage. At this point, the fate of the transaction has been sealed. Following this point, the coordinator sends either a commit T or an abort T message to all participating sites. When a site receives that message, it records the message in the log.
  68. 68. A site at which T executed can unconditionally abort T at any time before it sends the message ready T to the coordinator. Once the message is sent, the transaction is said to be in the ready state at the site. The ready T message is, in effect, a promise by a site to follow the coordinator’s order to commit T or to abort T. To make such a promise, the needed information must first be stored in stable storage. Otherwise, if the site crashes after sending ready T, it may be unable to make good on its promise. Further, locks acquired by the transaction must continue to be held till the transaction completes. 2 Phase Commit (2PC)
  69. 69. The fate of T is sealed as soon as at least one site responds abort T. Since the coordinator site Si is one of the sites at which T executed, the coordinator can decide unilaterally to abort T. The final verdict regarding T is determined at the time that the coordinator writes that verdict (commit or abort) to the log and forces that verdict to stable storage. In some implementations of the 2PC protocol, a site sends an acknowledge T message to the coordinator at the end of the second phase of the protocol. When the coordinator receives the acknowledge T message from all the sites, it adds the record <complete T> to the log. 2 Phase Commit (2PC)
  70. 70. Handling of Failures Failure of a participating site. If the site fails before responding with a ready T message to Ci, the coordinator assumes that it responded with an abort T message. If the site fails after the coordinator has received the ready T message from the site, the coordinator executes the rest of the commit protocol in the normal fashion, ignoring the failure of the site.
  71. 71. Failure of a participating site. When a participating site Sk recovers from a failure, it must examine its log to determine the fate of those transactions that were in the midst of execution when the failure occurred. Let T be one such transaction. The log contains a <commit T> record. In this case, the site executes redo(T). The log contains an <abort T> record. In this case, the site executes undo(T). The log contains no control records (abort, commit, ready) concerning T. Sk failed before responding to the prepare T message from Ci. Ci must abort T. Hence, Sk must execute undo(T). Handling of Failures
  72. 72. The log contains a <ready T> record. The site must consult Ci to determine the fate of T. If Ci is up, it notifies Sk regarding whether T committed or aborted. So it executes either redo(T) or undo(T). If Ci is down, Sk must try to find the fate of T from other sites. It sends a querystatus T message to all the sites in the system. On receiving such a message, a site must consult its log to determine whether T has executed there, and if T has, whether T committed or aborted. It then notifies Sk about this outcome. If no site has the appropriate information then Sk can neither abort nor commit T. The decision concerning T is postponed until Sk can obtain the needed information. Thus, Sk must periodically resend the querystatus message to the other sites.
  73. 73. Failure of the coordinator  If the coordinator fails in the midst of the execution of the commit protocol for transaction T, then the participating sites must decide the fate of T. In certain cases, the participating sites cannot decide whether to commit or abort T, and therefore these sites must wait for the recovery of the failed coordinator. Handling of Failures
  74. 74. Failure of the coordinator If an active site contains a <commit T> record in its log, then T must be committed. If an active site contains an <abort T> record in its log, then T must be aborted. If some active site does not contain a <ready T> record in its log, then the failed coordinator Ci cannot have decided to commit T. However, the coordinator may have decided to abort T, but not to commit T. Rather than wait for Ci to recover, it is preferable to abort T. Handling of Failures
  75. 75. If none of the preceding cases holds, then all active sites must have a <ready T> record in their logs, but no additional control records (such as <abort T> or <commit T>). Since the coordinator has failed, it is impossible to determine whether a decision has been made, and if one has, what that decision is, until the coordinator recovers. Thus, the active sites must wait for Ci to recover. Since the fate of T remains in doubt, T may continue to hold system resources- blocking problem Handling of Failures
  76. 76. Network partition. 1. The coordinator and all its participants remain in one partition. In this case, the failure has no effect on the commit protocol. 2. The coordinator and its participants belong to several partitions. From the viewpoint of the sites in one of the partitions, it appears that the sites in other partitions have failed. Sites that are not in the coordinator partition- failure of the coordinator. The coordinator and the sites in the partition - follow the usual commit protocol, assuming that the sites in the other partitions have failed. Handling of Failures
  77. 77. Recovery and Concurrency Control The recovery procedure must treat in-doubt transactions specially. The recovering site must determine the commit–abort status of such transactions by contacting other sites. Normal transaction processing at the site cannot begin until all in-doubt transactions have been committed or rolled back. Finding the status of in-doubt transactions can be slow. Further, recovery potentially could become blocked if 2PC is used. As a result, the site performing restart recovery may remain unusable for a long period.
  78. 78. To circumvent this problem, recovery algorithms typically provide support for noting lock information in the log. Instead of writing a <ready T> log record, the algorithm writes a <ready T, L> log record At recovery time, after performing local recovery actions, for every in-doubt transaction T, all the write locks noted in the <ready T,L> log record are reacquired. After this transaction processing can start at the site, even before the commit–abort status of the in-doubt transactions is determined. The commit or rollback of in-doubt transactions proceeds concurrently with the execution of new transactions. Thus, site recovery is faster, and never gets blocked. Recovery and Concurrency Control
  79. 79. Three-Phase Commit The three-phase commit (3PC) protocol is an extension of the two-phase commit protocol that avoids the blocking problem under certain assumptions. It is assumed that no network partition occurs, and not more than k sites fail, where k is some predetermined number. The protocol avoids blocking by introducing an extra third phase where multiple sites are involved in the decision to commit.  Instead of directly noting the commit decision in its persistent storage, the coordinator first ensures that at least k other sites know that it intended to commit the transaction. If the coordinator fails, the remaining sites first select a new coordinator.
  80. 80. This new coordinator checks the status of the protocol from the remaining sites; if the coordinator had decided to commit, at least one of the other k sites that it informed will be up and will ensure that the commit decision is respected. The new coordinator restarts the third phase of the protocol if some site knew that the old coordinator intended to commit the transaction. Otherwise the new coordinator aborts the transaction. Three-Phase Commit
  81. 81. While the 3PC protocol has the desirable property of not blocking unless k sites fail, it has the drawback that a partitioning of the network will appear to be the same as more than k sites failing, which would lead to blocking. The protocol also has to be carefully implemented to ensure that network partitioning does not result in inconsistencies, where a transaction is committed in one partition, and aborted in another. Because of its overhead, 3PC protocol is not widely used. Three-Phase Commit
  82. 82. Client-Server Database Architecture It consists of clients running client software, a set of servers which provide all database functionalities and a reliable communication infrastructure. Client 1 Client 3 Client 2 Client n Server 1 Server 2 Server n
  83. 83. Client-Server Database Architecture Clients reach server for desired service, but server does reach clients.  The server software is responsible for local data management at a site, much like centralized DBMS software.  The client software is responsible for most of the distribution function.  The communication software manages communication among clients and servers.
  84. 84. Client-Server Database Architecture The processing of a SQL queries goes as follows:  Client parses a user query and decomposes it into a number of independent sub-queries. Each subquery is sent to appropriate site for execution.  Each server processes its query and sends the result to the client.  The client combines the results of subqueries and produces the final result.

×