Distributed Database
Architecture
https://rizveeredwan.github.io
1
Homogeneous vs. Heterogeneous Databases
Homogeneous Distributed Databases
● All sites use identical Database Management System (DBMS) software.
● Sites are aware of each other and cooperate to process user requests.
● Local sites cede some autonomy (e.g., changing schemas or DBMS software).
● DBMS software cooperates across sites for transaction processing.
Heterogeneous Distributed Databases
● Different sites may use different schemas and/or DBMS software.
● Sites may not be aware of one another.
● Offer limited cooperation for transaction processing.
● Challenges:
○ Schema differences complicate query processing.
○ Software divergence hinders multi-site transaction processing.
Schema, DBMS Software
2
3
Data Replication
Definition
● A copy of relation r is stored in two or more sites.
● Full Replication: A copy is stored at every site in the system.
Advantages
1. Availability: If a site containing r fails, r can still be found at another site, allowing queries to continue.
2. Increased Parallelism: For read-heavy operations, multiple sites can process queries involving r in parallel. More
replicas increase the chance of finding data locally, minimizing data movement.
Disadvantages
3. Increased Overhead on Update: All replicas of r must be kept consistent. Any update to r must be propagated to all sites
containing replicas, leading to higher overhead.
Summary
● Replication improves read performance and data availability for read-only transactions.
● Update transactions incur higher overhead and more complex concurrency control.
● Primary Copy Scheme: A common simplification where one replica is designated as the primary copy.
4
Data Fragmentation
Definition
● Relation r is divided into fragments r1, r2, ..., rn.
● Fragments must contain sufficient information to reconstruct the original relation r.
Types of Fragmentation
1. Horizontal Fragmentation
● Splits by Tuple: Assigns each tuple of r to one or more fragments.
● Purpose: Usually used to keep tuples at sites where they are most frequently used, minimizing data
transfer.
● Definition: A fragment ri is defined by a selection predicate Pi on the global relation r:
○ ri = σPi (r)
● Reconstruction: The original relation r is reconstructed by taking the union of all fragments:
○ r = r1 r
∪ 2 ... r
∪ ∪ n
● Example: account relation fragmented by branch_name (e.g., account_Hillside, account_Valleyview).
5
2. Vertical Fragmentation
● Splits by Attribute: Decomposes the schema R of relation r into subsets of attributes R1, R2, ..., Rn.
● Requirement: R = R1 R
∪ 2 ... R
∪ ∪ n
● Definition: Each fragment ri of r is defined by projecting r onto the attribute subset Ri:
○ ri = ΠRi (r)
● Reconstruction: The original relation r is reconstructed by taking the natural join of all fragments:
○ r = r1 r
⋈ 2 ... r
⋈ ⋈ n
● Ensuring Reconstruction: Include primary-key attributes (or a superkey/tuple-id) in each Ri.
● Example: employee_info fragmented into employee_private_info (employee_id, salary) and
employee_public_info (employee_id, name, designation).
Horizontal Fragmentation Vertical Fragmentation
6
Transparency in Distributed Databases
Concept of Data Transparency
● Users should not need to know the physical location or access methods of data.
● Aims to simplify user interaction and application development.
Forms of Data Transparency
1. Fragmentation Transparency:
○ Users are not required to know how a relation has been fragmented.
○ The system handles the underlying partitioning.
2. Replication Transparency:
○ Users perceive each data object as logically unique.
○ The system manages data replication for performance or availability, without user involvement.
○ Users are unaware of which objects are replicated or where replicas are placed.
3. Location Transparency:
○ Users do not need to know the physical location of the data.
○ The distributed database system can find any data item given its identifier.
7
Naming in Distributed Databases
● Challenge: Ensuring unique names for data items across multiple sites.
● Solution 1: Central Name Server:
○ All names registered centrally.
○ Pros: Ensures unique names, helps locate data.
○ Cons: Performance bottleneck, single point of failure if server crashes.
● Solution 2: Site Identifier Prefixing:
○ Each site prefixes its unique identifier to names it generates (e.g., site17.account).
○ Pros: No central control needed, ensures uniqueness.
○ Cons: Fails to achieve location transparency (users see site IDs).
● Solution 3: Aliases (Preferred):
○ The database system creates alternative, simple names (aliases) for data items.
○ These aliases are translated by the system to complete names (e.g., account instead of site17.account).
○ Mapping of aliases to real names stored at each site. [Locally stored in the distributed system]
○ Benefits: Achieves location transparency, users unaffected by data movement.
Managing Replicas with Transparency
● Users should not specify a particular replica for read requests.
● The system determines which replica to use for reads.
● On write requests, the system automatically updates all replicas.
● Mechanism: A catalog table is maintained by the system to track all replicas for each data item.
8
Distributed Query Processing
Goal
● Minimize the time taken to compute the answer to a query.
Centralized Systems: Primary Cost Criterion
● Number of disk accesses.
Distributed Systems: Additional Considerations
In a distributed system, query processing strategies must also account for:
● Cost of Data Transmission: The overhead of transferring data over the network.
● Parallel Processing Gain: The potential performance improvement from having multiple sites process parts of the
query concurrently.
Optimization Challenge
● The relative costs of network data transfer and disk I/O vary significantly.
● Query optimization must find a good trade-off between these two cost factors, rather than focusing solely on one.
9
Query Transformation in Distributed Databases
Simple Query Example
● Consider a simple query: "Find all the tuples in the account relation."
● Even for simple queries, processing is non-trivial if the account relation is fragmented, replicated, or both.
Impact of Replication
● No Replicas Fragmented: Choose the replica with the lowest transmission cost.
● Replicas Fragmented: More complex, as multiple joins or unions may be needed to reconstruct the
account relation.
● Challenge: The number of possible strategies for query optimization can be very large, making exhaustive
enumeration impractical.
10
Fragmentation Transparency and Query Transformation
● Fragmentation transparency allows users to write queries without knowing the underlying fragmentation.
● Example Query: σbranch_name = "Hillside" (account)
● Name Translation: If account is defined as account1 account
∪ 2 (where account1 and account2 are
fragments), the query becomes:
○ σbranch_name = "Hillside" (account1 account
∪ 2)
● Query Optimization: Using optimization techniques, this expression can be simplified to:
○ σbranch_name = "Hillside" (account1) ∪ σbranch_name = "Hillside" (account2)
● Further Optimization:
○ σbranch_name = "Hillside" (account1): Can be eliminated if account_1 only contains tuples from the Hillside
branch. We can directly put account1 instead of σbranch_name = "Hillside" (account1)
○ σbranch_name = "Hillside" (account2): Applying the definition of account2 (e.g., σbranch_name = "Valleyview"
(account)) results in:
■ σbranch_name = "Hillside" (σbranch_name = "Valleyview" (account))
■ This expression is an empty set. If optimizer knows this, it discards this query.
● Final Strategy: The Hillside site returns account1 as the result of the query.
11
Simple Join Processing in Distributed Databases
Key Decision
● Choosing a join strategy is a major decision in query-processing.
Example Relational-Algebra Expression
● account depositor branch
⋈ ⋈
Scenario
● Three relations (account, depositor, branch) are neither replicated nor fragmented.
● account is at Site S1.
● depositor is at Site S2.
● branch is at Site S3.
● The query is issued at Site S1.
● The system needs to produce the result at Site S1.
12
Possible Strategies for Processing this Query
1. Ship all relations to the query site:
○ Ship copies of all three relations (account, depositor, branch) to Site S1.
○ Compute the entire query locally at Site S1 using centralized query processing techniques.
2. Distributed Join with Intermediate Results:
○ Ship a copy of the account relation (from S1) to Site S2.
○ Compute temp1 = account depositor
⋈ at Site S2.
○ Ship temp1 (from S2) to Site S3.
○ Compute temp2 = temp1 branch
⋈ at Site S3.
○ Ship temp2 (from S3) to Site S1 (the query site) as the final result.
3. Variations of Strategy 2:
○ Devise similar strategies by exchanging the roles of Site S1, S2, and S3. (This implies different orders of
shipping and intermediate join computations).
13
Factors to Consider for Strategy Selection
● Volume of Data Shipped: The amount of data transferred across the network.
● Cost of Transmitting a Block: The network transmission cost per data block between sites.
● Relative Speed of Processing: The processing capabilities and speed of each site.
Comparison of Strategies (Example Discussion)
● Strategy 1 (Ship all to S1):
○ If indices are present at S1, they are useful.
○ If indices are not present at S1, re-creation of indices (or extra processing) would be needed, which is
expensive.
● Strategy 2 (Distributed Join):
○ Potentially large intermediate relations (e.g., account depositor
⋈ ) might need to be shipped.
○ This could result in more extra network transmission compared to Strategy 1, depending on the size of the
intermediate results.
Conclusion
● No single strategy is always the best.
● The optimal strategy depends on a trade-off between data transmission costs and local processing costs, considering
factors like data volume, network speed, and site processing capabilities.
14
Semijoin Strategy
● Goal: Efficiently evaluate r1 r
⋈ 2 where r1 is at Site S1 and r2 is at Site S2, with the result needed at S1.
● Problem: Shipping all of r2 to S1 (or r1 to S2) might be costly, especially if many tuples in r2 do not
contribute to the final join result.
● Strategy Steps:
1. Compute Projection: temp1 ← ΠR1∩R2 (r1) at S1. (This projects r1 onto the common attributes used for
joining with r2).
2. Ship Projection: Ship temp1 from S1 to S2. (Only the relevant join attributes from r1 are sent).
3. Compute Semijoin: temp2 ← r2 temp
⋉ 1 at S2. (This performs a semijoin: it selects tuples from r2 that
match any tuple in temp1).
4. Ship Result: Ship temp2 from S2 to S1. (Only the relevant tuples from r2 are sent back).
5. Final Join: r1 temp
⋈ 2 at S1. (This computes the final join locally at S1).
● Correctness: This strategy correctly computes r1 r
⋈ 2.
● Advantages: Particularly advantageous when relatively few tuples of r2 contribute to the join, reducing
network transmission costs. This often occurs when r1 is the result of a selection operation.
● Disadvantages: Incurs additional cost for shipping temp1 to S2.
15
Join Strategies that Exploit Parallelism
● Scenario: Consider a join of four relations: r1 r
⋈ 2 r
⋈ 3 r
⋈ 4, where each ri is at a different site Si. The result
must be presented at site S1.
● Parallel Evaluation: There are multiple strategies to exploit parallelism.
● Example Strategy:
1. r1 is shipped to S2.
2. r1 r
⋈ 2 is computed at S2.
3. r3 is shipped to S4.
4. r3 r
⋈ 4 is computed at S4.
5. Site S2 ships tuples of (r1 r
⋈ 2) to S1 as they are produced (pipelined join).
6. Site S4 ships tuples of (r3 r
⋈ 4) to S1 as they are produced.
7. Once tuples from both (r1 r
⋈ 2) and (r3 r
⋈ 4) arrive at S1, the computation of the final join (r1 r
⋈ 2) ⋈
(r3 r
⋈ 4) can begin in parallel with the ongoing computations at S2 and S4.
● Benefit: This strategy allows parts of the join to be computed concurrently at different sites, potentially
speeding up the overall query execution.
16
Commit Protocols in Distributed Databases
Purpose
● To ensure atomicity in distributed transactions.
● All sites where a transaction T executed must agree on the final outcome:
○ T must commit at all sites, OR
○ T must abort at all sites.
Mechanism
● The transaction coordinator of T executes a commit protocol.
Key Commit Protocols
1. Two-Phase Commit Protocol (2PC)
○ Simple and widely used.
○ (Details described in Section 19.4.1 of the original text).
2. Three-Phase Commit Protocol (3PC)
○ An alternative to 2PC.
○ Aims to avoid certain disadvantages of 2PC.
○ Adds to complexity and overhead.
○ (Briefly outlined in Section 19.4.2 of the original text).
17
Two-Phase Commit Protocol (2PC)
Purpose
● Ensures atomicity for distributed transactions: a transaction either commits at all participating sites or
aborts at all sites.
Participants
● Transaction T: The distributed transaction.
● Site Si: The site where transaction T was initiated.
● Coordinator Ci: The transaction coordinator for T, located at Site Si.
● Participating Sites: All other sites where T executed.
18
2PC Protocol: Normal Operation
The 2PC protocol begins when all participating sites inform the coordinator Ci that their portion of transaction T has
completed.
Phase 1: Prepare Phase
1. Coordinator (Ci):
○ Writes <prepare T> record to its log.
○ Forces the log to stable storage.
○ Sends a prepare T message to all participating sites.
2. Participating Sites (Transaction Manager):
○ Upon receiving prepare T, determines willingness to commit its portion of T.
○ If unwilling to commit:
■ Writes <no T> record to its log.
■ Sends an abort T message to Ci.
○ If willing to commit:
■ Writes <ready T> record to its log (along with all T's log records).
■ Forces the log to stable storage.
■ Sends a ready T message to Ci.
○ Ready State: Once ready T is sent, the transaction is in the "ready" state at that site, meaning it promises to follow Ci's
commit/abort order. Locks are held until completion.
19
Phase 2: Commit/Abort Phase
1. Coordinator (Ci):
○ Waits for responses from all sites, or for a timeout.
○ Decision Rule:
■ If ready T messages are received from all participating sites, T can be committed.
■ Otherwise (if any abort T is received, or timeout occurs), T must be aborted. (Unanimity is
required for commit; one abort means global abort).
○ Verdict Recording:
■ Writes <commit T> or <abort T> record to its log based on the decision.
■ Forces the log to stable storage. At this point, the fate of T is sealed.
○ Message Sending: Sends commit T or abort T message to all participating sites.
2. Participating Sites (Transaction Manager):
○ Upon receiving commit T or abort T, records the message in its log.
20
Key Characteristics
● Unconditional Abort: A participating site can unilaterally abort T at any time before sending ready T.
● Coordinator Unilateral Abort: The coordinator Ci can unilaterally abort T if it's one of the participating
sites and decides to abort.
● Fate Sealed: The transaction's fate is irrevocably decided once the coordinator writes and forces its verdict
(<commit T> or <abort T>) to stable storage.
● Optional Acknowledge: Some 2PC implementations include an optional third step where sites send
acknowledge T to the coordinator after Phase 2. Ci then writes <complete T> to its log upon receiving all
acknowledgements.
21
Two-Phase Commit (2PC): Handling Failures
1. Failure of a Participating Site
● If Coordinator (Ci) detects a site failure:
○ Before ready T response: Ci assumes an abort T message was sent by the failed site. It will abort T on all the
participant sites.
○ After ready T response: Ci proceeds with the rest of the commit protocol normally, ignoring the failed site. The
participant site failed after providing ready T vote. So, the Ci already got the unanimity vote and does not abort.
Participant site will resume after recovering.
● When a Participating Site (Sk) recovers: Sk examines its log for transactions in progress during the failure.
○ Log contains <commit T>: Sk executes redo(T).
○ Log contains <abort T>: Sk executes undo(T).
○ Log contains <ready T>: Sk must consult Ci. [Sk is not sure if it needs to commit/abort]
■ If Ci is up: Ci informs Sk of T's fate (commit/abort); Sk performs redo(T) or undo(T).
■ If Ci is down: Sk sends query-status T to all other sites.
■ If another site has <commit T> or <abort T>, Sk is notified and performs redo(T) or undo(T).
■ If no site has the information, Sk postpones the decision and periodically re-sends query-status until Ci or
another informative site recovers.
○ Log contains no control records for T: Sk failed before sending ready T. Ci must have aborted T. Sk executes undo(T).
ready T || commit T || abort T
22
2. Failure of the Coordinator (Ci)
● Participating sites must decide T's fate. In some cases, they must wait for Ci's recovery.
● Decision Rules for Active Sites:
○ If any active site has <commit T>: T must be committed.
○ If any active site has <abort T>: T must be aborted.
○ If some active site does not have <ready T>: Ci cannot have decided to commit T. It's preferable to abort T rather than wait for Ci to
recover.
○ If all active sites have <ready T> but no <commit T> or <abort T>:
■ The fate of T is undetermined.
■ Active sites must wait for Ci to recover.
■ Blocking Problem: T continues to hold system resources (e.g., locks), making data unavailable on active sites, potentially for a
long time. This is a major disadvantage of 2PC.
3. Network Partition (Network channel divided, intra partition can communicate but inter partition can not)
● Coordinator and all participants in one partition: No effect on the commit protocol.
● Coordinator and participants in different partitions:
○ Sites not in the coordinator's partition handle it as a coordinator failure.
○ The coordinator and sites in its partition proceed assuming sites in other partitions have failed.
Major Disadvantage of 2PC
● Blocking Problem: Coordinator failure can lead to a transaction being blocked indefinitely, holding resources, until the coordinator recovers.
23
Recovery and Concurrency Control
Recovery Procedure for Failed Sites
● Standard Recovery: When a failed site restarts, it performs standard recovery algorithm.
● Special Handling for In-Doubt Transactions:
○ Definition of In-Doubt: Transactions for which a <ready T> log record is found, but neither a <commit T>
nor an <abort T> log record is found.
○ Action: The recovering site must contact other sites (as described in Failure of the Coordinator) to
determine the commit/abort status of these in-doubt transactions.
Problem with Basic Recovery for In-Doubt Transactions
● Blocking Normal Processing: Normal transaction processing at the recovering site cannot begin until all in-
doubt transactions have their status resolved (committed or rolled back).
● Slow Status Determination: Finding the status can be slow, requiring contact with multiple sites.
● Potential for Indefinite Blocking: If the coordinator has failed and no other site has the commit/abort status, the
recovering site can become blocked indefinitely if 2PC is used, making the site unusable for a long period.
24
Enhanced Recovery with Lock Information
To circumvent the blocking problem and speed up recovery, algorithms provide support for noting lock information in the log:
1. Logging Lock Information:
○ Instead of just <ready T>, the algorithm writes a <ready T, L> log record.
○ L is a list of all write locks held by transaction T when the log record is written.
2. Lock Reacquisition during Recovery:
○ After performing local recovery actions, for every in-doubt transaction T:
■ All write locks noted in its <ready T, L> log record are reacquired.
3. Concurrent Processing:
○ After lock reacquisition is complete for all in-doubt transactions, normal transaction processing can start at the site.
○ The commit or rollback of in-doubt transactions proceeds concurrently with the execution of new transactions.
Benefits of Enhanced Recovery
● Faster Site Recovery: The site becomes usable much quicker.
● Avoids Blocking: Recovery itself never gets blocked, even if the coordinator is down.
Trade-off
● New transactions that have a lock conflict with any write locks held by in-doubt transactions will be blocked until the conflicting in-
doubt transactions are resolved (committed or rolled back).

Distributed Database Architecture by Rizvee

  • 1.
  • 2.
    Homogeneous vs. HeterogeneousDatabases Homogeneous Distributed Databases ● All sites use identical Database Management System (DBMS) software. ● Sites are aware of each other and cooperate to process user requests. ● Local sites cede some autonomy (e.g., changing schemas or DBMS software). ● DBMS software cooperates across sites for transaction processing. Heterogeneous Distributed Databases ● Different sites may use different schemas and/or DBMS software. ● Sites may not be aware of one another. ● Offer limited cooperation for transaction processing. ● Challenges: ○ Schema differences complicate query processing. ○ Software divergence hinders multi-site transaction processing. Schema, DBMS Software 2
  • 3.
    3 Data Replication Definition ● Acopy of relation r is stored in two or more sites. ● Full Replication: A copy is stored at every site in the system. Advantages 1. Availability: If a site containing r fails, r can still be found at another site, allowing queries to continue. 2. Increased Parallelism: For read-heavy operations, multiple sites can process queries involving r in parallel. More replicas increase the chance of finding data locally, minimizing data movement. Disadvantages 3. Increased Overhead on Update: All replicas of r must be kept consistent. Any update to r must be propagated to all sites containing replicas, leading to higher overhead. Summary ● Replication improves read performance and data availability for read-only transactions. ● Update transactions incur higher overhead and more complex concurrency control. ● Primary Copy Scheme: A common simplification where one replica is designated as the primary copy.
  • 4.
    4 Data Fragmentation Definition ● Relationr is divided into fragments r1, r2, ..., rn. ● Fragments must contain sufficient information to reconstruct the original relation r. Types of Fragmentation 1. Horizontal Fragmentation ● Splits by Tuple: Assigns each tuple of r to one or more fragments. ● Purpose: Usually used to keep tuples at sites where they are most frequently used, minimizing data transfer. ● Definition: A fragment ri is defined by a selection predicate Pi on the global relation r: ○ ri = σPi (r) ● Reconstruction: The original relation r is reconstructed by taking the union of all fragments: ○ r = r1 r ∪ 2 ... r ∪ ∪ n ● Example: account relation fragmented by branch_name (e.g., account_Hillside, account_Valleyview).
  • 5.
    5 2. Vertical Fragmentation ●Splits by Attribute: Decomposes the schema R of relation r into subsets of attributes R1, R2, ..., Rn. ● Requirement: R = R1 R ∪ 2 ... R ∪ ∪ n ● Definition: Each fragment ri of r is defined by projecting r onto the attribute subset Ri: ○ ri = ΠRi (r) ● Reconstruction: The original relation r is reconstructed by taking the natural join of all fragments: ○ r = r1 r ⋈ 2 ... r ⋈ ⋈ n ● Ensuring Reconstruction: Include primary-key attributes (or a superkey/tuple-id) in each Ri. ● Example: employee_info fragmented into employee_private_info (employee_id, salary) and employee_public_info (employee_id, name, designation). Horizontal Fragmentation Vertical Fragmentation
  • 6.
    6 Transparency in DistributedDatabases Concept of Data Transparency ● Users should not need to know the physical location or access methods of data. ● Aims to simplify user interaction and application development. Forms of Data Transparency 1. Fragmentation Transparency: ○ Users are not required to know how a relation has been fragmented. ○ The system handles the underlying partitioning. 2. Replication Transparency: ○ Users perceive each data object as logically unique. ○ The system manages data replication for performance or availability, without user involvement. ○ Users are unaware of which objects are replicated or where replicas are placed. 3. Location Transparency: ○ Users do not need to know the physical location of the data. ○ The distributed database system can find any data item given its identifier.
  • 7.
    7 Naming in DistributedDatabases ● Challenge: Ensuring unique names for data items across multiple sites. ● Solution 1: Central Name Server: ○ All names registered centrally. ○ Pros: Ensures unique names, helps locate data. ○ Cons: Performance bottleneck, single point of failure if server crashes. ● Solution 2: Site Identifier Prefixing: ○ Each site prefixes its unique identifier to names it generates (e.g., site17.account). ○ Pros: No central control needed, ensures uniqueness. ○ Cons: Fails to achieve location transparency (users see site IDs). ● Solution 3: Aliases (Preferred): ○ The database system creates alternative, simple names (aliases) for data items. ○ These aliases are translated by the system to complete names (e.g., account instead of site17.account). ○ Mapping of aliases to real names stored at each site. [Locally stored in the distributed system] ○ Benefits: Achieves location transparency, users unaffected by data movement. Managing Replicas with Transparency ● Users should not specify a particular replica for read requests. ● The system determines which replica to use for reads. ● On write requests, the system automatically updates all replicas. ● Mechanism: A catalog table is maintained by the system to track all replicas for each data item.
  • 8.
    8 Distributed Query Processing Goal ●Minimize the time taken to compute the answer to a query. Centralized Systems: Primary Cost Criterion ● Number of disk accesses. Distributed Systems: Additional Considerations In a distributed system, query processing strategies must also account for: ● Cost of Data Transmission: The overhead of transferring data over the network. ● Parallel Processing Gain: The potential performance improvement from having multiple sites process parts of the query concurrently. Optimization Challenge ● The relative costs of network data transfer and disk I/O vary significantly. ● Query optimization must find a good trade-off between these two cost factors, rather than focusing solely on one.
  • 9.
    9 Query Transformation inDistributed Databases Simple Query Example ● Consider a simple query: "Find all the tuples in the account relation." ● Even for simple queries, processing is non-trivial if the account relation is fragmented, replicated, or both. Impact of Replication ● No Replicas Fragmented: Choose the replica with the lowest transmission cost. ● Replicas Fragmented: More complex, as multiple joins or unions may be needed to reconstruct the account relation. ● Challenge: The number of possible strategies for query optimization can be very large, making exhaustive enumeration impractical.
  • 10.
    10 Fragmentation Transparency andQuery Transformation ● Fragmentation transparency allows users to write queries without knowing the underlying fragmentation. ● Example Query: σbranch_name = "Hillside" (account) ● Name Translation: If account is defined as account1 account ∪ 2 (where account1 and account2 are fragments), the query becomes: ○ σbranch_name = "Hillside" (account1 account ∪ 2) ● Query Optimization: Using optimization techniques, this expression can be simplified to: ○ σbranch_name = "Hillside" (account1) ∪ σbranch_name = "Hillside" (account2) ● Further Optimization: ○ σbranch_name = "Hillside" (account1): Can be eliminated if account_1 only contains tuples from the Hillside branch. We can directly put account1 instead of σbranch_name = "Hillside" (account1) ○ σbranch_name = "Hillside" (account2): Applying the definition of account2 (e.g., σbranch_name = "Valleyview" (account)) results in: ■ σbranch_name = "Hillside" (σbranch_name = "Valleyview" (account)) ■ This expression is an empty set. If optimizer knows this, it discards this query. ● Final Strategy: The Hillside site returns account1 as the result of the query.
  • 11.
    11 Simple Join Processingin Distributed Databases Key Decision ● Choosing a join strategy is a major decision in query-processing. Example Relational-Algebra Expression ● account depositor branch ⋈ ⋈ Scenario ● Three relations (account, depositor, branch) are neither replicated nor fragmented. ● account is at Site S1. ● depositor is at Site S2. ● branch is at Site S3. ● The query is issued at Site S1. ● The system needs to produce the result at Site S1.
  • 12.
    12 Possible Strategies forProcessing this Query 1. Ship all relations to the query site: ○ Ship copies of all three relations (account, depositor, branch) to Site S1. ○ Compute the entire query locally at Site S1 using centralized query processing techniques. 2. Distributed Join with Intermediate Results: ○ Ship a copy of the account relation (from S1) to Site S2. ○ Compute temp1 = account depositor ⋈ at Site S2. ○ Ship temp1 (from S2) to Site S3. ○ Compute temp2 = temp1 branch ⋈ at Site S3. ○ Ship temp2 (from S3) to Site S1 (the query site) as the final result. 3. Variations of Strategy 2: ○ Devise similar strategies by exchanging the roles of Site S1, S2, and S3. (This implies different orders of shipping and intermediate join computations).
  • 13.
    13 Factors to Considerfor Strategy Selection ● Volume of Data Shipped: The amount of data transferred across the network. ● Cost of Transmitting a Block: The network transmission cost per data block between sites. ● Relative Speed of Processing: The processing capabilities and speed of each site. Comparison of Strategies (Example Discussion) ● Strategy 1 (Ship all to S1): ○ If indices are present at S1, they are useful. ○ If indices are not present at S1, re-creation of indices (or extra processing) would be needed, which is expensive. ● Strategy 2 (Distributed Join): ○ Potentially large intermediate relations (e.g., account depositor ⋈ ) might need to be shipped. ○ This could result in more extra network transmission compared to Strategy 1, depending on the size of the intermediate results. Conclusion ● No single strategy is always the best. ● The optimal strategy depends on a trade-off between data transmission costs and local processing costs, considering factors like data volume, network speed, and site processing capabilities.
  • 14.
    14 Semijoin Strategy ● Goal:Efficiently evaluate r1 r ⋈ 2 where r1 is at Site S1 and r2 is at Site S2, with the result needed at S1. ● Problem: Shipping all of r2 to S1 (or r1 to S2) might be costly, especially if many tuples in r2 do not contribute to the final join result. ● Strategy Steps: 1. Compute Projection: temp1 ← ΠR1∩R2 (r1) at S1. (This projects r1 onto the common attributes used for joining with r2). 2. Ship Projection: Ship temp1 from S1 to S2. (Only the relevant join attributes from r1 are sent). 3. Compute Semijoin: temp2 ← r2 temp ⋉ 1 at S2. (This performs a semijoin: it selects tuples from r2 that match any tuple in temp1). 4. Ship Result: Ship temp2 from S2 to S1. (Only the relevant tuples from r2 are sent back). 5. Final Join: r1 temp ⋈ 2 at S1. (This computes the final join locally at S1). ● Correctness: This strategy correctly computes r1 r ⋈ 2. ● Advantages: Particularly advantageous when relatively few tuples of r2 contribute to the join, reducing network transmission costs. This often occurs when r1 is the result of a selection operation. ● Disadvantages: Incurs additional cost for shipping temp1 to S2.
  • 15.
    15 Join Strategies thatExploit Parallelism ● Scenario: Consider a join of four relations: r1 r ⋈ 2 r ⋈ 3 r ⋈ 4, where each ri is at a different site Si. The result must be presented at site S1. ● Parallel Evaluation: There are multiple strategies to exploit parallelism. ● Example Strategy: 1. r1 is shipped to S2. 2. r1 r ⋈ 2 is computed at S2. 3. r3 is shipped to S4. 4. r3 r ⋈ 4 is computed at S4. 5. Site S2 ships tuples of (r1 r ⋈ 2) to S1 as they are produced (pipelined join). 6. Site S4 ships tuples of (r3 r ⋈ 4) to S1 as they are produced. 7. Once tuples from both (r1 r ⋈ 2) and (r3 r ⋈ 4) arrive at S1, the computation of the final join (r1 r ⋈ 2) ⋈ (r3 r ⋈ 4) can begin in parallel with the ongoing computations at S2 and S4. ● Benefit: This strategy allows parts of the join to be computed concurrently at different sites, potentially speeding up the overall query execution.
  • 16.
    16 Commit Protocols inDistributed Databases Purpose ● To ensure atomicity in distributed transactions. ● All sites where a transaction T executed must agree on the final outcome: ○ T must commit at all sites, OR ○ T must abort at all sites. Mechanism ● The transaction coordinator of T executes a commit protocol. Key Commit Protocols 1. Two-Phase Commit Protocol (2PC) ○ Simple and widely used. ○ (Details described in Section 19.4.1 of the original text). 2. Three-Phase Commit Protocol (3PC) ○ An alternative to 2PC. ○ Aims to avoid certain disadvantages of 2PC. ○ Adds to complexity and overhead. ○ (Briefly outlined in Section 19.4.2 of the original text).
  • 17.
    17 Two-Phase Commit Protocol(2PC) Purpose ● Ensures atomicity for distributed transactions: a transaction either commits at all participating sites or aborts at all sites. Participants ● Transaction T: The distributed transaction. ● Site Si: The site where transaction T was initiated. ● Coordinator Ci: The transaction coordinator for T, located at Site Si. ● Participating Sites: All other sites where T executed.
  • 18.
    18 2PC Protocol: NormalOperation The 2PC protocol begins when all participating sites inform the coordinator Ci that their portion of transaction T has completed. Phase 1: Prepare Phase 1. Coordinator (Ci): ○ Writes <prepare T> record to its log. ○ Forces the log to stable storage. ○ Sends a prepare T message to all participating sites. 2. Participating Sites (Transaction Manager): ○ Upon receiving prepare T, determines willingness to commit its portion of T. ○ If unwilling to commit: ■ Writes <no T> record to its log. ■ Sends an abort T message to Ci. ○ If willing to commit: ■ Writes <ready T> record to its log (along with all T's log records). ■ Forces the log to stable storage. ■ Sends a ready T message to Ci. ○ Ready State: Once ready T is sent, the transaction is in the "ready" state at that site, meaning it promises to follow Ci's commit/abort order. Locks are held until completion.
  • 19.
    19 Phase 2: Commit/AbortPhase 1. Coordinator (Ci): ○ Waits for responses from all sites, or for a timeout. ○ Decision Rule: ■ If ready T messages are received from all participating sites, T can be committed. ■ Otherwise (if any abort T is received, or timeout occurs), T must be aborted. (Unanimity is required for commit; one abort means global abort). ○ Verdict Recording: ■ Writes <commit T> or <abort T> record to its log based on the decision. ■ Forces the log to stable storage. At this point, the fate of T is sealed. ○ Message Sending: Sends commit T or abort T message to all participating sites. 2. Participating Sites (Transaction Manager): ○ Upon receiving commit T or abort T, records the message in its log.
  • 20.
    20 Key Characteristics ● UnconditionalAbort: A participating site can unilaterally abort T at any time before sending ready T. ● Coordinator Unilateral Abort: The coordinator Ci can unilaterally abort T if it's one of the participating sites and decides to abort. ● Fate Sealed: The transaction's fate is irrevocably decided once the coordinator writes and forces its verdict (<commit T> or <abort T>) to stable storage. ● Optional Acknowledge: Some 2PC implementations include an optional third step where sites send acknowledge T to the coordinator after Phase 2. Ci then writes <complete T> to its log upon receiving all acknowledgements.
  • 21.
    21 Two-Phase Commit (2PC):Handling Failures 1. Failure of a Participating Site ● If Coordinator (Ci) detects a site failure: ○ Before ready T response: Ci assumes an abort T message was sent by the failed site. It will abort T on all the participant sites. ○ After ready T response: Ci proceeds with the rest of the commit protocol normally, ignoring the failed site. The participant site failed after providing ready T vote. So, the Ci already got the unanimity vote and does not abort. Participant site will resume after recovering. ● When a Participating Site (Sk) recovers: Sk examines its log for transactions in progress during the failure. ○ Log contains <commit T>: Sk executes redo(T). ○ Log contains <abort T>: Sk executes undo(T). ○ Log contains <ready T>: Sk must consult Ci. [Sk is not sure if it needs to commit/abort] ■ If Ci is up: Ci informs Sk of T's fate (commit/abort); Sk performs redo(T) or undo(T). ■ If Ci is down: Sk sends query-status T to all other sites. ■ If another site has <commit T> or <abort T>, Sk is notified and performs redo(T) or undo(T). ■ If no site has the information, Sk postpones the decision and periodically re-sends query-status until Ci or another informative site recovers. ○ Log contains no control records for T: Sk failed before sending ready T. Ci must have aborted T. Sk executes undo(T). ready T || commit T || abort T
  • 22.
    22 2. Failure ofthe Coordinator (Ci) ● Participating sites must decide T's fate. In some cases, they must wait for Ci's recovery. ● Decision Rules for Active Sites: ○ If any active site has <commit T>: T must be committed. ○ If any active site has <abort T>: T must be aborted. ○ If some active site does not have <ready T>: Ci cannot have decided to commit T. It's preferable to abort T rather than wait for Ci to recover. ○ If all active sites have <ready T> but no <commit T> or <abort T>: ■ The fate of T is undetermined. ■ Active sites must wait for Ci to recover. ■ Blocking Problem: T continues to hold system resources (e.g., locks), making data unavailable on active sites, potentially for a long time. This is a major disadvantage of 2PC. 3. Network Partition (Network channel divided, intra partition can communicate but inter partition can not) ● Coordinator and all participants in one partition: No effect on the commit protocol. ● Coordinator and participants in different partitions: ○ Sites not in the coordinator's partition handle it as a coordinator failure. ○ The coordinator and sites in its partition proceed assuming sites in other partitions have failed. Major Disadvantage of 2PC ● Blocking Problem: Coordinator failure can lead to a transaction being blocked indefinitely, holding resources, until the coordinator recovers.
  • 23.
    23 Recovery and ConcurrencyControl Recovery Procedure for Failed Sites ● Standard Recovery: When a failed site restarts, it performs standard recovery algorithm. ● Special Handling for In-Doubt Transactions: ○ Definition of In-Doubt: Transactions for which a <ready T> log record is found, but neither a <commit T> nor an <abort T> log record is found. ○ Action: The recovering site must contact other sites (as described in Failure of the Coordinator) to determine the commit/abort status of these in-doubt transactions. Problem with Basic Recovery for In-Doubt Transactions ● Blocking Normal Processing: Normal transaction processing at the recovering site cannot begin until all in- doubt transactions have their status resolved (committed or rolled back). ● Slow Status Determination: Finding the status can be slow, requiring contact with multiple sites. ● Potential for Indefinite Blocking: If the coordinator has failed and no other site has the commit/abort status, the recovering site can become blocked indefinitely if 2PC is used, making the site unusable for a long period.
  • 24.
    24 Enhanced Recovery withLock Information To circumvent the blocking problem and speed up recovery, algorithms provide support for noting lock information in the log: 1. Logging Lock Information: ○ Instead of just <ready T>, the algorithm writes a <ready T, L> log record. ○ L is a list of all write locks held by transaction T when the log record is written. 2. Lock Reacquisition during Recovery: ○ After performing local recovery actions, for every in-doubt transaction T: ■ All write locks noted in its <ready T, L> log record are reacquired. 3. Concurrent Processing: ○ After lock reacquisition is complete for all in-doubt transactions, normal transaction processing can start at the site. ○ The commit or rollback of in-doubt transactions proceeds concurrently with the execution of new transactions. Benefits of Enhanced Recovery ● Faster Site Recovery: The site becomes usable much quicker. ● Avoids Blocking: Recovery itself never gets blocked, even if the coordinator is down. Trade-off ● New transactions that have a lock conflict with any write locks held by in-doubt transactions will be blocked until the conflicting in- doubt transactions are resolved (committed or rolled back).