Distributed Database Architecture by Rizvee

Distributed Database
Architecture
https://rizveeredwan.github.io
1

Homogeneous vs. Heterogeneous Databases
Homogeneous Distributed Databases
● All sites use identical Database Management System (DBMS) software.
● Sites are aware of each other and cooperate to process user requests.
● Local sites cede some autonomy (e.g., changing schemas or DBMS software).
● DBMS software cooperates across sites for transaction processing.
Heterogeneous Distributed Databases
● Different sites may use different schemas and/or DBMS software.
● Sites may not be aware of one another.
● Offer limited cooperation for transaction processing.
● Challenges:
○ Schema differences complicate query processing.
○ Software divergence hinders multi-site transaction processing.
Schema, DBMS Software
2

3
Data Replication
Definition
● A copy of relation r is stored in two or more sites.
● Full Replication: A copy is stored at every site in the system.
Advantages
1. Availability: If a site containing r fails, r can still be found at another site, allowing queries to continue.
2. Increased Parallelism: For read-heavy operations, multiple sites can process queries involving r in parallel. More
replicas increase the chance of finding data locally, minimizing data movement.
Disadvantages
3. Increased Overhead on Update: All replicas of r must be kept consistent. Any update to r must be propagated to all sites
containing replicas, leading to higher overhead.
Summary
● Replication improves read performance and data availability for read-only transactions.
● Update transactions incur higher overhead and more complex concurrency control.
● Primary Copy Scheme: A common simplification where one replica is designated as the primary copy.

4
Data Fragmentation
Definition
● Relation r is divided into fragments r1, r2, ..., rn.
● Fragments must contain sufficient information to reconstruct the original relation r.
Types of Fragmentation
1. Horizontal Fragmentation
● Splits by Tuple: Assigns each tuple of r to one or more fragments.
● Purpose: Usually used to keep tuples at sites where they are most frequently used, minimizing data
transfer.
● Definition: A fragment ri is defined by a selection predicate Pi on the global relation r:
○ ri = σPi (r)
● Reconstruction: The original relation r is reconstructed by taking the union of all fragments:
○ r = r1 r
∪ 2 ... r
∪ ∪ n
● Example: account relation fragmented by branch_name (e.g., account_Hillside, account_Valleyview).

5
2. Vertical Fragmentation
● Splits by Attribute: Decomposes the schema R of relation r into subsets of attributes R1, R2, ..., Rn.
● Requirement: R = R1 R
∪ 2 ... R
∪ ∪ n
● Definition: Each fragment ri of r is defined by projecting r onto the attribute subset Ri:
○ ri = ΠRi (r)
● Reconstruction: The original relation r is reconstructed by taking the natural join of all fragments:
○ r = r1 r
⋈ 2 ... r
⋈ ⋈ n
● Ensuring Reconstruction: Include primary-key attributes (or a superkey/tuple-id) in each Ri.
● Example: employee_info fragmented into employee_private_info (employee_id, salary) and
employee_public_info (employee_id, name, designation).
Horizontal Fragmentation Vertical Fragmentation

6
Transparency in Distributed Databases
Concept of Data Transparency
● Users should not need to know the physical location or access methods of data.
● Aims to simplify user interaction and application development.
Forms of Data Transparency
1. Fragmentation Transparency:
○ Users are not required to know how a relation has been fragmented.
○ The system handles the underlying partitioning.
2. Replication Transparency:
○ Users perceive each data object as logically unique.
○ The system manages data replication for performance or availability, without user involvement.
○ Users are unaware of which objects are replicated or where replicas are placed.
3. Location Transparency:
○ Users do not need to know the physical location of the data.
○ The distributed database system can find any data item given its identifier.

7
Naming in Distributed Databases
● Challenge: Ensuring unique names for data items across multiple sites.
● Solution 1: Central Name Server:
○ All names registered centrally.
○ Pros: Ensures unique names, helps locate data.
○ Cons: Performance bottleneck, single point of failure if server crashes.
● Solution 2: Site Identifier Prefixing:
○ Each site prefixes its unique identifier to names it generates (e.g., site17.account).
○ Pros: No central control needed, ensures uniqueness.
○ Cons: Fails to achieve location transparency (users see site IDs).
● Solution 3: Aliases (Preferred):
○ The database system creates alternative, simple names (aliases) for data items.
○ These aliases are translated by the system to complete names (e.g., account instead of site17.account).
○ Mapping of aliases to real names stored at each site. [Locally stored in the distributed system]
○ Benefits: Achieves location transparency, users unaffected by data movement.
Managing Replicas with Transparency
● Users should not specify a particular replica for read requests.
● The system determines which replica to use for reads.
● On write requests, the system automatically updates all replicas.
● Mechanism: A catalog table is maintained by the system to track all replicas for each data item.

8
Distributed Query Processing
Goal
● Minimize the time taken to compute the answer to a query.
Centralized Systems: Primary Cost Criterion
● Number of disk accesses.
Distributed Systems: Additional Considerations
In a distributed system, query processing strategies must also account for:
● Cost of Data Transmission: The overhead of transferring data over the network.
● Parallel Processing Gain: The potential performance improvement from having multiple sites process parts of the
query concurrently.
Optimization Challenge
● The relative costs of network data transfer and disk I/O vary significantly.
● Query optimization must find a good trade-off between these two cost factors, rather than focusing solely on one.

9
Query Transformation in Distributed Databases
Simple Query Example
● Consider a simple query: "Find all the tuples in the account relation."
● Even for simple queries, processing is non-trivial if the account relation is fragmented, replicated, or both.
Impact of Replication
● No Replicas Fragmented: Choose the replica with the lowest transmission cost.
● Replicas Fragmented: More complex, as multiple joins or unions may be needed to reconstruct the
account relation.
● Challenge: The number of possible strategies for query optimization can be very large, making exhaustive
enumeration impractical.

10
Fragmentation Transparency and Query Transformation
● Fragmentation transparency allows users to write queries without knowing the underlying fragmentation.
● Example Query: σbranch_name = "Hillside" (account)
● Name Translation: If account is defined as account1 account
∪ 2 (where account1 and account2 are
fragments), the query becomes:
○ σbranch_name = "Hillside" (account1 account
∪ 2)
● Query Optimization: Using optimization techniques, this expression can be simplified to:
○ σbranch_name = "Hillside" (account1) ∪ σbranch_name = "Hillside" (account2)
● Further Optimization:
○ σbranch_name = "Hillside" (account1): Can be eliminated if account_1 only contains tuples from the Hillside
branch. We can directly put account1 instead of σbranch_name = "Hillside" (account1)
○ σbranch_name = "Hillside" (account2): Applying the definition of account2 (e.g., σbranch_name = "Valleyview"
(account)) results in:
■ σbranch_name = "Hillside" (σbranch_name = "Valleyview" (account))
■ This expression is an empty set. If optimizer knows this, it discards this query.
● Final Strategy: The Hillside site returns account1 as the result of the query.

11
Simple Join Processing in Distributed Databases
Key Decision
● Choosing a join strategy is a major decision in query-processing.
Example Relational-Algebra Expression
● account depositor branch
⋈ ⋈
Scenario
● Three relations (account, depositor, branch) are neither replicated nor fragmented.
● account is at Site S1.
● depositor is at Site S2.
● branch is at Site S3.
● The query is issued at Site S1.
● The system needs to produce the result at Site S1.

12
Possible Strategies for Processing this Query
1. Ship all relations to the query site:
○ Ship copies of all three relations (account, depositor, branch) to Site S1.
○ Compute the entire query locally at Site S1 using centralized query processing techniques.
2. Distributed Join with Intermediate Results:
○ Ship a copy of the account relation (from S1) to Site S2.
○ Compute temp1 = account depositor
⋈ at Site S2.
○ Ship temp1 (from S2) to Site S3.
○ Compute temp2 = temp1 branch
⋈ at Site S3.
○ Ship temp2 (from S3) to Site S1 (the query site) as the final result.
3. Variations of Strategy 2:
○ Devise similar strategies by exchanging the roles of Site S1, S2, and S3. (This implies different orders of
shipping and intermediate join computations).

13
Factors to Consider for Strategy Selection
● Volume of Data Shipped: The amount of data transferred across the network.
● Cost of Transmitting a Block: The network transmission cost per data block between sites.
● Relative Speed of Processing: The processing capabilities and speed of each site.
Comparison of Strategies (Example Discussion)
● Strategy 1 (Ship all to S1):
○ If indices are present at S1, they are useful.
○ If indices are not present at S1, re-creation of indices (or extra processing) would be needed, which is
expensive.
● Strategy 2 (Distributed Join):
○ Potentially large intermediate relations (e.g., account depositor
⋈ ) might need to be shipped.
○ This could result in more extra network transmission compared to Strategy 1, depending on the size of the
intermediate results.
Conclusion
● No single strategy is always the best.
● The optimal strategy depends on a trade-off between data transmission costs and local processing costs, considering
factors like data volume, network speed, and site processing capabilities.

14
Semijoin Strategy
● Goal: Efficiently evaluate r1 r
⋈ 2 where r1 is at Site S1 and r2 is at Site S2, with the result needed at S1.
● Problem: Shipping all of r2 to S1 (or r1 to S2) might be costly, especially if many tuples in r2 do not
contribute to the final join result.
● Strategy Steps:
1. Compute Projection: temp1 ← ΠR1∩R2 (r1) at S1. (This projects r1 onto the common attributes used for
joining with r2).
2. Ship Projection: Ship temp1 from S1 to S2. (Only the relevant join attributes from r1 are sent).
3. Compute Semijoin: temp2 ← r2 temp
⋉ 1 at S2. (This performs a semijoin: it selects tuples from r2 that
match any tuple in temp1).
4. Ship Result: Ship temp2 from S2 to S1. (Only the relevant tuples from r2 are sent back).
5. Final Join: r1 temp
⋈ 2 at S1. (This computes the final join locally at S1).
● Correctness: This strategy correctly computes r1 r
⋈ 2.
● Advantages: Particularly advantageous when relatively few tuples of r2 contribute to the join, reducing
network transmission costs. This often occurs when r1 is the result of a selection operation.
● Disadvantages: Incurs additional cost for shipping temp1 to S2.

15
Join Strategies that Exploit Parallelism
● Scenario: Consider a join of four relations: r1 r
⋈ 2 r
⋈ 3 r
⋈ 4, where each ri is at a different site Si. The result
must be presented at site S1.
● Parallel Evaluation: There are multiple strategies to exploit parallelism.
● Example Strategy:
1. r1 is shipped to S2.
2. r1 r
⋈ 2 is computed at S2.
3. r3 is shipped to S4.
4. r3 r
⋈ 4 is computed at S4.
5. Site S2 ships tuples of (r1 r
⋈ 2) to S1 as they are produced (pipelined join).
6. Site S4 ships tuples of (r3 r
⋈ 4) to S1 as they are produced.
7. Once tuples from both (r1 r
⋈ 2) and (r3 r
⋈ 4) arrive at S1, the computation of the final join (r1 r
⋈ 2) ⋈
(r3 r
⋈ 4) can begin in parallel with the ongoing computations at S2 and S4.
● Benefit: This strategy allows parts of the join to be computed concurrently at different sites, potentially
speeding up the overall query execution.

16
Commit Protocols in Distributed Databases
Purpose
● To ensure atomicity in distributed transactions.
● All sites where a transaction T executed must agree on the final outcome:
○ T must commit at all sites, OR
○ T must abort at all sites.
Mechanism
● The transaction coordinator of T executes a commit protocol.
Key Commit Protocols
1. Two-Phase Commit Protocol (2PC)
○ Simple and widely used.
○ (Details described in Section 19.4.1 of the original text).
2. Three-Phase Commit Protocol (3PC)
○ An alternative to 2PC.
○ Aims to avoid certain disadvantages of 2PC.
○ Adds to complexity and overhead.
○ (Briefly outlined in Section 19.4.2 of the original text).

17
Two-Phase Commit Protocol (2PC)
Purpose
● Ensures atomicity for distributed transactions: a transaction either commits at all participating sites or
aborts at all sites.
Participants
● Transaction T: The distributed transaction.
● Site Si: The site where transaction T was initiated.
● Coordinator Ci: The transaction coordinator for T, located at Site Si.
● Participating Sites: All other sites where T executed.

18
2PC Protocol: Normal Operation
The 2PC protocol begins when all participating sites inform the coordinator Ci that their portion of transaction T has
completed.
Phase 1: Prepare Phase
1. Coordinator (Ci):
○ Writes <prepare T> record to its log.
○ Forces the log to stable storage.
○ Sends a prepare T message to all participating sites.
2. Participating Sites (Transaction Manager):
○ Upon receiving prepare T, determines willingness to commit its portion of T.
○ If unwilling to commit:
■ Writes <no T> record to its log.
■ Sends an abort T message to Ci.
○ If willing to commit:
■ Writes <ready T> record to its log (along with all T's log records).
■ Forces the log to stable storage.
■ Sends a ready T message to Ci.
○ Ready State: Once ready T is sent, the transaction is in the "ready" state at that site, meaning it promises to follow Ci's
commit/abort order. Locks are held until completion.

19
Phase 2: Commit/Abort Phase
1. Coordinator (Ci):
○ Waits for responses from all sites, or for a timeout.
○ Decision Rule:
■ If ready T messages are received from all participating sites, T can be committed.
■ Otherwise (if any abort T is received, or timeout occurs), T must be aborted. (Unanimity is
required for commit; one abort means global abort).
○ Verdict Recording:
■ Writes <commit T> or <abort T> record to its log based on the decision.
■ Forces the log to stable storage. At this point, the fate of T is sealed.
○ Message Sending: Sends commit T or abort T message to all participating sites.
2. Participating Sites (Transaction Manager):
○ Upon receiving commit T or abort T, records the message in its log.

20
Key Characteristics
● Unconditional Abort: A participating site can unilaterally abort T at any time before sending ready T.
● Coordinator Unilateral Abort: The coordinator Ci can unilaterally abort T if it's one of the participating
sites and decides to abort.
● Fate Sealed: The transaction's fate is irrevocably decided once the coordinator writes and forces its verdict
(<commit T> or <abort T>) to stable storage.
● Optional Acknowledge: Some 2PC implementations include an optional third step where sites send
acknowledge T to the coordinator after Phase 2. Ci then writes <complete T> to its log upon receiving all
acknowledgements.

21
Two-Phase Commit (2PC): Handling Failures
1. Failure of a Participating Site
● If Coordinator (Ci) detects a site failure:
○ Before ready T response: Ci assumes an abort T message was sent by the failed site. It will abort T on all the
participant sites.
○ After ready T response: Ci proceeds with the rest of the commit protocol normally, ignoring the failed site. The
participant site failed after providing ready T vote. So, the Ci already got the unanimity vote and does not abort.
Participant site will resume after recovering.
● When a Participating Site (Sk) recovers: Sk examines its log for transactions in progress during the failure.
○ Log contains <commit T>: Sk executes redo(T).
○ Log contains <abort T>: Sk executes undo(T).
○ Log contains <ready T>: Sk must consult Ci. [Sk is not sure if it needs to commit/abort]
■ If Ci is up: Ci informs Sk of T's fate (commit/abort); Sk performs redo(T) or undo(T).
■ If Ci is down: Sk sends query-status T to all other sites.
■ If another site has <commit T> or <abort T>, Sk is notified and performs redo(T) or undo(T).
■ If no site has the information, Sk postpones the decision and periodically re-sends query-status until Ci or
another informative site recovers.
○ Log contains no control records for T: Sk failed before sending ready T. Ci must have aborted T. Sk executes undo(T).
ready T || commit T || abort T

22
2. Failure of the Coordinator (Ci)
● Participating sites must decide T's fate. In some cases, they must wait for Ci's recovery.
● Decision Rules for Active Sites:
○ If any active site has <commit T>: T must be committed.
○ If any active site has <abort T>: T must be aborted.
○ If some active site does not have <ready T>: Ci cannot have decided to commit T. It's preferable to abort T rather than wait for Ci to
recover.
○ If all active sites have <ready T> but no <commit T> or <abort T>:
■ The fate of T is undetermined.
■ Active sites must wait for Ci to recover.
■ Blocking Problem: T continues to hold system resources (e.g., locks), making data unavailable on active sites, potentially for a
long time. This is a major disadvantage of 2PC.
3. Network Partition (Network channel divided, intra partition can communicate but inter partition can not)
● Coordinator and all participants in one partition: No effect on the commit protocol.
● Coordinator and participants in different partitions:
○ Sites not in the coordinator's partition handle it as a coordinator failure.
○ The coordinator and sites in its partition proceed assuming sites in other partitions have failed.
Major Disadvantage of 2PC
● Blocking Problem: Coordinator failure can lead to a transaction being blocked indefinitely, holding resources, until the coordinator recovers.

23
Recovery and Concurrency Control
Recovery Procedure for Failed Sites
● Standard Recovery: When a failed site restarts, it performs standard recovery algorithm.
● Special Handling for In-Doubt Transactions:
○ Definition of In-Doubt: Transactions for which a <ready T> log record is found, but neither a <commit T>
nor an <abort T> log record is found.
○ Action: The recovering site must contact other sites (as described in Failure of the Coordinator) to
determine the commit/abort status of these in-doubt transactions.
Problem with Basic Recovery for In-Doubt Transactions
● Blocking Normal Processing: Normal transaction processing at the recovering site cannot begin until all in-
doubt transactions have their status resolved (committed or rolled back).
● Slow Status Determination: Finding the status can be slow, requiring contact with multiple sites.
● Potential for Indefinite Blocking: If the coordinator has failed and no other site has the commit/abort status, the
recovering site can become blocked indefinitely if 2PC is used, making the site unusable for a long period.

24
Enhanced Recovery with Lock Information
To circumvent the blocking problem and speed up recovery, algorithms provide support for noting lock information in the log:
1. Logging Lock Information:
○ Instead of just <ready T>, the algorithm writes a <ready T, L> log record.
○ L is a list of all write locks held by transaction T when the log record is written.
2. Lock Reacquisition during Recovery:
○ After performing local recovery actions, for every in-doubt transaction T:
■ All write locks noted in its <ready T, L> log record are reacquired.
3. Concurrent Processing:
○ After lock reacquisition is complete for all in-doubt transactions, normal transaction processing can start at the site.
○ The commit or rollback of in-doubt transactions proceeds concurrently with the execution of new transactions.
Benefits of Enhanced Recovery
● Faster Site Recovery: The site becomes usable much quicker.
● Avoids Blocking: Recovery itself never gets blocked, even if the coordinator is down.
Trade-off
● New transactions that have a lock conflict with any write locks held by in-doubt transactions will be blocked until the conflicting in-
doubt transactions are resolved (committed or rolled back).

Distributed Database Architecture by Rizvee

More Related Content

Similar to Distributed Database Architecture by Rizvee

Recently uploaded

Distributed Database Architecture by Rizvee