Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Pachaiyappa's College For Men., Kanchipuram Department of Computer Science M.sc., Computer Science & Technology Distributed DataBase Outline Introduction « What is a distributed DBMS « Problems « Current state-of-affairs Distributed Database Design « Fragmentation « Data Location Distributed Query Processing « Query Processing Methodology « Distributed Query Optimization Distributed Transaction Management « Transaction Concepts and Models « Distributed Concurrency Control « Distributed Reliability What is distributed ... Processing logic Functions Data Control What is a Distributed Database System? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. Distributed database system (DDBS) = DDB + D–DBMS . What is not a DDBS? A timesharing computer system A loosely or tightly coupled multiprocessor system A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node . Applications Manufacturing - especially multi-plant manufacturing Military command and control Airlines Hotel chains Any organization which has a decentralized organization structure
  2. 2. Distributed DBMS Promises Transparent management of distributed, fragmented, and replicated data Improved reliability/availability through distributed transactions Improved performance Easier and more economical system expansion Transparency Transparency is the separation of the higher level semantics of a system from the lower level implementation issues. Fundamental issue is to provide data independence in the distributed environment « Network (distribution) transparency « Replication transparency « Fragmentation transparency horizontal fragmentation: selection vertical fragmentation: projection hybrid Potentially Improved Performance Proximity of data to its points of use « Requires some support for fragmentation and replication Parallelism in execution « Inter-query parallelism « Intra-query parallelism Distributed DBMS Issues Distributed Database Design « how to distribute the database « replicated & non-replicated database distribution « a related problem in directory management Query Processing « convert user transactions to data manipulation instructions « optimization problem « min{cost = data transmission + local processing} « general formulation is NP-hard Concurrency Control « synchronization of concurrent accesses « consistency and isolation of transactions' effects « deadlock management Reliability « how to make the system resilient to failures « atomicity and durability
  3. 3. Operating System Support « operating system with proper support for database operations « dichotomy between general purpose processing requirements and database processing requirements Open Systems and Interoperability « Distributed Multidatabase Systems « More probable scenario « Parallel issues Design Problem In the general setting : Making decisions about the placement of data and programs across the sites of a computer network as well as possibly designing the network itself. In Distributed DBMS, the placement of applications entails « placement of the distributed DBMS software; and « placement of the applications that run on the database Distribution Design Top-down « mostly in designing systems from scratch « mostly in homogeneous systems Bottom-up « when the databases already exist at a number of sites Distribution Design Issues Why fragment at all? How to fragment? How much to fragment? How to test correctness? How to allocate? Information requirements? Fragmentation Can't we just distribute relations? What is a reasonable unit of distribution? « relation views are subsets of relations ê locality extra communication « fragments of relations (sub-relations) concurrent execution of a number of transactions that access different portions of a relation . views that cannot be defined on a single fragment will require extra processing . semantic data control (especially integrity enforcement) more difficult .
  4. 4. Correctness of Fragmentation Completeness « Decomposition of relation R into fragments R1, R2, ..., Rn is complete if and only if each data item in R can also be found in some Ri . Reconstruction « If relation R is decomposed into fragments R1, R2, ..., Rn, then there should exist some relational operator ∇such that R = ∇1≤i≤nRi . Disjointness « If relation R is decomposed into fragments R1, R2, ..., Rn, and data item di is in Rj, then di should not be in any other fragment Rk (k ≠ j ). Allocation Alternatives Non-replicated « partitioned : each fragment resides at only one site Replicated « fully replicated : each fragment at each site « partially replicated : each fragment at some of the sites Information Requirements Four categories: Database information Application information Communication network information Computer system information Fragmentation Horizontal Fragmentation (HF) « Primary Horizontal Fragmentation (PHF) « Derived Horizontal Fragmentation (DHF) Vertical Fragmentation (VF) Hybrid Fragmentation (HF) Primary Horizontal Fragmentation Definition : Rj = σFj (R ), 1 ≤ j ≤ w where Fj is a selection formula, which is (preferably) a minterm predicate. Therefore, A horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate mi. ⇓ Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates. Set of horizontal fragments also referred to as minterm fragments.
  5. 5. PHF – Algorithm Given: A relation R, the set of simple predicates Pr Output: The set of fragments of R = {R1, R2,...,Rw} which obey the fragmentation rules. Preliminaries : « Pr should be complete « Pr should be minimal PHF – Example Two candidate relations : PAY and PROJ. Fragmentation of relation PAY « Application: Check the salary info and determine raise. « Employee records kept at two sites ⇒ application run at two sites « Simple predicates p1 : SAL ≤ 30000 p2 : SAL > 30000 Pr = {p1,p2} which is complete and minimal Pr'=Pr « Minterm predicates m1 : (SAL ≤ 30000) m2 : NOT(SAL ≤ 30000) = (SAL > 30000) Derived Horizontal Fragmentation Defined on a member relation of a link according to a selection operation specified on its owner. « Each link is an equijoin. « Equijoin can be implemented by means of semijoins. DHF – Definition Given a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are defined as Ri = R F Si, 1≤i≤w where w is the maximum number of fragments that will be defined on R and Si = σFi (S) where Fi is the formula according to which the primary horizontal fragment Si is defined. DHF – Example Given link L1 where owner(L1)=SKILL and member(L1)=EMP EMP1 = EMP SKILL1 EMP2 = EMP SKILL2 where SKILL1 = σ SAL≤30000 (SKILL) SKILL2 = σSAL>30000 (SKILL)
  6. 6. Vertical Fragmentation Has been studied within the centralized context « design methodology « physical clustering More difficult than horizontal, because more alternatives exist. Two approaches : « grouping attributes to fragments « splitting relation to fragments Overlapping fragments « grouping Non-overlapping fragments « splitting We do not consider the replicated key attributes to be overlapping. Advantage: Easier to enforce functional dependencies (for integrity checking etc.) Query Processing Components Query language that is used ➠ SQL: “intergalactic dataspeak” Query execution methodology ➠ The steps that one goes through in executing high- level (declarative) user queries. Query optimization ➠ How do we determine the “best” execution plan? Selecting Alternatives SELECT ENAME FROM EMP,ASG WHERE EMP.ENO = ASG.ENO AND DUR > 37 Strategy 1 ΠENAME(σDUR>37∧EMP.ENO=ASG.ENO (EMP × ASG)) Strategy 2 ΠENAME(EMP (σDUR>37 (ASG))) ENO Strategy 2 avoids Cartesian product, so is “better”
  7. 7. Cost of Alternatives Assume: ➠ size(EMP) = 400, size(ASG) = 1000 ➠ tuple access cost = 1 unit; tuple transfer cost = 10 units Strategy 1 produce ASG': (10+10)∗tuple access cost 20 transfer ASG' to the sites of EMP: (10+10)∗tuple transfer cost 200 produce EMP': (10+10) ∗tuple access cost∗2 40 transfer EMP' to result site: (10+10) ∗tuple transfer cost 200 Total cost 460 Strategy 2 transfer EMP to site 5:400∗tuple transfer cost 4,000 transfer ASG to site 5 :1000∗tuple transfer cost 10,000 produce ASG':1000∗tuple access cost 1,000 join EMP and ASG':400∗20∗tuple access cost 8,000 Total cost 23,000 Query Optimization Objectives Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Wide area networks ➠ communication cost will dominate low bandwidth low speed high protocol overhead ➠ most algorithms ignore all other cost components Local area networks ➠ communication cost not that dominant ➠ total cost function should be considered Can also maximize throughput Query Optimization Issues – Types of Optimizers Exhaustive search ➠ cost-based ➠ optimal ➠ combinatorial complexity in the number of relations Heuristics ➠ not optimal ➠ regroup common sub-expressions ➠ perform selection, projection first ➠ replace a join by a series of semijoins ➠ reorder operations to reduce intermediate relation size ➠ optimize individual operations
  8. 8. Query Optimization Issues – Optimization Granularity Single query at a time ➠ cannot use common intermediate results Multiple queries at a time ➠ efficient if many similar queries ➠ decision space is much larger Query Optimization Issues – Optimization Timing Static ➠ compilation ⇒ optimize prior to the execution ➠ difficult to estimate the size of the intermediate results ⇒ error propagation ➠ can amortize over many executions ➠ R* Dynamic ➠ run time optimization ➠ exact information on the intermediate relation sizes ➠ have to reoptimize for multiple executions ➠ Distributed INGRES Hybrid ➠ compile using a static algorithm ➠ if the error in estimate sizes > threshold, reoptimize at run time ➠ MERMAID Query Optimization Issues – Statistics Relation ➠ cardinality ➠ size of a tuple ➠ fraction of tuples participating in a join with another relation Attribute ➠ cardinality of domain ➠ actual number of distinct values Common assumptions ➠ independence between different attribute values ➠ uniform distribution of attribute values within their domain Query Optimization Issues – Decision Sites Centralized ➠ single site determines the “best” schedule ➠ simple ➠ need knowledge about the entire distributed database
  9. 9. Distributed ➠ cooperation among sites to determine the schedule ➠ need only local information ➠ cost of cooperation Hybrid ➠ one site determines the global schedule ➠ each site optimizes the local subqueries Query Optimization Issues – Network Topology Wide area networks (WAN) – point-to-point ➠ characteristics low bandwidth low speed high protocol overhead ➠ communication cost will dominate; ignore all other cost factors ➠ global schedule to minimize communication cost ➠ local schedules according to centralized query optimization Local area networks (LAN) ➠ communication cost not that dominant ➠ total cost function should be considered ➠ broadcasting can be exploited (joins) ➠ special algorithms exist for star networks Cost-Based Optimization Solution space ➠ The set of equivalent algebra expressions (query trees). Cost function (in terms of time) ➠ I/O cost + CPU cost + communication cost ➠ These might have different weights in different distributed environments (LAN vs WAN). ➠ Can also maximize throughput Search algorithm ➠ How do we move inside the solution space? ➠ Exhaustive search, heuristic algorithms (iterative improvement, simulated annealing, genetic,...) Join Ordering in Fragment Queries Ordering joins ➠ Distributed INGRES ➠ System R* Semijoin ordering ➠ SDD-1
  10. 10. Join Ordering Consider two relations only if size (R) < size (S) R S if size (R) > size (S) Multiple relations more difficult because too many alternatives. ➠ Compute the cost of all alternatives and select the best one. Necessary to compute the size of intermediate relations which is difficult. ➠ Use heuristics Join Ordering – Example Execution alternatives: 1. EMP → Site 2 2. ASG → Site 1 Site 2 computes EMP'=EMP ASG Site 1 computes EMP'=EMP ASG EMP' → Site 3 EMP' → Site 3 Site 3 computes EMP’ PROJ Site 3 computes EMP’ PROJ 3. ASG → Site 3 4. PROJ → Site 2 Site 3 computes ASG'=ASG PROJ Site 2 computes PROJ'=PROJ ASG ASG' → Site 1 PROJ' → Site 1 Site 1 computes ASG' EMP Site 1 computes PROJ' EMP 5. EMP → Site 2 PROJ → Site 2 Site 2 computes EMP PROJ ASG Semijoin Algorithms Consider the join of two relations: ➠ R[A] (located at site 1) ➠ S[A] (located at site 2) Alternatives: 1 Do the join R S A 2 Perform one of the semijoin equivalents 3 Perform the join SDD-1 Algorithm Based on the Hill Climbing Algorithm ➠ Semijoins ➠ No replication ➠ No fragmentation ➠ Cost of transferring the result to the user site from the final result site is not considered ➠ Can minimize either total time or response time
  11. 11. Hill Climbing Algorithm Assume join is between three relations. Step 1: Do initial processing Step 2: Select initial feasible solution (ES0) 2.1 Determine the candidate result sites – sites where a relation referenced in the query exist 2.2 Compute the cost of transferring all the other referenced relations to each candidate site 2.3 ES0 = candidate site with minimum cost Step 3: Determine candidate splits of ES0 into {ES1, ES2} 3.1 ES1 consists of sending one of the relations to the other relation's site 3.2 ES2 consists of sending the join of the relations to the final result site Step 4: Replace ES0 with the split schedule which gives cost(ES1) + cost(local join) + cost(ES2) < cost(ES0) Step 5: Recursively apply steps 3–4 on ES1 and ES2 until no such plans can be found Step 6: Check for redundant transmissions in the final plan and eliminate them. SDD-1 Algorithm Initialization Step 1: In the execution strategy (call it ES), include all the local processing Step 2: Reflect the effects of local processing on the database profile Step 3: Construct a set of beneficial semijoin operations (BS) as follows : BS = Ø For each semijoin SJi BS ← BS ∪ SJi if cost(SJi ) < benefit(SJi) SDD-1 Algorithm – Example Consider the following query SELECT R3.C FROM R1, R2, R3 WHERE R1.A = R2.A AND R2.B = R3.B Iterative Process Step 4: Remove the most beneficial SJi from BS and append it to ES Step 5: Modify the database profile accordingly Step 6: Modify BS appropriately ➠ compute new benefit/cost values ➠ check if any new semijoin need to be included in BS Step 7: If BS ≠ Ø, go back to Step 4. Assembly Site Selection Step 8: Find the site where the largest amount of data resides and select it as the assembly site Example: Amount of data stored at sites: Site 1: 360 Site 2: 360 Site 3: 2000 Therefore, Site 3 will be chosen as the assembly site.
  12. 12. Postprocessing Step 9: For each Ri at the assembly site, find the semijoins of the type Ri Rj where the total cost of ES without this semijoin is smaller than the cost with it and remove the semijoin from ES. Note : There might be indirect benefits. ➠ Example: No semijoins are removed. Step 10: Permute the order of semijoins if doing so would improve the total cost of ES. ➠ Example: Final strategy: Send (R2 R1) R3 to Site 3 Send R1 R2 to Site 3 Distributed Query Optimization Problems Cost model ➠ multiple query optimization ➠ heuristics to cut down on alternatives Larger set of queries ➠ optimization only on select-project-join queries ➠ also need to handle complex queries (e.g., unions, disjunctions, aggregations and sorting) Optimization cost vs execution cost tradeoff ➠ heuristics to cut down on alternatives ➠ controllable search strategies Optimization/reoptimization interval ➠ extent of changes in database profile before reoptimization is necessary Transaction A transaction is a collection of actions that make consistent transformations of system states while preserving system consistency. « concurrency transparency « failure transparency Transaction Example – A Simple SQL Query Transaction BUDGET_UPDATE begin EXEC SQL UPDATE PROJ SET BUDGET = BUDGET∗1.1 WHERE PNAME = “CAD/CAM” end. Example Database Consider an airline reservation example with the relations: FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP) CUST(CNAME, ADDR, BAL) FC(FNO, DATE, CNAME,SPECIAL)
  13. 13. Example Transaction – SQL Version Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL UPDATE FLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES (flight_no, date, customer_name, null); output(“reservation completed”) end . {Reservation} Termination of Transactions Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL SELECT STSOLD,CAP INTO temp1,temp2 FROM FLIGHT WHERE FNO = flight_no AND DATE = date; if temp1 = temp2 then output(“no free seats”); Abort else EXEC SQL UPDATEFLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES(flight_no, date, customer_name, null); Commit output(“reservation completed”) endif end . {Reservation} Example Transaction – Reads & Writes Begin_transaction Reservation begin input(flight_no, date, customer_name); temp ← Read(flight_no(date).stsold); if temp = flight(date).cap then begin output(“no free seats”); Abort end else begin Write(flight(date).stsold, temp + 1);
  14. 14. Write(flight(date).cname, customer_name); Write(flight(date).special, null); Commit; output(“reservation completed”) end end. {Reservation} Properties of Transactions ATOMICITY « all or nothing CONSISTENCY « no violation of integrity constraints ISOLATION « concurrent changes invisible È serializable DURABILITY « committed updates persist Atomicity Either all or none of the transaction's operations are performed. Atomicity requires that if a transaction is interrupted by a failure, its partial results must be undone. The activity of preserving the transaction's atomicity in presence of transaction aborts due to input errors, system overloads, or deadlocks is called transaction recovery. The activity of ensuring atomicity in the presence of system crashes is called crash recovery. Consistency Internal consistency « A transaction which executes alone against a consistent database leaves it in a consistent state. « Transactions do not violate database integrity constraints. Transactions are correct programs Consistency Degrees Degree 0 « Transaction T does not overwrite dirty data of other transactions « Dirty data refers to data values that have been updated by a transaction prior to its commitment Degree 1 « T does not overwrite dirty data of other transactions « T does not commit any writes before EOT Degree 2 « T does not overwrite dirty data of other transactions « T does not commit any writes before EOT « T does not read dirty data from other transactions
  15. 15. Degree 3 « T does not overwrite dirty data of other transactions « T does not commit any writes before EOT « T does not read dirty data from other transactions « Other transactions do not dirty any data read by T before T completes. Isolation Serializability « If several transactions are executed concurrently, the results must be the same as if they were executed serially in some order. Incomplete results « An incomplete transaction cannot reveal its results to other transactions before its commitment. « Necessary to avoid cascading aborts. Durability Once a transaction commits, the system must guarantee that the results of its operations will never be lost, in spite of subsequent failures. Database recovery Characterization of Transactions Based on « Application areas non-distributed vs. distributed compensating transactions heterogeneous transactions « Timing on-line (short-life) vs batch (long-life) « Organization of read and write actions two-step restricted action model « Structure flat (or simple) transactions nested transactions workflows Transaction Structure Flat transaction « Consists of a sequence of primitive operations embraced between a begin and end markers. Begin_transaction Reservation ... end.
  16. 16. Nested transaction « The operations of a transaction may themselves be transactions. Begin_transaction Reservation ... Begin_transaction Airline – ... end. {Airline} Begin_transaction Hotel ... end. {Hotel} end. {Reservation} Transaction Processing Issues Transaction structure (usually called transaction model) « Flat (simple), nested Internal database consistency « Semantic data control (integrity enforcement) algorithms Reliability protocols « Atomicity & Durability « Local recovery protocols « Global commit protocols Concurrency control algorithms « How to synchronize concurrent transaction executions (correctness criterion) « Intra-transaction consistency, Isolation Replica control protocols « How to control the mutual consistency of replicated data « One copy equivalence and ROWA Concurrency Control The problem of synchronizing concurrent transactions such that the consistency of the database is maintained while, at the same time, maximum degree of concurrency is achieved. Anomalies: « Lost updates The effects of some transactions are not reflected on the database. « Inconsistent retrievals A transaction, if it reads the same data item more than once, should always read the same value.
  17. 17. Serializability in Distributed DBMS Somewhat more involved. Two histories have to be considered: « local histories « global history For global transactions (i.e., global history) to be serializable, two conditions are necessary: « Each local history should be serializable. « Two conflicting operations should be in the same relative order in all of the local histories where they appear together. Concurrency Control Algorithms Pessimistic « Two-Phase Locking-based (2PL) Centralized (primary site) 2PL Primary copy 2PL Distributed 2PL « Timestamp Ordering (TO) Basic TO Multiversion TO Conservative TO « Hybrid Optimistic « Locking-based « Timestamp ordering-based Two-Phase Locking (2PL) A Transaction locks an object before using it. When an object is locked by another transaction, the requesting transaction must wait. ̧ When a transaction releases a lock, it may not request another lock. Centralized 2PL There is only one 2PL scheduler in the distributed system. Lock requests are issued to the central scheduler. Distributed 2PL 2PL schedulers are placed at each site. Each scheduler handles lock requests for data at that site. A transaction may read any of the replicated copies of item x, by obtaining a read lock on one of the copies of x. Writing into x requires obtaining write locks for all copies of x. Timestamp Ordering Transaction (Ti) is assigned a globally unique timestamp ts(Ti). Transaction manager attaches the timestamp to all operations issued by the transaction. Each data item is assigned a write timestamp (wts) and a read timestamp (rts): « rts(x) = largest timestamp of any read on x
  18. 18. « wts(x) = largest timestamp of any read on x Conflicting operations are resolved by timestamp order. Basic T/O: for Ri(x) for Wi(x) if ts(Ti) < wts(x) if ts(Ti) < rts(x) and ts(Ti) < wts(x) then reject Ri(x) then reject Wi(x) else accept Ri(x) else accept Wi(x) rts(x) ← ts(Ti) wts(x) ← ts(Ti) Deadlock A transaction is deadlocked if it is blocked and will remain blocked until there is intervention. Locking-based CC algorithms may cause deadlocks. TO-based algorithms that involve waiting may cause deadlocks. Wait-for graph « If transaction Ti waits for another transaction Tj to release a lock on an entity, then Ti → Tj in WFG. Deadlock Management Ignore « Let the application programmer deal with it, or restart the system Prevention « Guaranteeing that deadlocks can never occur in the first place. Check transaction when it is initiated. Requires no run time support. Avoidance « Detecting potential deadlocks in advance and taking action to insure that deadlock will not occur. Requires run time support. Detection and Recovery « Allowing deadlocks to form and then finding and breaking them. As in the avoidance scheme, this requires run time support. Deadlock Prevention All resources which may be needed by a transaction must be predeclared. « The system must guarantee that none of the resources will be needed by an ongoing transaction. « Resources must only be reserved, but not necessarily allocated a priori « Unsuitability of the scheme in database environment « Suitable for systems that have no provisions for undoing processes. Evaluation: – Reduced concurrency due to preallocation – Evaluating whether an allocation is safe leads to added overhead. – Difficult to determine (partial order) • No transaction rollback or restart is involved. Deadlock Avoidance Transactions are not required to request resources a priori. Transactions are allowed to proceed unless a requested resource is unavailable. In case of conflict, transactions may be allowed to wait for a fixed time interval. Order either the data items or the sites and always request locks in that order. More attractive than prevention in a database environment.
  19. 19. Distributed Deadlock Detection Sites cooperate in detection of deadlocks. One example: « The local WFGs are formed at each site and passed on to other sites. Each local WFG is modified as follows: Since each site receives the potential deadlock cycles from other sites, these edges are added to the local WFGs · The edges in the local WFG which show that local transactions are waiting for transactions at other sites are joined with edges in the local WFGs which show that remote transactions are waiting for local ones. « Each local deadlock detector: looks for a cycle that does not involve the external edge. If it exists, there is a local deadlock which can be handled locally. looks for a cycle involving the external edge. If it exists, it indicates a potential global deadlock. Pass on the information to the next site. Reliability Problem: How to maintain atomicity durability properties of transactions Fundamental Definitions Reliability « A measure of success with which a system conforms to some authoritative specification of its behavior. « Probability that the system has not experienced any failures within a given time period. « Typically used to describe systems that cannot be repaired or where the continuous operation of the system is critical. Availability « The fraction of the time that a system meets its specification. « The probability that the system is operational at a given time t. All The Best...