SlideShare a Scribd company logo
Chapter 25
Distributed Databases and
Client-Server Architectures
Copyright © 2004 Pearson Education, Inc.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-3
Chapter 25 Outline
1 Distributed Database Concepts
2 Data Fragmentation, Replication and Allocation
3 Types of Distributed Database Systems
4 Query Processing
5 Concurrency Control and Recovery
6 3-Tier Client-Server Architecture
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-4
Distributed Database Concepts
It is a system to process Unit of execution (a transaction) in
a distributed manner. That is, a transaction can be executed
by multiple networked computers in a unified manner.
It can be defined as
A distributed database (DDB) is a collection of multiple
logically related database distributed over a computer
network, and a distributed database management system
as a software system that manages a distributed
database while making the distribution transparent to
the user.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-5
Distributed Database System
Advantages
1. Management of distributed data with different
levels of transparency: This refers to the physical
placement of data (files, relations, etc.) which is not
known to the user (distribution transparency).
Communications neteork
Site 5
Site 1
Site 2
Site 4
Site 3
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-6
Distributed Database System
Advantages
The EMPLOYEE, PROJECT, and WORKS_ON tables may be
fragmented horizontally and stored with possible replication as shown
below.
Communications neteork
Atlanta
San Francisco
EMPLOYEES - All
PROJECTS - All
WORKS_ON - All
Chicago
(headquarters)
New York
EMPLOYEES - New York
PROJECTS - All
WORKS_ON - New York Employees
EMPLOYEES - San Francisco and LA
PROJECTS - San Francisco
WORKS_ON - San Francisco Employees
Los Angeles
EMPLOYEES - LA
PROJECTS - LA and San Francisco
WORKS_ON - LA Employees
EMPLOYEES - Atlanta
PROJECTS - Atlanta
WORKS_ON - Atlanta Employees
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-7
Distributed Database System
Advantages
• Distribution and Network transparency: Users do not have to
worry about operational details of the network. There is Location
transparency, which refers to freedom of issuing command from
any location without affecting its working. Then there is Naming
transparency, which allows access to any names object (files,
relations, etc.) from any location.
• Replication transparency: It allows to store copies of a data at
multiple sites as shown in the above diagram. This is done to
minimize access time to the required data.
• Fragmentation transparency: Allows to fragment a relation
horizontally (create a subset of tuples of a relation) or vertically
(create a subset of columns of a relation).
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-8
Distributed Database System
Advantages
2. Increased reliability and availability: Reliability refers to
system live time, that is, system is running efficiently most of the
time. Availability is the probability that the system is
continuously available (usable or accessible) during a time
interval. A distributed database system has multiple nodes
(computers) and if one fails then others are available to do the
job.
3. Improved performance: A distributed DBMS fragments the
database to keep data closer to where it is needed most. This
reduces data management (access and modification) time
significantly.
4. Easier expansion (scalability): Allows new nodes (computers)
to be added anytime without chaining the entire configuration.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-9
Data Fragmentation, Replication and
Allocation
Data Fragmentation
Split a relation into logically related and correct parts. A
relation can be fragmented in two ways:
Horizontal fragmentation
It is a horizontal subset of a relation which contain those of
tuples which satisfy selection conditions.
Consider the Employee relation with selection condition (DNO
= 5). All tuples satisfy this condition will create a subset which
will be a horizontal fragment of Employee relation.
A selection condition may be composed of several conditions
connected by AND or OR.
Derived horizontal fragmentation: It is the partitioning of a
primary relation to other secondary relations which are related
with Foreign keys.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-10
Data Fragmentation, Replication and
Allocation
Vertical fragmentation
It is a subset of a relation which is created by a subset of
columns. Thus a vertical fragment of a relation will contain
values of selected columns. There is no selection condition used
in vertical fragmentation.
Consider the Employee relation. A vertical fragment of can be
created by keeping the values of Name, Bdate, Sex, and
Address.
Because there is no condition for creating a vertical fragment,
each fragment must include the primary key attribute of the
parent relation Employee. In this way all vertical fragments of
a relation are connected.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-11
Data Fragmentation, Replication and
Allocation
Representation
Horizontal fragmentation
Each horizontal fragment on a relation can be specified by a sCi
(R) operation in the relational algebra.
Complete horizontal fragmentation
A set of horizontal fragments whose conditions C1, C2, …, Cn
include all the tuples in R- that is, every tuple in R satisfies (C1
OR C2 OR … OR Cn).
Disjoint complete horizontal fragmentation: No tuple in R
satisfies (Ci AND Cj) where i ≠ j.
To reconstruct R from horizontal fragments a UNION is
applied.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-12
Data Fragmentation, Replication and
Allocation
Representation
Vertical fragmentation
A vertical fragment on a relation can be specified by a Li(R)
operation in the relational algebra.
Complete vertical fragmentation
A set of vertical fragments whose projection lists L1, L2, …, Ln
include all the attributes in R but share only the primary key of
R. In this case the projection lists satisfy the following two
conditions:
L1  L2  ...  Ln = ATTRS (R)
Li  Lj = PK(R) for any i j, where ATTRS (R) is the set of
attributes of R and PK(R) is the primary key of R.
To reconstruct R from complete vertical fragments a OUTER
UNION is applied.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-13
Data Fragmentation, Replication and
Allocation
Mixed (Hybrid) fragmentation
A combination of Vertical fragmentation and Horizontal
fragmentation.
This is achieved by SELECT-PROJECT operations which is
represented by Li(sCi (R)).
If C = True (Select all tuples) and L ≠ ATTRS(R), we get a
vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get a
mixed fragment.
If C = True and L = ATTRS(R), then R can be considered a
fragment.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-14
Data Fragmentation, Replication and
Allocation
Fragmentation schema
A definition of a set of fragments (horizontal or vertical or
horizontal and vertical) that includes all attributes and tuples
in the database that satisfies the condition that the whole
database can be reconstructed from the fragments by applying
some sequence of UNION (or OUTER JOIN) and UNION
operations.
Allocation schema
It describes the distribution of fragments to sites of distributed
databases. It can be fully or partially replicated or can be
partitioned.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-15
Data Fragmentation, Replication and
Allocation
Data Replication
Database is replicated to all sites. In full replication the entire
database is replicated and in partial replication some selected
part is replicated to some of the sites. Data replication is
achieved through a replication schema.
Data Distribution (Data Allocation)
This is relevant only in the case of partial replication or
partition. The selected portion of the database is distributed to
the database sites.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
pter 25-16
Disadvantages of distributed databases
 Complexity —
– Extra work must be done by the DBAs to ensure that the distributed nature of the
system is transparent.
– Extra work must also be done to maintain multiple disparate systems, instead of one
big one.
– Extra database design work must also be done to account for the disconnected nature
of the database — for example, joins become prohibitively expensive when performed
across multiple systems.
 Economics — increased complexity and a more extensive infrastructure means
extra labour costs.
 Security — remote database fragments must be secured, and they are not
centralized so the remote sites must be secured as well. The infrastructure must
also be secured (eg: by encrypting the network links between remote sites).
 Difficult to maintain integrity — in a distributed database enforcing integrity over
a network may require too much networking resources to be feasible.
 No reliance on central site .
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-17
Additional Functions of
Distributed Databases
 Distributed query processing
 Distributed transaction management (e.g.,two-phase commit)
 Replication management
 Distributed database recovery
 Distributed security issues
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Allocation of fragments to sites.
(a) Relation fragments at site 2
corresponding to department 5.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Allocation of fragments to sites.
(b) Relation fragments at site 3
corresponding to department 4.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-20
FIGURE 25.4
Complete and disjoint fragments of the WORKS_ON relation. (a)
Fragments of WORKS_ON for employees working in department 5
(C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=5)]).
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-21
FIGURE 25.4 (continued)
Complete and disjoint fragments of the WORKS_ON relation. (b)
Fragments of WORKS_ON for employees working in department 4
(C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=4)]).
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-22
FIGURE 25.4 (continued)
Complete and disjoint fragments of the WORKS_ON relation. (c)
Fragments of WORKS_ON for employees working in department 1
(C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=1)]).
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Works_on tuples for
Employees who work
for dept 5
Works_on tuples for
Employees who work
for dept 4
Projects controlled by
dept5
G1,G2,G3,G4,G7
G4,G5,G6,G2,G8
• Union of G1,G2,G3
• Union of G4,G5,G6
• Union of G7,G8,G9
• Union of G1,G4,G7
• Fragments at site2
• Fragments at site 3
Works_on tuples for
Employees who work
for dept 5
Works_on tuples for
Employees who work
for dept 4
Works_on tuples for
Employees who work
for dept 1
Projects controlled by
dept5
G1,G2,G3,G4,G7
G4,G5,G6,G2,G8
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-24
Types of Distributed Database Systems
Homogeneous
All sites of the database system have identical setup, i.e., same database system
software. The underlying operating system may be different. For example, all
sites run Oracle or DB2, or Sybase or some other database system. The
underlying operating systems can be a mixture of Linux, Window, Unix, etc.
The clients thus have to use identical client software.
Communications
neteork
Site 5
Site 1
Site 2
Site 3
Oracle Oracle
Oracle
Oracle
Site 4
Oracle
Linux
Linux
Window
Window
Unix
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-25
Types of Distributed Database Systems
Heterogeneous
Federated: Each site may run different database system but the data
access is managed through a single conceptual schema. This implies
that the degree of local autonomy is minimum. Each site must adhere
to a centralized access policy. There may be a global schema.
Multidatabase: There is no one conceptual global schema. For data
access a schema is constructed dynamically as needed by the
application software.
Communications
network
Site 5
Site 1
Site 2
Site 3
Network
DBMS
Relational
Site 4
Object
Oriented
Linux
Linux
Unix
Hierarchical
Object
Oriented
Relational
Unix
Window
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-26
Homogeneous vs heterogeneous
– Hardware
– Data model: relational, object, hierarchical
– Brand DBMS
– Version of query language
Level of local autonomy
Local vs global schema
– Federated: some global schema
– Multidatabase: no (or limited) global schema
Types of Distributed Database Systems
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-27
Five level architecture of FDBS
 In this one local schema is the conceptual schema
(complete database definition) of a component
database.
 Component schema is derived by translating the local
schema into a common data model for the FDBS.
 The export schema represents the subset of the
component schema that is available to the FDBS.
 The federated schema is the global schema or view,
which is the result of integrating all the shareable
export schemas.
 The external schema define the schema for a user
group or an application.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-28
FIGURE 25.5
The five-level
schema architecture
in a federated
database system
(FDBS). Source:
Adapted from Sheth
and Larson,
Federated Database
Systems for
Managing
Distributed
Heterogeneous
Autonomous
Databases. ACM
Computing Surveys
(Vol. 22: No. 3,
September 1990).
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-29
Data Transfer Costs of Distributed Query Processing
Distributed Query Processing Using Semijoin
Query and Update Decomposition
Query Processing in Distributed
Databases
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-30
Examples to illustrate volume of data transferred.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-31
Query Processing in Distributed Databases
Cost of transferring data (files and results) over the network.
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno
Issues
This cost is usually high so some optimization is necessary.
Example relations: Employee at site 1 and Department at Site 2
Dname Dnumber Mgrssn Mgrstartdate
Employee at site 1. 10, 000 rows. Row size = 100 bytes. Table size = 106 bytes.
Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3500 bytes.
Q: For each employee, retrieve employee name and department
nameWhere the employee works.
Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-32
Query Processing in Distributed Databases
Result
Stretagies:
1. Transfer Employee and Department to site 3. Total transfer bytes
= 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3. Query result size = 40 * 10,000 = 400,000 bytes.
Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
The result of this query will have 10,000 tuples, assuming that every
employee is related to a department.
Suppose each result tuple is 40 bytes long. The query is submitted at
site 3 and the result is sent to this site.
Problem: Employee and Department relations are not present at site 3.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-33
Query Processing in Distributed Databases
Stretagies:
3. Transfer Department relation to site 1, execute the join at site 1,
and send the result to site 3. Total bytes transferred = 400,000 +
3500 = 403,500 bytes.
Optimization criteria: minimizing data transfer.
Preferred approach: strategy 3.
Consider the query
Q’: For each department, retrieve the department name and the
name of the department manager
Relational Algebra expression:
Fname,Lname,Dname (Employee Mgrssn = SSN Department)
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-34
Query Processing in Distributed Databases
The result of this query will have 100 tuples, assuming that every
department has a manager, the execution strategies are:
Stretagies:
1. Transfer Employee and Department to the result site and perorm
the join at site 3. Total bytes transferred = 1,000,000 + 3500 =
1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3. Query result size = 40 * 100 = 4000 bytes. Total
transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1 and
send the result to site 3. Total transfer size = 4000 + 3500 = 7500
bytes.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-35
Query Processing in Distributed Databases
Preferred strategy: Chose strategy 3.
Now suppose the result site is 2. Possible strategies:
Possible strategies :
1. Transfer Employee relation to site 2, execute the query and present
the result to the user at site 2. Total transfer size = 1,000,000 bytes
for both queries Q and Q’.
2. Transfer Department relation to site 1, execute join at site 1 and
send the result back to site 2. Total transfer size for Q = 400,000 +
3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-36
Query Processing in Distributed Databases
Semijoin: Objective is to reduce the number of tuples in a relation
before transferring it to another site.
Example execution of Q or Q’:
1. Project the join attributes of Department at site 2, and transfer
them to site 1. For Q, 4 * 100 = 400 bytes are transferred and for
Q’, 9 * 100 = 900 bytes are transferred.
2. Join the transferred file with the Employee relation at site 1, and
transfer the required attributes from the resulting file to site 2.
For Q, 34 * 10,000 = 340,000 bytes are transferred and for Q’, 39 *
100 = 3900 bytes are transferred.
3. Execute the query by joining the transferred file with Department
and present the result to the user at site 2.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-37
Query and Update decomposition
• For queries , query decomposition module must break up
or decompose a query into subqueries that can be executed at the
individual sites.
• A strategy must be there for combining the results of the
subqueries to form the query result.
• Particular replica related to the query must be selected.
• For making such decisions ,DDBMS must maintain catalog
consisting fragmentation, replication and distribution of
information.
• For vertical fragmentation the attribute list for each fragment is
stored.
• For horizontal fragmentation , Guard condition is stored.
• For mixed both are stored.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-38
FIGURE 25.7
Guard conditions and attributes lists for fragments.
(a) Site 2 fragments.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-39
FIGURE 25.7 (continued)
Guard conditions and attributes lists for fragments.
(b) Site 3 fragments.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
User wants to insert a new employee tuple
<‘Alex’, ‘B’, ‘coleman’, ‘345671239’, ‘22-
APR-04’, ‘3306 SAND-STONE’, ‘M’,
33000, ‘957654321’, 4>
Decomposed by the DDBMS into two insert
request.
– Insert the full tuple in employee at site 1.
– Insert the projected tuple <‘Alex’, ‘B’,
‘coleman’, ‘345671239’, 33000, ‘957654321’,
4> in the EMPD4 fragment at site 3.
Chapter 25-40
Query and Update decomposition
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Retrieve the names and hours per week for
each employee who works on some project
controlled by department 5.
 Query is submitted at site 2
– SQL : SELECT fname,lname,hours
FROM employee, projects, works_on
WHERE dnum = 5 and pnumber = pno
and essn = ssn;
Chapter 25-41
Query and Update decomposition
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
 The DDBMS can determine from the guard condition on
projs5 and works_on5 that all tuples satisfying the condition (
dnum = 5 and dnumber = pno ) reside at site2.
 Decompose the query
– T1 ∏ESSN ( PROJS5PNUMBER=PNO WORKS_ON5)
– T2 ∏ESSN, FNAME, LNAME ( T1ESSN=SSN EMPLOYEE)
– RESULT ∏FNAME, LNAME, HOURS ( T2 * WORKS_ON5 )
 EXECUTE USING SEMIJOIN
– Execute T1 in site 2 & send ESSN to site 1
– Execute T2 in site 1 & send result to site 2
– Calculate result at site 2
Chapter 25-42
Query and Update decomposition
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-43
Two-phase commit protocol
 A global recovery manager (or coordinator) performs the
two-phase commit protocol:
1)When all participating databases signal the coordinator that
the part of the transaction involving each has concluded, the
coordinator sends a message “prepare for commit” to each
participant,
2)If all participating databases reply “OK”, the transaction is
successful and the coordinator sends a “commit” signal to the
participating databases; otherwise it sends a message “roll
back” or UNDO the local effect of the transaction to each
participating database.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-44
Distributed Transactions
 The Two Phase Commit protocol (2PC)
 After prepare phase: participants are ready to commit or to abort; they still
hold locks
 If one of the participants does not reply or is not able to commit for some
reason, the global transaction has to be aborted.
 Problem: if coordinator is unavailable after the prepare phase, resources may
be locked for a long time.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Dealing with multiple copies of data items: The concurrency control must
maintain global consistency. Likewise the recovery mechanism must recover all
copies and maintain consistency after recovery.
Failure of individual sites: Database availability must not be affected due to the
failure of one or two sites and the recovery scheme must recover them before
they are available for use.
Communication link failure: This failure may create network partition which
would affect database availability even though all database sites may be running.
Distributed commit: A transaction may be fragmented and they may be
executed by a number of sites. This require a two-phase commit approach for
transaction commit.
Distributed deadlock: Since transactions are processed at multiple sites, two or
more sites may get involved in deadlock. This must be resolved in a distributed
manner.
Chapter 25-45
Distributed Databases encounter a number of concurrency control and
recovery problems which are not present in centralized databases. Some of
them are listed below.
Concurrency Control and Recovery
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Concurrency Control and Recovery
1. Distributed Concurrency control based on a distinguished copy of a data
item
Communications neteork
Site 5
Site 1
Site 2
Site 4
Site 3
Primary site
 Primary site technique
– A single site is designated as a primary site which serves as a coordinator for
transaction management.
– All locks are kept at that site, and all requests for locking or unlocking are sent
here.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-47
Concurrency Control and Recovery
• Transaction management:
– Concurrency control and commit are managed by this site. In two phase
locking, this site manages locking and releasing data items. If all transactions
follow two-phase policy at all sites, then serializability is guaranteed.
• Advantages:
– An extension to the centralized two phase locking so implementation and
management is simple. Data items are locked only at one site but they can be
accessed at any site.
• Disadvantages:
– All transaction management activities go to primary site which is likely to
overload the site. If the primary site fails, the entire system is inaccessible.
• To aid recovery a backup site is designated which behaves as a
shadow of primary site. In case of primary site failure, backup
site can act as primary site.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-48
 Primary Copy Technique
– In this approach, instead of a site, a data item partition is designated as
primary copy. To lock a data item just the primary copy of the data item
is locked.
• Advantages:
– Since primary copies are distributed at various sites, a single site is not
overloaded with locking and unlocking requests.
• Disadvantages:
– Identification of a primary copy is complex. A distributed directory
must be maintained, possibly at all sites.
 Recovery from a coordinator failure
• In both approaches a coordinator site or copy may become unavailable.
This will require the selection of a new coordinator.
• Primary site approach with no backup site: Aborts and restarts all active
transactions at all sites. Elects a new coordinator and initiates transaction
processing.
• Primary site approach with backup site: Suspends all active transactions,
designates the backup site as the primary site and identifies a new back up
site. Primary site receives all transaction management information to
resume processing.
• Primary and backup sites fail or no backup site: Use election process to
select a new coordinator site.
Concurrency Control and Recovery
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-49
Concurrency Control and Recovery
2. Distributed Concurrency control based on voting: There is no primary
copy of coordinator.
• No distinguished copy.
• Send lock request to sites that have data item.
• Each copy maintains it own lock and can grant or deny the
request.
• If majority of sites grant lock then the requesting transaction gets
the data item.
• Locking information (grant or denied) is sent to all these sites.
• To avoid unacceptably long wait, a time-out period is defined. If
the requesting transaction does not get any vote information then
the transaction is aborted.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-50
Distributed recovery
 The recovery process in DDBMS is quite involved.
 It is quite difficult to determine whether a site is down without
exchanging numerous message with other sites.
 Example
– Site X sends a message to site Y and expects a response from Y but
does not receive it.
 The message was not delivered to Y because of communication failure.
 Site Y is down and could not respond.
 Site Y is running and sent a response, but the response was not delivered.
 Another problem with distributed recovery is distributed
commit.
– When a transaction is updating data at several sites, it cannot commit
until it is sure that the effects of the transaction on every site cannot be
lost.

More Related Content

Similar to DDBMS (1).pptanmhfvnmbjk bjhjjjhkjkhjkjk

Distributed databases
Distributed databasesDistributed databases
Distributed databases
Suneel Dogra
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
Sulemang
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
Tharun Srinivasa
 
Veritas Failover3
Veritas Failover3Veritas Failover3
Veritas Failover3
grogers1124
 

Similar to DDBMS (1).pptanmhfvnmbjk bjhjjjhkjkhjkjk (20)

1 ddbms jan 2011_u
1 ddbms jan 2011_u1 ddbms jan 2011_u
1 ddbms jan 2011_u
 
Pptofdistributeddb
PptofdistributeddbPptofdistributeddb
Pptofdistributeddb
 
DDBS PPT (1).pptx
DDBS PPT (1).pptxDDBS PPT (1).pptx
DDBS PPT (1).pptx
 
chap1.pdf
chap1.pdfchap1.pdf
chap1.pdf
 
Rdbms
RdbmsRdbms
Rdbms
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)
 
Distributed design and architechture .ppt
Distributed design and architechture .pptDistributed design and architechture .ppt
Distributed design and architechture .ppt
 
Hw fdb(2)
Hw fdb(2)Hw fdb(2)
Hw fdb(2)
 
Hw fdb(2)
Hw fdb(2)Hw fdb(2)
Hw fdb(2)
 
Bt0066 database management system1
Bt0066 database management system1Bt0066 database management system1
Bt0066 database management system1
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
 
Data Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File ManualData Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File Manual
 
Distributed databases
Distributed databasesDistributed databases
Distributed databases
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
 
distributed db unit1 korth.ppt
distributed db unit1 korth.pptdistributed db unit1 korth.ppt
distributed db unit1 korth.ppt
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
Veritas Failover3
Veritas Failover3Veritas Failover3
Veritas Failover3
 

Recently uploaded

Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
Kamal Acharya
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DrGurudutt
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
Kamal Acharya
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
Kamal Acharya
 

Recently uploaded (20)

RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
 
An improvement in the safety of big data using blockchain technology
An improvement in the safety of big data using blockchain technologyAn improvement in the safety of big data using blockchain technology
An improvement in the safety of big data using blockchain technology
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 

DDBMS (1).pptanmhfvnmbjk bjhjjjhkjkhjkjk

  • 1.
  • 2. Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
  • 3. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-3 Chapter 25 Outline 1 Distributed Database Concepts 2 Data Fragmentation, Replication and Allocation 3 Types of Distributed Database Systems 4 Query Processing 5 Concurrency Control and Recovery 6 3-Tier Client-Server Architecture
  • 4. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-4 Distributed Database Concepts It is a system to process Unit of execution (a transaction) in a distributed manner. That is, a transaction can be executed by multiple networked computers in a unified manner. It can be defined as A distributed database (DDB) is a collection of multiple logically related database distributed over a computer network, and a distributed database management system as a software system that manages a distributed database while making the distribution transparent to the user.
  • 5. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-5 Distributed Database System Advantages 1. Management of distributed data with different levels of transparency: This refers to the physical placement of data (files, relations, etc.) which is not known to the user (distribution transparency). Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3
  • 6. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-6 Distributed Database System Advantages The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizontally and stored with possible replication as shown below. Communications neteork Atlanta San Francisco EMPLOYEES - All PROJECTS - All WORKS_ON - All Chicago (headquarters) New York EMPLOYEES - New York PROJECTS - All WORKS_ON - New York Employees EMPLOYEES - San Francisco and LA PROJECTS - San Francisco WORKS_ON - San Francisco Employees Los Angeles EMPLOYEES - LA PROJECTS - LA and San Francisco WORKS_ON - LA Employees EMPLOYEES - Atlanta PROJECTS - Atlanta WORKS_ON - Atlanta Employees
  • 7. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-7 Distributed Database System Advantages • Distribution and Network transparency: Users do not have to worry about operational details of the network. There is Location transparency, which refers to freedom of issuing command from any location without affecting its working. Then there is Naming transparency, which allows access to any names object (files, relations, etc.) from any location. • Replication transparency: It allows to store copies of a data at multiple sites as shown in the above diagram. This is done to minimize access time to the required data. • Fragmentation transparency: Allows to fragment a relation horizontally (create a subset of tuples of a relation) or vertically (create a subset of columns of a relation).
  • 8. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-8 Distributed Database System Advantages 2. Increased reliability and availability: Reliability refers to system live time, that is, system is running efficiently most of the time. Availability is the probability that the system is continuously available (usable or accessible) during a time interval. A distributed database system has multiple nodes (computers) and if one fails then others are available to do the job. 3. Improved performance: A distributed DBMS fragments the database to keep data closer to where it is needed most. This reduces data management (access and modification) time significantly. 4. Easier expansion (scalability): Allows new nodes (computers) to be added anytime without chaining the entire configuration.
  • 9. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-9 Data Fragmentation, Replication and Allocation Data Fragmentation Split a relation into logically related and correct parts. A relation can be fragmented in two ways: Horizontal fragmentation It is a horizontal subset of a relation which contain those of tuples which satisfy selection conditions. Consider the Employee relation with selection condition (DNO = 5). All tuples satisfy this condition will create a subset which will be a horizontal fragment of Employee relation. A selection condition may be composed of several conditions connected by AND or OR. Derived horizontal fragmentation: It is the partitioning of a primary relation to other secondary relations which are related with Foreign keys.
  • 10. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-10 Data Fragmentation, Replication and Allocation Vertical fragmentation It is a subset of a relation which is created by a subset of columns. Thus a vertical fragment of a relation will contain values of selected columns. There is no selection condition used in vertical fragmentation. Consider the Employee relation. A vertical fragment of can be created by keeping the values of Name, Bdate, Sex, and Address. Because there is no condition for creating a vertical fragment, each fragment must include the primary key attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected.
  • 11. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-11 Data Fragmentation, Replication and Allocation Representation Horizontal fragmentation Each horizontal fragment on a relation can be specified by a sCi (R) operation in the relational algebra. Complete horizontal fragmentation A set of horizontal fragments whose conditions C1, C2, …, Cn include all the tuples in R- that is, every tuple in R satisfies (C1 OR C2 OR … OR Cn). Disjoint complete horizontal fragmentation: No tuple in R satisfies (Ci AND Cj) where i ≠ j. To reconstruct R from horizontal fragments a UNION is applied.
  • 12. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-12 Data Fragmentation, Replication and Allocation Representation Vertical fragmentation A vertical fragment on a relation can be specified by a Li(R) operation in the relational algebra. Complete vertical fragmentation A set of vertical fragments whose projection lists L1, L2, …, Ln include all the attributes in R but share only the primary key of R. In this case the projection lists satisfy the following two conditions: L1  L2  ...  Ln = ATTRS (R) Li  Lj = PK(R) for any i j, where ATTRS (R) is the set of attributes of R and PK(R) is the primary key of R. To reconstruct R from complete vertical fragments a OUTER UNION is applied.
  • 13. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-13 Data Fragmentation, Replication and Allocation Mixed (Hybrid) fragmentation A combination of Vertical fragmentation and Horizontal fragmentation. This is achieved by SELECT-PROJECT operations which is represented by Li(sCi (R)). If C = True (Select all tuples) and L ≠ ATTRS(R), we get a vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get a mixed fragment. If C = True and L = ATTRS(R), then R can be considered a fragment.
  • 14. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-14 Data Fragmentation, Replication and Allocation Fragmentation schema A definition of a set of fragments (horizontal or vertical or horizontal and vertical) that includes all attributes and tuples in the database that satisfies the condition that the whole database can be reconstructed from the fragments by applying some sequence of UNION (or OUTER JOIN) and UNION operations. Allocation schema It describes the distribution of fragments to sites of distributed databases. It can be fully or partially replicated or can be partitioned.
  • 15. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-15 Data Fragmentation, Replication and Allocation Data Replication Database is replicated to all sites. In full replication the entire database is replicated and in partial replication some selected part is replicated to some of the sites. Data replication is achieved through a replication schema. Data Distribution (Data Allocation) This is relevant only in the case of partial replication or partition. The selected portion of the database is distributed to the database sites.
  • 16. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition pter 25-16 Disadvantages of distributed databases  Complexity — – Extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. – Extra work must also be done to maintain multiple disparate systems, instead of one big one. – Extra database design work must also be done to account for the disconnected nature of the database — for example, joins become prohibitively expensive when performed across multiple systems.  Economics — increased complexity and a more extensive infrastructure means extra labour costs.  Security — remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (eg: by encrypting the network links between remote sites).  Difficult to maintain integrity — in a distributed database enforcing integrity over a network may require too much networking resources to be feasible.  No reliance on central site .
  • 17. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-17 Additional Functions of Distributed Databases  Distributed query processing  Distributed transaction management (e.g.,two-phase commit)  Replication management  Distributed database recovery  Distributed security issues
  • 18. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Allocation of fragments to sites. (a) Relation fragments at site 2 corresponding to department 5.
  • 19. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Allocation of fragments to sites. (b) Relation fragments at site 3 corresponding to department 4.
  • 20. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-20 FIGURE 25.4 Complete and disjoint fragments of the WORKS_ON relation. (a) Fragments of WORKS_ON for employees working in department 5 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=5)]).
  • 21. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-21 FIGURE 25.4 (continued) Complete and disjoint fragments of the WORKS_ON relation. (b) Fragments of WORKS_ON for employees working in department 4 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=4)]).
  • 22. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-22 FIGURE 25.4 (continued) Complete and disjoint fragments of the WORKS_ON relation. (c) Fragments of WORKS_ON for employees working in department 1 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=1)]).
  • 23. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Works_on tuples for Employees who work for dept 5 Works_on tuples for Employees who work for dept 4 Projects controlled by dept5 G1,G2,G3,G4,G7 G4,G5,G6,G2,G8 • Union of G1,G2,G3 • Union of G4,G5,G6 • Union of G7,G8,G9 • Union of G1,G4,G7 • Fragments at site2 • Fragments at site 3 Works_on tuples for Employees who work for dept 5 Works_on tuples for Employees who work for dept 4 Works_on tuples for Employees who work for dept 1 Projects controlled by dept5 G1,G2,G3,G4,G7 G4,G5,G6,G2,G8
  • 24. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-24 Types of Distributed Database Systems Homogeneous All sites of the database system have identical setup, i.e., same database system software. The underlying operating system may be different. For example, all sites run Oracle or DB2, or Sybase or some other database system. The underlying operating systems can be a mixture of Linux, Window, Unix, etc. The clients thus have to use identical client software. Communications neteork Site 5 Site 1 Site 2 Site 3 Oracle Oracle Oracle Oracle Site 4 Oracle Linux Linux Window Window Unix
  • 25. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-25 Types of Distributed Database Systems Heterogeneous Federated: Each site may run different database system but the data access is managed through a single conceptual schema. This implies that the degree of local autonomy is minimum. Each site must adhere to a centralized access policy. There may be a global schema. Multidatabase: There is no one conceptual global schema. For data access a schema is constructed dynamically as needed by the application software. Communications network Site 5 Site 1 Site 2 Site 3 Network DBMS Relational Site 4 Object Oriented Linux Linux Unix Hierarchical Object Oriented Relational Unix Window
  • 26. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-26 Homogeneous vs heterogeneous – Hardware – Data model: relational, object, hierarchical – Brand DBMS – Version of query language Level of local autonomy Local vs global schema – Federated: some global schema – Multidatabase: no (or limited) global schema Types of Distributed Database Systems
  • 27. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-27 Five level architecture of FDBS  In this one local schema is the conceptual schema (complete database definition) of a component database.  Component schema is derived by translating the local schema into a common data model for the FDBS.  The export schema represents the subset of the component schema that is available to the FDBS.  The federated schema is the global schema or view, which is the result of integrating all the shareable export schemas.  The external schema define the schema for a user group or an application.
  • 28. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-28 FIGURE 25.5 The five-level schema architecture in a federated database system (FDBS). Source: Adapted from Sheth and Larson, Federated Database Systems for Managing Distributed Heterogeneous Autonomous Databases. ACM Computing Surveys (Vol. 22: No. 3, September 1990).
  • 29. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-29 Data Transfer Costs of Distributed Query Processing Distributed Query Processing Using Semijoin Query and Update Decomposition Query Processing in Distributed Databases
  • 30. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-30 Examples to illustrate volume of data transferred.
  • 31. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-31 Query Processing in Distributed Databases Cost of transferring data (files and results) over the network. Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno Issues This cost is usually high so some optimization is necessary. Example relations: Employee at site 1 and Department at Site 2 Dname Dnumber Mgrssn Mgrstartdate Employee at site 1. 10, 000 rows. Row size = 100 bytes. Table size = 106 bytes. Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3500 bytes. Q: For each employee, retrieve employee name and department nameWhere the employee works. Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)
  • 32. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-32 Query Processing in Distributed Databases Result Stretagies: 1. Transfer Employee and Department to site 3. Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. The result of this query will have 10,000 tuples, assuming that every employee is related to a department. Suppose each result tuple is 40 bytes long. The query is submitted at site 3 and the result is sent to this site. Problem: Employee and Department relations are not present at site 3.
  • 33. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-33 Query Processing in Distributed Databases Stretagies: 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3. Total bytes transferred = 400,000 + 3500 = 403,500 bytes. Optimization criteria: minimizing data transfer. Preferred approach: strategy 3. Consider the query Q’: For each department, retrieve the department name and the name of the department manager Relational Algebra expression: Fname,Lname,Dname (Employee Mgrssn = SSN Department)
  • 34. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-34 Query Processing in Distributed Databases The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are: Stretagies: 1. Transfer Employee and Department to the result site and perorm the join at site 3. Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes. Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes. 3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3. Total transfer size = 4000 + 3500 = 7500 bytes.
  • 35. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-35 Query Processing in Distributed Databases Preferred strategy: Chose strategy 3. Now suppose the result site is 2. Possible strategies: Possible strategies : 1. Transfer Employee relation to site 2, execute the query and present the result to the user at site 2. Total transfer size = 1,000,000 bytes for both queries Q and Q’. 2. Transfer Department relation to site 1, execute join at site 1 and send the result back to site 2. Total transfer size for Q = 400,000 + 3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.
  • 36. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-36 Query Processing in Distributed Databases Semijoin: Objective is to reduce the number of tuples in a relation before transferring it to another site. Example execution of Q or Q’: 1. Project the join attributes of Department at site 2, and transfer them to site 1. For Q, 4 * 100 = 400 bytes are transferred and for Q’, 9 * 100 = 900 bytes are transferred. 2. Join the transferred file with the Employee relation at site 1, and transfer the required attributes from the resulting file to site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred and for Q’, 39 * 100 = 3900 bytes are transferred. 3. Execute the query by joining the transferred file with Department and present the result to the user at site 2.
  • 37. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-37 Query and Update decomposition • For queries , query decomposition module must break up or decompose a query into subqueries that can be executed at the individual sites. • A strategy must be there for combining the results of the subqueries to form the query result. • Particular replica related to the query must be selected. • For making such decisions ,DDBMS must maintain catalog consisting fragmentation, replication and distribution of information. • For vertical fragmentation the attribute list for each fragment is stored. • For horizontal fragmentation , Guard condition is stored. • For mixed both are stored.
  • 38. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-38 FIGURE 25.7 Guard conditions and attributes lists for fragments. (a) Site 2 fragments.
  • 39. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Slide 25-39 FIGURE 25.7 (continued) Guard conditions and attributes lists for fragments. (b) Site 3 fragments.
  • 40. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition User wants to insert a new employee tuple <‘Alex’, ‘B’, ‘coleman’, ‘345671239’, ‘22- APR-04’, ‘3306 SAND-STONE’, ‘M’, 33000, ‘957654321’, 4> Decomposed by the DDBMS into two insert request. – Insert the full tuple in employee at site 1. – Insert the projected tuple <‘Alex’, ‘B’, ‘coleman’, ‘345671239’, 33000, ‘957654321’, 4> in the EMPD4 fragment at site 3. Chapter 25-40 Query and Update decomposition
  • 41. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Retrieve the names and hours per week for each employee who works on some project controlled by department 5.  Query is submitted at site 2 – SQL : SELECT fname,lname,hours FROM employee, projects, works_on WHERE dnum = 5 and pnumber = pno and essn = ssn; Chapter 25-41 Query and Update decomposition
  • 42. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition  The DDBMS can determine from the guard condition on projs5 and works_on5 that all tuples satisfying the condition ( dnum = 5 and dnumber = pno ) reside at site2.  Decompose the query – T1 ∏ESSN ( PROJS5PNUMBER=PNO WORKS_ON5) – T2 ∏ESSN, FNAME, LNAME ( T1ESSN=SSN EMPLOYEE) – RESULT ∏FNAME, LNAME, HOURS ( T2 * WORKS_ON5 )  EXECUTE USING SEMIJOIN – Execute T1 in site 2 & send ESSN to site 1 – Execute T2 in site 1 & send result to site 2 – Calculate result at site 2 Chapter 25-42 Query and Update decomposition
  • 43. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-43 Two-phase commit protocol  A global recovery manager (or coordinator) performs the two-phase commit protocol: 1)When all participating databases signal the coordinator that the part of the transaction involving each has concluded, the coordinator sends a message “prepare for commit” to each participant, 2)If all participating databases reply “OK”, the transaction is successful and the coordinator sends a “commit” signal to the participating databases; otherwise it sends a message “roll back” or UNDO the local effect of the transaction to each participating database.
  • 44. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-44 Distributed Transactions  The Two Phase Commit protocol (2PC)  After prepare phase: participants are ready to commit or to abort; they still hold locks  If one of the participants does not reply or is not able to commit for some reason, the global transaction has to be aborted.  Problem: if coordinator is unavailable after the prepare phase, resources may be locked for a long time.
  • 45. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Dealing with multiple copies of data items: The concurrency control must maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery. Failure of individual sites: Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use. Communication link failure: This failure may create network partition which would affect database availability even though all database sites may be running. Distributed commit: A transaction may be fragmented and they may be executed by a number of sites. This require a two-phase commit approach for transaction commit. Distributed deadlock: Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner. Chapter 25-45 Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below. Concurrency Control and Recovery
  • 46. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Concurrency Control and Recovery 1. Distributed Concurrency control based on a distinguished copy of a data item Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3 Primary site  Primary site technique – A single site is designated as a primary site which serves as a coordinator for transaction management. – All locks are kept at that site, and all requests for locking or unlocking are sent here.
  • 47. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-47 Concurrency Control and Recovery • Transaction management: – Concurrency control and commit are managed by this site. In two phase locking, this site manages locking and releasing data items. If all transactions follow two-phase policy at all sites, then serializability is guaranteed. • Advantages: – An extension to the centralized two phase locking so implementation and management is simple. Data items are locked only at one site but they can be accessed at any site. • Disadvantages: – All transaction management activities go to primary site which is likely to overload the site. If the primary site fails, the entire system is inaccessible. • To aid recovery a backup site is designated which behaves as a shadow of primary site. In case of primary site failure, backup site can act as primary site.
  • 48. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-48  Primary Copy Technique – In this approach, instead of a site, a data item partition is designated as primary copy. To lock a data item just the primary copy of the data item is locked. • Advantages: – Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests. • Disadvantages: – Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.  Recovery from a coordinator failure • In both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator. • Primary site approach with no backup site: Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing. • Primary site approach with backup site: Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing. • Primary and backup sites fail or no backup site: Use election process to select a new coordinator site. Concurrency Control and Recovery
  • 49. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-49 Concurrency Control and Recovery 2. Distributed Concurrency control based on voting: There is no primary copy of coordinator. • No distinguished copy. • Send lock request to sites that have data item. • Each copy maintains it own lock and can grant or deny the request. • If majority of sites grant lock then the requesting transaction gets the data item. • Locking information (grant or denied) is sent to all these sites. • To avoid unacceptably long wait, a time-out period is defined. If the requesting transaction does not get any vote information then the transaction is aborted.
  • 50. Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 25-50 Distributed recovery  The recovery process in DDBMS is quite involved.  It is quite difficult to determine whether a site is down without exchanging numerous message with other sites.  Example – Site X sends a message to site Y and expects a response from Y but does not receive it.  The message was not delivered to Y because of communication failure.  Site Y is down and could not respond.  Site Y is running and sent a response, but the response was not delivered.  Another problem with distributed recovery is distributed commit. – When a transaction is updating data at several sites, it cannot commit until it is sure that the effects of the transaction on every site cannot be lost.