Distributed Databases  and  Client-Server  Architectures  Chapter 25
Distributed Database Concepts  Data Fragmentation, Replication, and Allocation Techniques for Distributed Database Design Types of Distributed Database Systems Query Processing in Distributed Databases Overview of Concurrency Control and Recovery in Distributed Databases Distributed Databases in Oracle Chapter Outline
A number of processing elements that are interconnected by a computer network and cooperate in performing certain assigned tasks is called a  distributed computing system .  A distributed computing system partitions a big problem into small pieces and and solve it efficiently in a coordinate manner. There are two types of multiprocessor systems: Shared memory (tightly coupled) architecture –  share secondary storage (disk) and primary memory Shared disk (loosely coupled) architecture –  share secondary storage, but each has their own primary memory Distributed Database Concepts
A  Distributed Database (DDB)  is a collection of multiple logically interrelated databases distributed over a computer network. A  Distributed Database Management System (DDBMS)  refers to a software system that manages a DDB. Database management systems developed using multiprocessor architectures are called  parallel database management systems. In  shared nothing architecture  every processor has its own primary and secondary memory.  Distributed Database Concepts
FIGURE 25.1 Some different database system architectures.  Shared nothing architecture. Distributed Database Concepts
FIGURE 25.1 (continued) Some different database system architectures.  (b) A networked architecture with a centralized database at one of the sites. Distributed Database Concepts
FIGURE 25.1 (continued) Some different database system architectures.  (c) A truly distributed database architecture. Distributed Database Concepts
Advantages of distributed databases: Management of distributed data with different levels of transparency:  hiding the details of where each file (able, relation) is physically stored within the system. The possible types of transparency are: Distribution or network transparency –  freedom for the user from the operational details of the network. It can be divided into location and naming transparency. Replication transparency –  makes the user unaware of the existence of copies. Fragmentation transparency –  makes the user unaware of the existence of fragments. Advantages of Distributed Databases
Increased reliability and availability:  two of the most common potential advantages cited for DDB:  Reliability –  the probability that a system is running (not down) at a certain time point, Availability –  the probability that the system is continuously available during a time interval. Improved performance:  a DDBMS fragments the database by keeping the data closer to where it is needed most (data localization reduces CPU and I/O). Easier expansion:  adding more data, increasing database sizes, or adding more processors is much easier in a DDB . Advantages of Distributed Databases
FIGURE 25.2  Data distribution and replication among  distributed databases. Advantages of Distributed Databases
A DDBMS must provide the following functions: Keeping track of data:  the ability to keep track of the data distribution, fragmentation, and replication, Distributed query processing:  the ability to access remote sites via a communication network, Distributed transaction management:  the ability to execute transactions that access data from many sites, Replicated data management:  the ability to decide which copy of a replicated data to access and maintain the consistency of copies of replicated data items, Additional Functions of DDB
(cont.) Distributed database recovery:  the ability to recover from individual site crashes (or communication failure), Security:  proper management of the security of the data and the authorization/access privileges of users, Distributed directory (catalog) management:  design and policy issues for the placement of directory.  Additional Functions of DDB
Techniques that are used to break up a database into logical units, called  fragmentation . Data replication  permits certain data to be stored in more than one site, and the process of  allocating fragments .  Information concerning data fragmentation, allocation, and replication is stored in a  global directory  that is accessed by the DDB applications as needed. Data Fragmentation, Replication
In a DDB, decisions must be made regarding which site should contain which portion of the DB A  horizontal fragment  of a relation is a subset of the tuples in the relation (e.g., DNO=5). A  vertical fragment  of a relation keeps only certain attributes of a relation (e.g., SSN, SEX). A  mixed (hybrid) fragment  of a relation combines  the horizontal and vertical fragmentations.  Data Fragmentation, Replication
A horizontal fragment of a relation R can be specified by a  σ ci  (R). A set of horizontal fragments whose conditions C 1 , C 2 , …, C n  include all the tuples in R is called a complete horizontal fragmentation of R. A vertical fragment of a relation R can be specified by a  π Li  (R). A set of vertical fragments whose projection lists L 1 , L 2 , …, L n  include all the attributes in R but share only the primary key attribute of R is called a  complete vertical fragmentation  of R.  Data Fragmentation, Replication
Replication  is useful in improving the availability of data. In a  fully replicated distributed   database  the whole database is replicated at every site. The other extreme to full replication involves having  no replication  (i.e., each fragment is stored at exactly one site). In  partial replication  of the data, some fragments of the database ma be replicated whereas others may not. Data Fragmentation, Replication
FIGURE 25.3  Allocation of fragments to sites.  Relation fragments at site 2 corresponding to department 5. Data Fragmentation, Replication
FIGURE 25.3 (continued)  Allocation of fragments to sites.  (b) Relation fragments at site 3 corresponding to department 4. Data Fragmentation, Replication
FIGURE 25.4 Complete and disjoint fragments of the WORKS_ON relation.  Fragments of WORKS_ON for employees working in department 5 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=5)]). Data Fragmentation, Replication
FIGURE 25.4 (continued) Complete and disjoint fragments of the WORKS_ON relation.  (b)  Fragments of WORKS_ON for employees working in department 4 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=4)]). Data Fragmentation, Replication
FIGURE 25.4 (continued) Complete and disjoint fragments of the WORKS_ON relation.  Fragments of WORKS_ON for employees working in department 1 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=1)]). Data Fragmentation, Replication
The  degree of homogeneity  of the DDBMS: Homogeneous DDBMS:  all servers (or individual local DBMSs) use identical software and all users (clients) use identical software. Heterogeneous DDBMS:  servers and/or users may use different software. The  degree of local autonomy  of the DDBMS: If there is no provision for the local site to function as a stand-alone DBMS, then the system has  no local autonomy;  otherwise(e.g., direct access by local transactions to a server is permitted), the system has some degree of local autonomy. Types of Distributed Database Systems
In a  federated DDBMS , each server is an independent  and autonomous centralized DBMS that has its own local users, local transactions, and DBA (i.e., a very high level of local autonomy). In a  federated database system  (FDBS) there is some global view or schema of a federation of  databases that is shared by the applications. In a  multidatabase system  there exists no global schema. It interactively constructs one as needed by the application. Types of Distributed Database Systems
The type of heterogeneity present in FDBSs may arise from: Difference in Data Models:  e.g., one server may be a relational DBMS, another an object or hierarchical SBMS. Differences in Constraints:   e.g., using the referential integrity constraints and triggers. Differences in Query Languages:  e.g., SQL has multiple versions like SQL-89, SQL-92, and SQL-99 (even with the same data model and language).  Types of Distributed Database Systems
Communication autonomy  refers to the ability to decide whether to communicate with another component DBS. Execution autonomy  refers to the ability of a component DBS to execute local operations without inference from external operations by other component DBSs (and decide the order). Association autonomy  refers to the ability to  decide whether and how much to share its functionality and resources with other component DBSs.  Types of Distributed Database Systems
FIGURE 25.5 The five-level schema  architecture in a  federated database  system (FDBS).  Source:  Adapted from  Sheth and Larson,  Federated Database  Systems for Managing  Distributed Heterogeneous  Autonomous Databases .  ACM Computing Surveys   (Vol. 22: No. 3,  September 1990). Types of Distributed Database Systems
Suppose that the  EMPLOYEE  and the  DEPARTMENT  relations are distributed. The query: π  FNAME, LNAME, DNAME  (EMPLOYEE  ⋈   DNO=DNUMBER  DEPARTMENT) may be executed within one of the following strategies: Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, Transfer the EMPLOYEE relation to site A (where the DEPARTMENT relation is located), execute the query and send the output to the result site, Transfer the DEPARTMENT relation to site B (where the EMPLOYEE relation is located), execute the query and send the output to the result site. Query Processing in Distributed Databases
FIGURE 25.6 Examples to illustrate volume of data transferred. Query Processing in Distributed Databases
A DDBMS that supports  full distribution, fragmentation,  and  replication transparence  allows a query the  query decomposition  module to break up or  decompose  a query into  subqueries  that can be executed at the individual sites. For horizontal fragmentation, a selection condition (called a  guard ) specifies which tuples exist in the fragment. For vertical fragmentation, the attribute list for each fragment is kept in the catalog. Query Processing in Distributed Databases
FIGURE 25.7  Guard conditions and attributes lists for fragments.  Site 2 fragments. Query Processing in Distributed Databases
FIGURE 25.7 (continued) Guard conditions and attributes lists for fragments.  Site 3 fragments. Query Processing in Distributed Databases
Concurrency control and recovery problems in DDBMS environment that are not encountered in a centralized DBMS environment. Dealing with multiple copies of the data items Failure of individual sites, Failure of communication links, Distributed commit, Distributed deadlock. Overview of Concurrency Control and recovery
The distributed concurrency control based on a  distinguished copy  uses a particular copy of each data item. The lock for this data item are associated with the distinguished copy, and all locking and unlocking requests are sent to the site that contains that copy. If all distinguished copies are kept in a single site then it is called the  primary site  technique, otherwise it is called the  primary copy  technique. A site that includes a distinguished copy of a data item acts as the  coordinator site  for concurrency control on that item. Overview of Concurrency Control and recovery
The Oracle system is divided into two parts: A front-end as the client portion, that interacts with the user. The client has no data access responsibility and merely handles the requesting, processing, and presentation of data managed by the server. A back-end as the server portion, that runs Oracle and handles the functions related to concurrent shared access. Oracle uses a two-phase commit protocol to deal with concurrent distributed transactions Distributed Databases in Oracle
All Oracle databases in a DDBS use Oracle’s networking software net8, which allows databases to communicate across networks to support remote and distributed transactions. Oracle supports database links that define a one-way communication path from one Oracle database to another. For example, CREATE DATABASE LINK sales.us.americas; establishes a connection to the sales database under the network domain  US  that comes under domain  americas. Distributed Databases in Oracle
Data in an Oracle DDBS can be replicated using snapshots or replicated master tables. Replication is provided a the following levels: Basic replication:  replicates of tables are managed for read-only access. For updates, data must be accessed at a single primary site. Advanced (symmetric) replication:  allowing applications to update table replicas throughout a replicated DDBS. Data can be read and updated at any site. Distributed Databases in Oracle
FIGURE 25.9 Oracle distributed  database systems.  Source : From  Oracle (1997a).  Copyright    Oracle  Corporation 1997.  All rights reserved. Distributed Databases in Oracle

Chapter25

  • 1.
  • 2.
    Distributed Databases and Client-Server Architectures Chapter 25
  • 3.
    Distributed Database Concepts Data Fragmentation, Replication, and Allocation Techniques for Distributed Database Design Types of Distributed Database Systems Query Processing in Distributed Databases Overview of Concurrency Control and Recovery in Distributed Databases Distributed Databases in Oracle Chapter Outline
  • 4.
    A number ofprocessing elements that are interconnected by a computer network and cooperate in performing certain assigned tasks is called a distributed computing system . A distributed computing system partitions a big problem into small pieces and and solve it efficiently in a coordinate manner. There are two types of multiprocessor systems: Shared memory (tightly coupled) architecture – share secondary storage (disk) and primary memory Shared disk (loosely coupled) architecture – share secondary storage, but each has their own primary memory Distributed Database Concepts
  • 5.
    A DistributedDatabase (DDB) is a collection of multiple logically interrelated databases distributed over a computer network. A Distributed Database Management System (DDBMS) refers to a software system that manages a DDB. Database management systems developed using multiprocessor architectures are called parallel database management systems. In shared nothing architecture every processor has its own primary and secondary memory. Distributed Database Concepts
  • 6.
    FIGURE 25.1 Somedifferent database system architectures. Shared nothing architecture. Distributed Database Concepts
  • 7.
    FIGURE 25.1 (continued)Some different database system architectures. (b) A networked architecture with a centralized database at one of the sites. Distributed Database Concepts
  • 8.
    FIGURE 25.1 (continued)Some different database system architectures. (c) A truly distributed database architecture. Distributed Database Concepts
  • 9.
    Advantages of distributeddatabases: Management of distributed data with different levels of transparency: hiding the details of where each file (able, relation) is physically stored within the system. The possible types of transparency are: Distribution or network transparency – freedom for the user from the operational details of the network. It can be divided into location and naming transparency. Replication transparency – makes the user unaware of the existence of copies. Fragmentation transparency – makes the user unaware of the existence of fragments. Advantages of Distributed Databases
  • 10.
    Increased reliability andavailability: two of the most common potential advantages cited for DDB: Reliability – the probability that a system is running (not down) at a certain time point, Availability – the probability that the system is continuously available during a time interval. Improved performance: a DDBMS fragments the database by keeping the data closer to where it is needed most (data localization reduces CPU and I/O). Easier expansion: adding more data, increasing database sizes, or adding more processors is much easier in a DDB . Advantages of Distributed Databases
  • 11.
    FIGURE 25.2 Data distribution and replication among distributed databases. Advantages of Distributed Databases
  • 12.
    A DDBMS mustprovide the following functions: Keeping track of data: the ability to keep track of the data distribution, fragmentation, and replication, Distributed query processing: the ability to access remote sites via a communication network, Distributed transaction management: the ability to execute transactions that access data from many sites, Replicated data management: the ability to decide which copy of a replicated data to access and maintain the consistency of copies of replicated data items, Additional Functions of DDB
  • 13.
    (cont.) Distributed databaserecovery: the ability to recover from individual site crashes (or communication failure), Security: proper management of the security of the data and the authorization/access privileges of users, Distributed directory (catalog) management: design and policy issues for the placement of directory. Additional Functions of DDB
  • 14.
    Techniques that areused to break up a database into logical units, called fragmentation . Data replication permits certain data to be stored in more than one site, and the process of allocating fragments . Information concerning data fragmentation, allocation, and replication is stored in a global directory that is accessed by the DDB applications as needed. Data Fragmentation, Replication
  • 15.
    In a DDB,decisions must be made regarding which site should contain which portion of the DB A horizontal fragment of a relation is a subset of the tuples in the relation (e.g., DNO=5). A vertical fragment of a relation keeps only certain attributes of a relation (e.g., SSN, SEX). A mixed (hybrid) fragment of a relation combines the horizontal and vertical fragmentations. Data Fragmentation, Replication
  • 16.
    A horizontal fragmentof a relation R can be specified by a σ ci (R). A set of horizontal fragments whose conditions C 1 , C 2 , …, C n include all the tuples in R is called a complete horizontal fragmentation of R. A vertical fragment of a relation R can be specified by a π Li (R). A set of vertical fragments whose projection lists L 1 , L 2 , …, L n include all the attributes in R but share only the primary key attribute of R is called a complete vertical fragmentation of R. Data Fragmentation, Replication
  • 17.
    Replication isuseful in improving the availability of data. In a fully replicated distributed database the whole database is replicated at every site. The other extreme to full replication involves having no replication (i.e., each fragment is stored at exactly one site). In partial replication of the data, some fragments of the database ma be replicated whereas others may not. Data Fragmentation, Replication
  • 18.
    FIGURE 25.3 Allocation of fragments to sites. Relation fragments at site 2 corresponding to department 5. Data Fragmentation, Replication
  • 19.
    FIGURE 25.3 (continued) Allocation of fragments to sites. (b) Relation fragments at site 3 corresponding to department 4. Data Fragmentation, Replication
  • 20.
    FIGURE 25.4 Completeand disjoint fragments of the WORKS_ON relation. Fragments of WORKS_ON for employees working in department 5 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=5)]). Data Fragmentation, Replication
  • 21.
    FIGURE 25.4 (continued)Complete and disjoint fragments of the WORKS_ON relation. (b) Fragments of WORKS_ON for employees working in department 4 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=4)]). Data Fragmentation, Replication
  • 22.
    FIGURE 25.4 (continued)Complete and disjoint fragments of the WORKS_ON relation. Fragments of WORKS_ON for employees working in department 1 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=1)]). Data Fragmentation, Replication
  • 23.
    The degreeof homogeneity of the DDBMS: Homogeneous DDBMS: all servers (or individual local DBMSs) use identical software and all users (clients) use identical software. Heterogeneous DDBMS: servers and/or users may use different software. The degree of local autonomy of the DDBMS: If there is no provision for the local site to function as a stand-alone DBMS, then the system has no local autonomy; otherwise(e.g., direct access by local transactions to a server is permitted), the system has some degree of local autonomy. Types of Distributed Database Systems
  • 24.
    In a federated DDBMS , each server is an independent and autonomous centralized DBMS that has its own local users, local transactions, and DBA (i.e., a very high level of local autonomy). In a federated database system (FDBS) there is some global view or schema of a federation of databases that is shared by the applications. In a multidatabase system there exists no global schema. It interactively constructs one as needed by the application. Types of Distributed Database Systems
  • 25.
    The type ofheterogeneity present in FDBSs may arise from: Difference in Data Models: e.g., one server may be a relational DBMS, another an object or hierarchical SBMS. Differences in Constraints: e.g., using the referential integrity constraints and triggers. Differences in Query Languages: e.g., SQL has multiple versions like SQL-89, SQL-92, and SQL-99 (even with the same data model and language). Types of Distributed Database Systems
  • 26.
    Communication autonomy refers to the ability to decide whether to communicate with another component DBS. Execution autonomy refers to the ability of a component DBS to execute local operations without inference from external operations by other component DBSs (and decide the order). Association autonomy refers to the ability to decide whether and how much to share its functionality and resources with other component DBSs. Types of Distributed Database Systems
  • 27.
    FIGURE 25.5 Thefive-level schema architecture in a federated database system (FDBS). Source: Adapted from Sheth and Larson, Federated Database Systems for Managing Distributed Heterogeneous Autonomous Databases . ACM Computing Surveys (Vol. 22: No. 3, September 1990). Types of Distributed Database Systems
  • 28.
    Suppose that the EMPLOYEE and the DEPARTMENT relations are distributed. The query: π FNAME, LNAME, DNAME (EMPLOYEE ⋈ DNO=DNUMBER DEPARTMENT) may be executed within one of the following strategies: Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, Transfer the EMPLOYEE relation to site A (where the DEPARTMENT relation is located), execute the query and send the output to the result site, Transfer the DEPARTMENT relation to site B (where the EMPLOYEE relation is located), execute the query and send the output to the result site. Query Processing in Distributed Databases
  • 29.
    FIGURE 25.6 Examplesto illustrate volume of data transferred. Query Processing in Distributed Databases
  • 30.
    A DDBMS thatsupports full distribution, fragmentation, and replication transparence allows a query the query decomposition module to break up or decompose a query into subqueries that can be executed at the individual sites. For horizontal fragmentation, a selection condition (called a guard ) specifies which tuples exist in the fragment. For vertical fragmentation, the attribute list for each fragment is kept in the catalog. Query Processing in Distributed Databases
  • 31.
    FIGURE 25.7 Guard conditions and attributes lists for fragments. Site 2 fragments. Query Processing in Distributed Databases
  • 32.
    FIGURE 25.7 (continued)Guard conditions and attributes lists for fragments. Site 3 fragments. Query Processing in Distributed Databases
  • 33.
    Concurrency control andrecovery problems in DDBMS environment that are not encountered in a centralized DBMS environment. Dealing with multiple copies of the data items Failure of individual sites, Failure of communication links, Distributed commit, Distributed deadlock. Overview of Concurrency Control and recovery
  • 34.
    The distributed concurrencycontrol based on a distinguished copy uses a particular copy of each data item. The lock for this data item are associated with the distinguished copy, and all locking and unlocking requests are sent to the site that contains that copy. If all distinguished copies are kept in a single site then it is called the primary site technique, otherwise it is called the primary copy technique. A site that includes a distinguished copy of a data item acts as the coordinator site for concurrency control on that item. Overview of Concurrency Control and recovery
  • 35.
    The Oracle systemis divided into two parts: A front-end as the client portion, that interacts with the user. The client has no data access responsibility and merely handles the requesting, processing, and presentation of data managed by the server. A back-end as the server portion, that runs Oracle and handles the functions related to concurrent shared access. Oracle uses a two-phase commit protocol to deal with concurrent distributed transactions Distributed Databases in Oracle
  • 36.
    All Oracle databasesin a DDBS use Oracle’s networking software net8, which allows databases to communicate across networks to support remote and distributed transactions. Oracle supports database links that define a one-way communication path from one Oracle database to another. For example, CREATE DATABASE LINK sales.us.americas; establishes a connection to the sales database under the network domain US that comes under domain americas. Distributed Databases in Oracle
  • 37.
    Data in anOracle DDBS can be replicated using snapshots or replicated master tables. Replication is provided a the following levels: Basic replication: replicates of tables are managed for read-only access. For updates, data must be accessed at a single primary site. Advanced (symmetric) replication: allowing applications to update table replicas throughout a replicated DDBS. Data can be read and updated at any site. Distributed Databases in Oracle
  • 38.
    FIGURE 25.9 Oracledistributed database systems. Source : From Oracle (1997a). Copyright  Oracle Corporation 1997. All rights reserved. Distributed Databases in Oracle