Chapter25

Distributed Databases and Client-Server Architectures Chapter 25

Distributed Database Concepts Data Fragmentation, Replication, and Allocation Techniques for Distributed Database Design Types of Distributed Database Systems Query Processing in Distributed Databases Overview of Concurrency Control and Recovery in Distributed Databases Distributed Databases in Oracle Chapter Outline

A number of processing elements that are interconnected by a computer network and cooperate in performing certain assigned tasks is called a distributed computing system . A distributed computing system partitions a big problem into small pieces and and solve it efficiently in a coordinate manner. There are two types of multiprocessor systems: Shared memory (tightly coupled) architecture – share secondary storage (disk) and primary memory Shared disk (loosely coupled) architecture – share secondary storage, but each has their own primary memory Distributed Database Concepts

A Distributed Database (DDB) is a collection of multiple logically interrelated databases distributed over a computer network. A Distributed Database Management System (DDBMS) refers to a software system that manages a DDB. Database management systems developed using multiprocessor architectures are called parallel database management systems. In shared nothing architecture every processor has its own primary and secondary memory. Distributed Database Concepts

FIGURE 25.1 Some different database system architectures. Shared nothing architecture. Distributed Database Concepts

FIGURE 25.1 (continued) Some different database system architectures. (b) A networked architecture with a centralized database at one of the sites. Distributed Database Concepts

FIGURE 25.1 (continued) Some different database system architectures. (c) A truly distributed database architecture. Distributed Database Concepts

Advantages of distributed databases: Management of distributed data with different levels of transparency: hiding the details of where each file (able, relation) is physically stored within the system. The possible types of transparency are: Distribution or network transparency – freedom for the user from the operational details of the network. It can be divided into location and naming transparency. Replication transparency – makes the user unaware of the existence of copies. Fragmentation transparency – makes the user unaware of the existence of fragments. Advantages of Distributed Databases

Increased reliability and availability: two of the most common potential advantages cited for DDB: Reliability – the probability that a system is running (not down) at a certain time point, Availability – the probability that the system is continuously available during a time interval. Improved performance: a DDBMS fragments the database by keeping the data closer to where it is needed most (data localization reduces CPU and I/O). Easier expansion: adding more data, increasing database sizes, or adding more processors is much easier in a DDB . Advantages of Distributed Databases

FIGURE 25.2 Data distribution and replication among distributed databases. Advantages of Distributed Databases

A DDBMS must provide the following functions: Keeping track of data: the ability to keep track of the data distribution, fragmentation, and replication, Distributed query processing: the ability to access remote sites via a communication network, Distributed transaction management: the ability to execute transactions that access data from many sites, Replicated data management: the ability to decide which copy of a replicated data to access and maintain the consistency of copies of replicated data items, Additional Functions of DDB

(cont.) Distributed database recovery: the ability to recover from individual site crashes (or communication failure), Security: proper management of the security of the data and the authorization/access privileges of users, Distributed directory (catalog) management: design and policy issues for the placement of directory. Additional Functions of DDB

Techniques that are used to break up a database into logical units, called fragmentation . Data replication permits certain data to be stored in more than one site, and the process of allocating fragments . Information concerning data fragmentation, allocation, and replication is stored in a global directory that is accessed by the DDB applications as needed. Data Fragmentation, Replication

In a DDB, decisions must be made regarding which site should contain which portion of the DB A horizontal fragment of a relation is a subset of the tuples in the relation (e.g., DNO=5). A vertical fragment of a relation keeps only certain attributes of a relation (e.g., SSN, SEX). A mixed (hybrid) fragment of a relation combines the horizontal and vertical fragmentations. Data Fragmentation, Replication

A horizontal fragment of a relation R can be specified by a σ ci (R). A set of horizontal fragments whose conditions C 1 , C 2 , …, C n include all the tuples in R is called a complete horizontal fragmentation of R. A vertical fragment of a relation R can be specified by a π Li (R). A set of vertical fragments whose projection lists L 1 , L 2 , …, L n include all the attributes in R but share only the primary key attribute of R is called a complete vertical fragmentation of R. Data Fragmentation, Replication

Replication is useful in improving the availability of data. In a fully replicated distributed database the whole database is replicated at every site. The other extreme to full replication involves having no replication (i.e., each fragment is stored at exactly one site). In partial replication of the data, some fragments of the database ma be replicated whereas others may not. Data Fragmentation, Replication

FIGURE 25.3 Allocation of fragments to sites. Relation fragments at site 2 corresponding to department 5. Data Fragmentation, Replication

FIGURE 25.3 (continued) Allocation of fragments to sites. (b) Relation fragments at site 3 corresponding to department 4. Data Fragmentation, Replication

FIGURE 25.4 Complete and disjoint fragments of the WORKS_ON relation. Fragments of WORKS_ON for employees working in department 5 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=5)]). Data Fragmentation, Replication

FIGURE 25.4 (continued) Complete and disjoint fragments of the WORKS_ON relation. (b) Fragments of WORKS_ON for employees working in department 4 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=4)]). Data Fragmentation, Replication

FIGURE 25.4 (continued) Complete and disjoint fragments of the WORKS_ON relation. Fragments of WORKS_ON for employees working in department 1 (C=[ESSN IN (SELECT SSN FROM EMPLOYEE WHERE DNO=1)]). Data Fragmentation, Replication

The degree of homogeneity of the DDBMS: Homogeneous DDBMS: all servers (or individual local DBMSs) use identical software and all users (clients) use identical software. Heterogeneous DDBMS: servers and/or users may use different software. The degree of local autonomy of the DDBMS: If there is no provision for the local site to function as a stand-alone DBMS, then the system has no local autonomy; otherwise(e.g., direct access by local transactions to a server is permitted), the system has some degree of local autonomy. Types of Distributed Database Systems

In a federated DDBMS , each server is an independent and autonomous centralized DBMS that has its own local users, local transactions, and DBA (i.e., a very high level of local autonomy). In a federated database system (FDBS) there is some global view or schema of a federation of databases that is shared by the applications. In a multidatabase system there exists no global schema. It interactively constructs one as needed by the application. Types of Distributed Database Systems

The type of heterogeneity present in FDBSs may arise from: Difference in Data Models: e.g., one server may be a relational DBMS, another an object or hierarchical SBMS. Differences in Constraints: e.g., using the referential integrity constraints and triggers. Differences in Query Languages: e.g., SQL has multiple versions like SQL-89, SQL-92, and SQL-99 (even with the same data model and language). Types of Distributed Database Systems

Communication autonomy refers to the ability to decide whether to communicate with another component DBS. Execution autonomy refers to the ability of a component DBS to execute local operations without inference from external operations by other component DBSs (and decide the order). Association autonomy refers to the ability to decide whether and how much to share its functionality and resources with other component DBSs. Types of Distributed Database Systems

FIGURE 25.5 The five-level schema architecture in a federated database system (FDBS). Source: Adapted from Sheth and Larson, Federated Database Systems for Managing Distributed Heterogeneous Autonomous Databases . ACM Computing Surveys (Vol. 22: No. 3, September 1990). Types of Distributed Database Systems

Suppose that the EMPLOYEE and the DEPARTMENT relations are distributed. The query: π FNAME, LNAME, DNAME (EMPLOYEE ⋈ DNO=DNUMBER DEPARTMENT) may be executed within one of the following strategies: Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, Transfer the EMPLOYEE relation to site A (where the DEPARTMENT relation is located), execute the query and send the output to the result site, Transfer the DEPARTMENT relation to site B (where the EMPLOYEE relation is located), execute the query and send the output to the result site. Query Processing in Distributed Databases

FIGURE 25.6 Examples to illustrate volume of data transferred. Query Processing in Distributed Databases

A DDBMS that supports full distribution, fragmentation, and replication transparence allows a query the query decomposition module to break up or decompose a query into subqueries that can be executed at the individual sites. For horizontal fragmentation, a selection condition (called a guard ) specifies which tuples exist in the fragment. For vertical fragmentation, the attribute list for each fragment is kept in the catalog. Query Processing in Distributed Databases

FIGURE 25.7 Guard conditions and attributes lists for fragments. Site 2 fragments. Query Processing in Distributed Databases

FIGURE 25.7 (continued) Guard conditions and attributes lists for fragments. Site 3 fragments. Query Processing in Distributed Databases

Concurrency control and recovery problems in DDBMS environment that are not encountered in a centralized DBMS environment. Dealing with multiple copies of the data items Failure of individual sites, Failure of communication links, Distributed commit, Distributed deadlock. Overview of Concurrency Control and recovery

The distributed concurrency control based on a distinguished copy uses a particular copy of each data item. The lock for this data item are associated with the distinguished copy, and all locking and unlocking requests are sent to the site that contains that copy. If all distinguished copies are kept in a single site then it is called the primary site technique, otherwise it is called the primary copy technique. A site that includes a distinguished copy of a data item acts as the coordinator site for concurrency control on that item. Overview of Concurrency Control and recovery

The Oracle system is divided into two parts: A front-end as the client portion, that interacts with the user. The client has no data access responsibility and merely handles the requesting, processing, and presentation of data managed by the server. A back-end as the server portion, that runs Oracle and handles the functions related to concurrent shared access. Oracle uses a two-phase commit protocol to deal with concurrent distributed transactions Distributed Databases in Oracle

All Oracle databases in a DDBS use Oracle’s networking software net8, which allows databases to communicate across networks to support remote and distributed transactions. Oracle supports database links that define a one-way communication path from one Oracle database to another. For example, CREATE DATABASE LINK sales.us.americas; establishes a connection to the sales database under the network domain US that comes under domain americas. Distributed Databases in Oracle

Data in an Oracle DDBS can be replicated using snapshots or replicated master tables. Replication is provided a the following levels: Basic replication: replicates of tables are managed for read-only access. For updates, data must be accessed at a single primary site. Advanced (symmetric) replication: allowing applications to update table replicas throughout a replicated DDBS. Data can be read and updated at any site. Distributed Databases in Oracle

Chapter25

More Related Content

What's hot

Viewers also liked

Similar to Chapter25

Chapter25