DDBMS Paper with Solution

Faculty of Degree Engineering
Department of CE/IT (07/
Enrollment no: _____________________
B.E. – SEMESTER –
Subject Code: 2170714
Subject Name: Distributed DBMS
Duration: 2:30 hr.
Instruction:
1. Attempt all questions.
2. Make suitable assumption where necessary.
3. Figure to the right indicate full Marks.
Q. 1 (A) Explain the potential problems with DDBMS
A:1 (A) 1.6.1 Distributed Database D
The question that is being addressed is how the database and the applications that run
against it should be placed across the sites. There are two basic alternatives to placing
data: partitioned (or non-
database is divided into a number of disjoint partitions each of which is placed at
a different site. Replicated designs can be either
duplicated) where the entire database is stored at each site, or
partially duplicated) where each partition of the database is stored at more than one
site, but not at all the sites. The two fundamental design issues are
the separation of the database into partitions called
Optimum distribution of fragments.
The research in this area mostly involves mathematical programming in order
to minimize the combined cost of storing the database, processing transactions
Against it, and message communication among site
Therefore, the proposed solutions are based on heuristics.
1.6.2 Distributed Directory Management
A directory contains information (such as descriptions and locations) about data
items in the database. Problems related t
to the database placement problem discussed in the preceding section. A directory
may be global to the entire DDBS or local to each site; it can be centralized at one
site or distributed over several sites; ther
1.6.3 Distributed Query Processing
Query processing deals with designing algorithms that analyze queries and convert
them into a series of data manipulation operations. The problem is how to decide
on a strategy for executing each query over the network in the most cost
way, however cost is defined. The factors to be considered are the distribution of
data, communication costs, and lack of sufficient locally
objective is to optimize where the inherent parallelism is used to improve the performance
of executing the transaction, subject to the above
problem is NP-hard in nature, and the approaches are usually heuristic.
1.6.4 Distributed Concurrency Con
Concurrency control involves the synchronization of accesses to the distributed database,
such that the integrity of the database is maintained. It is, without any doubt,
one of the most extensively studied problems in the DDBS field. The concurrency
control problem in a distributed context is somewhat different than in a centralized
framework. One not only has to worry about the integrity of a single database, but
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Enrollment no: _____________________ Seat No: _____________________
VII PRE-FINAL EXAMINATION –
Date: 17
Time:
Total Marks:
Make suitable assumption where necessary.
Figure to the right indicate full Marks.
Explain the potential problems with DDBMS.
1.6.1 Distributed Database Design
-replicated) and replicated. In the partitioned scheme the
a different site. Replicated designs can be either fully replicated (also called
) where the entire database is stored at each site, or partially replicated
) where each partition of the database is stored at more than one
site, but not at all the sites. The two fundamental design issues are fragmentation
the separation of the database into partitions called fragments, and distribution
Optimum distribution of fragments.
Against it, and message communication among sites. The general problem is NP
Therefore, the proposed solutions are based on heuristics.
1.6.2 Distributed Directory Management
items in the database. Problems related to directory management are similar in nature
site or distributed over several sites; there can be a single copy or multiple copies.
1.6.3 Distributed Query Processing
for executing each query over the network in the most cost-
data, communication costs, and lack of sufficient locally-available information. The
imize where the inherent parallelism is used to improve the performance
of executing the transaction, subject to the above-mentioned constraints. The
hard in nature, and the approaches are usually heuristic.
1.6.4 Distributed Concurrency Control
083
Seat No: _____________________
– 2016 (ODD)
17-10-2016
10:00 AM To 12:30 PM
Total Marks: 70
07
scheme the
(also called fully
replicated (or
) where each partition of the database is stored at more than one
fragmentation,
distribution, the
s. The general problem is NP-hard.
o directory management are similar in nature
e can be a single copy or multiple copies.
-effective
available information. The
imize where the inherent parallelism is used to improve the performance
mentioned constraints. The

also about the consistency of multiple copies of the database. The condition that
requires all the values of multiple copies of every data item to converge to the same
value is called mutual consistency
1.6.5 Distributed Deadlock Management
The deadlock problem in DDBSs is similar in nature to that encountered in operating
systems. The competition among users for access to a set of resources (data, in this
case) can result in a deadlock if the synchronization mechanism is based on locking.
The well-known alternatives of prevention, avoidance, and detection/recovery also
apply to DDBSs.
1.6.6 Reliability of Distributed DBMS
We mentioned earlier that one of the potential advantages of distributed systems
is improved reliability and availability. This, however, is not a feature that comes
automatically. It is important that mechanisms be provid
of the database as well as to detect failures and recover from them. The implication
for DDBSs is that when a failure occurs and various sites become either inoperable
or inaccessible, the databases at the operational sites rem
Furthermore, when the computer system or network recovers from the failure, the
DDBSs should be able to recover and bring the databases at the failed sites up
This may be especially difficult in the case of network p
are divided into two or more groups with no communication among them.
1.6.7 Replication
If the distributed database is (partially or fully) replicated, it is necessary to implement
protocols that ensure the consistency of the
have the same value. These protocols can be
to be applied to all the replicas before the transaction completes, or they may be
lazy so that the transaction updates one copy (c
are propagated to the others after the transaction completes.
1.6.8 Relationship among Problem
The design of
distributed databases affects many areas. It affects directory management, because the
definition of fragments and their placement determine the contents of the directory
(or directories) as well as the strategies that may be employed to manage them.
The same information (i.e., fragment structure and placement) is used by the query
processor to determine the quer
and usage patterns that are determined by the query processor are used as inputs to
the data distribution and fragmentation algorithms. Similarly, directory placement
and contents influence the processin
There is a strong relationship among the concurrency control problem, the deadlock
management problem, and reliability issues. This is to be expected, since together
they are usually called the
control algorithm that is employed will determine whether or not a separate deadlock
management facility is required. If a locking
occur, whereas they will not if time stamping is the chosen alternative.
es all the values of multiple copies of every data item to converge to the same
mutual consistency.
1.6.5 Distributed Deadlock Management
etition among users for access to a set of resources (data, in this
known alternatives of prevention, avoidance, and detection/recovery also
6 Reliability of Distributed DBMS
automatically. It is important that mechanisms be provided to ensure the consistency
or inaccessible, the databases at the operational sites remain consistent and up to date.
DDBSs should be able to recover and bring the databases at the failed sites up
This may be especially difficult in the case of network partitioning, where the sites
protocols that ensure the consistency of the replicas,i.e., copies of the same data item
have the same value. These protocols can be eager in that they force the updates
so that the transaction updates one copy (called the master) from which updates
are propagated to the others after the transaction completes.
1.6.8 Relationship among Problem
s and their placement determine the contents of the directory
processor to determine the query evaluation strategy. On the other hand, the access
and contents influence the processing of queries.
they are usually called the transaction management problem. The concurrency
management facility is required. If a locking-based algorithm is used, deadlocks will
occur, whereas they will not if time stamping is the chosen alternative.
083
es all the values of multiple copies of every data item to converge to the same
etition among users for access to a set of resources (data, in this
known alternatives of prevention, avoidance, and detection/recovery also
ed to ensure the consistency
ain consistent and up to date.
DDBSs should be able to recover and bring the databases at the failed sites up-to-date.
artitioning, where the sites
replicas,i.e., copies of the same data item
in that they force the updates
) from which updates
s and their placement determine the contents of the directory
y evaluation strategy. On the other hand, the access
problem. The concurrency
based algorithm is used, deadlocks will

Q (B) Explain RAID Level Recovery Technique.
A (B) RAID or Redundant A
secondary storage devices and use them as a single storage media.RAID consists of an
array of disks in which multiple disks
goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receive
to write/read in parallel. It enhances the speed and performance of the storage device.
There is no parity and backup in Level 0.
Explain RAID Level Recovery Technique.
Array of Independent Disks, is a technology to connect multiple
array of disks in which multiple disks are connected together to achieve different
goals. RAID levels define the use of disk arrays.
blocks and the blocks are distributed among disks. Each disk receive
There is no parity and backup in Level 0.
083
07
isks, is a technology to connect multiple
are connected together to achieve different
blocks and the blocks are distributed among disks. Each disk receives a block of data

RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called
provides 100% redundancy in case of a failure.
RAID 2
• RAID Level 2 uses concept of parallel access technique. It works on the
word(byte) level. So each strip stores one bit. It tak
extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this
reason, it require a minimum, of 8 surface to write data to the hard disk.
• In RAID level 2, strip are very small, so when a block is read, all disk
accessed in parallel.
• Hamming code generation is time consuming, therefore RAID level 2 is too
slow for most commercial application.
to all the disks in the array. RAID level 1 is also called
provides 100% redundancy in case of a failure.
RAID Level 2 uses concept of parallel access technique. It works on the
word(byte) level. So each strip stores one bit. It takes data striping to the
In RAID level 2, strip are very small, so when a block is read, all disk
accessed in parallel.
Hamming code generation is time consuming, therefore RAID level 2 is too
slow for most commercial application.
083
to all the disks in the array. RAID level 1 is also called mirroring and
RAID Level 2 uses concept of parallel access technique. It works on the
es data striping to the
In RAID level 2, strip are very small, so when a block is read, all disks are
Hamming code generation is time consuming, therefore RAID level 2 is too

RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte
whereas level 4 uses block
disks to implement RAID.
t disk. This technique makes it to overcome single disk failures.
generated and stored on a different disk. Note that level 3 uses byte
s level 4 uses block-level striping. Both level 3 and level 4 require at least three
disks to implement RAID.
083
t disk. This technique makes it to overcome single disk failures.
generated and stored on a different disk. Note that level 3 uses byte-level striping,
level striping. Both level 3 and level 4 require at least three

RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the
different dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional
fault tolerance. This level requires at least four disk drives to implement RAID.
Q. 2 (A) What is Concurrency? List out method of Concurrency control and Explain any one of
them.
A. 2 (A) Concurrency: In computer science,
instruction sequences at the same time.
• In distributed database system, database is typically used by many users. These
system usually allow multiple transaction to run concurrently at the same time.
• It must support parallel execution of tran
• Communication delay is less.
• It must be recovery from site and communication failure.
data block stripe are distributed among all the data disks rather than storing them on a
different dedicated disk.
lt tolerance. This level requires at least four disk drives to implement RAID.
What is Concurrency? List out method of Concurrency control and Explain any one of
In computer science, concurrency is the execution of several
instruction sequences at the same time.
In distributed database system, database is typically used by many users. These
It must support parallel execution of transaction.
Communication delay is less.
It must be recovery from site and communication failure.
083
data disks rather than storing them on a
lt tolerance. This level requires at least four disk drives to implement RAID.
What is Concurrency? List out method of Concurrency control and Explain any one of 07
n of several
In distributed database system, database is typically used by many users. These

Locking-Based Concurrency Control
• The main idea of locking
item that is shared by conflicting operations is accessed by one operation at a
time.
• This lock is set by a transaction before it is accessed and is reset at the end of
its use.
• There are two types of locks read lock (rl) and write lock (wl)
Locking-Based Concurrency Control Algorithms
Methods of concurrency control
Based Concurrency Control
The main idea of locking-based concurrency control is to ensure that a dat
This lock is set by a transaction before it is accessed and is reset at the end of
There are two types of locks read lock (rl) and write lock (wl)
ed Concurrency Control Algorithms
083
based concurrency control is to ensure that a data
This lock is set by a transaction before it is accessed and is reset at the end of
There are two types of locks read lock (rl) and write lock (wl)

2PL Lock Graph
Q (B) Explain layers of query processing.
A (B) Query Decomposition
• The first layer decomposes the calculus query into an algebraic query on global
relations. The information needed for this
conceptual schema describing the global relations.
• Query decomposition can be viewed as four successive steps.
• First, the calculus query is rewritten in a normalized form that is suitable for
subsequent manipulation.
manipulation of the query quantifiers and of the query qualification by applying
logical operator priority.
• Second, the normalized query is analyzed semantically so that incorrect queries
are detected and rej
queries exist only for a subset of relational calculus. Typically, they use some
sort of graph that captures the semantics of the query.
• Third, the correct query (still expressed in relational cal
One way to simplify a query is to eliminate redundant predicates. Note that
redundant queries are likely to arise when a query is the result of system
transformations applied to the user query. such transformations are used for
performing semantic data control (views, protection, and semantic integrity
control).
• Fourth, the calculus query is restructured as an algebraic query. The traditional
way to do this transformation toward a “better” algebraic specification is to
start with an initial algebraic query and transform it in order to find a “go
• The algebraic query generated by this layer is good in the sense that the worse
executions are typically avoided.
Data Localization
• The input to the second layer is an algebraic query on globa
main role of the second layer is to localize the query’s data using data
distribution information in the fragment schema.
• This layer determines which fragments are involved in the query and
transforms the distributed query into a query on
• A global relation can be reconstructed by applying the fragmentation rules, and
Explain layers of query processing.
Query Decomposition
The first layer decomposes the calculus query into an algebraic query on global
relations. The information needed for this transformation is found in the global
conceptual schema describing the global relations.
Query decomposition can be viewed as four successive steps.
, the calculus query is rewritten in a normalized form that is suitable for
subsequent manipulation. Normalization of a query generally involves the
logical operator priority.
, the normalized query is analyzed semantically so that incorrect queries
are detected and rejected as early as possible. Techniques to detect incorrect
sort of graph that captures the semantics of the query.
, the correct query (still expressed in relational calculus) is simplified.
rming semantic data control (views, protection, and semantic integrity
, the calculus query is restructured as an algebraic query. The traditional
nitial algebraic query and transform it in order to find a “go
The algebraic query generated by this layer is good in the sense that the worse
executions are typically avoided.
The input to the second layer is an algebraic query on globa
distribution information in the fragment schema.
This layer determines which fragments are involved in the query and
transforms the distributed query into a query on fragments.
A global relation can be reconstructed by applying the fragmentation rules, and
083
07
The first layer decomposes the calculus query into an algebraic query on global
transformation is found in the global
Query decomposition can be viewed as four successive steps.
, the calculus query is rewritten in a normalized form that is suitable for
Normalization of a query generally involves the
, the normalized query is analyzed semantically so that incorrect queries
ected as early as possible. Techniques to detect incorrect
culus) is simplified.
rming semantic data control (views, protection, and semantic integrity
, the calculus query is restructured as an algebraic query. The traditional
nitial algebraic query and transform it in order to find a “go
The algebraic query generated by this layer is good in the sense that the worse
The input to the second layer is an algebraic query on global relations. The
This layer determines which fragments are involved in the query and
A global relation can be reconstructed by applying the fragmentation rules, and

then deriving a program, called a localization program, of relational algebra
operators, which then act on fragments.
• Generating a query on fragments is done in two
• First, the query is mapped into a fragment query by substituting each
relation by its reconstruction program (also called materialization
program).
• Second
another “good” query.
Global Query Optimization
• The input to the third layer is an algebraic query on fragments. The goal of
query optimization is to find an execution strategy for the query which is close
to optimal.
• The previous layers have already optimized the quer
eliminating redundant expressions. However, this optimization is independent
of fragment characteristics such as fragment allocation and cardinalities.
• Query optimization consists of finding the “best” ordering of operators in
the query, including communication operators that minimize a cost
function.
• The output of the query optimization layer is a optimized algebraic query with
communication operators included on fragments. It is typically represented and
saved (for future executions) as
operators, which then act on fragments.
Generating a query on fragments is done in two steps
, the query is mapped into a fragment query by substituting each
program).
Second, the fragment query is simplified and restructured to produce
another “good” query.
Global Query Optimization
The input to the third layer is an algebraic query on fragments. The goal of
The previous layers have already optimized the query, for example, by
Query optimization consists of finding the “best” ordering of operators in
including communication operators that minimize a cost
The output of the query optimization layer is a optimized algebraic query with
saved (for future executions) as a distributed query execution
083
, the query is mapped into a fragment query by substituting each
, the fragment query is simplified and restructured to produce
The input to the third layer is an algebraic query on fragments. The goal of
y, for example, by
Query optimization consists of finding the “best” ordering of operators in
including communication operators that minimize a cost
The output of the query optimization layer is a optimized algebraic query with
a distributed query execution plan.

Minterm Properties
• Minterm selectivity
• Number of records that satisfy minterm
• sel(m1) = 1; sel(m2) = 1; sel(m4) = 0
• Access frequency by applications and users
• Q = {q1, q2, …, qq} is set of queries
• acc(q1) is freque
Algorithm for Determining Minterms
Rule 1: fragment is partitioned into at least two parts that
by at least one application
Definitions
R - relation
Pr - set of simple predicates
Pr' - another set of simple predicates
F - set of minterm fragments
COM_MIN (R, Pr)
{ // Compute minterms
Rule 1
Pr' = pi
Pr = Pr - pi
F = fi // fi is minterm fragment according to p
while (Pr' is incomplete)
{
find pj Pr that partitions some f
Pr' = Pr' U pj
Pr = Pr - pj
F = F U fj // fj is minterm fragment according to p
if pk Pr' which is
Pr' = Pr' - pk
F = F - fk
}
}
return Pr'
}
Q.3 (A) Explain following in context of Relational algebra :
1. Selection 2. Natural Join 3.
Minterm selectivity
Number of records that satisfy minterm
sel(m1) = 1; sel(m2) = 1; sel(m4) = 0
Access frequency by applications and users
Q = {q1, q2, …, qq} is set of queries
acc(q1) is frequency of access of query 1
Algorithm for Determining Minterms
Rule 1: fragment is partitioned into at least two parts that are accessed differently
by at least one application
set of simple predicates
simple predicates
set of minterm fragments
COM_MIN (R, Pr)
{ // Compute minterms // find a pi such that pi partitions R according to
is minterm fragment according to pi
while (Pr' is incomplete)
Pr that partitions some fk of Pr'
is minterm fragment according to pj
Pr' which is non-relevant { // this is complex
following in context of Relational algebra :
1. Selection 2. Natural Join 3. Projection
083
are accessed differently
partitions R according to
07

A.3 (A) 1. Selection
• Produces a horizontal subset of the operand relation
• General form
σF(R)={
where
• R is a relation,
• F is a formula consisting of
•
•
•
Selection Example
2. Natural Join
• Equi-join of two re
both R and S and projecting out one copy of those attributes
• R ⋈ S = ΠR∪Sσ
Natural Join Example
Produces a horizontal subset of the operand relation
)={t |t∈R and F(t) is true}
is a relation, t is a tuple variable
is a formula consisting of
operands that are constants or attributes
arithmetic comparison operators
<, >, =, ≠, ≤, ≥
logical operators
∧, ∨, ¬
join of two relations R and S over an attribute (or attributes) common to
and projecting out one copy of those attributes
σF(R × S)
Natural Join Example
083
over an attribute (or attributes) common to
and projecting out one copy of those attributes

3. Projection
• Produces a vertical slice of a relation
• General form
where
• R is a relation,
• {A1,…,
will be performed
• Note: projection can generate duplicate tuples. Commercial systems (and SQL)
allow this and provide
• Projection with duplicate elimination
• Projection without duplicate elimination
Projection Example
Q (B) Explain Client Server architecture for Distributed DBMS.
A (B) • This provides two-
complexity of modern DBMSs and the complexity of distribution.
• The server does most of the data management work (query processing an
optimization, transaction management, storage management).
• The client is the application and the user interface (management the data that is
cached to the client, management the transaction locks).
• Multiple client - single server
From a data management perspective, this is not much different from
centralized databases since the database is stor
Produces a vertical slice of a relation
ΠA1,…,An(R)={t[A1,…, An] | t∈R}
is a relation, t is a tuple variable
,…, An} is a subset of the attributes of R over which the
will be performed
Note: projection can generate duplicate tuples. Commercial systems (and SQL)
allow this and provide
Projection with duplicate elimination
Projection without duplicate elimination
erver architecture for Distributed DBMS.
-level architecture which make it easier to manage the
does most of the data management work (query processing an
optimization, transaction management, storage management).
is the application and the user interface (management the data that is
cached to the client, management the transaction locks).
• This architecture is quite common in
relational systems where the communication
between the clients and the server(s) is at the
level of SQL statements.
single server
centralized databases since the database is stored on only one machine (the server)
083
over which the projection
Note: projection can generate duplicate tuples. Commercial systems (and SQL)
07
which make it easier to manage the
does most of the data management work (query processing and
is the application and the user interface (management the data that is
This architecture is quite common in
ems where the communication
between the clients and the server(s) is at the
level of SQL statements.
ed on only one machine (the server)

which also hosts the software to manage it. However, there are some differences from
centralized systems in the way transactions are executed and caches are managed.
• Multiple client - multiple server
In this case, two
client manages its own connection to the appropriate server or each client knows of
only its “home server” which then communicates with other servers as required.
Q. 3 (A) Explain ACID property in concept of DDBMS.
A. 3 (A) • The consistency and reliability aspects of transactions are due to four properties
1. Atomicity
2. Consistency
3. Isolation
4. Durability
• Together, these are commonly referred to as the ACID properties of transactions.
1. Atomicity
• Atomicity refers to the fact that a transaction is treated as a unit of operation.
Therefore, either all the transaction’s actions are completed, or none of them
are. This is also known as the “all
2. Consistency
• The consistency of a t
• In other words, a transaction is a correct program that maps one consistent
database state to another.
3. Isolation
• Isolation is the property of transactions that requires each transaction to see a
consistent database at
• In other words, an executing transaction cannot reveal its results to other
concurrent transactions before its commitment.
4. Durability
• Durability refers to that property of transactions which ensures that once a
transaction commits, its result
database.
Q (B) Explain AA matrix, CA matrix and BEA algorithm in context of vertical
fragmentation.
A (B) Determining Affinity
• The attribute use matrix does not help us yet
• We cannot determine the affin
• because we don't know the access frequency of the
• attribute groups
• We need this to calculate attribute affinity
• Defines how often Ai and Aj are accessed together
• Depends on the frequency of query requests for
simultaneously.
multiple server
In this case, two alternative management strategies are possible: either each
OR
rty in concept of DDBMS.
The consistency and reliability aspects of transactions are due to four properties
Together, these are commonly referred to as the ACID properties of transactions.
Atomicity refers to the fact that a transaction is treated as a unit of operation.
are. This is also known as the “all-or-nothing property.”
The consistency of a transaction is simply its correctness.
In other words, a transaction is a correct program that maps one consistent
database state to another.
Isolation is the property of transactions that requires each transaction to see a
consistent database at all times.
In other words, an executing transaction cannot reveal its results to other
concurrent transactions before its commitment.
Durability refers to that property of transactions which ensures that once a
transaction commits, its results are permanent and cannot be erased from the
Explain AA matrix, CA matrix and BEA algorithm in context of vertical
Determining Affinity
The attribute use matrix does not help us yet
We cannot determine the affinity of the attributes
because we don't know the access frequency of the
attribute groups
ed this to calculate attribute affinity - aff(Ai, Aj)
Defines how often Ai and Aj are accessed together
Depends on the frequency of query requests for attributes Ai and Aj
simultaneously.
083
alternative management strategies are possible: either each
07
The consistency and reliability aspects of transactions are due to four properties
Together, these are commonly referred to as the ACID properties of transactions.
Atomicity refers to the fact that a transaction is treated as a unit of operation.
In other words, a transaction is a correct program that maps one consistent
Isolation is the property of transactions that requires each transaction to see a
In other words, an executing transaction cannot reveal its results to other
Durability refers to that property of transactions which ensures that once a
s are permanent and cannot be erased from the
Explain AA matrix, CA matrix and BEA algorithm in context of vertical 07
Ai and Aj

Affinity Measure
• The attribute affinity between two attributes Ai and Aj of a relation
• where refm(qk) is the number of access to attributes Ai and Aj for each
application qk at site Sm
• where accl(qk) is the access frequency of
Attribute Affinity Matrix
Clustering Algorithm
• Want to find which attributes belong together in a vertically fragmented table
• Examining it for this small case is sufficient
Bond Energy Algorithm (BEA)
• Initialize - pick
• Iterate
Select the next column and try to place it in the matrix
Choose the place that maximizes the global affinity
Repeat
• Repeat the process for rows.
However, since AA is symmetric, jus
We will reorder the columns later to create a symmetric matrix
The attribute affinity between two attributes Ai and Aj of a relation
where refm(qk) is the number of access to attributes Ai and Aj for each
application qk at site Sm
where accl(qk) is the access frequency of qk for each site
Attribute Affinity Matrix – Example
Clustering Algorithm
Want to find which attributes belong together in a vertically fragmented table
Examining it for this small case is sufficient
Bond Energy Algorithm (BEA)
pick any column
Select the next column and try to place it in the matrix
Choose the place that maximizes the global affinity
Repeat the process for rows.
However, since AA is symmetric, just reorder the rows!
083
The attribute affinity between two attributes Ai and Aj of a relation
where refm(qk) is the number of access to attributes Ai and Aj for each
Want to find which attributes belong together in a vertically fragmented table

BEA Pseudocode
Bond Energy Algorithm (BEA) {
input: AA // attribute affinity matrix
output: CA // clustered attribute matrix
// put the first two columns in
CA(*, 1) <- AA(*, 1)
CA(*, 2) <- AA(*, 2)
// for each of the rest of the columns of AA,
// choose the best placement.
while (index <= n ) {
// calculate continuity of each possible place for new column
for (i=1; i< index; i++) // iterate over the columns
calculate cont(Ai-1, Aindex, Ai);
calculate cont(Aindex-
loc <- placement given by maximum cont()
for (j=index; j>loc; j--
CA(*,j) <- CA (*, j-1);
CA(*, loc) = AA(*, index);
index <- index + 1;
}
// reorder the rows according to the placement of columns
}
Q.4 (A) Explain Two phase commit protocol.
A:4 (A) Two-phase commit (2PC) is a very simple and elegant protocol that ensures the atomic
commitment of distributed transactions. It extends t
distributed transactions by insisting that all sites involved in the execution of a distributed
transaction agree to commit the transaction before its effects are made permanent. There are
a number of reasons wh
the type of concurrency control algorithm that is used, some schedulers may not be ready to
terminate a transaction. For example, if a transaction has read a value of a data item that is
updated by another transaction that has not yet committed, the associated scheduler may not
want to commit the former. Of course, strict concurrency control algorithms that avoid
cascading aborts would not permit the updated value of a data item to be read
transaction until the updating transaction terminates. This is sometimes called the
recoverability condition.
A brief description of the 2PC protocol that does not consider failures is as follows. Initially, the
coordinator writes a begin com
sites, and enters the WAIT state. When a participant receives a “prepare” message, it checks if
it could commit the transaction. If so, the participant writes a
“vote-commit” message to the coordinator, and enters READY state; otherwise, the participant
writes an abort record and sends a “vote
the site is to abort, it can forget about that transaction, si
(i.e., unilateral abort). After the coordinator has received a reply from every participant, it
decides whether to commit or to abort the transaction. If even one participant has registered a
negative vote, the coordi
sends a “global-abort” message to all participant sites, and enters the ABORT state; otherwise,
it writes a commit record, sends a “global
COMMIT state. The participants either commit or abort the transaction according to the
coordinator’s instructions and send back an acknowledgment, at which point the coordinator
terminates the transaction by writing an
Bond Energy Algorithm (BEA) {
input: AA // attribute affinity matrix
output: CA // clustered attribute matrix
// put the first two columns in
// for each of the rest of the columns of AA,
// choose the best placement.
// calculate continuity of each possible place for new column
for (i=1; i< index; i++) // iterate over the columns
1, Aindex, Ai);
-1, Aindex, Aindex+1); // boundary
placement given by maximum cont()
--) // iterate over the columns
1);
CA(*, loc) = AA(*, index);
// reorder the rows according to the placement of columns
Explain Two phase commit protocol.
phase commit (2PC) is a very simple and elegant protocol that ensures the atomic
commitment of distributed transactions. It extends the effects of local atomic commit actions to
a number of reasons why such synchronization among sites is necessary. First, depending on
cascading aborts would not permit the updated value of a data item to be read
begin commit record in its log, sends a “prepare” message to all participant
it could commit the transaction. If so, the participant writes a ready record in the log, se
commit” message to the coordinator, and enters READY state; otherwise, the participant
record and sends a “vote-abort” message to the coordinator. If the decision of
the site is to abort, it can forget about that transaction, since an abort decision serves as a veto
negative vote, the coordinator has to abort the transaction globally. So it writes an
abort” message to all participant sites, and enters the ABORT state; otherwise,
record, sends a “global-commit” message to all participants, and e
terminates the transaction by writing an end of transaction record in the log.
083
07
phase commit (2PC) is a very simple and elegant protocol that ensures the atomic
he effects of local atomic commit actions to
y such synchronization among sites is necessary. First, depending on
cascading aborts would not permit the updated value of a data item to be read by any other
record in its log, sends a “prepare” message to all participant
record in the log, sends a
commit” message to the coordinator, and enters READY state; otherwise, the participant
abort” message to the coordinator. If the decision of
nce an abort decision serves as a veto
nator has to abort the transaction globally. So it writes an abort record,
abort” message to all participant sites, and enters the ABORT state; otherwise,
commit” message to all participants, and enters the

State Transitions in 2PC Protocol
083

Q (B) Explain Top-down and Bottom
A (B) Top-Down Design Process
Conceptual design of the data is the ER model of the whole enterprise
Federated by the views
Must anticipate new views/usages
Must describe semantics of the data as used in the domain/enterprise
This is almost identical to typical DB design
However, we are concerned with Distribution Design
We need to place tables "geographic
We also need to fragment tables
Bottom Up
• Top-down design is the choice when you have the liberty of starting from scratch
Unfortunately, this is not usually the case
Some element of bottom
• Bottom-up design is integrating independent/semi
Global Conceptual
Must deal with schema mapping issues
May deal with heterogeneous integration issues
down and Bottom-up design strategies.
Down Design Process
Federated by the views
Must anticipate new views/usages
This is almost identical to typical DB design
However, we are concerned with Distribution Design
We need to place tables "geographically" on the network
We also need to fragment tables
down design is the choice when you have the liberty of starting from scratch
Unfortunately, this is not usually the case
Some element of bottom-up design is more common
up design is integrating independent/semi-independent schemas into a
Global Conceptual Schema (GCS)
Must deal with schema mapping issues
May deal with heterogeneous integration issues
OR
083
07
down design is the choice when you have the liberty of starting from scratch
independent schemas into a

(1) The Operation-based Appr
• Principle:
– Record all changes made to state of process (‘audit trail’ or ‘log’) such
that process can be returned to a previous state
– Example: A transaction based environment where transactions update a
database
•
•
(2) State-based Approach
• Principle: establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire
state of process
• Actions:
– ‘Checkpointing’ or ‘taking a checkpoint’: saving process state
– ‘Rolling back’ a process: restoring a process to a prior state
Log Based Recovery
• When failures occur the following operation that use the log are executed.
• UNDO: restore database to stat
• REDO: perform the changes to the database over again.
UNDO
REDO
based Approach
Record all changes made to state of process (‘audit trail’ or ‘log’) such
that process can be returned to a previous state
Example: A transaction based environment where transactions update a
database
It is possible to commit or undo updates on a per
basis
A commit indicates that the transaction on the object was
successful and changes are permanent
based Approach
establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire
state of process
‘Checkpointing’ or ‘taking a checkpoint’: saving process state
‘Rolling back’ a process: restoring a process to a prior state
When failures occur the following operation that use the log are executed.
UNDO: restore database to state prior to execution.
REDO: perform the changes to the database over again.
083
Record all changes made to state of process (‘audit trail’ or ‘log’) such
Example: A transaction based environment where transactions update a
es on a per-transaction
A commit indicates that the transaction on the object was
establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire
‘Checkpointing’ or ‘taking a checkpoint’: saving process state
‘Rolling back’ a process: restoring a process to a prior state
When failures occur the following operation that use the log are executed.
REDO: perform the changes to the database over again.

Write-Ahead Logging Protocol
• write-ahead logging
durability in database systems.
• In a system using W
applied. Usually both redo and undo information is stored in the log.
• The purpose of this can be illustrated by an example. Imagine a program that is in
the middle of performing some operation when
power. Upon restart, that program might well need to know whether the operation
it was performing succeeded, half
used, the program can check this log and compare what it was
doing when it unexpectedly lost power to what was actually done. On the basis of
this comparison, the program could decide to undo what it had started, complete
what it had started, or keep things as they are.
Q (B) What is Semantic Integrity control? Explain in concept of Centralized and Distributed
environment.
A (B) Semantic Integrity Control
• Semantic integrity control ensures database consistency by rejecting update
transactions that lead to inconsistent database states, or b
the database state, which compensate for the effects of the update transactions.
• Two main types of integrity constraints can be distinguished: structural constraints and
behavioral constraints.
• Structural constraints expre
of such constraints are unique key constraints in the relational model, or one
associations between objects in the object
• Behavioral constraints are essential in the databas
associations between objects, such as inclusion dependency in the relational model, or
describe object properties and structures.
Centralized Semantic Integrity Control
• Specification of Integrity Constraints
• triggers (event-condition
Ahead Logging Protocol
ahead logging (WAL) is a family of techniques for providing atomicity and
durability in database systems.
In a system using WAL, all modifications are written to a log before they are
The purpose of this can be illustrated by an example. Imagine a program that is in
the middle of performing some operation when the machine it is running on loses
it was performing succeeded, half-succeeded, or failed. If a write
used, the program can check this log and compare what it was
what it had started, or keep things as they are.
ntegrity control? Explain in concept of Centralized and Distributed
Semantic Integrity Control
Semantic integrity control ensures database consistency by rejecting update
transactions that lead to inconsistent database states, or by activating specific actions on
Two main types of integrity constraints can be distinguished: structural constraints and
behavioral constraints.
Structural constraints express basic semantic properties inherent to a model. Examples
of such constraints are unique key constraints in the relational model, or one
associations between objects in the object-oriented model.
Behavioral constraints are essential in the database design process. They can express
describe object properties and structures.
Centralized Semantic Integrity Control
Specification of Integrity Constraints
condition-action rules) can be used to automatically propagate updates,
083
) is a family of techniques for providing atomicity and
AL, all modifications are written to a log before they are
The purpose of this can be illustrated by an example. Imagine a program that is in
the machine it is running on loses
succeeded, or failed. If a write-ahead log is
supposed to be
ntegrity control? Explain in concept of Centralized and Distributed 07
Semantic integrity control ensures database consistency by rejecting update
y activating specific actions on
Two main types of integrity constraints can be distinguished: structural constraints and
ss basic semantic properties inherent to a model. Examples
of such constraints are unique key constraints in the relational model, or one-to-many
e design process. They can express
action rules) can be used to automatically propagate updates,

and thus to maintain semantic integrity.
• We can distinguish between three types of integrity constraints:
precondition, or general constraints
• EMP(ENO, ENAME, TITLE)
• PROJ(PNO, PNAME, BUDGET)
• ASG(ENO, PNO, RESP, DUR)
• Predefined constraints
express concisely the more common constraints of the relational model, such as non
null attribute, unique key, foreign
• Employee number in relation EMP cannot be null.
ENO NOT NULL IN EMP
• The project number PNO in relation ASG is a foreign key matching the primary key
PNO of relation PROJ.
PNO IN ASG REFERENCES PNO IN PROJ
• Precondition const
relation for a given update type. The update type, which might be INSERT, DELETE,
or MODIFY, permits restricting the integrity control.
• Precondition constraints can be expressed with the
with the ability to specify the update type.
CHECK ON <relation name > WHEN <update type>
• The budget of a project is between 500K and 1000K.
CHECK ON PROJ (BUDGET+ >= 500000 AND BUDGET <= 10
• Only the tuples whose budget is 0 may be deleted.
CHECK ON PROJ WHEN DELETE (BUDGET = 0)
• General constraints
quantified. The database system must ensure that those formulas are alway
CHECK ON list of <variable name>:<relation name>,(<qualification>)
• The total duration for all employees in the CAD project is less than 100.
• CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF
j.PNAME="CAD/CAM")
Distributed Semantic Integr
• Definition of Distributed Integrity Constraints
• Assertions can involve data stored at different sites, the storage of the constraints must
be decided so as to minimize the cost of integrity checking. There is a strategy based
on a taxonomy of integrity constraints that distinguishes three classes:
• Individual constraints
tuples to be updated independently of the rest of the database.
• Set-oriented constraints
and thus to maintain semantic integrity.
We can distinguish between three types of integrity constraints: predefined
general constraints.
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET)
ASG(ENO, PNO, RESP, DUR)
Predefined constraints are based on simple keywords. Through them, it is possible to
express concisely the more common constraints of the relational model, such as non
null attribute, unique key, foreign key, or functional dependency.
Employee number in relation EMP cannot be null.
ENO NOT NULL IN EMP
The project number PNO in relation ASG is a foreign key matching the primary key
PNO of relation PROJ.
PNO IN ASG REFERENCES PNO IN PROJ
Precondition constraints express conditions that must be satisfied by all tuples in a
or MODIFY, permits restricting the integrity control.
Precondition constraints can be expressed with the SQL CHECK statement enriched
with the ability to specify the update type.
(<qualification over relation name>)
The budget of a project is between 500K and 1000K.
CHECK ON PROJ (BUDGET+ >= 500000 AND BUDGET <= 1000000)
Only the tuples whose budget is 0 may be deleted.
CHECK ON PROJ WHEN DELETE (BUDGET = 0)
General constraints are formulas of tuple relational calculus where all variables are
quantified. The database system must ensure that those formulas are alway
The total duration for all employees in the CAD project is less than 100.
CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF
j.PNAME="CAD/CAM")
Distributed Semantic Integrity Control
Definition of Distributed Integrity Constraints
Assertions can involve data stored at different sites, the storage of the constraints must
integrity constraints that distinguishes three classes:
Individual constraints: single-relation single-variable constraints. They refer only to
tuples to be updated independently of the rest of the database.
oriented constraints: include single-relation multivariable constraints such as
083
predefined,
are based on simple keywords. Through them, it is possible to
express concisely the more common constraints of the relational model, such as non-
The project number PNO in relation ASG is a foreign key matching the primary key
express conditions that must be satisfied by all tuples in a
SQL CHECK statement enriched
00000)
are formulas of tuple relational calculus where all variables are
quantified. The database system must ensure that those formulas are always true.
The total duration for all employees in the CAD project is less than 100.
CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF
Assertions can involve data stored at different sites, the storage of the constraints must
integrity constraints that distinguishes three classes:
variable constraints. They refer only to
n multivariable constraints such as

functional dependency and multirelation multivariable constraints such as foreign key
constraints
• Constraints involving aggregates
evaluating the aggregates.
Individual constraints
• Consider relation EMP, horizontally fragmented across three sites using the predicates
and the domain constraint C: ENO < “E4”.
p1 : 0 ENO < “E3”
p2 : ”E3” ENO “E6”
p3 : ENO > “E6”
• Constraint C is compatible with p1 (if C i
not necessarily false), but not with p3 (if C is true, then p3 is false). Therefore,
constraint C should be globally rejected because the tuples at site 3 cannot satisfy C,
and thus relation EMP does not sati
Set-oriented constraints
• Set-oriented constraint are multivariable; that is, they involve join predicates.
• Three cases, given in increasing cost of checking, can occur:
1. The fragmentation of R is derived from that of S based on a semi join on t
used in the assertion join predicate.
2. S is fragmented on join attribute.
3. S is not fragmented on join attribute.
• In the first case, compatibility checking is cheap since the tuple of S matching a tuple
of R is at the same site.
• In the second case, each tuple of R must be compared with at most one fragment of S,
because the join attribute value of the tuple of R can be used to find the site of the
corresponding fragment of S.
• In the third case, each tuple of R must be compared with all f
compatibility is found for all tuples of R, the constraint can be stored at each site.
Constraints involving aggregates
• These constraints are among the most costly to test because they require the calculation
of the aggregate functions
• The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT.
• Each aggregate function contains a projection part and a selection part.
Q.5 (A) Explain Query Processing in Distributed Systems.
A.5 (A) Query Processing in Distributed Systems
• In a distributed DBMS the catalog has to store additional i
location of relations and their replicas. The catalog must also include system wise
information such as the number of site in the system along with their identifiers'.
Constraints involving aggregates: require special processing because of the cost of
evaluating the aggregates.
Individual constraints
Consider relation EMP, horizontally fragmented across three sites using the predicates
and the domain constraint C: ENO < “E4”.
p1 : 0 ENO < “E3”
p2 : ”E3” ENO “E6”
Constraint C is compatible with p1 (if C is true, p1 is true) and p2 (if C is true, p2 is
and thus relation EMP does not satisfy C.
oriented constraints
oriented constraint are multivariable; that is, they involve join predicates.
Three cases, given in increasing cost of checking, can occur:
The fragmentation of R is derived from that of S based on a semi join on t
used in the assertion join predicate.
S is fragmented on join attribute.
S is not fragmented on join attribute.
In the first case, compatibility checking is cheap since the tuple of S matching a tuple
of R is at the same site.
cond case, each tuple of R must be compared with at most one fragment of S,
corresponding fragment of S.
In the third case, each tuple of R must be compared with all fragments of S. If
Constraints involving aggregates
These constraints are among the most costly to test because they require the calculation
of the aggregate functions.
The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT.
Each aggregate function contains a projection part and a selection part.
Query Processing in Distributed Systems.
Query Processing in Distributed Systems
In a distributed DBMS the catalog has to store additional information including the
083
: require special processing because of the cost of
Consider relation EMP, horizontally fragmented across three sites using the predicates
s true, p1 is true) and p2 (if C is true, p2 is
oriented constraint are multivariable; that is, they involve join predicates.
The fragmentation of R is derived from that of S based on a semi join on the attribute
In the first case, compatibility checking is cheap since the tuple of S matching a tuple
cond case, each tuple of R must be compared with at most one fragment of S,
ragments of S. If
These constraints are among the most costly to test because they require the calculation
The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT.
Each aggregate function contains a projection part and a selection part.
07
nformation including the

Mapping Global Query to Local
• The tables required in a glob
sites. The local databases have information only about local data. The controlling
site uses the global data dictionary to gather information about the distribution and
reconstructs the global view from t
• If there is no replication, the global optimizer runs local queries at the sites where
the fragments are stored. If there is replication, the global optimizer selects the site
based upon communication cost, workload, and server speed.
• The global optimizer generates a distributed execution plan so that least amount of
data transfer occurs across the sites. The plan states the location of the fragments,
order in which query steps needs to be executed and the processes involved in
transferring intermediate results.
• The local queries are optimized by the local database servers. Finally, the local
query results are merged together through union operation in case of horizontal
fragments and join operation for vertical fragments.
Example
• For example, let us consider that the following Project schema is horizontally
fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad.
• PROJECT
• Suppose there is a query to retrieve details of all projects whose status is
“Ongoing”.
• The global query will be
• Query in New Delhi’s server will be
Mapping Global Query to Local
The tables required in a global query have fragments distributed across multiple
reconstructs the global view from the fragments.
If there is no replication, the global optimizer runs local queries at the sites where
based upon communication cost, workload, and server speed.
lobal optimizer generates a distributed execution plan so that least amount of
intermediate results.
The local queries are optimized by the local database servers. Finally, the local
fragments and join operation for vertical fragments.
mple, let us consider that the following Project schema is horizontally
Suppose there is a query to retrieve details of all projects whose status is
global query will be
σstatus="ongoing"(PROJECT)
Query in New Delhi’s server will be
σstatus="ongoing"(NewD−PROJECT)
083
al query have fragments distributed across multiple
If there is no replication, the global optimizer runs local queries at the sites where
lobal optimizer generates a distributed execution plan so that least amount of
The local queries are optimized by the local database servers. Finally, the local
mple, let us consider that the following Project schema is horizontally
Suppose there is a query to retrieve details of all projects whose status is

• Query in Kolkata’s server will be
• Query in Hyderabad’s server will be
• In order to get the overall result, we need to union the results of the three queries as
follows
σstatus="ongoing"(NewD
Q (B) Explain Deadlock in Distributed Systems
A (B) Deadlock
• A deadlock can occur because transactions wait for one another. Informally, a
deadlock situation is a set of requests that can never be granted by the concurrency
control mechanism.
• A deadlock can be indicated by a cycle in the
• In computer science,
processes are each waiting for another to release a resource, or more than two
processes are waiting for resources in a circular chain.
Deadlock Detection
• Deadlock detection
identifying the processes and resources involved in the
• Detection of a cycle in WFG proceeds concurrently with normal operation in main
of deadlock detection
Deadlock Prevention
• The deadlock prevention approach does not allow any transaction to acquire locks
that will lead to deadlocks. The convention is that when more than one transactions
request for locking the same data item, only one of them is grante
• One of the most popular deadlock prevention methods is pre
locks.
Deadlock Avoidance
• The deadlock avoidance approach handles deadlocks before they occur. It analyzes
the transactions and the locks to determine whether
deadlock.
Query in Kolkata’s server will be
σstatus="ongoing"(Kol−PROJECT)
Query in Hyderabad’s server will be
σstatus="ongoing"(Hyd−PROJECT)
In order to get the overall result, we need to union the results of the three queries as
NewD−PROJECT)∪σstatus="ongoing"(kol−
us="ongoing"(Hyd−PROJECT)
Deadlock in Distributed Systems.
A deadlock can occur because transactions wait for one another. Informally, a
control mechanism.
A deadlock can be indicated by a cycle in the wait-for-graph (WFG)
In computer science, deadlock refers to a specific condition when two or more
processes are waiting for resources in a circular chain.
Deadlock detection is the process of actually determining that a
identifying the processes and resources involved in the deadlock.
Detection of a cycle in WFG proceeds concurrently with normal operation in main
of deadlock detection.
Deadlock Prevention
The deadlock prevention approach does not allow any transaction to acquire locks
request for locking the same data item, only one of them is grante
One of the most popular deadlock prevention methods is pre-acquisition of all the
The deadlock avoidance approach handles deadlocks before they occur. It analyzes
the transactions and the locks to determine whether or not waiting leads to a
083
In order to get the overall result, we need to union the results of the three queries as
kol−PROJECT)∪σstat
07
A deadlock can occur because transactions wait for one another. Informally, a
graph (WFG).
refers to a specific condition when two or more
is the process of actually determining that a deadlock exists and
deadlock.
Detection of a cycle in WFG proceeds concurrently with normal operation in main
The deadlock prevention approach does not allow any transaction to acquire locks
request for locking the same data item, only one of them is granted the lock.
acquisition of all the
The deadlock avoidance approach handles deadlocks before they occur. It analyzes
or not waiting leads to a

• There are two algorithms for this purpose, namely
• Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
data item which is already locked by T2. The algorithms are as follows
• Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is
younger than T2, T1 is aborted and later restarted.
• Wound-Wait − If T1 is older than T2, T2 is aborted and later restar
if T1 is younger than T2, T1 is allowed to wait.
Q.5 (A) What is authorization control? How do imply authorization control in a distributed
environment?
A.5 (A) Authorization control
• Authorization control consists of checking whether a given triple (subject, operation,
object) can be allowed to proceed.
• The introduction of a subject in the system
password).
• The objects to protect are subsets of the database. Relational systems provide finer and
more general protection granularity than do earlier systems.
• A right expresses a relationship between a subject and an object for a particular set of
operations.
GRANT <operation type(s)> ON <object> TO <subject(s)>
REVOKE <operation type(s)> FROM <object> TO <subject(s)>
Multilevel Access Control
• Discretionary access control has some limitations. One problem is that a malicious user
can access unauthorized data through an authorized user.
• For instance, consider user A who has authorized access to relations R and S and user
B who has authorized acc
application program used by A so it writes R data into S, then B can read unauthorized
data without violating authorization rules.
• Multilevel access control answers this problem and further improve
defining different security levels for both subjects and data objects.
• process has a security level also called clearance derived from that of the user.
• In its simplest form, the security levels are Top Secret (TS), Secret (S),
Confidential (C) and Unclassified (U), and ordered as TS > S >C >U, where
“>” means “more secure”.
• Access in read and write modes by subjects is restricted by two simple rules:
Rule 1 (called “no read up”)
• protects data from unauthorized disclosure, i.e., a subject
can only read objects at the same or lower security levels.
Rule 2 (called “no write down”)
• protects data from unauthorized change, i.e., a subject at a given security level can
only write objects at the same or higher security
There are two algorithms for this purpose, namely wait-die and
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
is already locked by T2. The algorithms are as follows
− If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is
younger than T2, T1 is aborted and later restarted.
− If T1 is older than T2, T2 is aborted and later restar
if T1 is younger than T2, T1 is allowed to wait.
OR
What is authorization control? How do imply authorization control in a distributed
Authorization control
Authorization control consists of checking whether a given triple (subject, operation,
object) can be allowed to proceed.
The introduction of a subject in the system is typically done by a pair (user name,
The objects to protect are subsets of the database. Relational systems provide finer and
more general protection granularity than do earlier systems.
A right expresses a relationship between a subject and an object for a particular set of
Multilevel Access Control
Discretionary access control has some limitations. One problem is that a malicious user
can access unauthorized data through an authorized user.
For instance, consider user A who has authorized access to relations R and S and user
B who has authorized access to relation S only. If B somehow manages to modify an
data without violating authorization rules.
Multilevel access control answers this problem and further improve
process has a security level also called clearance derived from that of the user.
In its simplest form, the security levels are Top Secret (TS), Secret (S),
l (C) and Unclassified (U), and ordered as TS > S >C >U, where
“>” means “more secure”.
Access in read and write modes by subjects is restricted by two simple rules:
Rule 1 (called “no read up”)
protects data from unauthorized disclosure, i.e., a subject at a given security level
can only read objects at the same or lower security levels.
Rule 2 (called “no write down”)
protects data from unauthorized change, i.e., a subject at a given security level can
only write objects at the same or higher security levels.
083
and wound-wait.
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
is already locked by T2. The algorithms are as follows −
− If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is
− If T1 is older than T2, T2 is aborted and later restarted. Otherwise,
What is authorization control? How do imply authorization control in a distributed 07
Authorization control consists of checking whether a given triple (subject, operation,
is typically done by a pair (user name,
The objects to protect are subsets of the database. Relational systems provide finer and
A right expresses a relationship between a subject and an object for a particular set of
Discretionary access control has some limitations. One problem is that a malicious user
For instance, consider user A who has authorized access to relations R and S and user
ess to relation S only. If B somehow manages to modify an
Multilevel access control answers this problem and further improves security by
process has a security level also called clearance derived from that of the user.
In its simplest form, the security levels are Top Secret (TS), Secret (S),
l (C) and Unclassified (U), and ordered as TS > S >C >U, where
Access in read and write modes by subjects is restricted by two simple rules:
at a given security level
protects data from unauthorized change, i.e., a subject at a given security level can

Distributed Access Control
• The additional problems of access control in a distributed environment stem from the
fact that objects and subjects are distributed and that messages with sensitive data can
be read by unauthorized users.
• These problems are: remote user authentication, management of discretionary access
rules, handling of views and of user groups, and enforcing multilevel access control.
• Remote user authentication is necessary since any site of a distributed DBMS may
accept programs initiated, and authorized, at remote sites.
• Three solutions are possible for managing authentication
1. Authentication information is maintained at a central site for global users which can
then be authenticated only once and then
2. The information for authenticating users (user name and password) is replicated at all
sites in the catalog.
3. Intersite communication is thus protected by the use of the site password. Once the
initiating site has been auth
users.
Q (B) Write a short note on Query Opti
A (B) Query optimization refers to the process of producing a query execution plan
represents an execution strategy for the query. This QEP minimizes
A query optimizer, the software module that performs
consisting of three components: a search space,
search space is the set of
plans are equivalent, in the sense that they yield the same result, but they differ in the
execution order of operations and the way these operations are implemented, and therefore in
their performance.
The cost model predicts
must have good knowledge about the distributed execution environment. The
explores the search space and selects the best plan, using the cost model. It defines which
plans are examined and in
distributed) are captured by the search space and the cost model.
Distributed Access Control
The additional problems of access control in a distributed environment stem from the
be read by unauthorized users.
ms are: remote user authentication, management of discretionary access
Remote user authentication is necessary since any site of a distributed DBMS may
nitiated, and authorized, at remote sites.
Three solutions are possible for managing authentication
Authentication information is maintained at a central site for global users which can
then be authenticated only once and then accessed from multiple sites.
The information for authenticating users (user name and password) is replicated at all
Intersite communication is thus protected by the use of the site password. Once the
initiating site has been authenticated, there is no need for authenticating their remote
Query Optimization.
Query optimization refers to the process of producing a query execution plan
represents an execution strategy for the query. This QEP minimizes an objective cost function.
ery optimizer, the software module that performs query optimization, is usually seen as
consisting of three components: a search space, a cost model, and a search
is the set of alternative execution plans that represent the input q
in the sense that they yield the same result, but they differ in the
of operations and the way these operations are implemented, and therefore in
the cost of a given execution plan. To be accurate, the cost model
knowledge about the distributed execution environment. The
the search space and selects the best plan, using the cost model. It defines which
plans are examined and in which order. The details of the environment (centralized
distributed) are captured by the search space and the cost model.
083
The additional problems of access control in a distributed environment stem from the
ms are: remote user authentication, management of discretionary access
Remote user authentication is necessary since any site of a distributed DBMS may
Authentication information is maintained at a central site for global users which can
accessed from multiple sites.
The information for authenticating users (user name and password) is replicated at all
Intersite communication is thus protected by the use of the site password. Once the
enticated, there is no need for authenticating their remote
07
Query optimization refers to the process of producing a query execution plan (QEP) which
an objective cost function.
query optimization, is usually seen as
a cost model, and a search strategy. The
alternative execution plans that represent the input query. These
in the sense that they yield the same result, but they differ in the
of operations and the way these operations are implemented, and therefore in
n execution plan. To be accurate, the cost model
knowledge about the distributed execution environment. The search strategy
the search space and selects the best plan, using the cost model. It defines which
which order. The details of the environment (centralized versus

1 Search Space
Query execution plans are typically abstracted by means of operator trees
order in which the operations are executed. They are enriched
such as the best algorithm chosen for each operation.
thus be defined as the set of equivalent
transformation rules. To characterize query
which are operator trees whose
permutations of the join order
queries.
2 Search Strategy
The most popular search strategy used by query optimizers is
deterministic. Deterministic strategies proceed by
joining one more relation at each step until complete plans are
programming builds all possible plans,
reduce the optimization cost, partial plans
pruned (i.e., discarded) as soon as
greedy algorithm, builds
3 Distributed Cost Model
An optimizer’s cost model includes cost functions to predict the cost of operators,
base data, and formulas to evaluate the sizes of intermediate result
execution time, so a cost function represents the execution
Query execution plans are typically abstracted by means of operator trees
order in which the operations are executed. They are enriched with additional information,
such as the best algorithm chosen for each operation. For a given query, the search space can
thus be defined as the set of equivalent operator trees that can be produced using
transformation rules. To characterize query optimizers, it is useful to concentrate on
which are operator trees whose operators are join or Cartesian product. Thi
permutations of the join order have the most important effect on performance of relational
The most popular search strategy used by query optimizers is dynamic programming
Deterministic strategies proceed by building plans, starting
joining one more relation at each step until complete plans are obtained, as in Figure
programming builds all possible plans, breadth first, before it chooses the “best” plan. To
reduce the optimization cost, partial plans that are not likely to lead to the optimal plan are
(i.e., discarded) as soon as possible. By contrast, another deterministic strategy, the
only one plan, depth-first.
3 Distributed Cost Model
An optimizer’s cost model includes cost functions to predict the cost of operators,
base data, and formulas to evaluate the sizes of intermediate results. The cost is in terms of
execution time, so a cost function represents the execution time of a query.
*****Best of Luck*****
083
Query execution plans are typically abstracted by means of operator trees , which define the
with additional information,
For a given query, the search space can
operator trees that can be produced using
optimizers, it is useful to concentrate on join trees,
operators are join or Cartesian product. This is because
have the most important effect on performance of relational
dynamic programming, which is
from base relations,
obtained, as in Figure. Dynamic
before it chooses the “best” plan. To
that are not likely to lead to the optimal plan are
possible. By contrast, another deterministic strategy, the
An optimizer’s cost model includes cost functions to predict the cost of operators, statistics and
The cost is in terms of
time of a query.

DDBMS Paper with Solution

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to DDBMS Paper with Solution

Similar to DDBMS Paper with Solution (20)

More from Gyanmanjari Institute Of Technology

More from Gyanmanjari Institute Of Technology (20)

Recently uploaded

Recently uploaded (20)

DDBMS Paper with Solution